Re: git merge banch w/ different submodule revision

2018-05-04 Thread Heiko Voigt
Hi,

On Fri, May 04, 2018 at 08:29:32AM +, Middelschulte, Leif wrote:
> Am Donnerstag, den 03.05.2018, 18:42 +0200 schrieb Heiko Voigt:
> > I still do not understand how the current behaviour is mismatching with
> > users expectations. Let's assume that you directly tracked the files of
> > L in your product repository P, without any submodule boundary. How
> > would the behavior be different? Would it be? If D started on an older
> > revision and gets merged into a newer revision, there can always be
> > regressions even without submodules.
> > 
> > Why would the core developer need to be informed about mismatching
> > revisions if he himself advanced the submodule?
> In that case you'd be right. I should have picked my example more wisely.
> Assume right here that not a core developer, but another developer advanced
> the submodule (also via feature branch + merge).
> > 
> > It seems to me that you do not want to mix integration testing and
> > testing of the feature itself. 
> That's on point. That's why it would be nice if git *at least* warned
> about the different revisions wrt submodules.
> 
> But, I guess, I learned something about submodules:
> I used to think of submodules as means to pin down a specific revision like: 
> `ver == x`.
> Now I'm learning that submodules are treated as `ver >= x` during a merge.

Well a submodule version is pinned down as long a you do not change it
and commit it. The same as files and the goal is to make submodules
behave as close to normal files as possible. And git "warns" about
changed submodules by displaying them in the diff.

Actually the use case you are describing is not even involving a real
merge for submodules. It is just changing the pointer to another
revision.

> > How about just testing/reviewing on the
> > branch then? You would still get the submodule revision D was working on
> > and then in a later stage check if integration with everything else
> > works.
> Sure. But if the behavior deviates after a merge the merging developer is 
> currently not
> aware that it *might* have to do with different submodule revisions used, not 
> the "actual" code merged.
> 
> Like not even "beware: the (feature) branch you've merged used an 'older' 
> revision of X"

The submodule is part of the "actual" code and should be reviewed the
same. Maybe you want to set the diff.submodule option to 'diff' ? Then
git shows the actual diff of the changed contents in the submodule and
it would be more obvious how the code changed.

At the moment it seems to me that you want submodules to behave
differently than we handle normal files/directories which is the
opposite direction we have been trying to get git into. My feeling
though is that this should be covered by the review process instead of a
failing merge. Another option would be that you could write a hook that
warns reviewers that they are merging a submodule update.

Cheers Heiko


Re: git merge banch w/ different submodule revision

2018-05-03 Thread Heiko Voigt
Hi,

On Wed, May 02, 2018 at 07:30:25AM +, Middelschulte, Leif wrote:
> Am Montag, den 30.04.2018, 19:02 +0200 schrieb Heiko Voigt:
> > On Thu, Apr 26, 2018 at 03:19:36PM -0700, Stefan Beller wrote:
> > > Stefan wrote:
> > > > See 
> > > > https://github.com/git/git/commit/68d03e4a6e448aa557f52adef92595ac4d6cd4bd
> > > > (68d03e4a6e (Implement automatic fast-forward merge for submodules, 
> > > > 2010-07-07)
> > > > to explain the situation you encounter. (specifically merge_submodule
> > > > at the end of the diff)
> > > 
> > > +cc Heiko, author of that commit.
> > 
> > In that commit we tried to be very careful about. I do not understand
> > the situation in which the current strategy would be wrong by default.
> > 
> > We only merge if the following applies:
> > 
> >  * The changes in the superproject on both sides point forward in the
> >submodule.
> > 
> >  * One side is contained in the other. Contained from the submodule
> >perspective. Sides from the superproject merge perspective.
> > 
> > So in case of the mentioned rewind of a submodule: Only one side of the
> > 3-way merge would point forward and the merge would fail.
> > 
> > I can imagine, that in case of a temporary revert of a commit in the
> > submodule that you would not want that merged into some other branch.
> > But that would be the same without submodules. If you merge a temporary
> > revert from another branch you will not get any conflict.
> > 
> > So maybe someone can explain the use case in which one would get the
> > results that seem wrong?
> In an ideal world, where there are no regressions between revisions, a
> fast-forward is appropriate. However, we might have regressions within
> submodules.
> 
> So the usecase is the following:
> 
> Environment:
> - We have a base library L that is developed by some team (Team B).
> - Another team (Team A) developes a product P based on those libraries using 
> git-flow.
> 
> Case:
> The problem occurs, when a developer (D) of Team A tries to have a feature
> that he developed on a branch accepted by a core developer of P:
> If a core developer of P advanced the reference of L within P (linear 
> history), he might
> deem the work D insufficient. Not because of the actual work by D, but 
> regressions
> that snuck into L. The core developer will not be informed about the 
> missmatching
> revisions of L.
> 
> So it would be nice if there was some kind of switch or at least some trigger.

I still do not understand how the current behaviour is mismatching with
users expectations. Let's assume that you directly tracked the files of
L in your product repository P, without any submodule boundary. How
would the behavior be different? Would it be? If D started on an older
revision and gets merged into a newer revision, there can always be
regressions even without submodules.

Why would the core developer need to be informed about mismatching
revisions if he himself advanced the submodule?

It seems to me that you do not want to mix integration testing and
testing of the feature itself. How about just testing/reviewing on the
branch then? You would still get the submodule revision D was working on
and then in a later stage check if integration with everything else
works.

Cheers Heiko


Re: git merge banch w/ different submodule revision

2018-04-30 Thread Heiko Voigt
On Thu, Apr 26, 2018 at 03:19:36PM -0700, Stefan Beller wrote:
> Stefan wrote:
> > See 
> > https://github.com/git/git/commit/68d03e4a6e448aa557f52adef92595ac4d6cd4bd
> > (68d03e4a6e (Implement automatic fast-forward merge for submodules, 
> > 2010-07-07)
> > to explain the situation you encounter. (specifically merge_submodule
> > at the end of the diff)
> 
> +cc Heiko, author of that commit.

In that commit we tried to be very careful about. I do not understand
the situation in which the current strategy would be wrong by default.

We only merge if the following applies:

 * The changes in the superproject on both sides point forward in the
   submodule.

 * One side is contained in the other. Contained from the submodule
   perspective. Sides from the superproject merge perspective.

So in case of the mentioned rewind of a submodule: Only one side of the
3-way merge would point forward and the merge would fail.

I can imagine, that in case of a temporary revert of a commit in the
submodule that you would not want that merged into some other branch.
But that would be the same without submodules. If you merge a temporary
revert from another branch you will not get any conflict.

So maybe someone can explain the use case in which one would get the
results that seem wrong?

Cheers Heiko


Re: submodule..ignore vs git add -u

2018-03-15 Thread Heiko Voigt
Hi,

On Mon, Mar 12, 2018 at 04:59:25PM +0100, Miklos Vajna wrote:
> Let's say I have a fairly simple submodule setup where I do 'git
> checkout' inside the submodule to check out a different commit, so the
> outer repo 'git diff' shows a submodule update.
> 
> In that case
> 
> git config submodule..ignore all
> 
> makes 'git diff' or 'git commit -a' ignore the change in the outer repo,
> but not 'git add -u'.
> 
> Reading the git-config documentation if this is intentional behavior,
> I'm a bit confused. It specifies that:
> 
> - "git status" and the diff family: handle this setting
> - git submodule commands: ignore this setting
> 
> So that about 'git add -u', is it expected that it ignores this setting
> as well?
> 
> I guess either the doc should say 'git add -u' doesn't handle this
> setting or 'git add -u' should handle it. Happy to try to make a patch
> that does the later, but I though better ask first. :-)

Have a look here for a previous discussion.

https://public-inbox.org/git/20131204221659.GA7326@sandbox-ub/

I think I never got around finishing those patches, because the
discussion died and there was no reply from the original poster asking
for this.

Maybe you could have a look at my original branch and whether that would
be the behavior you expect. I had a look into porting those patches to
the current master, but there are still some test failures.

You can see and test my current WIP branch here:

https://github.com/hvoigt/git/commits/hv/fix_ignore_all_submodules_update1

Cheers Heiko


Re: [PATCH V4] config: add --expiry-date

2017-11-30 Thread Heiko Voigt
On Mon, Nov 20, 2017 at 03:37:03PM -0500, Jeff King wrote:
> On Mon, Nov 20, 2017 at 12:28:11PM -0800, Stefan Beller wrote:
> 
> > > +cc Stefan, who added the die(). It may be that we don't care that much
> > > these days about recovering from broken .gitmodules files.
> > 
> > By that you mean commits like 37f52e9344 (submodule-config:
> > keep shallow recommendation around, 2016-05-26) for example?
> > That adds a git_config_bool to the submodule config machinery.
> 
> I actually meant ea2fa5a338 (submodule-config: keep update strategy
> around, 2016-02-29), which adds an actual die() into parse_config(). But
> yeah, I think the end result is the same.
> 
> > I agree that we'd want to be more careful, but for now I'd put it to the
> > #leftoverbits.
> 
> Fine by me. While I think the original intent was to be more lenient to
> malformed .gitmodules, it's not like we're seeing bug reports about it.

My original intent was not about being more lenient about malformed
.gitmodules but having a way to deal with repository history that might
have a malformed .gitmodules in its history. Since depending on the
branch it is on it might be quite carved in stone.
On an active project it would not be that easy to rewrite history to get
out of that situation.

When a .gitmodules file in the worktree is malformed it is easy to fix.
That is not the case when we are reading configurations from blobs.

My guess why there are no reports is that maybe not too many users are
using this infrastructure yet, plus it is probably seldom that someone
edits the .gitmodules file by hand which could lead to such a situation.
But if such an error occurs it will be very annoying if we die while
parsing submodule configurations. The only solution I see currently is
to turn submodule recursion off completely.

But maybe I am being overly cautious here.

Cheers Heiko


Re: [PATCH] status: do not get confused by submodules in excluded directories

2017-10-25 Thread Heiko Voigt
On Wed, Oct 25, 2017 at 10:28:25AM +0900, Junio C Hamano wrote:
> Heiko Voigt <hvo...@hvoigt.net> writes:
> 
> > On Tue, Oct 24, 2017 at 02:18:49PM +0900, Junio C Hamano wrote:
> >> Johannes Schindelin <johannes.schinde...@gmx.de> writes:
> >> 
> >> > We meticulously pass the `exclude` flag to the `treat_directory()`
> >> > function so that we can indicate that files in it are excluded rather
> >> > than untracked when recursing.
> >> >
> >> > But we did not yet treat submodules the same way.
> >> 
> >> ... "because of that, we ended up showing < >> what situation>>" would be a nice thing to have here, so that it can
> >> be copied to the release notes for the bugfix.  
> >
> > Yes I agree that would be nice here. It was not immediately obvious that
> > this only applies when using both flags: -u and --ignored.
> 
> Does any of you care to fill in the <> then? ;-)

How about:

Because of that, we ended up showing the submodule as untracked and its
content as ignored files when using the --ignored and -u flags with git
status.

? But maybe Dscho also has some more information to add about his
situation?

Cheers Heiko


Re: [PATCH] status: do not get confused by submodules in excluded directories

2017-10-24 Thread Heiko Voigt
On Tue, Oct 24, 2017 at 02:18:49PM +0900, Junio C Hamano wrote:
> Johannes Schindelin  writes:
> 
> > We meticulously pass the `exclude` flag to the `treat_directory()`
> > function so that we can indicate that files in it are excluded rather
> > than untracked when recursing.
> >
> > But we did not yet treat submodules the same way.
> 
> ... "because of that, we ended up showing < what situation>>" would be a nice thing to have here, so that it can
> be copied to the release notes for the bugfix.  

Yes I agree that would be nice here. It was not immediately obvious that
this only applies when using both flags: -u and --ignored.

Seems to be a corner that not many people are using. At first I thought
a plain 'git status' would show that behavior...

> How far back a release do we want to make this fix applicable?  It
> seems that it applies cleanly to maint-2.13 without breaking from my
> quick test, so that is probably where I'll queue this, even though
> we may no longer issue further maintenance releases on that track.
> 
> Any comment from submodule folks?
> 
> Sorry that I didn't notice this was left unattended by anybody til
> now.  Will queue while waiting for those who are into submodules to
> respond.

Looks good to me.

Cheers Heiko


Re: [PATCH 1/2] t5526: check for name/path collision in submodule fetch

2017-10-23 Thread Heiko Voigt
On Thu, Oct 19, 2017 at 11:11:08AM -0700, Stefan Beller wrote:
> Signed-off-by: Stefan Beller 
> ---
> 
> This is just to test the corner case we're discussing.
> Applies on top of origin/hv/fetch-moved-submodules-on-demand.
> 
> 
>  t/t5526-fetch-submodules.sh | 42 ++
>  1 file changed, 42 insertions(+)
> 
> diff --git a/t/t5526-fetch-submodules.sh b/t/t5526-fetch-submodules.sh
> index a552ad4ead..c82d519e06 100755
> --- a/t/t5526-fetch-submodules.sh
> +++ b/t/t5526-fetch-submodules.sh
> @@ -571,6 +571,7 @@ test_expect_success 'fetching submodule into a broken 
> repository' '
>  '
>  
>  test_expect_success "fetch new commits when submodule got renamed" '
> + test_when_finished "rm -rf downstream_rename" &&
>   git clone . downstream_rename &&
>   (
>   cd downstream_rename &&
> @@ -605,4 +606,45 @@ test_expect_success "fetch new commits when submodule 
> got renamed" '
>   test_cmp expect actual
>  '
>  
> +test_expect_success "warn on submodule name/path clash, but new commits 
> fetched in renamed" '
> + test_when_finished "rm -rf downstream_rename" &&
> + git clone . downstream_rename &&
> + (
> + cd downstream_rename &&
> + git submodule update --init &&
> +# NEEDSWORK: we omitted --recursive for the submodule update here since
> +# that does not work. See test 7001 for mv "moving nested submodules"
> +# for details. Once that is fixed we should add the --recursive option
> +# here.
> + git checkout -b rename &&
> + git mv submodule submodule_renamed &&
> + (
> + cd submodule_renamed &&
> + git checkout -b rename_sub &&
> + echo a >a &&
> + git add a &&
> + git commit -ma &&
> + git push origin rename_sub &&
> + git rev-parse HEAD >../../expect
> + ) &&
> + git add submodule_renamed &&
> + git commit -m "update renamed submodule" &&
> + # produce collision, note that we use no submodule command
> + git clone ../submodule submodule &&
> + git add submodule &&

A small note even though this is not meant for inclusion: This would
break when I start working on teaching 'git add' to set default values
in .gitmodules when available.

But I guess I will discover a few other places, when starting that, that
will break in the tests anyway.

Cheers Heiko


Re: [PATCH 2/2] fetch, push: keep separate lists of submodules and gitlinks

2017-10-23 Thread Heiko Voigt
On Thu, Oct 19, 2017 at 11:11:09AM -0700, Stefan Beller wrote:
> Currently when fetching we collect the names of submodules to be fetched
> in a list. As we also want to support fetching 'gitlinks, that happen to
> have a repo checked out at the right place', we'll just pretend that these
> are submodules. We do that by assuming their path is their name. This in
> turn can yield collisions between the name-namespace and the
> path-namespace. (See the previous test for a demonstration.)
> 
> This patch rewrites the code such that we treat the 'real submodule' case
> differently from the 'gitlink, but ok' case. This introduces a bit
> of code duplication, but gets rid of the confusing mapping between names
> and paths.
> 
> The test is incomplete as the long term vision is not achieved yet.
> (which would be fetching both the renamed submodule as well as
> the gitlink thing, putting them in place via e.g. git-pull)
> 
> Signed-off-by: Stefan Beller 
> ---
> 
>  Heiko,
>  Junio,
> 
>  I assumed the code would ease up a lot more, but now I am undecided if
>  I want to keep arguing as the code is not stopping to be ugly. :)

So we are basically coming to the same conclusion? :) My previous
fallback approach basically did the same but with the old architecture
(without parallel fetch, ...) and was already ugly.

With the fallback on submodule default names approach we can keep most
of the old functionality and keep the code that handles that minimal.

Since there is only a small (IMO quite unlikely) cornercase that could
break peoples expectations I would like to have a look whether anyone
even notices the behavioral change on next or master. If there are
complaints we can still extend and add the two lists.

>  The idea is to treat submodule and gitlinks separately, with submodules
>  supporting renames, and gitlinks as a historic artefact.
>  
>  Sorry for the noise about code ugliness.

Why sorry? For me it is actually interesting to see you basically coming
to the same conclusions.

Cheers Heiko


Re: [PATCH v4 3/3] submodule: simplify decision tree whether to or not to fetch

2017-10-19 Thread Heiko Voigt
On Thu, Oct 19, 2017 at 09:36:47AM +0900, Junio C Hamano wrote:
> Brandon Williams <bmw...@google.com> writes:
> 
> > On 10/16, Heiko Voigt wrote:
> >> To make extending this logic later easier.
> >
> > This makes things so much clearer, thanks!
> 
> I agree that it is clear to see what the code after the patch does,
> but the code before the patch is so convoluted to follow that it is
> a bit hard to see if the code before and after are doing the same
> thing, though ;-)

That is why I would appreciate some extra pairs of eyes on this :) I
tried to be as careful as possible when refactoring this, but since it
is quite convoluted something might have slipped through. The testsuite
does not show anything, but there might be corner cases that are not
tested I guess.

Will hopefully have time to look into the comments to the main patch of
this series tomorrow. Did not get around to properly do that yet.

Cheers Heiko

> >> Signed-off-by: Heiko Voigt <hvo...@hvoigt.net>
> >> ---
> >>  submodule.c | 74 
> >> ++---
> >>  1 file changed, 37 insertions(+), 37 deletions(-)
> >> 
> >> diff --git a/submodule.c b/submodule.c
> >> index 71d1773e2e..82d206eb65 100644
> >> --- a/submodule.c
> >> +++ b/submodule.c
> >> @@ -1187,6 +1187,31 @@ struct submodule_parallel_fetch {
> >>  };
> >>  #define SPF_INIT {0, ARGV_ARRAY_INIT, NULL, NULL, 0, 0, 0, 0}
> >>  
> >> +static int get_fetch_recurse_config(const struct submodule *submodule,
> >> +  struct submodule_parallel_fetch *spf)
> >> +{
> >> +  if (spf->command_line_option != RECURSE_SUBMODULES_DEFAULT)
> >> +  return spf->command_line_option;
> >> +
> >> +  if (submodule) {
> >> +  char *key;
> >> +  const char *value;
> >> +
> >> +  int fetch_recurse = submodule->fetch_recurse;
> >> +  key = xstrfmt("submodule.%s.fetchRecurseSubmodules", 
> >> submodule->name);
> >> +  if (!repo_config_get_string_const(the_repository, key, )) 
> >> {
> >> +  fetch_recurse = parse_fetch_recurse_submodules_arg(key, 
> >> value);
> >> +  }
> >> +  free(key);
> >> +
> >> +  if (fetch_recurse != RECURSE_SUBMODULES_NONE)
> >> +  /* local config overrules everything except commandline 
> >> */
> >> +  return fetch_recurse;
> >> +  }
> >> +
> >> +  return spf->default_option;
> >> +}
> >> +
> >>  static int get_next_submodule(struct child_process *cp,
> >>  struct strbuf *err, void *data, void **task_cb)
> >>  {
> >> @@ -1214,46 +1239,21 @@ static int get_next_submodule(struct child_process 
> >> *cp,
> >>}
> >>}
> >>  
> >> -  default_argv = "yes";
> >> -  if (spf->command_line_option == RECURSE_SUBMODULES_DEFAULT) {
> >> -  int fetch_recurse = RECURSE_SUBMODULES_NONE;
> >> -
> >> -  if (submodule) {
> >> -  char *key;
> >> -  const char *value;
> >> -
> >> -  fetch_recurse = submodule->fetch_recurse;
> >> -  key = 
> >> xstrfmt("submodule.%s.fetchRecurseSubmodules", submodule->name);
> >> -  if 
> >> (!repo_config_get_string_const(the_repository, key, )) {
> >> -  fetch_recurse = 
> >> parse_fetch_recurse_submodules_arg(key, value);
> >> -  }
> >> -  free(key);
> >> -  }
> >> -
> >> -  if (fetch_recurse != RECURSE_SUBMODULES_NONE) {
> >> -  if (fetch_recurse == RECURSE_SUBMODULES_OFF)
> >> -  continue;
> >> -  if (fetch_recurse == 
> >> RECURSE_SUBMODULES_ON_DEMAND) {
> >> -  if 
> >> (!unsorted_string_list_lookup(_submodule_names,
> >> -   
> >> submodule->name))
> >> -  continue;
>

[PATCH v4 2/3] implement fetching of moved submodules

2017-10-16 Thread Heiko Voigt
We store the changed submodules paths to calculate which submodule needs
fetching. This does not work for moved submodules since their paths do
not stay the same in case of a moved submodules. In case of new
submodules we do not have a path in the current checkout, since they
just appeared in this fetch.

It is more general to collect the submodule names for changes instead of
their paths to include the above cases. If we do not have a
configuration for a gitlink we rely on constructing a default name from
the path if a git repository can be found at its path. We skip
non-configured gitlinks whose default name collides with a configured
one.

With the change described above we implement 'on-demand' fetching of
changes in moved submodules.

Signed-off-by: Heiko Voigt <hvo...@hvoigt.net>
---
 submodule-config.h  |   3 +
 submodule.c | 138 
 t/t5526-fetch-submodules.sh |  35 +++
 3 files changed, 138 insertions(+), 38 deletions(-)

diff --git a/submodule-config.h b/submodule-config.h
index e3845831f6..a5503a5d17 100644
--- a/submodule-config.h
+++ b/submodule-config.h
@@ -22,6 +22,9 @@ struct submodule {
int recommend_shallow;
 };
 
+#define SUBMODULE_INIT { NULL, NULL, NULL, RECURSE_SUBMODULES_NONE, \
+   NULL, NULL, SUBMODULE_UPDATE_STRATEGY_INIT, {0}, -1 };
+
 struct submodule_cache;
 struct repository;
 
diff --git a/submodule.c b/submodule.c
index 63e7094e16..71d1773e2e 100644
--- a/submodule.c
+++ b/submodule.c
@@ -21,7 +21,7 @@
 #include "parse-options.h"
 
 static int config_update_recurse_submodules = RECURSE_SUBMODULES_OFF;
-static struct string_list changed_submodule_paths = STRING_LIST_INIT_DUP;
+static struct string_list changed_submodule_names = STRING_LIST_INIT_DUP;
 static int initialized_fetch_ref_tips;
 static struct oid_array ref_tips_before_fetch;
 static struct oid_array ref_tips_after_fetch;
@@ -674,11 +674,11 @@ const struct submodule *submodule_from_ce(const struct 
cache_entry *ce)
 }
 
 static struct oid_array *submodule_commits(struct string_list *submodules,
-  const char *path)
+  const char *name)
 {
struct string_list_item *item;
 
-   item = string_list_insert(submodules, path);
+   item = string_list_insert(submodules, name);
if (item->util)
return (struct oid_array *) item->util;
 
@@ -687,39 +687,67 @@ static struct oid_array *submodule_commits(struct 
string_list *submodules,
return (struct oid_array *) item->util;
 }
 
+struct collect_changed_submodules_cb_data {
+   struct string_list *changed;
+   const struct object_id *commit_oid;
+};
+
+/*
+ * this would normally be two functions: default_name_from_path() and
+ * path_from_default_name(). Since the default name is the same as
+ * the submodule path we can get away with just one function which only
+ * checks whether there is a submodule in the working directory at that
+ * location.
+ */
+static const char *default_name_or_path(const char *path_or_name)
+{
+   int error_code;
+
+   if (!is_submodule_populated_gently(path_or_name, _code))
+   return NULL;
+
+   return path_or_name;
+}
+
 static void collect_changed_submodules_cb(struct diff_queue_struct *q,
  struct diff_options *options,
  void *data)
 {
+   struct collect_changed_submodules_cb_data *me = data;
+   struct string_list *changed = me->changed;
+   const struct object_id *commit_oid = me->commit_oid;
int i;
-   struct string_list *changed = data;
 
for (i = 0; i < q->nr; i++) {
struct diff_filepair *p = q->queue[i];
struct oid_array *commits;
+   const struct submodule *submodule;
+   const char *name;
+
if (!S_ISGITLINK(p->two->mode))
continue;
 
-   if (S_ISGITLINK(p->one->mode)) {
-   /*
-* NEEDSWORK: We should honor the name configured in
-* the .gitmodules file of the commit we are examining
-* here to be able to correctly follow submodules
-* being moved around.
-*/
-   commits = submodule_commits(changed, p->two->path);
-   oid_array_append(commits, >two->oid);
-   } else {
-   /* Submodule is new or was moved here */
-   /*
-* NEEDSWORK: When the .git directories of submodules
-* live inside the superprojects .git directory some
-* day we should fetch new submodules directly into
-* that location too w

[PATCH v4 3/3] submodule: simplify decision tree whether to or not to fetch

2017-10-16 Thread Heiko Voigt
To make extending this logic later easier.

Signed-off-by: Heiko Voigt <hvo...@hvoigt.net>
---
 submodule.c | 74 ++---
 1 file changed, 37 insertions(+), 37 deletions(-)

diff --git a/submodule.c b/submodule.c
index 71d1773e2e..82d206eb65 100644
--- a/submodule.c
+++ b/submodule.c
@@ -1187,6 +1187,31 @@ struct submodule_parallel_fetch {
 };
 #define SPF_INIT {0, ARGV_ARRAY_INIT, NULL, NULL, 0, 0, 0, 0}
 
+static int get_fetch_recurse_config(const struct submodule *submodule,
+   struct submodule_parallel_fetch *spf)
+{
+   if (spf->command_line_option != RECURSE_SUBMODULES_DEFAULT)
+   return spf->command_line_option;
+
+   if (submodule) {
+   char *key;
+   const char *value;
+
+   int fetch_recurse = submodule->fetch_recurse;
+   key = xstrfmt("submodule.%s.fetchRecurseSubmodules", 
submodule->name);
+   if (!repo_config_get_string_const(the_repository, key, )) 
{
+   fetch_recurse = parse_fetch_recurse_submodules_arg(key, 
value);
+   }
+   free(key);
+
+   if (fetch_recurse != RECURSE_SUBMODULES_NONE)
+   /* local config overrules everything except commandline 
*/
+   return fetch_recurse;
+   }
+
+   return spf->default_option;
+}
+
 static int get_next_submodule(struct child_process *cp,
  struct strbuf *err, void *data, void **task_cb)
 {
@@ -1214,46 +1239,21 @@ static int get_next_submodule(struct child_process *cp,
}
}
 
-   default_argv = "yes";
-   if (spf->command_line_option == RECURSE_SUBMODULES_DEFAULT) {
-   int fetch_recurse = RECURSE_SUBMODULES_NONE;
-
-   if (submodule) {
-   char *key;
-   const char *value;
-
-   fetch_recurse = submodule->fetch_recurse;
-   key = 
xstrfmt("submodule.%s.fetchRecurseSubmodules", submodule->name);
-   if 
(!repo_config_get_string_const(the_repository, key, )) {
-   fetch_recurse = 
parse_fetch_recurse_submodules_arg(key, value);
-   }
-   free(key);
-   }
-
-   if (fetch_recurse != RECURSE_SUBMODULES_NONE) {
-   if (fetch_recurse == RECURSE_SUBMODULES_OFF)
-   continue;
-   if (fetch_recurse == 
RECURSE_SUBMODULES_ON_DEMAND) {
-   if 
(!unsorted_string_list_lookup(_submodule_names,
-
submodule->name))
-   continue;
-   default_argv = "on-demand";
-   }
-   } else {
-   if (spf->default_option == 
RECURSE_SUBMODULES_OFF)
-   continue;
-   if (spf->default_option == 
RECURSE_SUBMODULES_ON_DEMAND) {
-   if 
(!unsorted_string_list_lookup(_submodule_names,
- 
submodule->name))
-   continue;
-   default_argv = "on-demand";
-   }
-   }
-   } else if (spf->command_line_option == 
RECURSE_SUBMODULES_ON_DEMAND) {
-   if 
(!unsorted_string_list_lookup(_submodule_names,
+   switch (get_fetch_recurse_config(submodule, spf))
+   {
+   default:
+   case RECURSE_SUBMODULES_DEFAULT:
+   case RECURSE_SUBMODULES_ON_DEMAND:
+   if (!submodule || 
!unsorted_string_list_lookup(_submodule_names,
 submodule->name))
continue;
default_argv = "on-demand";
+   break;
+   case RECURSE_SUBMODULES_ON:
+   default_argv = "yes";
+   break;
+   case RECURSE_SUBMODULES_OFF:
+   continue;
}
 
strbuf_addf(_path, "%s/%s", spf->work_tree, ce->name);
-- 
2.14.1.145.gb3622a4



[PATCH v4 1/3] fetch: add test to make sure we stay backwards compatible

2017-10-16 Thread Heiko Voigt
The current implementation of submodules supports on-demand fetch if
there is no .gitmodules entry for a submodule. Let's add a test to
document this behavior.

Signed-off-by: Heiko Voigt <hvo...@hvoigt.net>
---
 t/t5526-fetch-submodules.sh | 42 +-
 1 file changed, 41 insertions(+), 1 deletion(-)

diff --git a/t/t5526-fetch-submodules.sh b/t/t5526-fetch-submodules.sh
index 42251f7f3a..43a22f680f 100755
--- a/t/t5526-fetch-submodules.sh
+++ b/t/t5526-fetch-submodules.sh
@@ -478,7 +478,47 @@ test_expect_success "don't fetch submodule when newly 
recorded commits are alrea
git fetch >../actual.out 2>../actual.err
) &&
! test -s actual.out &&
-   test_i18ncmp expect.err actual.err
+   test_i18ncmp expect.err actual.err &&
+   (
+   cd submodule &&
+   git checkout -q master
+   )
+'
+
+test_expect_success "'fetch.recurseSubmodules=on-demand' works also without 
.gitmodule entry" '
+   (
+   cd downstream &&
+   git fetch --recurse-submodules
+   ) &&
+   add_upstream_commit &&
+   head1=$(git rev-parse --short HEAD) &&
+   git add submodule &&
+   git rm .gitmodules &&
+   git commit -m "new submodule without .gitmodules" &&
+   printf "" >expect.out &&
+   head2=$(git rev-parse --short HEAD) &&
+   echo "From $pwd/." >expect.err.2 &&
+   echo "   $head1..$head2  master -> origin/master" >>expect.err.2 &&
+   head -3 expect.err >>expect.err.2 &&
+   (
+   cd downstream &&
+   rm .gitmodules &&
+   git config fetch.recurseSubmodules on-demand &&
+   # fake submodule configuration to avoid skipping submodule 
handling
+   git config -f .gitmodules submodule.fake.path fake &&
+   git config -f .gitmodules submodule.fake.url fakeurl &&
+   git add .gitmodules &&
+   git config --unset submodule.submodule.url &&
+   git fetch >../actual.out 2>../actual.err &&
+   # cleanup
+   git config --unset fetch.recurseSubmodules &&
+   git reset --hard
+   ) &&
+   test_i18ncmp expect.out actual.out &&
+   test_i18ncmp expect.err.2 actual.err &&
+   git checkout HEAD^ -- .gitmodules &&
+   git add .gitmodules &&
+   git commit -m "new submodule restored .gitmodules"
 '
 
 test_expect_success 'fetching submodules respects parallel settings' '
-- 
2.14.1.145.gb3622a4



[PATCH v4 0/3] implement fetching of moved submodules

2017-10-16 Thread Heiko Voigt
The previous RFC iteration can be found here:

https://public-inbox.org/git/20171006222544.GA26642@sandbox/

This should now be in a state ready for review for inclusion.

The main difference from last iteration is that we now also support
unconfigured gitlinks for push and fetch for backwards compatibility.

To implement this compatibility we construct a default name for gitlinks
if there is a repository found at their location in the worktree.

Cheers Heiko

Heiko Voigt (3):
  fetch: add test to make sure we stay backwards compatible
  implement fetching of moved submodules
  submodule: simplify decision tree whether to or not to fetch

 submodule-config.h  |   3 +
 submodule.c | 200 +---
 t/t5526-fetch-submodules.sh |  77 -
 3 files changed, 210 insertions(+), 70 deletions(-)

-- 
2.14.1.145.gb3622a4



Re: [RFC PATCH 2/4] change submodule push test to use proper repository setup

2017-10-11 Thread Heiko Voigt
On Wed, Oct 11, 2017 at 08:31:37AM +0900, Junio C Hamano wrote:
> Stefan Beller  writes:
> 
> > So you propose to make git-add behave like "git submodule add"
> > (i.e. also add the .gitmodules entry for name/path/URL), which I
> > like from a submodule perspective.
> >
> > However other users of gitlinks might be confused[1], which is why
> > I refrained from "making every gitlink into a submodule". Specifically
> > the more powerful a submodule operation is (the more fluff adds),
> > the harder it should be for people to mis-use it.
> 
> A few questions that come to mind are:
> 
>  - Does "git add sub/" have enough information to populate
>.gitmodules?  If we have reasonable "default" values for
>.gitmodules entries (e.g. missing URL means we won't fetch when
>asked to go recursively fetch), perhaps we can leave everything
>other than "submodule.$name.path" undefined.

My suggestion would be: If we do not have them we do not populate them.
We could even go further and say: If we do not have the set "git
submodule add" would populate then we do not add anything to .gitmodules
and warn the user.

>  - Can't we help those who have gitlinks without .gitmodules entries
>exactly the same way as above, i.e. when we see a gitlink and try
>to treat it as a submodule, we'd first try to look it up from
>.gitmodules (by going from path to name and then to
>submodule.$name.$var); the above "'git add sub/' would add an
>entry for .gitmodules" wish is based on the assumption that there
>are reasonable "default" values for each of these $var--so by
>basing on the same assumption, we can "pretend" as if these
>submodule.$name.$var were in .gitmodules file when we see
>gitlinks without .gitmodules entries.  IOW, if "git add sub/" can
>add .gitmodules to help people without having to type "git
>submodule add sub/", then we can give exactly the same degree of
>help without even modifying .gitmodules when "git add sub/" is
>run.

This "default" value thing got me thinking in a different direction. We
could use a scheme like that to get names (and values) for submodules
that are missing from the .gitmodules file. If we decide that we need to
handle them.

Cheers Heiko


Re: [RFC PATCH 2/4] change submodule push test to use proper repository setup

2017-10-11 Thread Heiko Voigt
On Tue, Oct 10, 2017 at 11:39:21AM -0700, Stefan Beller wrote:
> So you propose to make git-add behave like "git submodule add"
> (i.e. also add the .gitmodules entry for name/path/URL), which I
> like from a submodule perspective.

Well more like: clone and add will behave like "git submodule add" but
basically yes.

> However other users of gitlinks might be confused[1], which is why
> I refrained from "making every gitlink into a submodule". Specifically
> the more powerful a submodule operation is (the more fluff adds),
> the harder it should be for people to mis-use it.
> 
> [1] https://github.com/git-series/git-series/blob/master/INTERNALS.md
>  "git-series uses gitlinks to store pointer to commits in its own repo."

But would those users use

git add

to add a gitlink? From the description in that file I read that it
points to commits in its own repository. Will there also be files
checked out like submodules at that location?

Otherwise I would propose that 'git add' could detect whether a gitlink
is a submodule by trying to read its git configuration. If we do not
find that we simply do not do anything.

> > If everyone agrees that submodules are the default way of handling
> > repositories insided repositories, IMO, 'git add' should also alter
> > .gitmodules by default. We could provide a switch to avoid doing that.
> 
> I wonder if that switch should be default-on (i.e. not treat a gitlink as
> a submodule initially, behavior as-is, and then eventually we will
> die() on unconfigured repos, expecting the user to make the decision)
> 
> > An intermediate solution would be to warn
> 
> That is already implemented by Peff.

Ah ok, thanks I suspected so when I realized that this discussion was
older.

> > but in the long run my goal
> > for submodules is and always was: Make them behave as close to files as
> > possible. And why should a 'git add submodule' not magically do
> > everything it can to make submodules just work? I can look into a patch
> > for that if people agree here...
> 
> I'd love to see this implemented. I cc'd Josh (the author of git-series), who
> may disagree with this, or has some good input how to go forward without
> breaking git-series.

Yeah, lets see if, as described above, that actually would break
git-series.

> > Regarding handling of gitlinks with or without .gitmodules:
> >
> > Currently we are actually in some intermediate state:
> >
> >  * If there is no .gitmodules file: No submodule processing on any
> >gitlinks (AFAIK)
> 
> AFAIK this is true.
> 
> >  * If there is a .gitmodules files with some submodule configured: Do
> >recursive fetch and push as far as possible on gitlinks.
> 
> * If submodule.recurse is set, then we also treat submodules like files
>   for checkout, reset, read-tree.

To clarify: If submodule.recurse is set but there is no .gitmodules file
we do submodule processing for the above commands?

> > So I am not sure whether there are actually many users (knowingly)
> > using a mix of some submodules configured and some not and then relying
> > on the submodule infrastructure.
> >
> > I would rather expect two sorts of users:
> >
> >   1. Those that do use .gitmodules
> 
> Those want to reap all benefits of good submodules.
> 
> >
> >   2. Those that do *not* use .gitmodules
> 
> As said above, we don't know if those users are
> "holding submodules wrong" or are using gitlinks for
> magic tricks (unrelated to submodules).

I did not want to say that they are "holding submodules wrong" but
rather that if they do not use .gitmodules they do that knowingly and
thus consistently not use .gitmodules for any gitlink.

> > Users that do not use any .gitmodules file will currently (AFAIK) not
> > get any submodule handling. So the question is are there really many
> > "mixed users"? My guess would be no.
> 
> I hope that there are few (if any) users of these mixed setups.

That sounds promising.

> > Because without those using this mixed we could switch to saying: "You
> > need to have a .gitmodules file for submodule handling" without much
> > fallout from breaking users use cases.
> 
> That seems reasonable to me, actually.

Nice.

> > Maybe we can test this out somehow? My patch series would be ready in
> > that case, just had to drop the first patch and adjust the commit
> > message of this one.
> 
> I wonder how we would test this, though? Do you have any idea
> (even vague) how we'd accomplish such a measurement?
> I fear we'll have to go this way blindly.

One idea would be to expose this somewhere to a limited amount of users.
I remember Jonathan was suggesting, back when Jens was working on the
recursive checkout, that he could add the series to the debian package
and see what happens. Or we could use Junios next branch? Something like
that. If we get complaints we know the assumption was wrong and we need
a fallback.

Cheers Heiko


Re: [RFC PATCH 2/4] change submodule push test to use proper repository setup

2017-10-10 Thread Heiko Voigt
Hi,

On Mon, Oct 09, 2017 at 11:20:51AM -0700, Stefan Beller wrote:
> On Fri, Oct 6, 2017 at 3:32 PM, Heiko Voigt <hvo...@hvoigt.net> wrote:
> > NOTE: The argument in this message is not correct, see description in
> > cover letter.
> >
> > The setup of the repositories in this test is using gitlinks without the
> > .gitmodules infrastructure. It is however testing convenience features
> > like --recurse-submodules=on-demand. These features are already not
> > supported by fetch without a .gitmodules file. This leads us to the
> > conclusion that it is not really used here as well.
> >
> > Let's use the usual submodule commands to setup the repository in a
> > typical way. This also has the advantage that we are testing with a
> > repository structure that is more similar to one we could expect on a
> > users setup.
> >
> > Signed-off-by: Heiko Voigt <hvo...@hvoigt.net>
> > ---
> >
> > As mentioned in the cover letter. This seems to be the only test that
> > ensures that we stay compatible with setups without .gitmodules. Maybe
> > we should add/revive some?
> 
> An interesting discussion covering this topic is found at
> https://public-inbox.org/git/20170606035650.oykbz2uc4xkr3...@sigill.intra.peff.net/

Thanks for that pointer. So in that discussion Junio said that the
recursive operations should succeed if we have everything necessary at
hand. I kind of agree because why should we limit usage when not
necessary. On the other hand we want git to be easy to use. And that
example from Peff is perfect as a demonstration of a incosistency we
currently have:

git clone git://some.where.git/submodule.git
git add submodule

is an operation I remember, I did, when first getting in contact with
submodules (many years back), since that is one intuitive way. And the
thing is: It works, kind of... Only later I discovered that one actually
needs to us a special submodule command to get everything approriately
setup to work together with others.

If everyone agrees that submodules are the default way of handling
repositories insided repositories, IMO, 'git add' should also alter
.gitmodules by default. We could provide a switch to avoid doing that.

An intermediate solution would be to warn but in the long run my goal
for submodules is and always was: Make them behave as close to files as
possible. And why should a 'git add submodule' not magically do
everything it can to make submodules just work? I can look into a patch
for that if people agree here...

Regarding handling of gitlinks with or without .gitmodules:

Currently we are actually in some intermediate state:

 * If there is no .gitmodules file: No submodule processing on any
   gitlinks (AFAIK)
 * If there is a .gitmodules files with some submodule configured: Do
   recursive fetch and push as far as possible on gitlinks.

So I am not sure whether there are actually many users (knowingly)
using a mix of some submodules configured and some not and then relying
on the submodule infrastructure.

I would rather expect two sorts of users:

  1. Those that do use .gitmodules

  2. Those that do *not* use .gitmodules

Users that do not use any .gitmodules file will currently (AFAIK) not
get any submodule handling. So the question is are there really many
"mixed users"? My guess would be no.
Because without those using this mixed we could switch to saying: "You
need to have a .gitmodules file for submodule handling" without much
fallout from breaking users use cases.

Maybe we can test this out somehow? My patch series would be ready in
that case, just had to drop the first patch and adjust the commit
message of this one.

Cheers Heiko


Re: "git rm" seems to do recursive removal even without "-r"

2017-10-10 Thread Heiko Voigt
On Sun, Oct 08, 2017 at 07:56:20AM -0400, Robert P. J. Day wrote:
>   but as i asked in my earlier post, if i wanted to remove *all* files
> with names of "Makefile*", why can't i use:
> 
>   $ git rm 'Makefile*'
> 
> just as i used:
> 
>   $ git rm '*.c'
> 
> are those not both acceptable fileglobs? why does the former clearly
> only match the top-level Makefile, and refuse to cross directory
> boundaries?

Maybe think about it this way: The only difference between git's
globbing and the default shell globbing is that the '/' in a path has a
special meaning. The shells expansion stops at a '/' but git does not.

So with *.c the shell matches: blabla.c, blub.c, ...  but not
subdir/bla.c, subdir/blub.c, ... since it only considers files in the
current directory. A little different for Makefile* that will also match
Makefile.bla, Makefile/bla or Makefile_bla/blub in shell but not
subdir/Makefile or bla.Makefile. Basically anything directly in *this*
directory that *starts* with 'Makefile'.

Git on the other hand does not consider '/' to be special. So *.c
matches all of the path above: bla.c, blub.c, subdir/bla.c,
subdir/blub.c. Basically any file below the current directory with a
path that ends in '.c'. With Makefile* it is the opposite: Every file
below the current directory that *starts* with 'Makefile'. So
Makefile.bla, Makefile/bla, ... but also not subdir/Makefile or
bla.Makefile.

Maybe that helps...

Cheers Heiko



[RFC PATCH v3 3/4] implement fetching of moved submodules

2017-10-06 Thread Heiko Voigt
We store the changed submodules paths to calculate which submodule needs
fetching. This does not work for moved submodules since their paths do
not stay the same in case of a moved submodules. In case of new
submodules we do not have a path in the current checkout, since they
just appeared in this fetch.

It is more general to collect the submodule names for changes instead of
their paths to include the above cases.

With the change described above we implement 'on-demand' fetching of
changes in moved submodules.

Note: This does only work when repositories have a .gitmodules file. In
other words: We now require a name for a submodule repository.

Signed-off-by: Heiko Voigt <hvo...@hvoigt.net>
---

No changes from the previous version here.

 submodule.c | 91 +
 t/t5526-fetch-submodules.sh | 35 +
 2 files changed, 85 insertions(+), 41 deletions(-)

diff --git a/submodule.c b/submodule.c
index 63e7094..0c586a0 100644
--- a/submodule.c
+++ b/submodule.c
@@ -21,7 +21,7 @@
 #include "parse-options.h"
 
 static int config_update_recurse_submodules = RECURSE_SUBMODULES_OFF;
-static struct string_list changed_submodule_paths = STRING_LIST_INIT_DUP;
+static struct string_list changed_submodule_names = STRING_LIST_INIT_DUP;
 static int initialized_fetch_ref_tips;
 static struct oid_array ref_tips_before_fetch;
 static struct oid_array ref_tips_after_fetch;
@@ -674,11 +674,11 @@ const struct submodule *submodule_from_ce(const struct 
cache_entry *ce)
 }
 
 static struct oid_array *submodule_commits(struct string_list *submodules,
-  const char *path)
+  const char *name)
 {
struct string_list_item *item;
 
-   item = string_list_insert(submodules, path);
+   item = string_list_insert(submodules, name);
if (item->util)
return (struct oid_array *) item->util;
 
@@ -687,39 +687,34 @@ static struct oid_array *submodule_commits(struct 
string_list *submodules,
return (struct oid_array *) item->util;
 }
 
+struct collect_changed_submodules_cb_data {
+   struct string_list *changed;
+   const struct object_id *commit_oid;
+};
+
 static void collect_changed_submodules_cb(struct diff_queue_struct *q,
  struct diff_options *options,
  void *data)
 {
+   struct collect_changed_submodules_cb_data *me = data;
+   struct string_list *changed = me->changed;
+   const struct object_id *commit_oid = me->commit_oid;
int i;
-   struct string_list *changed = data;
 
for (i = 0; i < q->nr; i++) {
struct diff_filepair *p = q->queue[i];
struct oid_array *commits;
+   const struct submodule *submodule;
+
if (!S_ISGITLINK(p->two->mode))
continue;
 
-   if (S_ISGITLINK(p->one->mode)) {
-   /*
-* NEEDSWORK: We should honor the name configured in
-* the .gitmodules file of the commit we are examining
-* here to be able to correctly follow submodules
-* being moved around.
-*/
-   commits = submodule_commits(changed, p->two->path);
-   oid_array_append(commits, >two->oid);
-   } else {
-   /* Submodule is new or was moved here */
-   /*
-* NEEDSWORK: When the .git directories of submodules
-* live inside the superprojects .git directory some
-* day we should fetch new submodules directly into
-* that location too when config or options request
-* that so they can be checked out from there.
-*/
+   submodule = submodule_from_path(commit_oid, p->two->path);
+   if (!submodule)
continue;
-   }
+
+   commits = submodule_commits(changed, submodule->name);
+   oid_array_append(commits, >two->oid);
}
 }
 
@@ -742,11 +737,14 @@ static void collect_changed_submodules(struct string_list 
*changed,
 
while ((commit = get_revision())) {
struct rev_info diff_rev;
+   struct collect_changed_submodules_cb_data data;
+   data.changed = changed;
+   data.commit_oid = >object.oid;
 
init_revisions(_rev, NULL);
diff_rev.diffopt.output_format |= DIFF_FORMAT_CALLBACK;
diff_rev.diffopt.format_callback = 
collect_changed_submodules_cb;
-   diff_rev.diffopt.for

[RFC PATCH v3 4/4] submodule: simplify decision tree whether to or not to fetch

2017-10-06 Thread Heiko Voigt
To make extending this logic later easier.

Signed-off-by: Heiko Voigt <hvo...@hvoigt.net>
---

This should also be the same as in the previous version.

 submodule.c | 74 ++---
 1 file changed, 37 insertions(+), 37 deletions(-)

diff --git a/submodule.c b/submodule.c
index 0c586a0..c7b32c6 100644
--- a/submodule.c
+++ b/submodule.c
@@ -1142,6 +1142,31 @@ struct submodule_parallel_fetch {
 };
 #define SPF_INIT {0, ARGV_ARRAY_INIT, NULL, NULL, 0, 0, 0, 0}
 
+static int get_fetch_recurse_config(const struct submodule *submodule,
+   struct submodule_parallel_fetch *spf)
+{
+   if (spf->command_line_option != RECURSE_SUBMODULES_DEFAULT)
+   return spf->command_line_option;
+
+   if (submodule) {
+   char *key;
+   const char *value;
+
+   int fetch_recurse = submodule->fetch_recurse;
+   key = xstrfmt("submodule.%s.fetchRecurseSubmodules", 
submodule->name);
+   if (!repo_config_get_string_const(the_repository, key, )) 
{
+   fetch_recurse = parse_fetch_recurse_submodules_arg(key, 
value);
+   }
+   free(key);
+
+   if (fetch_recurse != RECURSE_SUBMODULES_NONE)
+   /* local config overrules everything except commandline 
*/
+   return fetch_recurse;
+   }
+
+   return spf->default_option;
+}
+
 static int get_next_submodule(struct child_process *cp,
  struct strbuf *err, void *data, void **task_cb)
 {
@@ -1161,46 +1186,21 @@ static int get_next_submodule(struct child_process *cp,
 
submodule = submodule_from_path(_oid, ce->name);
 
-   default_argv = "yes";
-   if (spf->command_line_option == RECURSE_SUBMODULES_DEFAULT) {
-   int fetch_recurse = RECURSE_SUBMODULES_NONE;
-
-   if (submodule) {
-   char *key;
-   const char *value;
-
-   fetch_recurse = submodule->fetch_recurse;
-   key = 
xstrfmt("submodule.%s.fetchRecurseSubmodules", submodule->name);
-   if 
(!repo_config_get_string_const(the_repository, key, )) {
-   fetch_recurse = 
parse_fetch_recurse_submodules_arg(key, value);
-   }
-   free(key);
-   }
-
-   if (fetch_recurse != RECURSE_SUBMODULES_NONE) {
-   if (fetch_recurse == RECURSE_SUBMODULES_OFF)
-   continue;
-   if (fetch_recurse == 
RECURSE_SUBMODULES_ON_DEMAND) {
-   if 
(!unsorted_string_list_lookup(_submodule_names,
-
submodule->name))
-   continue;
-   default_argv = "on-demand";
-   }
-   } else {
-   if (spf->default_option == 
RECURSE_SUBMODULES_OFF)
-   continue;
-   if (spf->default_option == 
RECURSE_SUBMODULES_ON_DEMAND) {
-   if 
(!unsorted_string_list_lookup(_submodule_names,
- 
submodule->name))
-   continue;
-   default_argv = "on-demand";
-   }
-   }
-   } else if (spf->command_line_option == 
RECURSE_SUBMODULES_ON_DEMAND) {
-   if 
(!unsorted_string_list_lookup(_submodule_names,
+   switch (get_fetch_recurse_config(submodule, spf))
+   {
+   default:
+   case RECURSE_SUBMODULES_DEFAULT:
+   case RECURSE_SUBMODULES_ON_DEMAND:
+   if (!submodule || 
!unsorted_string_list_lookup(_submodule_names,
 submodule->name))
continue;
default_argv = "on-demand";
+   break;
+   case RECURSE_SUBMODULES_ON:
+   default_argv = "yes";
+   break;
+   case RECURSE_SUBMODULES_OFF:
+   continue;
}
 
strbuf_addf(_path, "%s/%s", spf->work_tree, ce->name);
-- 
2.10.0.129.g35f6318



[RFC PATCH 2/4] change submodule push test to use proper repository setup

2017-10-06 Thread Heiko Voigt
NOTE: The argument in this message is not correct, see description in
cover letter.

The setup of the repositories in this test is using gitlinks without the
.gitmodules infrastructure. It is however testing convenience features
like --recurse-submodules=on-demand. These features are already not
supported by fetch without a .gitmodules file. This leads us to the
conclusion that it is not really used here as well.

Let's use the usual submodule commands to setup the repository in a
typical way. This also has the advantage that we are testing with a
repository structure that is more similar to one we could expect on a
users setup.

Signed-off-by: Heiko Voigt <hvo...@hvoigt.net>
---

As mentioned in the cover letter. This seems to be the only test that
ensures that we stay compatible with setups without .gitmodules. Maybe
we should add/revive some?

Cheers Heiko

 t/t5531-deep-submodule-push.sh | 29 -
 1 file changed, 16 insertions(+), 13 deletions(-)

diff --git a/t/t5531-deep-submodule-push.sh b/t/t5531-deep-submodule-push.sh
index 39cb2c1..a4a2c6a 100755
--- a/t/t5531-deep-submodule-push.sh
+++ b/t/t5531-deep-submodule-push.sh
@@ -8,22 +8,26 @@ test_expect_success setup '
mkdir pub.git &&
GIT_DIR=pub.git git init --bare &&
GIT_DIR=pub.git git config receive.fsckobjects true &&
+   mkdir submodule &&
+   (
+   cd submodule &&
+   git init &&
+   git config push.default matching &&
+   >junk &&
+   git add junk &&
+   git commit -m "Initial junk"
+   ) &&
+   git clone --bare submodule submodule.git &&
mkdir work &&
(
cd work &&
git init &&
git config push.default matching &&
-   mkdir -p gar/bage &&
-   (
-   cd gar/bage &&
-   git init &&
-   git config push.default matching &&
-   >junk &&
-   git add junk &&
-   git commit -m "Initial junk"
-   ) &&
-   git add gar/bage &&
+   mkdir gar &&
+   git submodule add ../submodule.git gar/bage &&
git commit -m "Initial superproject"
+   cd gar/bage &&
+   git remote rm origin
)
 '
 
@@ -51,11 +55,10 @@ test_expect_success 'push if submodule has no remote' '
 
 test_expect_success 'push fails if submodule commit not on remote' '
(
-   cd work/gar &&
-   git clone --bare bage ../../submodule.git &&
-   cd bage &&
+   cd work/gar/bage &&
git remote add origin ../../../submodule.git &&
git fetch &&
+   git push --set-upstream origin master &&
>junk3 &&
git add junk3 &&
git commit -m "Third junk"
-- 
2.10.0.129.g35f6318



[RFC PATCH 1/4] fetch: add test to make sure we stay backwards compatible

2017-10-06 Thread Heiko Voigt
The current implementation of submodules supports on-demand fetch if
there is no .gitmodules entry for a submodule. Let's add a test to
document this behavior.

Signed-off-by: Heiko Voigt <hvo...@hvoigt.net>
---
 t/t5526-fetch-submodules.sh | 42 +-
 1 file changed, 41 insertions(+), 1 deletion(-)

diff --git a/t/t5526-fetch-submodules.sh b/t/t5526-fetch-submodules.sh
index 42251f7..43a22f6 100755
--- a/t/t5526-fetch-submodules.sh
+++ b/t/t5526-fetch-submodules.sh
@@ -478,7 +478,47 @@ test_expect_success "don't fetch submodule when newly 
recorded commits are alrea
git fetch >../actual.out 2>../actual.err
) &&
! test -s actual.out &&
-   test_i18ncmp expect.err actual.err
+   test_i18ncmp expect.err actual.err &&
+   (
+   cd submodule &&
+   git checkout -q master
+   )
+'
+
+test_expect_success "'fetch.recurseSubmodules=on-demand' works also without 
.gitmodule entry" '
+   (
+   cd downstream &&
+   git fetch --recurse-submodules
+   ) &&
+   add_upstream_commit &&
+   head1=$(git rev-parse --short HEAD) &&
+   git add submodule &&
+   git rm .gitmodules &&
+   git commit -m "new submodule without .gitmodules" &&
+   printf "" >expect.out &&
+   head2=$(git rev-parse --short HEAD) &&
+   echo "From $pwd/." >expect.err.2 &&
+   echo "   $head1..$head2  master -> origin/master" >>expect.err.2 &&
+   head -3 expect.err >>expect.err.2 &&
+   (
+   cd downstream &&
+   rm .gitmodules &&
+   git config fetch.recurseSubmodules on-demand &&
+   # fake submodule configuration to avoid skipping submodule 
handling
+   git config -f .gitmodules submodule.fake.path fake &&
+   git config -f .gitmodules submodule.fake.url fakeurl &&
+   git add .gitmodules &&
+   git config --unset submodule.submodule.url &&
+   git fetch >../actual.out 2>../actual.err &&
+   # cleanup
+   git config --unset fetch.recurseSubmodules &&
+   git reset --hard
+   ) &&
+   test_i18ncmp expect.out actual.out &&
+   test_i18ncmp expect.err.2 actual.err &&
+   git checkout HEAD^ -- .gitmodules &&
+   git add .gitmodules &&
+   git commit -m "new submodule restored .gitmodules"
 '
 
 test_expect_success 'fetching submodules respects parallel settings' '
-- 
2.10.0.129.g35f6318



[RFC PATCH v3 0/4] implement fetching of moved submodules

2017-10-06 Thread Heiko Voigt
The last iteration can be found here:

https://public-inbox.org/git/20170817105349.gc52...@book.hvoigt.net/

This is mainly a status update and to let people know that I am still
working on this.

I struggled quite a bit with reviving my original test for the path
based recursive fetch (first patch). The behavior seems to haved changed
and simply setting the submodule configuration in .git/config without
one in .gitmodules does not work anymore. I did not have time to
investigate whether this was a deliberate change or a maybe a bug?

So the solution for now is that I write my fake configuration (to avoid
skipping submodule handling altogether) into a .gitmodules file.

The second patch (cleanup of a submodule push testcase) was written
because that currently is the only test failing. It is not meant for
inclusion but rather as a demonstration of what might be happening when
we cleanup testcases: Because of the behavioral change above, on first
sight, it seemed like there was a shortcut in fetch and so on-demand
fetch without submodule configuration would not be supported anymore.

IIRC there were a lot more tests failing before when I implemented my
patch without the fallback on paths. So my guess is that some tests have
been cleaned up to use proper (.gitmodules) submodule setup.

So the thing here is: If we want to make sure that we stay backwards
compatible by supporting the setup with gitlinks without configuration.
Then we also should keep tests around that have the plain manual setup
without .gitmodules files. Just something, I think, we should keep in
mind.

Apart from the tests nothing has been added in this iteration. Since I
finally have a working test now I will continue with reviving the
fallback to paths.

Cheers Heiko

Heiko Voigt (4):
  fetch: add test to make sure we stay backwards compatible
  change submodule push test to use proper repository setup
  implement fetching of moved submodules
  submodule: simplify decision tree whether to or not to fetch

 submodule.c| 155 ++---
 t/t5526-fetch-submodules.sh|  77 +++-
 t/t5531-deep-submodule-push.sh |  29 
 3 files changed, 174 insertions(+), 87 deletions(-)

-- 
2.10.0.129.g35f6318



Re: [PATCH v2] add test for bug in git-mv for recursive submodules

2017-09-20 Thread Heiko Voigt
On Mon, Sep 18, 2017 at 01:03:32PM -0700, Stefan Beller wrote:
> >> Took a little while but here is a more clean patch creating individual
> >> submodules for the nesting.
> >>
> >> Cheers Heiko
> 
> Thanks for writing this test!

No worries. :)

> > Thanks.  Stefan, does this look good to you now?
> 
> Yes, though there are nits below.
> 
> > It is not quite clear which step is expected to fail with the
> > current code by reading the test or the proposed log message.  Does
> > "mv" refuse to work and we do not get to run "status", or does
> > "status" report a failure, or do we fail well before that?
> 
> git-mv failing seems like a new possibility without incurring
> another process spawn with the new repository object.
> (Though then we could also just fix the recursed submodule)

It is mv that fails to update everything necessary when using it with
recursively nested submodules. So the git-mv command does not report a
failure here. As an interim fix it could maybe report an error when
encountering nested submodules but the real fix would be to teach it to
recursively spawn the appropriate git-mv commands.

> > The log message that only says "This does not work when ..." is not
> > helpful in figuring it out, either.  Something like "This does not
> > work and fails to update the paths for its configurations" or
> > whatever that describes "what actually happens" (in contrast to
> > "what ought to happen", which you described clearly) should be
> > there.
> >
> > Description on how you happened to have discovered the issue feels a
> > lot less relevant compared to that, and it is totally useless if it
> > is unclear what the issue is in the first place.

Sorry about being a bit brief here. How about dropping that information
how I discovered the bug then and change the commit message to something
like this:

add test for bug in git-mv for recursive submodules

When using git-mv with a submodule it will detect that and update
the paths for its configurations (.gitmodules, worktree and
gitfile). This does not work in case it encounters nested
submodules. In that case it only updates the configurations for the
submodule directly underneath the superproject and fails to update
the paths for the submodules nested more deeply. This in turn leads
to the symptom that git status reports that it can not chdir to the
nested submodule in its old location.

Lets add a test to document.

?

> >>  t/t7001-mv.sh | 25 +
> >>  1 file changed, 25 insertions(+)
> >>
> >> diff --git a/t/t7001-mv.sh b/t/t7001-mv.sh
> >> index e365d1ff77..cbc5fb37fe 100755
> >> --- a/t/t7001-mv.sh
> >> +++ b/t/t7001-mv.sh
> >> @@ -491,4 +491,29 @@ test_expect_success 'moving a submodule in nested 
> >> directories' '
> >>   test_cmp actual expect
> >>  '
> >>
> >> +test_expect_failure 'moving nested submodules' '
> >> + git commit -am "cleanup commit" &&
> >> + mkdir sub_nested_nested &&
> >> + (cd sub_nested_nested &&
> 
> We seem to have different styles for nested shell. I prefer
> 
>   outside command &&
>   (
>   first nested command here &&
>   ...
> 
> as that aligns indentation to the nesting level. I have seen
> the style you use a lot in the  test suite, and we do not have
> a guideline in Documentation/CodingGuidelines, so I do not
> complain too loudly. ;)

Yeah we have some different styles it seems ;) So here some reasoning
behind my style:

I actually would agree on your style if 'first nested command' was any
arbitrary command but when I use my style it is always when I use a
nested shell for changing into some directory, doing something there and
then being able to return to the previous directory by closing the nested
shell. So for me the 'cd somewhere' belongs to the brackets similarly
like a condition definition belongs to the if it is used with.

> >> + touch nested_level2 &&
> >> + git init &&
> >> + git add . &&
> >> + git commit -m "nested level 2"
> >> + ) &&
> >> + mkdir sub_nested &&
> >> + (cd sub_nested &&
> >> + touch nested_level1 &&
> >> + git init &&
> >> + git add . &&
> >> + git commit -m "nested level 1"
> >> + git submodule add ../sub_nested_nested &&
> >> + git commit -m "add nested level 2"
> >> + ) &&
> >> + git submodule add ./sub_nested nested_move &&
> >> + git commit -m "add nested_move" &&
> >> + git submodule update --init --recursive &&
> 
> So far a nice setup!

Thanks.

> >> + git mv nested_move sub_nested_moved &&
> 
> This is the offending command that produces the bug,
> as it will break most subsequent commands, such as

Yes.

> >> + git status
> 
> git-status is one of the basic commands. Without
> status to function, I think it is hard to recover your repo without
> a lot of in-depth knowledge of Git (submodules).
> 
> I wonder if git-status 

[RFC PATCH v2 2/2] submodule: simplify decision tree whether to or not to fetch

2017-09-15 Thread Heiko Voigt
To make extending this logic later easier.

Signed-off-by: Heiko Voigt <hvo...@hvoigt.net>
---
 submodule.c | 74 ++---
 1 file changed, 37 insertions(+), 37 deletions(-)

diff --git a/submodule.c b/submodule.c
index 38b9905e43..fa44fc59f2 100644
--- a/submodule.c
+++ b/submodule.c
@@ -1118,6 +1118,31 @@ struct submodule_parallel_fetch {
 };
 #define SPF_INIT {0, ARGV_ARRAY_INIT, NULL, NULL, 0, 0, 0, 0}
 
+static int get_fetch_recurse_config(const struct submodule *submodule,
+   struct submodule_parallel_fetch *spf)
+{
+   if (spf->command_line_option != RECURSE_SUBMODULES_DEFAULT)
+   return spf->command_line_option;
+
+   if (submodule) {
+   char *key;
+   const char *value;
+
+   int fetch_recurse = submodule->fetch_recurse;
+   key = xstrfmt("submodule.%s.fetchRecurseSubmodules", 
submodule->name);
+   if (!repo_config_get_string_const(the_repository, key, )) 
{
+   fetch_recurse = parse_fetch_recurse_submodules_arg(key, 
value);
+   }
+   free(key);
+
+   if (fetch_recurse != RECURSE_SUBMODULES_NONE)
+   /* local config overrules everything except commandline 
*/
+   return fetch_recurse;
+   }
+
+   return spf->default_option;
+}
+
 static int get_next_submodule(struct child_process *cp,
  struct strbuf *err, void *data, void **task_cb)
 {
@@ -1137,46 +1162,21 @@ static int get_next_submodule(struct child_process *cp,
 
submodule = submodule_from_path(_oid, ce->name);
 
-   default_argv = "yes";
-   if (spf->command_line_option == RECURSE_SUBMODULES_DEFAULT) {
-   int fetch_recurse = RECURSE_SUBMODULES_NONE;
-
-   if (submodule) {
-   char *key;
-   const char *value;
-
-   fetch_recurse = submodule->fetch_recurse;
-   key = 
xstrfmt("submodule.%s.fetchRecurseSubmodules", submodule->name);
-   if 
(!repo_config_get_string_const(the_repository, key, )) {
-   fetch_recurse = 
parse_fetch_recurse_submodules_arg(key, value);
-   }
-   free(key);
-   }
-
-   if (fetch_recurse != RECURSE_SUBMODULES_NONE) {
-   if (fetch_recurse == RECURSE_SUBMODULES_OFF)
-   continue;
-   if (fetch_recurse == 
RECURSE_SUBMODULES_ON_DEMAND) {
-   if 
(!unsorted_string_list_lookup(_submodule_names,
-
submodule->name))
-   continue;
-   default_argv = "on-demand";
-   }
-   } else {
-   if (spf->default_option == 
RECURSE_SUBMODULES_OFF)
-   continue;
-   if (spf->default_option == 
RECURSE_SUBMODULES_ON_DEMAND) {
-   if 
(!unsorted_string_list_lookup(_submodule_names,
- 
submodule->name))
-   continue;
-   default_argv = "on-demand";
-   }
-   }
-   } else if (spf->command_line_option == 
RECURSE_SUBMODULES_ON_DEMAND) {
-   if 
(!unsorted_string_list_lookup(_submodule_names,
+   switch (get_fetch_recurse_config(submodule, spf))
+   {
+   default:
+   case RECURSE_SUBMODULES_DEFAULT:
+   case RECURSE_SUBMODULES_ON_DEMAND:
+   if (!submodule || 
!unsorted_string_list_lookup(_submodule_names,
 submodule->name))
continue;
default_argv = "on-demand";
+   break;
+   case RECURSE_SUBMODULES_ON:
+   default_argv = "yes";
+   break;
+   case RECURSE_SUBMODULES_OFF:
+   continue;
}
 
strbuf_addf(_path, "%s/%s", spf->work_tree, ce->name);
-- 
2.14.1.145.gb3622a4



[RFC PATCH v2 1/2] implement fetching of moved submodules

2017-09-15 Thread Heiko Voigt
We store the changed submodules paths to calculate which submodule needs
fetching. This does not work for moved submodules since their paths do
not stay the same in case of a moved submodules. In case of new
submodules we do not have a path in the current checkout, since they
just appeared in this fetch.

It is more general to collect the submodule names for changes instead of
their paths to include the above cases.

With the change described above we implement 'on-demand' fetching of
changes in moved submodules.

Note: This does only work when repositories have a .gitmodules file. In
other words: It breaks if we do not get a name for a repository.
IIRC, consensus was that this is a requirement to get nice submodule
handling these days?

NEEDSWORK: This breaks t5531 and t5545 because they do not use a
.gitmodules file. I will add a fallback to paths to help such users.

Signed-off-by: Heiko Voigt <hvo...@hvoigt.net>
---
This an update of the previous series[1] to the current master. The
fallback is still missing but now it should not conflict with any topics
in flight anymore (hopefully).

Cheers Heiko

[1] https://public-inbox.org/git/20170817105349.gc52...@book.hvoigt.net/

 submodule.c | 91 +
 t/t5526-fetch-submodules.sh | 35 +
 2 files changed, 85 insertions(+), 41 deletions(-)

diff --git a/submodule.c b/submodule.c
index 3cea8221e0..38b9905e43 100644
--- a/submodule.c
+++ b/submodule.c
@@ -21,7 +21,7 @@
 #include "parse-options.h"
 
 static int config_update_recurse_submodules = RECURSE_SUBMODULES_OFF;
-static struct string_list changed_submodule_paths = STRING_LIST_INIT_DUP;
+static struct string_list changed_submodule_names = STRING_LIST_INIT_DUP;
 static int initialized_fetch_ref_tips;
 static struct oid_array ref_tips_before_fetch;
 static struct oid_array ref_tips_after_fetch;
@@ -667,11 +667,11 @@ const struct submodule *submodule_from_ce(const struct 
cache_entry *ce)
 }
 
 static struct oid_array *submodule_commits(struct string_list *submodules,
-  const char *path)
+  const char *name)
 {
struct string_list_item *item;
 
-   item = string_list_insert(submodules, path);
+   item = string_list_insert(submodules, name);
if (item->util)
return (struct oid_array *) item->util;
 
@@ -680,39 +680,34 @@ static struct oid_array *submodule_commits(struct 
string_list *submodules,
return (struct oid_array *) item->util;
 }
 
+struct collect_changed_submodules_cb_data {
+   struct string_list *changed;
+   const struct object_id *commit_oid;
+};
+
 static void collect_changed_submodules_cb(struct diff_queue_struct *q,
  struct diff_options *options,
  void *data)
 {
+   struct collect_changed_submodules_cb_data *me = data;
+   struct string_list *changed = me->changed;
+   const struct object_id *commit_oid = me->commit_oid;
int i;
-   struct string_list *changed = data;
 
for (i = 0; i < q->nr; i++) {
struct diff_filepair *p = q->queue[i];
struct oid_array *commits;
+   const struct submodule *submodule;
+
if (!S_ISGITLINK(p->two->mode))
continue;
 
-   if (S_ISGITLINK(p->one->mode)) {
-   /*
-* NEEDSWORK: We should honor the name configured in
-* the .gitmodules file of the commit we are examining
-* here to be able to correctly follow submodules
-* being moved around.
-*/
-   commits = submodule_commits(changed, p->two->path);
-   oid_array_append(commits, >two->oid);
-   } else {
-   /* Submodule is new or was moved here */
-   /*
-* NEEDSWORK: When the .git directories of submodules
-* live inside the superprojects .git directory some
-* day we should fetch new submodules directly into
-* that location too when config or options request
-* that so they can be checked out from there.
-*/
+   submodule = submodule_from_path(commit_oid, p->two->path);
+   if (!submodule)
continue;
-   }
+
+   commits = submodule_commits(changed, submodule->name);
+   oid_array_append(commits, >two->oid);
}
 }
 
@@ -735,11 +730,14 @@ static void collect_changed_submodules(struct string_list 
*changed,
 
while ((commit = get_revision())) {
stru

[PATCH v2] add test for bug in git-mv for recursive submodules

2017-09-15 Thread Heiko Voigt
When using git-mv with a submodule it will detect that and update the
paths for its configurations (.gitmodules, worktree and gitfile). This
does not work for recursive submodules where a user renames the root
submodule.

We discovered this fact when working on on-demand fetch for renamed
submodules. Lets add a test to document.

Signed-off-by: Heiko Voigt <hvo...@hvoigt.net>
---
On Fri, Aug 18, 2017 at 12:04:03PM -0700, Stefan Beller wrote:
> > I just copied the shortcut that they were adding themselfes as submodule
> > in 'setup submodule'. The whole setup of submodules in this test is like
> > this. This way we already had a nested submodule structure which I could
> > just add.
> >
> > I agree that this is unrealistic so I can change that in the test I am
> > adding. But from what I have seen, this shortcut is taken in quite some
> > places when dealing with submodules.
> 
> Please do not make it worse.
> Once upon a time (late '16 IIRC) I had a series floating on the
> list removing all occurrences, but there were issues with the
> series and it did not land.

Took a little while but here is a more clean patch creating individual
submodules for the nesting.

Cheers Heiko

 t/t7001-mv.sh | 25 +
 1 file changed, 25 insertions(+)

diff --git a/t/t7001-mv.sh b/t/t7001-mv.sh
index e365d1ff77..cbc5fb37fe 100755
--- a/t/t7001-mv.sh
+++ b/t/t7001-mv.sh
@@ -491,4 +491,29 @@ test_expect_success 'moving a submodule in nested 
directories' '
test_cmp actual expect
 '
 
+test_expect_failure 'moving nested submodules' '
+   git commit -am "cleanup commit" &&
+   mkdir sub_nested_nested &&
+   (cd sub_nested_nested &&
+   touch nested_level2 &&
+   git init &&
+   git add . &&
+   git commit -m "nested level 2"
+   ) &&
+   mkdir sub_nested &&
+   (cd sub_nested &&
+   touch nested_level1 &&
+   git init &&
+   git add . &&
+   git commit -m "nested level 1"
+   git submodule add ../sub_nested_nested &&
+   git commit -m "add nested level 2"
+   ) &&
+   git submodule add ./sub_nested nested_move &&
+   git commit -m "add nested_move" &&
+   git submodule update --init --recursive &&
+   git mv nested_move sub_nested_moved &&
+   git status
+'
+
 test_done
-- 
2.14.1.145.gb3622a4



Re: [PATCH 2/4] push, fetch: error out for submodule entries not pointing to commits

2017-09-14 Thread Heiko Voigt
On Tue, Sep 12, 2017 at 10:30:27AM -0700, Jonathan Nieder wrote:
> From: Stefan Beller 
> 
> The check_has_commit helper uses resolves a submodule entry to a
> commit, when validating its existence. As a side effect this means
> tolerates a submodule entry pointing to a tag, which is not a valid
> submodule entry that git commands would know how to cope with.
> 
> Tighten the check to require an actual commit, not a tag pointing to a
> commit.
> 
> Also improve the error handling when a submodule entry points to
> non-commit (e.g., a blob) to error out instead of warning and
> pretending the pointed to object doesn't exist.
> 
> Signed-off-by: Stefan Beller 
> Signed-off-by: Jonathan Nieder 

Looks good to me.

Cheers Heiko


Re: Submodule regression in 2.14?

2017-08-25 Thread Heiko Voigt
On Tue, Aug 22, 2017 at 11:10:52AM -0700, Stefan Beller wrote:
> On Tue, Aug 22, 2017 at 8:33 AM, Heiko Voigt <hvo...@hvoigt.net> wrote:
> > On Mon, Aug 21, 2017 at 09:42:54AM -0700, Stefan Beller wrote:
> >> On Mon, Aug 21, 2017 at 9:05 AM, Heiko Voigt <hvo...@hvoigt.net> wrote:
> >> > On Fri, Aug 18, 2017 at 11:51:13PM -0700, Junio C Hamano wrote:
> >> # You feel the superproject is in the way?
> >> git worktree add --for-submodule  ...
> >> # The new submodule worktree puts the
> >> # submodule only UX first. so it feels like its own
> >> # repository, no need for specific flags.
> >
> > I am not sure I understand this one. What would that do? Put a worktree
> > for submodule path/to/sub to ...?
> 
> Yes, and at "..." you would have the UX of the submodule being
> its own repository, no interaction with the superproject.

Are you sure that is what Junio meant? IMO that would be 'git worktree
clone' or 'git worktree checkout', no?  In todays git terminology an
'add' is something that puts data into the repository / the index. That
is why I was/am confused.

Cheers Heiko


Re: Submodule regression in 2.14?

2017-08-22 Thread Heiko Voigt
On Mon, Aug 21, 2017 at 09:48:51AM -0700, Junio C Hamano wrote:
> Heiko Voigt <hvo...@hvoigt.net> writes:
> 
> > So I think it is important that there are commits in the submodule so
> > its history makes sense independently for others.
> 
> I was trying to push the "just like trees" to the logical conclusion
> in order to see see if it leads to an absurd conclusions, or some
> useful scenario.  I do not necessarily subscribed to Jonathan's
> "vision" (I do not object to it as one possible mode of operation,
> either).
> 
> > Or how would you push out the history in the submodule in your idea?
> > Maybe I am missing something? What would be your use case with gitlinks
> > pointing to trees?
> 
> Not my idea.  But Stefan's message I was responding to said that
> object database for the superproject is the same as and mixed with
> object databases for the submodules, so it is plausible that push
> always happens at the superproject, and a mechanism to be invented
> (I mentioned the need for it in the message you are responding to)
> to enumerate all the commits bound from submodules to a range of
> superproject's commits would identify these trees to be pushed out.
> 
> In essense, "just like trees" folks want to use the gitlinks in the
> superproject to only specify the tree from the submodule project
> that should sit there.  And my point is that such a world view would
> lead to no need for branches in the submodule repository, no need
> for commits in the submodule repository, and no need for history in
> the submodule repository.  Which may or may not be a bad thing.

The problem I see here is: How do we seperate the submodule from the
superproject? Without the history (commits and trees) of the
superproject the submodule trees do not make sense anymore. The reason
users have submodules is because the projects are seperate somehow. With
just trees in the submodule it becomes tied quite tightly to the
superproject, IMO.

One step further: Why use gitlinks to point to trees? Let's just use
normal tree links instead, we already have that implemented :)

Cheers Heiko


Re: Submodule regression in 2.14?

2017-08-22 Thread Heiko Voigt
On Mon, Aug 21, 2017 at 09:42:54AM -0700, Stefan Beller wrote:
> On Mon, Aug 21, 2017 at 9:05 AM, Heiko Voigt <hvo...@hvoigt.net> wrote:
> > On Fri, Aug 18, 2017 at 11:51:13PM -0700, Junio C Hamano wrote:
> >> As long as we are talking about idealized future world (well, at
> >> least an idea of somebody's "ideal", not necessarily shared by
> >> everybody), I wonder if there is even any need to have commits in
> >> submodules in such a world.  To realize such a "monorepo" world, you
> >> might be better off allowing a gitlink in the superproject to
> >> directly point at a tree object in a submodule repository (making
> >> them physically a single repository is an optional implementation
> >> detail I choose to ignore in this discussion).
> >
> > IMO this is one step to far. One main use of submodules are shared
> > repositories that are used by many superprojects. The reason you want to
> > have commits in the submodule are so that you can push them
> > independently and all other users can pick up the changes. You could get
> > by by Using the superproject commits for the submodule once you push or
> > something but those do not necessarily make sense in the context of the
> > submodule.
> >
> > So I think it is important that there are commits in the submodule so
> > its history makes sense independently for others.
> >
> > Or how would you push out the history in the submodule in your idea?
> > Maybe I am missing something? What would be your use case with gitlinks
> > pointing to trees?
> 
> Well there are still commits, but in the superproject the UX feels more
> as if the submodules were special trees.

Ah ok then I misunderstood. So they only feel like trees.

> So if you want to interact with
> the submodule specifically, you could do things like
> 
> git add /path/inside/sub
> # works seamlessly from the superproject tree

Would that mean that we need to loosen/keep the requirement loose for a
name from .gitmodules? I am asking because of my series for on-demand
fetch of renamed submodules. For the full functionality I would require
a name.

Would that be in a scenario where the user would then e.g. push the
submodule into the superproject?

Ah wait I misunderstood again. You mean a file in an existing
submodule right? Not adding submodule from a repository a user moved
there?

> git commit --submodule-commit-only
> # When the flag is not give, you may get an editor
> # asking for two commit messages, (sub+super)

Or maybe something like

git commit --submodule path/to/submodule

so the user can specify which submodule she wants. I first wrote it
without the switch but but that collides with listing files which should
be added. IMO this shorter option is also more intuitive to understand
what it does (for this case).

> git fetch --submodule
> # When the flag is not given, we'd fetch superproject and
> # on-demand

Yes like above we should add the path to the submodule right?

> # You feel the superproject is in the way?
> git worktree add --for-submodule  ...
> # The new submodule worktree puts the
> # submodule only UX first. so it feels like its own
> # repository, no need for specific flags.

I am not sure I understand this one. What would that do? Put a worktree
for submodule path/to/sub to ...?

Overall I like the direction of these ideas.

Cheers Heiko


Re: [GSoC][PATCH 3/4] submodule: port set_name_rev() from shell to C

2017-08-21 Thread Heiko Voigt
On Mon, Aug 21, 2017 at 09:45:14PM +0530, Prathamesh Chavan wrote:
> Function set_name_rev() is ported from git-submodule to the
> submodule--helper builtin. The function get_name_rev() generates the
> value of the revision name as required, and the function
> print_name_rev() handles the formating and printing of the obtained
> revision name.
> 
> Mentored-by: Christian Couder 
> Mentored-by: Stefan Beller 
> Signed-off-by: Prathamesh Chavan 
> ---
>  builtin/submodule--helper.c | 63 
> +
>  git-submodule.sh| 16 ++--
>  2 files changed, 65 insertions(+), 14 deletions(-)
> 
> diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
> index 7803457ba..a4bff3f38 100644
> --- a/builtin/submodule--helper.c
> +++ b/builtin/submodule--helper.c

[...]

> @@ -1242,6 +1304,7 @@ static struct cmd_struct commands[] = {
>   {"relative-path", resolve_relative_path, 0},
>   {"resolve-relative-url", resolve_relative_url, 0},
>   {"resolve-relative-url-test", resolve_relative_url_test, 0},
> + {"print-name-rev", print_name_rev, 0},

I see the function in git-submodule.sh was named kind of reverse. How
about we do it more naturally here and call this 'rev-name' instead?
That makes is more clear to me and is also similar to the used variable
name 'revname'.

I would also prefix it differently like 'get' or 'calculate' instead of
'print' since it tries to find a name and is not just a simple lookup.

So in summary I would prefer something like 'get-rev-name' as a name for
the subcommand here.

>   {"init", module_init, SUPPORT_SUPER_PREFIX},
>   {"remote-branch", resolve_remote_submodule_branch, 0},
>   {"push-check", push_check, 0},
> diff --git a/git-submodule.sh b/git-submodule.sh
> index e131760ee..e988167e0 100755
> --- a/git-submodule.sh
> +++ b/git-submodule.sh
> @@ -759,18 +759,6 @@ cmd_update()
>   }
>  }
>  
> -set_name_rev () {
> - revname=$( (
> - sanitize_submodule_env
> - cd "$1" && {
> - git describe "$2" 2>/dev/null ||
> - git describe --tags "$2" 2>/dev/null ||
> - git describe --contains "$2" 2>/dev/null ||
> - git describe --all --always "$2"
> - }
> - ) )
> - test -z "$revname" || revname=" ($revname)"
> -}
>  #
>  # Show commit summary for submodules in index or working tree
>  #
> @@ -1042,14 +1030,14 @@ cmd_status()
>   fi
>   if git diff-files --ignore-submodules=dirty --quiet -- 
> "$sm_path"
>   then
> - set_name_rev "$sm_path" "$sha1"
> + revname=$(git submodule--helper print-name-rev 
> "$sm_path" "$sha1")
>   say " $sha1 $displaypath$revname"
>   else
>   if test -z "$cached"
>   then
>   sha1=$(sanitize_submodule_env; cd "$sm_path" && 
> git rev-parse --verify HEAD)
>   fi
> - set_name_rev "$sm_path" "$sha1"
> + revname=$(git submodule--helper print-name-rev 
> "$sm_path" "$sha1")
>   say "+$sha1 $displaypath$revname"
>   fi
>  
> -- 
> 2.13.0
> 


Re: [PATCH] pull: respect submodule update configuration

2017-08-21 Thread Heiko Voigt
On Fri, Aug 18, 2017 at 11:24:47PM -0700, Junio C Hamano wrote:
> Stefan Beller  writes:
> 
> > From: Lars Schneider 
> >
> > Do not override the submodule configuration in the call to update
> > the submodules, but give a weaker default.
> >
> > Reported-by: Lars Schneider 
> > Signed-off-by: Stefan Beller 
> > ---
> >   
> > Personally I dislike this patch, but I have no better idea for the time
> > being.
> 
> The patch text from a cursory look seems reasonable to me.
> 
> It's not like you have 47 different codepaths that need to pay
> attention to the .update config and they all have to pass the new
> --default-update option, this is merely to fix one of them that
> relates to the problem reported by Lars, and you need a similar fix
> to other 46, right?
> 
> If you want the "--recurse-submodules" thing to always do the
> "weaker default" thing in your project, you can choose not to set
> .update to custom values in any of your submodules, so I do not
> think the reason why you dislike this change is because it would
> affect your use of submodules.
> 
> So I am a bit curious to learn which part of this change you dislike
> and why.

I am also curious. Isn't this the same strategy we are using in other
places?

Cheers Heiko


Re: Submodule regression in 2.14?

2017-08-21 Thread Heiko Voigt
On Fri, Aug 18, 2017 at 11:51:13PM -0700, Junio C Hamano wrote:
> As long as we are talking about idealized future world (well, at
> least an idea of somebody's "ideal", not necessarily shared by
> everybody), I wonder if there is even any need to have commits in
> submodules in such a world.  To realize such a "monorepo" world, you
> might be better off allowing a gitlink in the superproject to
> directly point at a tree object in a submodule repository (making
> them physically a single repository is an optional implementation
> detail I choose to ignore in this discussion).

IMO this is one step to far. One main use of submodules are shared
repositories that are used by many superprojects. The reason you want to
have commits in the submodule are so that you can push them
independently and all other users can pick up the changes. You could get
by by Using the superproject commits for the submodule once you push or
something but those do not necessarily make sense in the context of the
submodule.

So I think it is important that there are commits in the submodule so
its history makes sense independently for others.

Or how would you push out the history in the submodule in your idea?
Maybe I am missing something? What would be your use case with gitlinks
pointing to trees?

Cheers Heiko


Re: [PATCH] add test for bug in git-mv with nested submodules

2017-08-18 Thread Heiko Voigt
On Thu, Aug 17, 2017 at 12:05:56PM -0700, Stefan Beller wrote:
> On Thu, Aug 17, 2017 at 3:34 AM, Heiko Voigt <hvo...@hvoigt.net> wrote:
> > When using git-mv with a submodule it will detect that and update the
> > paths for its configurations (.gitmodules, worktree and gitfile). This
> > does not work for nested submodules where a user renames the root
> > submodule.
> >
> > We discovered this fact when working on on-demand fetch for renamed
> > submodules. Lets add a test to document.
> >
> > Signed-off-by: Heiko Voigt <hvo...@hvoigt.net>
> > ---
> >  t/t7001-mv.sh | 9 +
> >  1 file changed, 9 insertions(+)
> >
> > diff --git a/t/t7001-mv.sh b/t/t7001-mv.sh
> > index e365d1f..39f8aed 100755
> > --- a/t/t7001-mv.sh
> > +++ b/t/t7001-mv.sh
> > @@ -491,4 +491,13 @@ test_expect_success 'moving a submodule in nested 
> > directories' '
> > test_cmp actual expect
> >  '
> >
> > +test_expect_failure 'moving nested submodules' '
> > +   git commit -am "cleanup commit" &&
> > +   git submodule add ./. sub_nested &&
> 
> If possible, I would avoid adding the repo itself
> as a submodule as it is unrealistic in the wild.
> 
> While it may be ok for the test here, later down the road
> other tests making use of it it may become an issue with
> the URL of the submodule.

I just copied the shortcut that they were adding themselfes as submodule
in 'setup submodule'. The whole setup of submodules in this test is like
this. This way we already had a nested submodule structure which I could
just add.

I agree that this is unrealistic so I can change that in the test I am
adding. But from what I have seen, this shortcut is taken in quite some
places when dealing with submodules.

Cheers Heiko


Re: [RFC PATCH 1/2] implement fetching of moved submodules

2017-08-18 Thread Heiko Voigt
On Thu, Aug 17, 2017 at 10:20:13AM -0700, Stefan Beller wrote:
> On Thu, Aug 17, 2017 at 3:53 AM, Heiko Voigt <hvo...@hvoigt.net> wrote:
> > We store the changed submodules paths to calculate which submodule needs
> > fetching. This does not work for moved submodules since their paths do
> > not stay the same in case of a moved submodules. In case of new
> > submodules we do not have a path in the current checkout, since they
> > just appeared in this fetch.
> >
> > It is more general to collect the submodule names for changes instead of
> > their paths to include the above cases.
> >
> > With the change described above we implement 'on-demand' fetching of
> > changes in moved submodules.
> 
> This sounds as if this would also enable fetching new submodules
> eventually?

Yes that was the goal when starting with these changes back then. But it
took more time than I had back then. So instead of letting these changes
sit bitrot again lets see if we can get them integrated.

For new submodules we need to change the iteration somehow. Currently we
are iterating through the index. But new submodules obviously do not
have an index entry (otherwise they would not be new). So instead of the
index we will need to create another list that contains "all"
submodules. Maybe something like: all submodules from the index plus all
submodules that changed / are new? We could also go further and inspect
all submodules from all ref tips to handle submodules on other branches
configured to 'yes'. But I think we should leave that for later if need
arises.

Some merge of index and additional submodules is needed, because for
--recurse-submodules=yes or submodule..fetchRecurseSubmodules=yes
we always need to run fetch inside the submodule. That would break if we
only looked at submodules that are collected as changed.

> > Note: This does only work when repositories have a .gitmodules file. In
> > other words: It breaks if we do not get a name for a repository.
> > IIRC, consensus was that this is a requirement to get nice submodule
> > handling these days?
> 
> I think that should have been the consensus since ~1.7.8 (since the
> submodules git dir can live inside the superprojects
> /module/).

I agree but since we started without it, we kind of have a mixed state.

> A gitlink entry without corresponding .gitmodules entry is just a gitlink.
> If we happen to have a repository at that path of the gitlink, we can
> be nice and pretend like it is a functional submodule, but it really is
> not. It's just another repo inside the superproject that happen to live
> at the path of a gitlink.

Yeah but at the moment we are handling 'on-demand' fetches and stuff for
such just gitlink submodules. If we were firm on that requirement we
would just skip those but that is not the case with the current
implementation.

> > Signed-off-by: Heiko Voigt <hvo...@hvoigt.net>
> > ---
> >
> > I updated the leftover code from my series implementing recursive fetch
> > for moved submodules[1] to the current master.
> >
> > This breaks t5531 and t5545 because they do not use a .gitmodules file.
> >
> > I also have some code leftover that does fallback on paths in case no
> > submodule names can be found. But I do not really like it. The question
> > here is how far do we support not using .gitmodules. Is it e.g.
> > reasonable to say: "For --recurse-submodules=on-demand you need a
> > .gitmodules file?"
> 
> I would not intentionally break users here, but any new functionality can
> safely assume (a) we have a proper .gitmodules entry or (b) it is not a
> submodule, so do nothing/be extra careful.
> 
> For example in recursive diff sort of makes sense to also handle
> non-submodule gitlinks, but fetch is harder to tell.

Well we have a few different cases for gitlinks without .gitmodule
entry:

 1. New gitlink: We can not handle since we do not know where to clone
 from.

 2. Removed gitlink: No need to do anything in fetch

 3. Changed (but same name) gitlink: We can / and currently do run fetch
 in it

 4. Renamed: We currently skip those. We could probably do something to
 track the rename and run fetch in case of gitlink changes.
 In my current approach only the ones with a name are
 handled.

So I guess I will add a fallback to paths for 3. so we do not
unnecessarily break users using the current implementation.

> (just last night I was rereading
> https://public-inbox.org/git/CAJo=hjvnapnaddcaawavu9c4rveqdos3ev9wtguhx4fd0v_...@mail.gmail.com/
> which I think is a super cute application of gitlinks. If you happen
> to checkout such
> a tree, you don't want to fetch all of the fake subm

Re: [RFC PATCH 2/2] submodule: simplify decision tree whether to or not to fetch

2017-08-18 Thread Heiko Voigt
On Thu, Aug 17, 2017 at 10:50:07AM -0700, Brandon Williams wrote:
> On 08/17, Stefan Beller wrote:
> > On Thu, Aug 17, 2017 at 4:00 AM, Heiko Voigt <hvo...@hvoigt.net> wrote:
> > > To make extending this logic later easier.
> > >
> > > Signed-off-by: Heiko Voigt <hvo...@hvoigt.net>
> > > ---
> > > I am quite sure I replicated the same logic but a few more eyes would be
> > > appreciated.
> > 
> > A code cleanup is appreciated!
> > 
> > I thought Brandon had a series in flight doing a very similar cleanup here,
> > but in master..pu there is nothing to be found.
> 
> Yeah there are 2 series in flight which will probably conflict here.
> bw/grep-recurse-submodules and bw/submodule-config-cleanup

Ok then I will wait until those are in and then see if I can base the
cleanup on top. I think it is only necessary as a preparation for the
fully fledged fetch configuration logic mess we will get into once we
get to the full recursive submodule fetch implementation. Not
necessarily needed for the moved submodules.

> > 
> > The code looks good to me.

Thanks.

Cheers Heiko


[RFC PATCH 2/2] submodule: simplify decision tree whether to or not to fetch

2017-08-17 Thread Heiko Voigt
To make extending this logic later easier.

Signed-off-by: Heiko Voigt <hvo...@hvoigt.net>
---
I am quite sure I replicated the same logic but a few more eyes would be
appreciated.

Cheers Heiko

 submodule.c | 55 +++
 1 file changed, 27 insertions(+), 28 deletions(-)

diff --git a/submodule.c b/submodule.c
index 3ed78ac..a1011f4 100644
--- a/submodule.c
+++ b/submodule.c
@@ -1171,6 +1171,21 @@ int submodule_touches_in_range(struct object_id 
*excl_oid,
return ret;
 }
 
+static int get_fetch_recurse_config(const struct submodule *submodule, int 
command_line_option)
+{
+   if (command_line_option != RECURSE_SUBMODULES_DEFAULT)
+   return command_line_option;
+
+   if (submodule && submodule->fetch_recurse != RECURSE_SUBMODULES_NONE)
+   /* local config overrules everything except commandline */
+   return submodule->fetch_recurse;
+
+   if (gitmodules_is_unmerged)
+   return RECURSE_SUBMODULES_OFF;
+
+   return config_fetch_recurse_submodules;
+}
+
 struct submodule_parallel_fetch {
int count;
struct argv_array args;
@@ -1203,37 +1218,21 @@ static int get_next_submodule(struct child_process *cp,
if (!submodule)
submodule = submodule_from_name(_oid, ce->name);
 
-   default_argv = "yes";
-   if (spf->command_line_option == RECURSE_SUBMODULES_DEFAULT) {
-   if (submodule &&
-   submodule->fetch_recurse !=
-   RECURSE_SUBMODULES_NONE) {
-   if (submodule->fetch_recurse ==
-   RECURSE_SUBMODULES_OFF)
-   continue;
-   if (submodule->fetch_recurse ==
-   RECURSE_SUBMODULES_ON_DEMAND) {
-   if 
(!unsorted_string_list_lookup(_submodule_names,
-
submodule->name))
-   continue;
-   default_argv = "on-demand";
-   }
-   } else {
-   if ((config_fetch_recurse_submodules == 
RECURSE_SUBMODULES_OFF) ||
-   gitmodules_is_unmerged)
-   continue;
-   if (config_fetch_recurse_submodules == 
RECURSE_SUBMODULES_ON_DEMAND) {
-   if 
(!unsorted_string_list_lookup(_submodule_names,
-
submodule->name))
-   continue;
-   default_argv = "on-demand";
-   }
-   }
-   } else if (spf->command_line_option == 
RECURSE_SUBMODULES_ON_DEMAND) {
-   if 
(!unsorted_string_list_lookup(_submodule_names,
+   switch (get_fetch_recurse_config(submodule, 
spf->command_line_option))
+   {
+   default:
+   case RECURSE_SUBMODULES_DEFAULT:
+   case RECURSE_SUBMODULES_ON_DEMAND:
+   if (!submodule || 
!unsorted_string_list_lookup(_submodule_names,
 submodule->name))
continue;
default_argv = "on-demand";
+   break;
+   case RECURSE_SUBMODULES_ON:
+   default_argv = "yes";
+   break;
+   case RECURSE_SUBMODULES_OFF:
+   continue;
}
 
strbuf_addf(_path, "%s/%s", spf->work_tree, ce->name);
-- 
2.0.0.274.g6b2cd91



[RFC PATCH 1/2] implement fetching of moved submodules

2017-08-17 Thread Heiko Voigt
We store the changed submodules paths to calculate which submodule needs
fetching. This does not work for moved submodules since their paths do
not stay the same in case of a moved submodules. In case of new
submodules we do not have a path in the current checkout, since they
just appeared in this fetch.

It is more general to collect the submodule names for changes instead of
their paths to include the above cases.

With the change described above we implement 'on-demand' fetching of
changes in moved submodules.

Note: This does only work when repositories have a .gitmodules file. In
other words: It breaks if we do not get a name for a repository.
IIRC, consensus was that this is a requirement to get nice submodule
handling these days?

Signed-off-by: Heiko Voigt <hvo...@hvoigt.net>
---

I updated the leftover code from my series implementing recursive fetch
for moved submodules[1] to the current master.

This breaks t5531 and t5545 because they do not use a .gitmodules file.

I also have some code leftover that does fallback on paths in case no
submodule names can be found. But I do not really like it. The question
here is how far do we support not using .gitmodules. Is it e.g.
reasonable to say: "For --recurse-submodules=on-demand you need a
.gitmodules file?"

[1] 
https://public-inbox.org/git/f5baa2acc09531a16f4f693eebbe60706bb8ed1e.1361751905.git.hvo...@hvoigt.net/

 submodule.c | 92 +
 t/t5526-fetch-submodules.sh | 35 +
 2 files changed, 86 insertions(+), 41 deletions(-)

diff --git a/submodule.c b/submodule.c
index 27de65a..3ed78ac 100644
--- a/submodule.c
+++ b/submodule.c
@@ -23,7 +23,7 @@
 static int config_fetch_recurse_submodules = RECURSE_SUBMODULES_ON_DEMAND;
 static int config_update_recurse_submodules = RECURSE_SUBMODULES_OFF;
 static int parallel_jobs = 1;
-static struct string_list changed_submodule_paths = STRING_LIST_INIT_DUP;
+static struct string_list changed_submodule_names = STRING_LIST_INIT_DUP;
 static int initialized_fetch_ref_tips;
 static struct oid_array ref_tips_before_fetch;
 static struct oid_array ref_tips_after_fetch;
@@ -742,11 +742,11 @@ const struct submodule *submodule_from_ce(const struct 
cache_entry *ce)
 }
 
 static struct oid_array *submodule_commits(struct string_list *submodules,
-  const char *path)
+  const char *name)
 {
struct string_list_item *item;
 
-   item = string_list_insert(submodules, path);
+   item = string_list_insert(submodules, name);
if (item->util)
return (struct oid_array *) item->util;
 
@@ -755,39 +755,34 @@ static struct oid_array *submodule_commits(struct 
string_list *submodules,
return (struct oid_array *) item->util;
 }
 
+struct collect_changed_submodules_cb_data {
+   struct string_list *changed;
+   const struct object_id *commit_oid;
+};
+
 static void collect_changed_submodules_cb(struct diff_queue_struct *q,
  struct diff_options *options,
  void *data)
 {
+   struct collect_changed_submodules_cb_data *me = data;
+   struct string_list *changed = me->changed;
+   const struct object_id *commit_oid = me->commit_oid;
int i;
-   struct string_list *changed = data;
 
for (i = 0; i < q->nr; i++) {
struct diff_filepair *p = q->queue[i];
struct oid_array *commits;
+   const struct submodule *submodule;
+
if (!S_ISGITLINK(p->two->mode))
continue;
 
-   if (S_ISGITLINK(p->one->mode)) {
-   /*
-* NEEDSWORK: We should honor the name configured in
-* the .gitmodules file of the commit we are examining
-* here to be able to correctly follow submodules
-* being moved around.
-*/
-   commits = submodule_commits(changed, p->two->path);
-   oid_array_append(commits, >two->oid);
-   } else {
-   /* Submodule is new or was moved here */
-   /*
-* NEEDSWORK: When the .git directories of submodules
-* live inside the superprojects .git directory some
-* day we should fetch new submodules directly into
-* that location too when config or options request
-* that so they can be checked out from there.
-*/
+   submodule = submodule_from_path(commit_oid, p->two->path);
+   if (!submodule)
continue;
-   }
+
+   commits = submodu

[PATCH] t5526: fix some broken && chains

2017-08-17 Thread Heiko Voigt
Signed-off-by: Heiko Voigt <hvo...@hvoigt.net>
---
 t/t5526-fetch-submodules.sh | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/t/t5526-fetch-submodules.sh b/t/t5526-fetch-submodules.sh
index ce788e9..22a7358 100755
--- a/t/t5526-fetch-submodules.sh
+++ b/t/t5526-fetch-submodules.sh
@@ -193,7 +193,7 @@ test_expect_success "recurseSubmodules=true propagates into 
submodules" '
add_upstream_commit &&
(
cd downstream &&
-   git config fetch.recurseSubmodules true
+   git config fetch.recurseSubmodules true &&
git fetch >../actual.out 2>../actual.err
) &&
test_must_be_empty actual.out &&
@@ -218,7 +218,7 @@ test_expect_success "--no-recurse-submodules overrides 
config setting" '
add_upstream_commit &&
(
cd downstream &&
-   git config fetch.recurseSubmodules true
+   git config fetch.recurseSubmodules true &&
git fetch --no-recurse-submodules >../actual.out 2>../actual.err
) &&
! test -s actual.out &&
@@ -232,7 +232,7 @@ test_expect_success "Recursion doesn't happen when no new 
commits are fetched in
cd submodule &&
git config --unset fetch.recurseSubmodules
) &&
-   git config --unset fetch.recurseSubmodules
+   git config --unset fetch.recurseSubmodules &&
git fetch >../actual.out 2>../actual.err
) &&
! test -s actual.out &&
@@ -312,7 +312,7 @@ test_expect_success "Recursion picks up all submodules when 
necessary" '
) &&
head1=$(git rev-parse --short HEAD^) &&
git add subdir/deepsubmodule &&
-   git commit -m "new deepsubmodule"
+   git commit -m "new deepsubmodule" &&
head2=$(git rev-parse --short HEAD) &&
echo "Fetching submodule submodule" > ../expect.err.sub &&
echo "From $pwd/submodule" >> ../expect.err.sub &&
-- 
2.0.0.274.g6b2cd91



[PATCH] add test for bug in git-mv with nested submodules

2017-08-17 Thread Heiko Voigt
When using git-mv with a submodule it will detect that and update the
paths for its configurations (.gitmodules, worktree and gitfile). This
does not work for nested submodules where a user renames the root
submodule.

We discovered this fact when working on on-demand fetch for renamed
submodules. Lets add a test to document.

Signed-off-by: Heiko Voigt <hvo...@hvoigt.net>
---
 t/t7001-mv.sh | 9 +
 1 file changed, 9 insertions(+)

diff --git a/t/t7001-mv.sh b/t/t7001-mv.sh
index e365d1f..39f8aed 100755
--- a/t/t7001-mv.sh
+++ b/t/t7001-mv.sh
@@ -491,4 +491,13 @@ test_expect_success 'moving a submodule in nested 
directories' '
test_cmp actual expect
 '
 
+test_expect_failure 'moving nested submodules' '
+   git commit -am "cleanup commit" &&
+   git submodule add ./. sub_nested &&
+   git commit -m "add sub_nested" &&
+   git submodule update --init --recursive &&
+   git mv sub_nested sub_nested_moved &&
+   git status
+'
+
 test_done
-- 
2.0.0.274.g6b2cd91



Re: [PATCH] push: do not add submodule odb as an alternate when recursing on demand

2017-08-16 Thread Heiko Voigt
Hi,

was about to write that we are maybe overly cautious here. Because the
current way a submodule ends up in the list to be pushed is through:

find_unpushed_submodules()

that itself collects all changed submodules when submodule_needs_pushing() is
true. In there we have this:

if (!submodule_has_commits(path, commits))
/*
 * NOTE: We do consider it safe to return "no" here. The
 * correct answer would be "We do not know" instead of
 * "No push needed", but it is quite hard to change
 * the submodule pointer without having the submodule
 * around. If a user did however change the submodules
 * without having the submodule around, this indicates
 * an expert who knows what they are doing or a
 * maintainer integrating work from other people. In
 * both cases it should be safe to skip this check.
 */
return 0;

So if the check, whether a submodule has commits, fails for any reason it will
not end up in the list to be pushed.

As a side note: inside submodule_has_commits() there is an add_submodule_odb()
followed by a process to really make sure that the commits are in the
submodule.

So IMO at this point we can be sure that the *database* exists and this extra
check could be dropped if we said that a caller to push_submodule() should make
sure that the submodule exists. The current ones are doing it already (if I did
not miss anything).

On Tue, Aug 15, 2017 at 06:05:25PM -0700, Stefan Beller wrote:
> On Tue, Aug 15, 2017 at 5:11 PM, Junio C Hamano  wrote:
> > Stefan Beller  writes:
> >
> >>> Is "is it populated" a good thing to check here, though?  IIRC,
> >>> add-submodule-odb allows you to add the object database of an
> >>> inactivated submodule, so this seems to change the behaviour.  I do
> >>> not know if the behaviour change is a good thing (i.e. bugfix) or
> >>> not (i.e. regression) offhand, though.
> >>
> >> Good point, we should be able to push non-populated, even inactive(?)
> >> submodules. For that we strictly need add_submodule_odb here
> >> (or the repo object of the submodule, eventually).
> >>
> >> So let's retract this patch for now.
> >
> > Not so fast.
> 
> Ok, I took another look at the code.
> 
> While we may desire that un-populated submodules can be pushed
> (due to checking out another revision where the submodule
> doesn't exist, before pushing), this is not supported currently, because
> the call to run the push in the submodule assumes there is a
> "/.git" on which the child process can operate.
> So for now we HAVE to have the submodule populated.

That is a good point though. In the current form of push_submodule() we need to
have a populated submodule. So IMO to check whether the submodule is actually
*populated* instead of adding the odb is correct and a possible bug fix.

Cheers Heiko


Re: [PATCH v2 02/15] submodule: don't use submodule_from_name

2017-08-11 Thread Heiko Voigt
On Fri, Aug 04, 2017 at 02:53:11PM -0700, Brandon Williams wrote:
> On 08/03, Stefan Beller wrote:
> > On Thu, Aug 3, 2017 at 11:19 AM, Brandon Williams  wrote:
> > > The function 'submodule_from_name()' is being used incorrectly here as a
> > > submodule path is being used instead of a submodule name.  Since the
> > > correct function to use with a path to a submodule is already being used
> > > ('submodule_from_path()') let's remove the call to
> > > 'submodule_from_name()'.
> > >
> > > Signed-off-by: Brandon Williams 
> > 
> > In case a reroll is needed, you could incorperate Jens feedback
> > stating that 851e18c385 should have done it.
> 
> K I'll add that into the commit message.

Well, thats not 100% correct... IMO, it should have been a follow up patch
which I never got to implement. See my other reply to the v1 of this
patch I just sent out.

As stated there I will have a look into where it makes sense to pass a
commit id and behave more correctly.

Cheers Heiko


Re: [PATCH v2 14/15] unpack-trees: improve loading of .gitmodules

2017-08-11 Thread Heiko Voigt
On Thu, Aug 03, 2017 at 11:19:59AM -0700, Brandon Williams wrote:
> diff --git a/unpack-trees.c b/unpack-trees.c
> index 5dce7ff7d..3c7f464fa 100644
> --- a/unpack-trees.c
> +++ b/unpack-trees.c
> @@ -1,5 +1,6 @@
>  #define NO_THE_INDEX_COMPATIBILITY_MACROS
>  #include "cache.h"
> +#include "repository.h"
>  #include "config.h"
>  #include "dir.h"
>  #include "tree.h"
> @@ -268,22 +269,28 @@ static int check_submodule_move_head(const struct 
> cache_entry *ce,
>   return 0;
>  }
>  
> -static void reload_gitmodules_file(struct index_state *index,
> -struct checkout *state)
> +/*
> + * Preform the loading of the repository's gitmodules file.  This function is

s/Preform/Perform/

and a nit: There is some extra space after the end of this sentence.

Cheers Heiko


Re: [PATCH 02/15] submodule: don't use submodule_from_name

2017-08-11 Thread Heiko Voigt
Hi,

sorry for the late reply, just stumpled upon this.

On Mon, Jul 31, 2017 at 01:43:04PM -0700, Stefan Beller wrote:
> On Sun, Jul 30, 2017 at 6:43 AM, Jens Lehmann  wrote:
> > Am 26.07.2017 um 23:06 schrieb Junio C Hamano:
> >>
> >> Stefan Beller  writes:
> >>
> >>> Rereading the archives, there was quite some discussion on the design
> >>> of these patches, but these lines of code did not get any attention
> >>>
> >>>  https://public-inbox.org/git/4cdb3063.5010...@web.de/
> >>>
> >>> I cc'd Jens in the hope of him having a good memory why he
> >>> wrote the code that way. :)
> >>
> >>
> >> Thanks for digging.  I wouldn't be surprised if this were a fallback
> >> to help a broken entry in .gitmodules that lack .path variable, but
> >> we shouldn't be sweeping the problem under the rug like that.
> >
> >
> > Sorry to disappoint you ;-) I added this in 7dce19d374 because
> > submodule by path lookup back then only parsed the checked out
> > .gitmodules file.
> 
> This is still the case AFAICT, as we never ask for a specific .gitmodules
> file identified by sha1 of the commit.

This was actually part of my original approach[1] but it seems I never got
around to implement that last part for which I originally started the
submodule config API: Proper recursive fetch.

I still have a patch for moved submodules lying around which pass a commit id
for a gitmodules file. That particular patch is quite simple and finished but
I was planning to include that in the finished fetch series. So I can have a
look if I can quickly update that to the current state, so we can at least have
one proper user of the submodule config API.

> > So looking for it by name was a good guess to
> > fetch a new submodule that wasn't present in the current HEAD's
> > .gitmodules, as the path is used as the default name in "git
> > submodule add".

I will have a look whether we can easily replace this hack with the proper
lookup now. Lets see how many low hanging fruits we have lying around
for recursive fetch. The full blown implementation including cloning of
new submodules might still take some time...

Cheers Heiko

[1] 
https://public-inbox.org/git/f5baa2acc09531a16f4f693eebbe60706bb8ed1e.1361751905.git.hvo...@hvoigt.net/


Re: [PATCH v4 4/4] submodule_needs_pushing() NEEDSWORK when we can not answer this question

2016-11-16 Thread Heiko Voigt
On Wed, Nov 16, 2016 at 11:18:07AM -0800, Junio C Hamano wrote:
> Heiko Voigt <hvo...@hvoigt.net> writes:
> 
> > Signed-off-by: Heiko Voigt <hvo...@hvoigt.net>
> > ---
> 
> Needs retitle ;-)  Here is what I tentatively queued.

Thanks ;-) Missed that one.

> submodule_needs_pushing(): explain the behaviour when we cannot answer
> 
> When we do not have commits that are involved in the update of the
> superproject in our copy of submodule, we cannot tell if the remote
> end needs to acquire these commits to be able to check out the
> superproject tree.  Explain why we answer "no there is no need/point
> in pushing from our submodule repository" in this case.
> 
> Signed-off-by: Heiko Voigt <hvo...@hvoigt.net>
> Signed-off-by: Junio C Hamano <gits...@pobox.com>

Sound fine to me.

Cheers Heiko


[PATCH v4 3/4] batch check whether submodule needs pushing into one call

2016-11-16 Thread Heiko Voigt
We run a command for each sha1 change in a submodule. This is
unnecessary since we can simply batch all sha1's we want to check into
one command. Lets do it so we can speedup the check when many submodule
changes are in need of checking.

Signed-off-by: Heiko Voigt <hvo...@hvoigt.net>
---
 submodule.c | 62 -
 1 file changed, 33 insertions(+), 29 deletions(-)

diff --git a/submodule.c b/submodule.c
index 12ac1ea..11391fa 100644
--- a/submodule.c
+++ b/submodule.c
@@ -507,27 +507,49 @@ static int append_sha1_to_argv(const unsigned char 
sha1[20], void *data)
return 0;
 }
 
-static int submodule_needs_pushing(const char *path, const unsigned char 
sha1[20])
+static int check_has_commit(const unsigned char sha1[20], void *data)
 {
-   if (add_submodule_odb(path) || !lookup_commit_reference(sha1))
+   int *has_commit = data;
+
+   if (!lookup_commit_reference(sha1))
+   *has_commit = 0;
+
+   return 0;
+}
+
+static int submodule_has_commits(const char *path, struct sha1_array *commits)
+{
+   int has_commit = 1;
+
+   if (add_submodule_odb(path))
+   return 0;
+
+   sha1_array_for_each_unique(commits, check_has_commit, _commit);
+   return has_commit;
+}
+
+static int submodule_needs_pushing(const char *path, struct sha1_array 
*commits)
+{
+   if (!submodule_has_commits(path, commits))
return 0;
 
if (for_each_remote_ref_submodule(path, has_remote, NULL) > 0) {
struct child_process cp = CHILD_PROCESS_INIT;
-   const char *argv[] = {"rev-list", NULL, "--not", "--remotes", 
"-n", "1" , NULL};
struct strbuf buf = STRBUF_INIT;
int needs_pushing = 0;
 
-   argv[1] = sha1_to_hex(sha1);
-   cp.argv = argv;
+   argv_array_push(, "rev-list");
+   sha1_array_for_each_unique(commits, append_sha1_to_argv, 
);
+   argv_array_pushl(, "--not", "--remotes", "-n", "1" , 
NULL);
+
prepare_submodule_repo_env(_array);
cp.git_cmd = 1;
cp.no_stdin = 1;
cp.out = -1;
cp.dir = path;
if (start_command())
-   die("Could not run 'git rev-list %s --not --remotes -n 
1' command in submodule %s",
-   sha1_to_hex(sha1), path);
+   die("Could not run 'git rev-list  --not 
--remotes -n 1' command in submodule %s",
+   path);
if (strbuf_read(, cp.out, 41))
needs_pushing = 1;
finish_command();
@@ -582,22 +604,6 @@ static void find_unpushed_submodule_commits(struct commit 
*commit,
diff_tree_combined_merge(commit, 1, );
 }
 
-struct collect_submodule_from_sha1s_data {
-   char *submodule_path;
-   struct string_list *needs_pushing;
-};
-
-static int collect_submodules_from_sha1s(const unsigned char sha1[20],
-   void *data)
-{
-   struct collect_submodule_from_sha1s_data *me = data;
-
-   if (submodule_needs_pushing(me->submodule_path, sha1))
-   string_list_insert(me->needs_pushing, me->submodule_path);
-
-   return 0;
-}
-
 static void free_submodules_sha1s(struct string_list *submodules)
 {
struct string_list_item *item;
@@ -634,12 +640,10 @@ int find_unpushed_submodules(struct sha1_array *commits,
argv_array_clear();
 
for_each_string_list_item(submodule, ) {
-   struct collect_submodule_from_sha1s_data data;
-   data.submodule_path = submodule->string;
-   data.needs_pushing = needs_pushing;
-   sha1_array_for_each_unique((struct sha1_array *) 
submodule->util,
-   collect_submodules_from_sha1s,
-   );
+   struct sha1_array *commits = (struct sha1_array *) 
submodule->util;
+
+   if (submodule_needs_pushing(submodule->string, commits))
+   string_list_insert(needs_pushing, submodule->string);
}
free_submodules_sha1s();
 
-- 
2.10.1.386.gc503e45



[PATCH v4 2/4] serialize collection of refs that contain submodule changes

2016-11-16 Thread Heiko Voigt
We are iterating over each pushed ref and want to check whether it
contains changes to submodules. Instead of immediately checking each ref
lets first collect them and then do the check for all of them in one
revision walk.

Signed-off-by: Heiko Voigt <hvo...@hvoigt.net>
---
 submodule.c | 35 ---
 submodule.h |  5 +++--
 transport.c | 29 +
 3 files changed, 44 insertions(+), 25 deletions(-)

diff --git a/submodule.c b/submodule.c
index b2908fe..12ac1ea 100644
--- a/submodule.c
+++ b/submodule.c
@@ -500,6 +500,13 @@ static int has_remote(const char *refname, const struct 
object_id *oid,
return 1;
 }
 
+static int append_sha1_to_argv(const unsigned char sha1[20], void *data)
+{
+   struct argv_array *argv = data;
+   argv_array_push(argv, sha1_to_hex(sha1));
+   return 0;
+}
+
 static int submodule_needs_pushing(const char *path, const unsigned char 
sha1[20])
 {
if (add_submodule_odb(path) || !lookup_commit_reference(sha1))
@@ -599,25 +606,24 @@ static void free_submodules_sha1s(struct string_list 
*submodules)
string_list_clear(submodules, 1);
 }
 
-int find_unpushed_submodules(unsigned char new_sha1[20],
+int find_unpushed_submodules(struct sha1_array *commits,
const char *remotes_name, struct string_list *needs_pushing)
 {
struct rev_info rev;
struct commit *commit;
-   const char *argv[] = {NULL, NULL, "--not", "NULL", NULL};
-   int argc = ARRAY_SIZE(argv) - 1;
-   char *sha1_copy;
struct string_list submodules = STRING_LIST_INIT_DUP;
struct string_list_item *submodule;
+   struct argv_array argv = ARGV_ARRAY_INIT;
 
-   struct strbuf remotes_arg = STRBUF_INIT;
-
-   strbuf_addf(_arg, "--remotes=%s", remotes_name);
init_revisions(, NULL);
-   sha1_copy = xstrdup(sha1_to_hex(new_sha1));
-   argv[1] = sha1_copy;
-   argv[3] = remotes_arg.buf;
-   setup_revisions(argc, argv, , NULL);
+
+   /* argv.argv[0] will be ignored by setup_revisions */
+   argv_array_push(, "find_unpushed_submodules");
+   sha1_array_for_each_unique(commits, append_sha1_to_argv, );
+   argv_array_push(, "--not");
+   argv_array_pushf(, "--remotes=%s", remotes_name);
+
+   setup_revisions(argv.argc, argv.argv, , NULL);
if (prepare_revision_walk())
die("revision walk setup failed");
 
@@ -625,8 +631,7 @@ int find_unpushed_submodules(unsigned char new_sha1[20],
find_unpushed_submodule_commits(commit, );
 
reset_revision_walk();
-   free(sha1_copy);
-   strbuf_release(_arg);
+   argv_array_clear();
 
for_each_string_list_item(submodule, ) {
struct collect_submodule_from_sha1s_data data;
@@ -663,12 +668,12 @@ static int push_submodule(const char *path)
return 1;
 }
 
-int push_unpushed_submodules(unsigned char new_sha1[20], const char 
*remotes_name)
+int push_unpushed_submodules(struct sha1_array *commits, const char 
*remotes_name)
 {
int i, ret = 1;
struct string_list needs_pushing = STRING_LIST_INIT_DUP;
 
-   if (!find_unpushed_submodules(new_sha1, remotes_name, _pushing))
+   if (!find_unpushed_submodules(commits, remotes_name, _pushing))
return 1;
 
for (i = 0; i < needs_pushing.nr; i++) {
diff --git a/submodule.h b/submodule.h
index d9e197a..9454806 100644
--- a/submodule.h
+++ b/submodule.h
@@ -3,6 +3,7 @@
 
 struct diff_options;
 struct argv_array;
+struct sha1_array;
 
 enum {
RECURSE_SUBMODULES_CHECK = -4,
@@ -62,9 +63,9 @@ int submodule_uses_gitfile(const char *path);
 int ok_to_remove_submodule(const char *path);
 int merge_submodule(unsigned char result[20], const char *path, const unsigned 
char base[20],
const unsigned char a[20], const unsigned char b[20], int 
search);
-int find_unpushed_submodules(unsigned char new_sha1[20], const char 
*remotes_name,
+int find_unpushed_submodules(struct sha1_array *commits, const char 
*remotes_name,
struct string_list *needs_pushing);
-int push_unpushed_submodules(unsigned char new_sha1[20], const char 
*remotes_name);
+int push_unpushed_submodules(struct sha1_array *commits, const char 
*remotes_name);
 void connect_work_tree_and_git_dir(const char *work_tree, const char *git_dir);
 int parallel_submodules(void);
 
diff --git a/transport.c b/transport.c
index d57e8de..f482869 100644
--- a/transport.c
+++ b/transport.c
@@ -949,23 +949,36 @@ int transport_push(struct transport *transport,
 
if ((flags & TRANSPORT_RECURSE_SUBMODULES_ON_DEMAND) && 
!is_bare_repository()) {
struct ref *ref = remote_refs;
+   struct sha1_array commits = SHA1_ARRAY_INIT;
+
for (; ref; ref = ref->next)
-   i

[PATCH v4 4/4] submodule_needs_pushing() NEEDSWORK when we can not answer this question

2016-11-16 Thread Heiko Voigt
Signed-off-by: Heiko Voigt <hvo...@hvoigt.net>
---
 submodule.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/submodule.c b/submodule.c
index 11391fa..00dd655 100644
--- a/submodule.c
+++ b/submodule.c
@@ -531,6 +531,17 @@ static int submodule_has_commits(const char *path, struct 
sha1_array *commits)
 static int submodule_needs_pushing(const char *path, struct sha1_array 
*commits)
 {
if (!submodule_has_commits(path, commits))
+   /*
+* NOTE: We do consider it safe to return "no" here. The
+* correct answer would be "We do not know" instead of
+* "No push needed", but it is quite hard to change
+* the submodule pointer without having the submodule
+* around. If a user did however change the submodules
+* without having the submodule around, this indicates
+* an expert who knows what they are doing or a
+* maintainer integrating work from other people. In
+* both cases it should be safe to skip this check.
+*/
return 0;
 
if (for_each_remote_ref_submodule(path, has_remote, NULL) > 0) {
-- 
2.10.1.386.gc503e45



[PATCH v4 1/4] serialize collection of changed submodules

2016-11-16 Thread Heiko Voigt
To check whether a submodule needs to be pushed we need to collect all
changed submodules. Lets collect them first and then execute the
possibly expensive test whether certain revisions are already pushed
only once per submodule.

There is further potential for optimization since we can assemble one
command and only issued that instead of one call for each remote ref in
the submodule.

Signed-off-by: Heiko Voigt <hvo...@hvoigt.net>
---
 submodule.c | 59 +++
 1 file changed, 55 insertions(+), 4 deletions(-)

diff --git a/submodule.c b/submodule.c
index 6f7d883..b2908fe 100644
--- a/submodule.c
+++ b/submodule.c
@@ -532,19 +532,34 @@ static int submodule_needs_pushing(const char *path, 
const unsigned char sha1[20
return 0;
 }
 
+static struct sha1_array *submodule_commits(struct string_list *submodules,
+   const char *path)
+{
+   struct string_list_item *item;
+
+   item = string_list_insert(submodules, path);
+   if (item->util)
+   return (struct sha1_array *) item->util;
+
+   /* NEEDSWORK: should we have sha1_array_init()? */
+   item->util = xcalloc(1, sizeof(struct sha1_array));
+   return (struct sha1_array *) item->util;
+}
+
 static void collect_submodules_from_diff(struct diff_queue_struct *q,
 struct diff_options *options,
 void *data)
 {
int i;
-   struct string_list *needs_pushing = data;
+   struct string_list *submodules = data;
 
for (i = 0; i < q->nr; i++) {
struct diff_filepair *p = q->queue[i];
+   struct sha1_array *commits;
if (!S_ISGITLINK(p->two->mode))
continue;
-   if (submodule_needs_pushing(p->two->path, p->two->oid.hash))
-   string_list_insert(needs_pushing, p->two->path);
+   commits = submodule_commits(submodules, p->two->path);
+   sha1_array_append(commits, p->two->oid.hash);
}
 }
 
@@ -560,6 +575,30 @@ static void find_unpushed_submodule_commits(struct commit 
*commit,
diff_tree_combined_merge(commit, 1, );
 }
 
+struct collect_submodule_from_sha1s_data {
+   char *submodule_path;
+   struct string_list *needs_pushing;
+};
+
+static int collect_submodules_from_sha1s(const unsigned char sha1[20],
+   void *data)
+{
+   struct collect_submodule_from_sha1s_data *me = data;
+
+   if (submodule_needs_pushing(me->submodule_path, sha1))
+   string_list_insert(me->needs_pushing, me->submodule_path);
+
+   return 0;
+}
+
+static void free_submodules_sha1s(struct string_list *submodules)
+{
+   struct string_list_item *item;
+   for_each_string_list_item(item, submodules)
+   sha1_array_clear((struct sha1_array *) item->util);
+   string_list_clear(submodules, 1);
+}
+
 int find_unpushed_submodules(unsigned char new_sha1[20],
const char *remotes_name, struct string_list *needs_pushing)
 {
@@ -568,6 +607,8 @@ int find_unpushed_submodules(unsigned char new_sha1[20],
const char *argv[] = {NULL, NULL, "--not", "NULL", NULL};
int argc = ARRAY_SIZE(argv) - 1;
char *sha1_copy;
+   struct string_list submodules = STRING_LIST_INIT_DUP;
+   struct string_list_item *submodule;
 
struct strbuf remotes_arg = STRBUF_INIT;
 
@@ -581,12 +622,22 @@ int find_unpushed_submodules(unsigned char new_sha1[20],
die("revision walk setup failed");
 
while ((commit = get_revision()) != NULL)
-   find_unpushed_submodule_commits(commit, needs_pushing);
+   find_unpushed_submodule_commits(commit, );
 
reset_revision_walk();
free(sha1_copy);
strbuf_release(_arg);
 
+   for_each_string_list_item(submodule, ) {
+   struct collect_submodule_from_sha1s_data data;
+   data.submodule_path = submodule->string;
+   data.needs_pushing = needs_pushing;
+   sha1_array_for_each_unique((struct sha1_array *) 
submodule->util,
+   collect_submodules_from_sha1s,
+   );
+   }
+   free_submodules_sha1s();
+
return needs_pushing->nr;
 }
 
-- 
2.10.1.386.gc503e45



[PATCH v4 0/4] Speedup finding of unpushed submodules

2016-11-16 Thread Heiko Voigt
You can find the third iteration of this series here:

http://public-inbox.org/git/cover.1479221071.git.hvo...@hvoigt.net/

All comments from the last iteration should be addressed.

Cheers Heiko

Heiko Voigt (4):
  serialize collection of changed submodules
  serialize collection of refs that contain submodule changes
  batch check whether submodule needs pushing into one call
  submodule_needs_pushing() NEEDSWORK when we can not answer this
question

 submodule.c | 123 +++-
 submodule.h |   5 ++-
 transport.c |  29 ++
 3 files changed, 121 insertions(+), 36 deletions(-)

-- 
2.10.1.386.gc503e45



Re: [PATCH v1 2/2] travis-ci: disable GIT_TEST_HTTPD for macOS

2016-11-16 Thread Heiko Voigt
On Tue, Nov 15, 2016 at 10:31:59AM -0500, Jeff King wrote:
> On Tue, Nov 15, 2016 at 01:07:18PM +0100, Heiko Voigt wrote:
> 
> > On Fri, Nov 11, 2016 at 09:22:51AM +0100, Lars Schneider wrote:
> > > To all macOS users on the list:
> > > Does anyone execute the tests with GIT_TEST_HTTPD enabled successfully?
> > 
> > Nope. The following tests fail for me on master: 5539, 5540, 5541, 5542,
> > 5550, 5551, 5561, 5812.
> 
> Failing how? Does apache fail to start up? Do tests fails? What does
> "-v" say? Is there anything interesting in httpd/error.log in the trash
> directory?

This is what I see for 5539:

$ GIT_TEST_HTTPD=1 ./t5539-fetch-http-shallow.sh -v
Initialized empty Git repository in /Users/hvoigt/Repository/git4/t/trash 
directory.t5539-fetch-http-shallow/.git/
checking prerequisite: NOT_ROOT

mkdir -p "$TRASH_DIRECTORY/prereq-test-dir" &&
(
cd "$TRASH_DIRECTORY/prereq-test-dir" &&
uid=$(id -u) &&
test "$uid" != 0

)
prerequisite NOT_ROOT ok
httpd: Syntax error on line 65 of 
/Users/hvoigt/Repository/git4/t/lib-httpd/apache.conf: Cannot load 
modules/mod_mpm_prefork.so into server: 
dlopen(/Users/hvoigt/Repository/git4/t/trash 
directory.t5539-fetch-http-shallow/httpd/modules/mod_mpm_prefork.so, 10): image 
not found
error: web server setup failed


It seems the other failures have the same cause.

Cheers Heiko


Re: [PATCH v3 3/4] batch check whether submodule needs pushing into one call

2016-11-16 Thread Heiko Voigt
On Tue, Nov 15, 2016 at 02:28:31PM -0800, Stefan Beller wrote:
> On Tue, Nov 15, 2016 at 6:56 AM, Heiko Voigt <hvo...@hvoigt.net> wrote:
> 
> > -static int submodule_needs_pushing(const char *path, const unsigned char 
> > sha1[20])
> > +static int check_has_commit(const unsigned char sha1[20], void *data)
> >  {
> > -   if (add_submodule_odb(path) || !lookup_commit_reference(sha1))
> > +   int *has_commit = (int *) data;
> 
> nit: just as prior patches ;) void* can be cast implicitly.

Even though its just a nit: Will remove all the void casts. :)

Cheers Heiko


Re: [PATCH v3 4/4] submodule_needs_pushing() NEEDSWORK when we can not answer this question

2016-11-16 Thread Heiko Voigt
On Tue, Nov 15, 2016 at 04:13:51PM -0800, Junio C Hamano wrote:
> Stefan Beller  writes:
> 
> >> "We do not know" ...
> >
> > ... because there is no way to check for us as we don't have the
> > submodule commits.
> >
> > " We do consider it safe as no one in their sane mind would
> > have changed the submodule pointers without having the
> > submodule around. If a user did however change the submodules
> > without having the submodule commits around, this indicates an
> > expert who knows what they were doing."
> 
> I didn't think it through myself to arrive at such a conclusion, but
> to me the above sounds like a sensible reasoning [*1*].

I think you have a point here. If I rephrase it like this: "We do
consider it safe as no one in their sane mind *could* have changed the
submodule pointers without having the submodule around..."

Since its actually hard to create such a situation without the submodule
commit around I agree here.

> *1* My version was more like "we do not know if they would get into
> a situation where they do not have enough submodule commits if
> we pushed our superproject, but more importantly, we DO KNOW
> that it would not help an iota if we pushed our submodule to
> them, so there is no point stopping the push of superproject
> saying 'no, no, no, you must push the submodule first'".

Yes saying that would be wrong. I was rather suggesting that we tell the
user that we could not find the submodule commits to and that if he
wants to proceed he should either pass --recurse-submodules=no or
initialize the submodule.

But I think the above reasoning obsoletes my suggestion. I would adjust
the comment accordingly but still keep the patch so we have
documentation that this behavior is on purpose.

Cheers Heiko


Re: Git status takes too long- How to improve the performance of git

2016-11-15 Thread Heiko Voigt
On Tue, Nov 15, 2016 at 02:33:12AM -0700, ravalika wrote:
> Number of files - 63883

Since you also posted this to the "Git for Windows" mailinglist I assume
that you are using Windows. Reduce the number of files. For example
split the repository into two one for documentation and one for source.
Thats what I did with a converted repository that had to many files.

Windows is unfortunately very slow when it comes to handling many files
and if I recall correctly ~3 files was in a nicely handleable range
for a Git repository on Windows, but that might have changed...

Cheers Heiko


[PATCH v3 3/4] batch check whether submodule needs pushing into one call

2016-11-15 Thread Heiko Voigt
We run a command for each sha1 change in a submodule. This is
unnecessary since we can simply batch all sha1's we want to check into
one command. Lets do it so we can speedup the check when many submodule
changes are in need of checking.

Signed-off-by: Heiko Voigt <hvo...@hvoigt.net>
---
 submodule.c | 63 -
 1 file changed, 33 insertions(+), 30 deletions(-)

diff --git a/submodule.c b/submodule.c
index 769d666..e1196fd 100644
--- a/submodule.c
+++ b/submodule.c
@@ -507,27 +507,49 @@ static int append_sha1_to_argv(const unsigned char 
sha1[20], void *data)
return 0;
 }
 
-static int submodule_needs_pushing(const char *path, const unsigned char 
sha1[20])
+static int check_has_commit(const unsigned char sha1[20], void *data)
 {
-   if (add_submodule_odb(path) || !lookup_commit_reference(sha1))
+   int *has_commit = (int *) data;
+
+   if (!lookup_commit_reference(sha1))
+   *has_commit = 0;
+
+   return 0;
+}
+
+static int submodule_has_commits(const char *path, struct sha1_array *commits)
+{
+   int has_commit = 1;
+
+   if (add_submodule_odb(path))
+   return 0;
+
+   sha1_array_for_each_unique(commits, check_has_commit, _commit);
+   return has_commit;
+}
+
+static int submodule_needs_pushing(const char *path, struct sha1_array 
*commits)
+{
+   if (!submodule_has_commits(path, commits))
return 0;
 
if (for_each_remote_ref_submodule(path, has_remote, NULL) > 0) {
struct child_process cp = CHILD_PROCESS_INIT;
-   const char *argv[] = {"rev-list", NULL, "--not", "--remotes", 
"-n", "1" , NULL};
struct strbuf buf = STRBUF_INIT;
int needs_pushing = 0;
 
-   argv[1] = sha1_to_hex(sha1);
-   cp.argv = argv;
+   argv_array_push(, "rev-list");
+   sha1_array_for_each_unique(commits, append_sha1_to_argv, 
);
+   argv_array_pushl(, "--not", "--remotes", "-n", "1" , 
NULL);
+
prepare_submodule_repo_env(_array);
cp.git_cmd = 1;
cp.no_stdin = 1;
cp.out = -1;
cp.dir = path;
if (start_command())
-   die("Could not run 'git rev-list %s --not --remotes -n 
1' command in submodule %s",
-   sha1_to_hex(sha1), path);
+   die("Could not run 'git rev-list  --not 
--remotes -n 1' command in submodule %s",
+   path);
if (strbuf_read(, cp.out, 41))
needs_pushing = 1;
finish_command();
@@ -582,23 +604,6 @@ static void find_unpushed_submodule_commits(struct commit 
*commit,
diff_tree_combined_merge(commit, 1, );
 }
 
-struct collect_submodule_from_sha1s_data {
-   char *submodule_path;
-   struct string_list *needs_pushing;
-};
-
-static int collect_submodules_from_sha1s(const unsigned char sha1[20],
-   void *data)
-{
-   struct collect_submodule_from_sha1s_data *me =
-   (struct collect_submodule_from_sha1s_data *) data;
-
-   if (submodule_needs_pushing(me->submodule_path, sha1))
-   string_list_insert(me->needs_pushing, me->submodule_path);
-
-   return 0;
-}
-
 static void free_submodules_sha1s(struct string_list *submodules)
 {
struct string_list_item *item;
@@ -635,12 +640,10 @@ int find_unpushed_submodules(struct sha1_array *commits,
argv_array_clear();
 
for_each_string_list_item(submodule, ) {
-   struct collect_submodule_from_sha1s_data data;
-   data.submodule_path = submodule->string;
-   data.needs_pushing = needs_pushing;
-   sha1_array_for_each_unique((struct sha1_array *) 
submodule->util,
-   collect_submodules_from_sha1s,
-   );
+   struct sha1_array *commits = (struct sha1_array *) 
submodule->util;
+
+   if (submodule_needs_pushing(submodule->string, commits))
+   string_list_insert(needs_pushing, submodule->string);
}
free_submodules_sha1s();
 
-- 
2.10.1.386.gc503e45



[PATCH v3 2/4] serialize collection of refs that contain submodule changes

2016-11-15 Thread Heiko Voigt
We are iterating over each pushed ref and want to check whether it
contains changes to submodules. Instead of immediately checking each ref
lets first collect them and then do the check for all of them in one
revision walk.

Signed-off-by: Heiko Voigt <hvo...@hvoigt.net>
---
 submodule.c | 35 ---
 submodule.h |  5 +++--
 transport.c | 29 +
 3 files changed, 44 insertions(+), 25 deletions(-)

diff --git a/submodule.c b/submodule.c
index b91585e..769d666 100644
--- a/submodule.c
+++ b/submodule.c
@@ -500,6 +500,13 @@ static int has_remote(const char *refname, const struct 
object_id *oid,
return 1;
 }
 
+static int append_sha1_to_argv(const unsigned char sha1[20], void *data)
+{
+   struct argv_array *argv = (struct argv_array *) data;
+   argv_array_push(argv, sha1_to_hex(sha1));
+   return 0;
+}
+
 static int submodule_needs_pushing(const char *path, const unsigned char 
sha1[20])
 {
if (add_submodule_odb(path) || !lookup_commit_reference(sha1))
@@ -600,25 +607,24 @@ static void free_submodules_sha1s(struct string_list 
*submodules)
string_list_clear(submodules, 1);
 }
 
-int find_unpushed_submodules(unsigned char new_sha1[20],
+int find_unpushed_submodules(struct sha1_array *commits,
const char *remotes_name, struct string_list *needs_pushing)
 {
struct rev_info rev;
struct commit *commit;
-   const char *argv[] = {NULL, NULL, "--not", "NULL", NULL};
-   int argc = ARRAY_SIZE(argv) - 1;
-   char *sha1_copy;
struct string_list submodules = STRING_LIST_INIT_DUP;
struct string_list_item *submodule;
+   struct argv_array argv = ARGV_ARRAY_INIT;
 
-   struct strbuf remotes_arg = STRBUF_INIT;
-
-   strbuf_addf(_arg, "--remotes=%s", remotes_name);
init_revisions(, NULL);
-   sha1_copy = xstrdup(sha1_to_hex(new_sha1));
-   argv[1] = sha1_copy;
-   argv[3] = remotes_arg.buf;
-   setup_revisions(argc, argv, , NULL);
+
+   /* argv.argv[0] will be ignored by setup_revisions */
+   argv_array_push(, "find_unpushed_submodules");
+   sha1_array_for_each_unique(commits, append_sha1_to_argv, );
+   argv_array_push(, "--not");
+   argv_array_pushf(, "--remotes=%s", remotes_name);
+
+   setup_revisions(argv.argc, argv.argv, , NULL);
if (prepare_revision_walk())
die("revision walk setup failed");
 
@@ -626,8 +632,7 @@ int find_unpushed_submodules(unsigned char new_sha1[20],
find_unpushed_submodule_commits(commit, );
 
reset_revision_walk();
-   free(sha1_copy);
-   strbuf_release(_arg);
+   argv_array_clear();
 
for_each_string_list_item(submodule, ) {
struct collect_submodule_from_sha1s_data data;
@@ -664,12 +669,12 @@ static int push_submodule(const char *path)
return 1;
 }
 
-int push_unpushed_submodules(unsigned char new_sha1[20], const char 
*remotes_name)
+int push_unpushed_submodules(struct sha1_array *commits, const char 
*remotes_name)
 {
int i, ret = 1;
struct string_list needs_pushing = STRING_LIST_INIT_DUP;
 
-   if (!find_unpushed_submodules(new_sha1, remotes_name, _pushing))
+   if (!find_unpushed_submodules(commits, remotes_name, _pushing))
return 1;
 
for (i = 0; i < needs_pushing.nr; i++) {
diff --git a/submodule.h b/submodule.h
index d9e197a..9454806 100644
--- a/submodule.h
+++ b/submodule.h
@@ -3,6 +3,7 @@
 
 struct diff_options;
 struct argv_array;
+struct sha1_array;
 
 enum {
RECURSE_SUBMODULES_CHECK = -4,
@@ -62,9 +63,9 @@ int submodule_uses_gitfile(const char *path);
 int ok_to_remove_submodule(const char *path);
 int merge_submodule(unsigned char result[20], const char *path, const unsigned 
char base[20],
const unsigned char a[20], const unsigned char b[20], int 
search);
-int find_unpushed_submodules(unsigned char new_sha1[20], const char 
*remotes_name,
+int find_unpushed_submodules(struct sha1_array *commits, const char 
*remotes_name,
struct string_list *needs_pushing);
-int push_unpushed_submodules(unsigned char new_sha1[20], const char 
*remotes_name);
+int push_unpushed_submodules(struct sha1_array *commits, const char 
*remotes_name);
 void connect_work_tree_and_git_dir(const char *work_tree, const char *git_dir);
 int parallel_submodules(void);
 
diff --git a/transport.c b/transport.c
index d57e8de..f482869 100644
--- a/transport.c
+++ b/transport.c
@@ -949,23 +949,36 @@ int transport_push(struct transport *transport,
 
if ((flags & TRANSPORT_RECURSE_SUBMODULES_ON_DEMAND) && 
!is_bare_repository()) {
struct ref *ref = remote_refs;
+   struct sha1_array commits = SHA1_ARRAY_INIT;
+
   

[PATCH v3 4/4] submodule_needs_pushing() NEEDSWORK when we can not answer this question

2016-11-15 Thread Heiko Voigt
Signed-off-by: Heiko Voigt <hvo...@hvoigt.net>
---
 submodule.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/submodule.c b/submodule.c
index e1196fd..29efee9 100644
--- a/submodule.c
+++ b/submodule.c
@@ -531,6 +531,14 @@ static int submodule_has_commits(const char *path, struct 
sha1_array *commits)
 static int submodule_needs_pushing(const char *path, struct sha1_array 
*commits)
 {
if (!submodule_has_commits(path, commits))
+   /* NEEDSWORK: The correct answer here is "We do not
+* know" instead of "No push needed". We currently
+* proceed pushing here as if the submodules commits are
+* available on a remote. Since we can not check the
+* remote availability for this submodule we should
+* consider changing this behavior to: Stop here and
+* tell the user how to skip this check if wanted.
+*/
return 0;
 
if (for_each_remote_ref_submodule(path, has_remote, NULL) > 0) {
-- 
2.10.1.386.gc503e45



[PATCH v3 1/4] serialize collection of changed submodules

2016-11-15 Thread Heiko Voigt
To check whether a submodule needs to be pushed we need to collect all
changed submodules. Lets collect them first and then execute the
possibly expensive test whether certain revisions are already pushed
only once per submodule.

There is further potential for optimization since we can assemble one
command and only issued that instead of one call for each remote ref in
the submodule.

Signed-off-by: Heiko Voigt <hvo...@hvoigt.net>
---
 submodule.c | 60 
 1 file changed, 56 insertions(+), 4 deletions(-)

diff --git a/submodule.c b/submodule.c
index 6f7d883..b91585e 100644
--- a/submodule.c
+++ b/submodule.c
@@ -532,19 +532,34 @@ static int submodule_needs_pushing(const char *path, 
const unsigned char sha1[20
return 0;
 }
 
+static struct sha1_array *submodule_commits(struct string_list *submodules,
+   const char *path)
+{
+   struct string_list_item *item;
+
+   item = string_list_insert(submodules, path);
+   if (item->util)
+   return (struct sha1_array *) item->util;
+
+   /* NEEDSWORK: should we have sha1_array_init()? */
+   item->util = xcalloc(1, sizeof(struct sha1_array));
+   return (struct sha1_array *) item->util;
+}
+
 static void collect_submodules_from_diff(struct diff_queue_struct *q,
 struct diff_options *options,
 void *data)
 {
int i;
-   struct string_list *needs_pushing = data;
+   struct string_list *submodules = data;
 
for (i = 0; i < q->nr; i++) {
struct diff_filepair *p = q->queue[i];
+   struct sha1_array *commits;
if (!S_ISGITLINK(p->two->mode))
continue;
-   if (submodule_needs_pushing(p->two->path, p->two->oid.hash))
-   string_list_insert(needs_pushing, p->two->path);
+   commits = submodule_commits(submodules, p->two->path);
+   sha1_array_append(commits, p->two->oid.hash);
}
 }
 
@@ -560,6 +575,31 @@ static void find_unpushed_submodule_commits(struct commit 
*commit,
diff_tree_combined_merge(commit, 1, );
 }
 
+struct collect_submodule_from_sha1s_data {
+   char *submodule_path;
+   struct string_list *needs_pushing;
+};
+
+static int collect_submodules_from_sha1s(const unsigned char sha1[20],
+   void *data)
+{
+   struct collect_submodule_from_sha1s_data *me =
+   (struct collect_submodule_from_sha1s_data *) data;
+
+   if (submodule_needs_pushing(me->submodule_path, sha1))
+   string_list_insert(me->needs_pushing, me->submodule_path);
+
+   return 0;
+}
+
+static void free_submodules_sha1s(struct string_list *submodules)
+{
+   struct string_list_item *item;
+   for_each_string_list_item(item, submodules)
+   sha1_array_clear((struct sha1_array *) item->util);
+   string_list_clear(submodules, 1);
+}
+
 int find_unpushed_submodules(unsigned char new_sha1[20],
const char *remotes_name, struct string_list *needs_pushing)
 {
@@ -568,6 +608,8 @@ int find_unpushed_submodules(unsigned char new_sha1[20],
const char *argv[] = {NULL, NULL, "--not", "NULL", NULL};
int argc = ARRAY_SIZE(argv) - 1;
char *sha1_copy;
+   struct string_list submodules = STRING_LIST_INIT_DUP;
+   struct string_list_item *submodule;
 
struct strbuf remotes_arg = STRBUF_INIT;
 
@@ -581,12 +623,22 @@ int find_unpushed_submodules(unsigned char new_sha1[20],
die("revision walk setup failed");
 
while ((commit = get_revision()) != NULL)
-   find_unpushed_submodule_commits(commit, needs_pushing);
+   find_unpushed_submodule_commits(commit, );
 
reset_revision_walk();
free(sha1_copy);
strbuf_release(_arg);
 
+   for_each_string_list_item(submodule, ) {
+   struct collect_submodule_from_sha1s_data data;
+   data.submodule_path = submodule->string;
+   data.needs_pushing = needs_pushing;
+   sha1_array_for_each_unique((struct sha1_array *) 
submodule->util,
+   collect_submodules_from_sha1s,
+   );
+   }
+   free_submodules_sha1s();
+
return needs_pushing->nr;
 }
 
-- 
2.10.1.386.gc503e45



[PATCH v3 0/4] Speedup finding of unpushed submodules

2016-11-15 Thread Heiko Voigt
You can find the second iteration of this series here:

http://public-inbox.org/git/cover.1475851621.git.hvo...@hvoigt.net/

All mentioned issues should be fixed. I put the NEEDSWORK comment in a
seperate patch since it seemed to me as if we did not fully agree on
that. So in case we decide against it we can just drop that patch.

Cheers Heiko

Heiko Voigt (4):
  serialize collection of changed submodules
  serialize collection of refs that contain submodule changes
  batch check whether submodule needs pushing into one call
  submodule_needs_pushing() NEEDSWORK when we can not answer this
question

 submodule.c | 120 +++-
 submodule.h |   5 ++-
 transport.c |  29 +++
 3 files changed, 118 insertions(+), 36 deletions(-)

-- 
2.10.1.386.gc503e45



Re: [PATCH v1 2/2] travis-ci: disable GIT_TEST_HTTPD for macOS

2016-11-15 Thread Heiko Voigt
On Fri, Nov 11, 2016 at 09:22:51AM +0100, Lars Schneider wrote:
> To all macOS users on the list:
> Does anyone execute the tests with GIT_TEST_HTTPD enabled successfully?

Nope. The following tests fail for me on master: 5539, 5540, 5541, 5542,
5550, 5551, 5561, 5812.

Cheers Heiko


Re: Uninitialized submodules as symlinks

2016-10-17 Thread Heiko Voigt
On Fri, Oct 14, 2016 at 09:48:16AM -0700, Junio C Hamano wrote:
> Kevin Daudt <m...@ikke.info> writes:
> 
> > On Thu, Oct 13, 2016 at 06:10:17PM +0200, Heiko Voigt wrote:
> >> On Fri, Oct 07, 2016 at 06:17:05PM +, David Turner wrote:
> >> > Presently, uninitialized submodules are materialized in the working
> >> > tree as empty directories.  We would like to consider having them be
> >> > symlinks.  Specifically, we'd like them to be symlinks into a FUSE
> >> > filesystem which retrieves files on demand.
> >> 
> >> How about portability? This feature would only work on Unix like
> >> operating systems. You have to be careful to not break Windows since
> >> they do not have symlinks.
> >
> > NTFS does have symlinks, but you need admin right to create them though
> > (unless you change the policy).
> 
> That sounds like saying "It has, but it practically is not usable by
> Git as a mechanism to achieve this goal" to me.

Yes and that is why Git for Windows does not use them and I simplified
to: "Windows does not have symlinks". For a normal user there is no such
thing as symlinks on Windows, unfortunately.

Cheers Heiko


Re: [PATCH v2 3/3] batch check whether submodule needs pushing into one call

2016-10-13 Thread Heiko Voigt
On Wed, Oct 12, 2016 at 10:37:33AM -0700, Junio C Hamano wrote:
> Heiko Voigt <hvo...@hvoigt.net> writes:
> 
> >> If we do not even have these commits locally, then there is no point
> >> attempting to push, so returning 0 (i.e. it is not "needs pushing"
> >> situation) is correct but it is a but subtle.  It's not "we know
> >> they already have them", but it is "even if we tried to push, it
> >> won't do us or the other side any good."  A single-liner in-code
> >> comment may help.
> >
> > First the naming part. How about:
> >
> > submodule_has_commits()
> 
> Nice.

Ok will use that. And while I am at it: I will also rename all the
'hashes' variables to commits because that makes the code way clearer I
think.

> > Returning 0 here means: "No push needed" but the correct answer would
> > be: "We do not know". 
> 
> Is it?  Perhaps I am misreading the "submodule-has-commits"; I
> thought it was "the remote may or may not need updating, but we
> ourselves don't have what they may need to have commits in their
> submodule that are referenced by their superproject, so it would not
> help them even if we pushed our submodule to them".  It indeed is
> different from "No push needed" (rather, "our pushing would be
> pointless").

Yes you could also rephrase/see it that way. But the question is: If we
do not have what the remote needs would the user expect us to tell him
that fact and stop or does he usually not care?

> > So how about:
> >
> >
> > if (!submodule_has_hashes(path, hashes))
> > /* NEEDSWORK: The correct answer here is "We do not
> >  * know" instead of "No". We currently proceed pushing
> >  * here as if the submodules commits are available on a
> >  * remote, which is not always correct. */
> > return 0;
> 
> I am not sure.  
> 
> What should happen in this scenario?
> 
>  * We have two remotes, A and B, for our superproject.
> 
>  * We are not interested in one submodule at path X.  Our repository
>is primarily used to work on the superproject and possibly other
>submodules but not the one at path X.
> 
>  * We pulled from A to update ourselves.  They were actively working
>on the submodule we are not interested in, and path X in the
>superproject records a new commit that we do not have.
> 
>  * We are now trying to push to B.

I am not sure if this is a typical scenario? Well, if you are updating
your main branch from someone else and then push it to your own fork
maybe. You could specify --no-recurse-submodules for this case though.
The proper solution for this case would probably be something along the
lines of 'submodule..fetchRecurseSubmodules' but for push so we
can mark certain submodules as uninteresting by default.

I like to be more protective to the user here. Its usually more
annoying for possibly many others when you push out things that have
missing things compared to one person not being able to push because his
submodule is not up-to-date/initialized.

> Should different things happen in these two subcases?
> 
>  - We are not interested in submodule at path X, so we haven't even
>done "submodule init" on it.
> 
>  - We are not interested in submodule at path X, so even though we
>do have a rather stale clone of it, we do not usually bother
>updating what is checked out at path X and commit our changes
>outside that area.
> 
> I tend to think that in these two cases the same thing should
> happen.  I am not sure if that same thing should be rejection
> (i.e. "you do not know for sure that the commit at path X of the
> superproject you are pushing exists in the submodule repository at
> the receiving end, so I'd refuse to push the superproject"), as it
> makes the only remedy for the situation is for you to make a full
> clone of the submodule you are not interested in and you have never
> touched yourself in either of these two subcases.

I also think in both situations the same thing should happen. A decision
that something different should happen should be made explicitly instead
of implicitly just because some submodule is not initialized. That might
be by accident or because a certain submodule is new so here the choice
should be made deliberately by the user, IMO.

Cheers Heiko


Re: Uninitialized submodules as symlinks

2016-10-13 Thread Heiko Voigt
On Fri, Oct 07, 2016 at 06:17:05PM +, David Turner wrote:
> Presently, uninitialized submodules are materialized in the working
> tree as empty directories.  We would like to consider having them be
> symlinks.  Specifically, we'd like them to be symlinks into a FUSE
> filesystem which retrieves files on demand.

How about portability? This feature would only work on Unix like
operating systems. You have to be careful to not break Windows since
they do not have symlinks.

Cheers Heiko


Re: [PATCH v2 1/3] serialize collection of changed submodules

2016-10-13 Thread Heiko Voigt
On Wed, Oct 12, 2016 at 10:18:28AM -0700, Junio C Hamano wrote:
> Heiko Voigt <hvo...@hvoigt.net> writes:
> 
> > Which seems quite extensively long for a static function so how about
> > we shorten it a bit and add a comment:
> >
> > /* lookup or create commit object list for submodule */
> > get_commit_objects_for_submodule_path(...
> 
> Or you can even lose "get_" and "path", I guess.  You are not even
> "getting" commits but the array that holds them, so the caller can
> use it to "get" one of them or it can even use it to "put" a new
> one, no?  "get-commit-objects" is a misnomer in that sense.  Either
> one of
> 
> get_submodule_commits_array()
> submodule_commits()
> 
> perhaps?  I dunno.

I like the last one. Will use 'submodule_commits()'.

Cheers Heiko


Re: [PATCH v2 3/3] batch check whether submodule needs pushing into one call

2016-10-12 Thread Heiko Voigt
On Mon, Oct 10, 2016 at 03:56:13PM -0700, Junio C Hamano wrote:
> Heiko Voigt <hvo...@hvoigt.net> writes:
> 
> > -static int submodule_needs_pushing(const char *path, const unsigned char 
> > sha1[20])
> > +static int check_has_hash(const unsigned char sha1[20], void *data)
> >  {
> > -   if (add_submodule_odb(path) || !lookup_commit_reference(sha1))
> > +   int *has_hash = (int *) data;
> > +
> > +   if (!lookup_commit_reference(sha1))
> > +   *has_hash = 0;
> > +
> > +   return 0;
> > +}
> > +
> > +static int submodule_has_hashes(const char *path, struct sha1_array 
> > *hashes)
> > +{
> > +   int has_hash = 1;
> > +
> > +   if (add_submodule_odb(path))
> > +   return 0;
> > +
> > +   sha1_array_for_each_unique(hashes, check_has_hash, _hash);
> > +   return has_hash;
> > +}
> > +
> > +static int submodule_needs_pushing(const char *path, struct sha1_array 
> > *hashes)
> > +{
> > +   if (!submodule_has_hashes(path, hashes))
> > return 0;
> 
> Same comment about naming.  
> 
> What do check-has-hash and submodule-has-hashes exactly mean by
> "hash" in their names?  Because I think what is checked here is
> "does the local submodule repository have _all_ the commits
> referenced from the superproject commit we are pushing?", so I'd
> prefer to see "commit" in their names.
> 
> If we do not even have these commits locally, then there is no point
> attempting to push, so returning 0 (i.e. it is not "needs pushing"
> situation) is correct but it is a but subtle.  It's not "we know
> they already have them", but it is "even if we tried to push, it
> won't do us or the other side any good."  A single-liner in-code
> comment may help.

First the naming part. How about:

submodule_has_commits()

?

Second as mentioned a previous answer[1] to this part: I would actually
like to have a die() here instead of blindly proceeding. Since the user
either specified --recurse-submodules=... at the commandline or it was
implicitly enabled because we have submodules in the tree we should be
careful and not push revisions referencing submodules that are not
available at a remote. If we can not properly figure it out I would
suggest to stop and tell the user how to solve the situation. E.g.
either she clones the appropriate submodules or specifies
--no-recurse-submodules on the commandline to tell git that she does not
care.

Returning 0 here means: "No push needed" but the correct answer would
be: "We do not know". Question is what we should do here which I am
planning to address in a separate patch series since that will be
changing behavior.

So how about:


if (!submodule_has_hashes(path, hashes))
/* NEEDSWORK: The correct answer here is "We do not
 * know" instead of "No". We currently proceed pushing
 * here as if the submodules commits are available on a
 * remote, which is not always correct. */
return 0;

What do you think?

Cheers Heiko

[1] http://public-inbox.org/git/20160919195812.gc62...@book.hvoigt.net/


Re: [PATCH v2 1/3] serialize collection of changed submodules

2016-10-12 Thread Heiko Voigt
On Mon, Oct 10, 2016 at 03:43:13PM -0700, Junio C Hamano wrote:
> Stefan Beller  writes:
> 
> >> +static struct sha1_array *get_sha1s_from_list(struct string_list 
> >> *submodules,
> >> +   const char *path)
> >
> > So this will take the stringlist `submodules` and insert the path into it,
> > if it wasn't already in there. In case it is newly inserted, add a 
> > sha1_array
> > as util, so each inserted path has it's own empty array.
> >
> > So it is both init of the data structures as well as retrieving them. I was
> > initially confused by the name as I assumed it would give you sha1s out
> > of a string list (e.g. transform strings to internal sha1 things).
> > Maybe it's just
> > me having a hard time to understand that, but I feel like the name could be
> > improved.
> >
> > lookup_sha1_list_by_path,
> > insert_path_and_return_sha1_list ?
> 
> I do not think either the name or the "find if exists otherwise
> initialize one" behaviour is particularly confusing, but I do not
> think "maintain a set of sha1_arrays keyed with a string" is a so
> widely reusable general concept/construct.  As can be seen easily in
> the names of parameters, this function is about maintaining a set of
> sha1_arrays keyed by paths to submodules, and I also assume that the
> array indexed by path is not meant to be a general purpose "we can
> use it to store any 40-hex thing" but to store something specific.
> 
> What is that specific thing?  The names of commit objects in the
> submodule repository?
> 
> I'd prefer to see that exact thing used to construct the function
> name for a helper function with specific usage in mind, i.e.
> get_commit_object_names_for_submodule_path() or something along that
> line.

I did not name this function too precisely to keep it's name short since
everything specific was quite long, like the suggestion from Junio.

Since this is a static function local to the submodule file I was
assuming anyone interested would just look up the usage and immediately
see the purpose. If I look into submodule-cache.c where I have a similar
functionality we used 'lookup_or_create' for this create on demand
functionality. So a function name would be:

lookup_or_create_commit_objects_for_submodule_path(...

Which seems quite extensively long for a static function so how about
we shorten it a bit and add a comment:

/* lookup or create commit object list for submodule */
get_commit_objects_for_submodule_path(...

?

Cheers Heiko


Re: [PATCH v2 2/3] serialize collection of refs that contain submodule changes

2016-10-12 Thread Heiko Voigt
On Fri, Oct 07, 2016 at 11:16:31AM -0700, Stefan Beller wrote:
> > diff --git a/submodule.c b/submodule.c
> > index 59c9d15905..5044afc2f8 100644
> > --- a/submodule.c
> > +++ b/submodule.c
> > @@ -522,6 +522,13 @@ static int has_remote(const char *refname, const 
> > struct object_id *oid,
> > return 1;
> >  }
> >
> > +static int append_hash_to_argv(const unsigned char sha1[20], void *data)
> > +{
> > +   struct argv_array *argv = (struct argv_array *) data;
> > +   argv_array_push(argv, sha1_to_hex(sha1));
> 
> Nit of the day:
> When using the struct child-process, we have the oldstyle argv NULL
> terminated array as
> well as the new style args argv_array. So in that context we'd prefer
> `args` as a name for
> argv_array as that helps to distinguish from the old array type.
> Here however `argv` seems to be a reasonable name, in fact whenever we
> do not deal with
> child processes, we seem to not like the `args` name:
> 
> $ git grep argv_array |wc -l
> 577
> $ git grep argv_array |grep args |wc -l
> 293
> 
> The rest looks good to me. :)

Thanks. So I do not completely get what you are suggesting: args or kept
it the way it is? Since in the end you are saying it is ok here ;) I
mainly chose this name because I am substituting the argv variable which
is already called 'argv' with this array. That might also be the reason
why in so many locations with struct child_processe's we have the 'argv'
name: Because they initially started with the old-style NULL terminated
array.

I am fine with it either way. Just tell me what you like :)

Cheers Heiko


Re: [PATCH v2 1/3] serialize collection of changed submodules

2016-10-12 Thread Heiko Voigt
On Fri, Oct 07, 2016 at 10:59:29AM -0700, Stefan Beller wrote:
> On Fri, Oct 7, 2016 at 8:06 AM, Heiko Voigt <hvo...@hvoigt.net> wrote:
> > +static void free_submodules_sha1s(struct string_list *submodules)
> > +{
> > +   int i;
> > +   for (i = 0; i < submodules->nr; i++) {
> > +   struct string_list_item *item = >items[i];
> 
> You do not seem to make use of `i` explicitely, so
> for_each_string_list_item might be more readable here?

Will change.

> > @@ -603,12 +645,23 @@ int find_unpushed_submodules(unsigned char 
> > new_sha1[20],
> > die("revision walk setup failed");
> >
> > while ((commit = get_revision()) != NULL)
> > -   find_unpushed_submodule_commits(commit, needs_pushing);
> > +   find_unpushed_submodule_commits(commit, );
> >
> > reset_revision_walk();
> > free(sha1_copy);
> > strbuf_release(_arg);
> >
> > +   for (i = 0; i < submodules.nr; i++) {
> > +   struct string_list_item *item = [i];
> 
> You do not seem to make use of `i` explicitely, so
> for_each_string_list_item might be more readable here?

As above.

Cheers Heiko


Re: [PATCH] clean up confusing suggestion for commit references

2016-10-11 Thread Heiko Voigt
On Mon, Oct 10, 2016 at 12:14:14PM -0700, Junio C Hamano wrote:
> Jeff King  writes:
> 
> > On Mon, Oct 10, 2016 at 11:24:01AM -0700, Junio C Hamano wrote:
> >
> >> I no longer have preference either way myself, even though I was in
> >> favor of no-quotes simply because I had an alias to produce that
> >> format and was used to it.
> >
> > I'll admit that I don't care _that_ much and am happy to leave it up to
> > individual authors, as long as nobody quotes SubmittingPatches at me as
> > some kind of gospel when I use the no-quotes form.
> 
> ;-).  
> 
> I just do not want to hear "gitk (or was it git-gui) produces quoted
> form, why are you recommending no-quoted form in SubmittingPatches?"
> 
> I'd say "use common sense; sometimes it is less confusing to read
> without quotes and it is perfectly OK to do so if that is the case".

I do not care about which format it should be either. I just wanted to
be clear about whatever should be used. Since it seems we will allow
both, I am also fine with leaving the description as it is ;-)

Cheers Heiko


Re: [PATCH 1/2] submodule add: extend force flag to add existing repos

2016-10-11 Thread Heiko Voigt
Hi,

On Fri, Oct 07, 2016 at 10:25:04AM -0700, Stefan Beller wrote:
> On Fri, Oct 7, 2016 at 5:52 AM, Heiko Voigt <hvo...@hvoigt.net> wrote:
> > On Thu, Oct 06, 2016 at 01:11:20PM -0700, Junio C Hamano wrote:
> >> Stefan Beller <sbel...@google.com> writes:
> >>
> >> > Currently the force flag in `git submodule add` takes care of possibly
> >> > ignored files or when a name collision occurs.
> >> >
> >> > However there is another situation where submodule add comes in handy:
> >> > When you already have a gitlink recorded, but no configuration was
> >> > done (i.e. no .gitmodules file nor any entry in .git/config) and you
> >> > want to generate these config entries. For this situation allow
> >> > `git submodule add` to proceed if there is already a submodule at the
> >> > given path in the index.
> >
> > Is it important that the submodule is in the index?
> 
> If it is not in the index, it already works.

Ah ok I was not aware of that, sorry.

> > How about worktree?
> > From the index entry alone we can not deduce the values anyway.
> 
> Right, but as of now this is the only show stopper, i.e.
> * you have an existing repo? -> fine, it works with --force
> * you even ignored that repo -> --force knows how to do it.
> * you already have a gitlink -> Sorry, you're out of luck.
> 
> So that is why I stressed index in this commit message, as it is only about 
> this
> case.

Forget what I wrote. As said above I was not aware that there is only an
error when it is already in the index.

> > [1] 
> > http://public-inbox.org/git/%3c20160916141143.ga47...@book.hvoigt.net%3E/
> 
> Current situation:
> 
> > clone the submodule into a directory
> > git submodule add --force
> > git commit everything
> 
> works fine, but:
> 
> > clone the submodule into a directory
> > git add 
> > git commit  -m "Add submodule"
> > # me: "Oh crap! I did forget the configuration."
> > git submodule add --force  
> > # Git: "It already exists in the index, I am not going to produce the 
> > config for you."
> 
> The last step is changed with this patch, as
> it will just work fine then.

Thanks.

Cheers Heiko


[PATCH v2 3/3] batch check whether submodule needs pushing into one call

2016-10-07 Thread Heiko Voigt
We run a command for each sha1 change in a submodule. This is
unnecessary since we can simply batch all sha1's we want to check into
one command. Lets do it so we can speedup the check when many submodule
changes are in need of checking.

Signed-off-by: Heiko Voigt <hvo...@hvoigt.net>
---
 submodule.c | 63 +
 1 file changed, 34 insertions(+), 29 deletions(-)

diff --git a/submodule.c b/submodule.c
index 5044afc2f8..a05c2a34b1 100644
--- a/submodule.c
+++ b/submodule.c
@@ -529,27 +529,49 @@ static int append_hash_to_argv(const unsigned char 
sha1[20], void *data)
return 0;
 }
 
-static int submodule_needs_pushing(const char *path, const unsigned char 
sha1[20])
+static int check_has_hash(const unsigned char sha1[20], void *data)
 {
-   if (add_submodule_odb(path) || !lookup_commit_reference(sha1))
+   int *has_hash = (int *) data;
+
+   if (!lookup_commit_reference(sha1))
+   *has_hash = 0;
+
+   return 0;
+}
+
+static int submodule_has_hashes(const char *path, struct sha1_array *hashes)
+{
+   int has_hash = 1;
+
+   if (add_submodule_odb(path))
+   return 0;
+
+   sha1_array_for_each_unique(hashes, check_has_hash, _hash);
+   return has_hash;
+}
+
+static int submodule_needs_pushing(const char *path, struct sha1_array *hashes)
+{
+   if (!submodule_has_hashes(path, hashes))
return 0;
 
if (for_each_remote_ref_submodule(path, has_remote, NULL) > 0) {
struct child_process cp = CHILD_PROCESS_INIT;
-   const char *argv[] = {"rev-list", NULL, "--not", "--remotes", 
"-n", "1" , NULL};
struct strbuf buf = STRBUF_INIT;
int needs_pushing = 0;
 
-   argv[1] = sha1_to_hex(sha1);
-   cp.argv = argv;
+   argv_array_push(, "rev-list");
+   sha1_array_for_each_unique(hashes, append_hash_to_argv, 
);
+   argv_array_pushl(, "--not", "--remotes", "-n", "1" , 
NULL);
+
prepare_submodule_repo_env(_array);
cp.git_cmd = 1;
cp.no_stdin = 1;
cp.out = -1;
cp.dir = path;
if (start_command())
-   die("Could not run 'git rev-list %s --not --remotes -n 
1' command in submodule %s",
-   sha1_to_hex(sha1), path);
+   die("Could not run 'git rev-list  --not 
--remotes -n 1' command in submodule %s",
+   path);
if (strbuf_read(, cp.out, 41))
needs_pushing = 1;
finish_command();
@@ -604,21 +626,6 @@ static void find_unpushed_submodule_commits(struct commit 
*commit,
diff_tree_combined_merge(commit, 1, );
 }
 
-struct collect_submodule_from_sha1s_data {
-   char *submodule_path;
-   struct string_list *needs_pushing;
-};
-
-static void collect_submodules_from_sha1s(const unsigned char sha1[20],
-   void *data)
-{
-   struct collect_submodule_from_sha1s_data *me =
-   (struct collect_submodule_from_sha1s_data *) data;
-
-   if (submodule_needs_pushing(me->submodule_path, sha1))
-   string_list_insert(me->needs_pushing, me->submodule_path);
-}
-
 static void free_submodules_sha1s(struct string_list *submodules)
 {
int i;
@@ -658,13 +665,11 @@ int find_unpushed_submodules(struct sha1_array *hashes,
argv_array_clear();
 
for (i = 0; i < submodules.nr; i++) {
-   struct string_list_item *item = [i];
-   struct collect_submodule_from_sha1s_data data;
-   data.submodule_path = item->string;
-   data.needs_pushing = needs_pushing;
-   sha1_array_for_each_unique((struct sha1_array *) item->util,
-   collect_submodules_from_sha1s,
-   );
+   struct string_list_item *submodule = [i];
+   struct sha1_array *hashes = (struct sha1_array *) 
submodule->util;
+
+   if (submodule_needs_pushing(submodule->string, hashes))
+   string_list_insert(needs_pushing, submodule->string);
}
free_submodules_sha1s();
 
-- 
2.10.1.637.g09b28c5



[PATCH v2 2/3] serialize collection of refs that contain submodule changes

2016-10-07 Thread Heiko Voigt
We are iterating over each pushed ref and want to check whether it
contains changes to submodules. Instead of immediately checking each ref
lets first collect them and then do the check for all of them in one
revision walk.

Signed-off-by: Heiko Voigt <hvo...@hvoigt.net>
---
 submodule.c | 36 +---
 submodule.h |  5 +++--
 transport.c | 29 +
 3 files changed, 45 insertions(+), 25 deletions(-)

diff --git a/submodule.c b/submodule.c
index 59c9d15905..5044afc2f8 100644
--- a/submodule.c
+++ b/submodule.c
@@ -522,6 +522,13 @@ static int has_remote(const char *refname, const struct 
object_id *oid,
return 1;
 }
 
+static int append_hash_to_argv(const unsigned char sha1[20], void *data)
+{
+   struct argv_array *argv = (struct argv_array *) data;
+   argv_array_push(argv, sha1_to_hex(sha1));
+   return 0;
+}
+
 static int submodule_needs_pushing(const char *path, const unsigned char 
sha1[20])
 {
if (add_submodule_odb(path) || !lookup_commit_reference(sha1))
@@ -623,24 +630,24 @@ static void free_submodules_sha1s(struct string_list 
*submodules)
string_list_clear(submodules, 1);
 }
 
-int find_unpushed_submodules(unsigned char new_sha1[20],
+int find_unpushed_submodules(struct sha1_array *hashes,
const char *remotes_name, struct string_list *needs_pushing)
 {
struct rev_info rev;
struct commit *commit;
-   const char *argv[] = {NULL, NULL, "--not", "NULL", NULL};
-   int argc = ARRAY_SIZE(argv) - 1, i;
-   char *sha1_copy;
+   int i;
struct string_list submodules = STRING_LIST_INIT_DUP;
+   struct argv_array argv = ARGV_ARRAY_INIT;
 
-   struct strbuf remotes_arg = STRBUF_INIT;
-
-   strbuf_addf(_arg, "--remotes=%s", remotes_name);
init_revisions(, NULL);
-   sha1_copy = xstrdup(sha1_to_hex(new_sha1));
-   argv[1] = sha1_copy;
-   argv[3] = remotes_arg.buf;
-   setup_revisions(argc, argv, , NULL);
+
+   /* argv.argv[0] will be ignored by setup_revisions */
+   argv_array_push(, "find_unpushed_submodules");
+   sha1_array_for_each_unique(hashes, append_hash_to_argv, );
+   argv_array_push(, "--not");
+   argv_array_pushf(, "--remotes=%s", remotes_name);
+
+   setup_revisions(argv.argc, argv.argv, , NULL);
if (prepare_revision_walk())
die("revision walk setup failed");
 
@@ -648,8 +655,7 @@ int find_unpushed_submodules(unsigned char new_sha1[20],
find_unpushed_submodule_commits(commit, );
 
reset_revision_walk();
-   free(sha1_copy);
-   strbuf_release(_arg);
+   argv_array_clear();
 
for (i = 0; i < submodules.nr; i++) {
struct string_list_item *item = [i];
@@ -687,12 +693,12 @@ static int push_submodule(const char *path)
return 1;
 }
 
-int push_unpushed_submodules(unsigned char new_sha1[20], const char 
*remotes_name)
+int push_unpushed_submodules(struct sha1_array *hashes, const char 
*remotes_name)
 {
int i, ret = 1;
struct string_list needs_pushing = STRING_LIST_INIT_DUP;
 
-   if (!find_unpushed_submodules(new_sha1, remotes_name, _pushing))
+   if (!find_unpushed_submodules(hashes, remotes_name, _pushing))
return 1;
 
for (i = 0; i < needs_pushing.nr; i++) {
diff --git a/submodule.h b/submodule.h
index d9e197a948..065b2f0a2a 100644
--- a/submodule.h
+++ b/submodule.h
@@ -3,6 +3,7 @@
 
 struct diff_options;
 struct argv_array;
+struct sha1_array;
 
 enum {
RECURSE_SUBMODULES_CHECK = -4,
@@ -62,9 +63,9 @@ int submodule_uses_gitfile(const char *path);
 int ok_to_remove_submodule(const char *path);
 int merge_submodule(unsigned char result[20], const char *path, const unsigned 
char base[20],
const unsigned char a[20], const unsigned char b[20], int 
search);
-int find_unpushed_submodules(unsigned char new_sha1[20], const char 
*remotes_name,
+int find_unpushed_submodules(struct sha1_array *hashes, const char 
*remotes_name,
struct string_list *needs_pushing);
-int push_unpushed_submodules(unsigned char new_sha1[20], const char 
*remotes_name);
+int push_unpushed_submodules(struct sha1_array *hashes, const char 
*remotes_name);
 void connect_work_tree_and_git_dir(const char *work_tree, const char *git_dir);
 int parallel_submodules(void);
 
diff --git a/transport.c b/transport.c
index 94d6dc3725..05f2ce83f1 100644
--- a/transport.c
+++ b/transport.c
@@ -903,23 +903,36 @@ int transport_push(struct transport *transport,
 
if ((flags & TRANSPORT_RECURSE_SUBMODULES_ON_DEMAND) && 
!is_bare_repository()) {
struct ref *ref = remote_refs;
+   struct sha1_array hashes = SHA1_ARRAY_INIT;
+
for (; ref; ref = ref->next)
- 

[PATCH v2 1/3] serialize collection of changed submodules

2016-10-07 Thread Heiko Voigt
To check whether a submodule needs to be pushed we need to collect all
changed submodules. Lets collect them first and then execute the
possibly expensive test whether certain revisions are already pushed
only once per submodule.

There is further potential for optimization since we can assemble one
command and only issued that instead of one call for each remote ref in
the submodule.

Signed-off-by: Heiko Voigt <hvo...@hvoigt.net>
---
 submodule.c | 63 -
 1 file changed, 58 insertions(+), 5 deletions(-)

diff --git a/submodule.c b/submodule.c
index 2de06a3351..59c9d15905 100644
--- a/submodule.c
+++ b/submodule.c
@@ -554,19 +554,34 @@ static int submodule_needs_pushing(const char *path, 
const unsigned char sha1[20
return 0;
 }
 
+static struct sha1_array *get_sha1s_from_list(struct string_list *submodules,
+   const char *path)
+{
+   struct string_list_item *item;
+
+   item = string_list_insert(submodules, path);
+   if (item->util)
+   return (struct sha1_array *) item->util;
+
+   /* NEEDSWORK: should we have sha1_array_init()? */
+   item->util = xcalloc(1, sizeof(struct sha1_array));
+   return (struct sha1_array *) item->util;
+}
+
 static void collect_submodules_from_diff(struct diff_queue_struct *q,
 struct diff_options *options,
 void *data)
 {
int i;
-   struct string_list *needs_pushing = data;
+   struct string_list *submodules = data;
 
for (i = 0; i < q->nr; i++) {
struct diff_filepair *p = q->queue[i];
+   struct sha1_array *hashes;
if (!S_ISGITLINK(p->two->mode))
continue;
-   if (submodule_needs_pushing(p->two->path, p->two->oid.hash))
-   string_list_insert(needs_pushing, p->two->path);
+   hashes = get_sha1s_from_list(submodules, p->two->path);
+   sha1_array_append(hashes, p->two->oid.hash);
}
 }
 
@@ -582,14 +597,41 @@ static void find_unpushed_submodule_commits(struct commit 
*commit,
diff_tree_combined_merge(commit, 1, );
 }
 
+struct collect_submodule_from_sha1s_data {
+   char *submodule_path;
+   struct string_list *needs_pushing;
+};
+
+static void collect_submodules_from_sha1s(const unsigned char sha1[20],
+   void *data)
+{
+   struct collect_submodule_from_sha1s_data *me =
+   (struct collect_submodule_from_sha1s_data *) data;
+
+   if (submodule_needs_pushing(me->submodule_path, sha1))
+   string_list_insert(me->needs_pushing, me->submodule_path);
+}
+
+static void free_submodules_sha1s(struct string_list *submodules)
+{
+   int i;
+   for (i = 0; i < submodules->nr; i++) {
+   struct string_list_item *item = >items[i];
+   struct sha1_array *hashes = (struct sha1_array *) item->util;
+   sha1_array_clear(hashes);
+   }
+   string_list_clear(submodules, 1);
+}
+
 int find_unpushed_submodules(unsigned char new_sha1[20],
const char *remotes_name, struct string_list *needs_pushing)
 {
struct rev_info rev;
struct commit *commit;
const char *argv[] = {NULL, NULL, "--not", "NULL", NULL};
-   int argc = ARRAY_SIZE(argv) - 1;
+   int argc = ARRAY_SIZE(argv) - 1, i;
char *sha1_copy;
+   struct string_list submodules = STRING_LIST_INIT_DUP;
 
struct strbuf remotes_arg = STRBUF_INIT;
 
@@ -603,12 +645,23 @@ int find_unpushed_submodules(unsigned char new_sha1[20],
die("revision walk setup failed");
 
while ((commit = get_revision()) != NULL)
-   find_unpushed_submodule_commits(commit, needs_pushing);
+   find_unpushed_submodule_commits(commit, );
 
reset_revision_walk();
free(sha1_copy);
strbuf_release(_arg);
 
+   for (i = 0; i < submodules.nr; i++) {
+   struct string_list_item *item = [i];
+   struct collect_submodule_from_sha1s_data data;
+   data.submodule_path = item->string;
+   data.needs_pushing = needs_pushing;
+   sha1_array_for_each_unique((struct sha1_array *) item->util,
+   collect_submodules_from_sha1s,
+   );
+   }
+   free_submodules_sha1s();
+
return needs_pushing->nr;
 }
 
-- 
2.10.1.637.g09b28c5



[PATCH v2 0/3] Speedup finding of unpushed submodules

2016-10-07 Thread Heiko Voigt
You can find the first iteration of this series as part of this thread:

http://public-inbox.org/git/%3C20160914173124.GA7613@sandbox%3E/

All mentioned issues should be fixed. I dropped the last patch which was
the cause of the broken tests.

This should optimize every part of this test to a nice speed if you are
pushing to a remote. The only case that is still broken/slow as hell is
when calling push with a direct url.

I am thinking whether we should maybe error out with a "not implemented"
message or something and mention that --recurse-submoules does not work
with direct urls? But we might want to have another look at performance
with this patch included. Maybe it is actually useable with the last
patch included which was not yet on pu.

Cheers Heiko

Heiko Voigt (3):
  serialize collection of changed submodules
  serialize collection of refs that contain submodule changes
  batch check whether submodule needs pushing into one call

 submodule.c | 116 ++--
 submodule.h |   5 +--
 transport.c |  29 ++-
 3 files changed, 114 insertions(+), 36 deletions(-)

-- 
2.10.1.637.g09b28c5



Re: [PATCH 1/2] submodule add: extend force flag to add existing repos

2016-10-07 Thread Heiko Voigt
On Thu, Oct 06, 2016 at 01:11:20PM -0700, Junio C Hamano wrote:
> Stefan Beller  writes:
> 
> > Currently the force flag in `git submodule add` takes care of possibly
> > ignored files or when a name collision occurs.
> >
> > However there is another situation where submodule add comes in handy:
> > When you already have a gitlink recorded, but no configuration was
> > done (i.e. no .gitmodules file nor any entry in .git/config) and you
> > want to generate these config entries. For this situation allow
> > `git submodule add` to proceed if there is already a submodule at the
> > given path in the index.

Is it important that the submodule is in the index? How about worktree?
>From the index entry alone we can not deduce the values anyway. So I
would say the submodule has to be in the worktree, no matter what is in
the index. If its not in the index we can also add it.

BTW, that is the way I imagined submodules would work in the first
place: just clone and add them, like I described here[1].

> > Signed-off-by: Stefan Beller 
> > ---
> 
> Yup, the goal makes perfect sense.  
> 
> I vaguely recall discussing this exact issue of "git submodule add"
> that refuses to add a path that already is a gitlink (via "git add"
> that has previously been run) elsewhere on this list some time ago.

Yes there was a discussion, see the link.

Cheers Heiko

[1] http://public-inbox.org/git/%3c20160916141143.ga47...@book.hvoigt.net%3E/


Re: [PATCHv3 1/2] push: change submodule default to check when submodules exist

2016-10-07 Thread Heiko Voigt
On Thu, Oct 06, 2016 at 10:20:16AM -0700, Stefan Beller wrote:
> On Thu, Oct 6, 2016 at 2:23 AM, Heiko Voigt <hvo...@hvoigt.net> wrote:
> > On Wed, Oct 05, 2016 at 03:53:25PM +0200, Heiko Voigt wrote:
> >> On Tue, Oct 04, 2016 at 02:03:58PM -0700, Stefan Beller wrote:
> >> > Jeff,
> >> > thanks for the suggestions, both git_path(..) as well as checking the 
> >> > config,
> >> > this seems quite readable to me:
> >>
> >> When reading the discussion I thought the same: What about the
> >> "old-style" repositories. I like this one. Checking both locations
> >> is nice.
> >
> > BTW, since it seems we all agree on the direction. Should we add some
> > tests?
> >
> 
> Good call. What do we want to test for?
> * Correctness in case of submodules? (just push and get rejected)
>   I think that is easy to do.
> * Performance with [no, lots of] submodules? That seems harder to me.
> 
> I'll add a test for the correctness part and resend.

Well I though about the following tests:

 * Without submodules: Make sure submodule processing is disabled by
   default
 * With new-style submodules: Make sure submodules are processed by
   default
 * With old-style submodules: Make sure submodules are processed by
   default

But I have to admit that I did not think about the "how can we do that".
But performance seems to be the only thing that is exposing the
processing when we have no submodules, so it seems we can only easily do
the tests with submodules.

Cheers Heiko


[PATCH] clean up confusing suggestion for commit references

2016-10-07 Thread Heiko Voigt
The description for referencing commits looks as if it is contradicting
the example, since it is itself enclosed in double quotes. Lets use
single quotes around the description and include the double quotes in
the description so it matches the example.
---
Sorry for opening this up again but I just looked up the format and was
like: "Umm, which one is now the correct one..."

For this makes more sense. What do others think?

Cheers Heiko

 Documentation/SubmittingPatches | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/SubmittingPatches b/Documentation/SubmittingPatches
index 08352de..692f4ce 100644
--- a/Documentation/SubmittingPatches
+++ b/Documentation/SubmittingPatches
@@ -122,7 +122,7 @@ without external resources. Instead of giving a URL to a 
mailing list
 archive, summarize the relevant points of the discussion.
 
 If you want to reference a previous commit in the history of a stable
-branch, use the format "abbreviated sha1 (subject, date)",
+branch, use the format 'abbreviated sha1 ("subject", date)',
 with the subject enclosed in a pair of double-quotes, like this:
 
 Commit f86a374 ("pack-bitmap.c: fix a memleak", 2015-03-30)
-- 
2.10.0.645.g54f1e86



Re: [PATCHv3 1/2] push: change submodule default to check when submodules exist

2016-10-06 Thread Heiko Voigt
On Wed, Oct 05, 2016 at 03:53:25PM +0200, Heiko Voigt wrote:
> On Tue, Oct 04, 2016 at 02:03:58PM -0700, Stefan Beller wrote:
> > Jeff,
> > thanks for the suggestions, both git_path(..) as well as checking the 
> > config,
> > this seems quite readable to me:
> 
> When reading the discussion I thought the same: What about the
> "old-style" repositories. I like this one. Checking both locations
> is nice.

BTW, since it seems we all agree on the direction. Should we add some
tests?

Cheers Heiko


Re: Bug Report: "git submodule deinit" fails right after a clone

2016-10-06 Thread Heiko Voigt
Hi,

please also keep the mailinglist in the CC so everyone can read this.

On Thu, Oct 06, 2016 at 09:11:05AM +0200, Thomas Bétous wrote:
> On Wed, Oct 5, 2016 at 3:36 PM, Heiko Voigt <hvo...@hvoigt.net> wrote:
> 
> >
> > My initial reaction is that this might be a problem with line endings. Did
> > you
> > check whether you get any diff when you do a 'git diff' after the clone?
> >
> > Maybe the variable 'core.autocrlf' is set to 'input' ? Have a look at 'git
> > help
> > config'
> 
> 
> When I do a 'git diff' right after the clone, nothing appears.
> Moreover my global setting for core.autocrlf is true. (This was configured
> on purpose as I work on Windows whereas the repositories are hosted on an
> UNIX server.)

So I guess the same applies to 'git status'?

> Nevertheless when I change core.autocrlf to 'input', the error disappears
> and I got the expected behavior for git submodule deinit...
> So I guess it is just a configuration problem but I do not understand why
> core.autocrlf should be set to 'input' to remove this error. Do you have
> any idea?

This is indeed strange. That's why I asked whether 'git diff' shows
anything. I was suspecting that the .gitmodules is somehow checked out
in UNIX format. And the fact that setting core.autocrlf to
'input' seems to fix it supports that.

I currently do not have access to a windows machine as the moment to
test this. Copying the windows mailing list maybe someone over there
can reproduce and help with the issue[1].

Cheers Heiko

[1] 
http://public-inbox.org/git/%3ccapoqyv+xsrlk7y1hjyhzfy8ofkxvrwpczbdqhdgrhthqdzy...@mail.gmail.com%3E/


Re: Reference a submodule branch instead of a commit

2016-10-05 Thread Heiko Voigt
On Wed, Oct 05, 2016 at 09:13:53AM -0700, Junio C Hamano wrote:
> Heiko Voigt <hvo...@hvoigt.net> writes:
> 
> >> It IS a hack, but having this information in .git would
> >> mean that it can be forced to be in machine readable form, unlike a
> >> mention in README.  I do not know if the .gitmodules/.gitignore
> >> combination is a sensible thing to use, but it does smell like a
> >> potentially useful hack.
> >
> > IIRC the tree entries are the reference for submodules in the code. We
> > are iterating over the tree entries in many places so that change does
> > not seem so easy to me.
> >
> > But you are right maybe we should stop arguing against this workflow and
> > just let people use it until they find out whats wrong with it ;)
> 
> I didn't say that, though.  I am fairly firm on _not_ changing what
> the superproject records in its tree for the submodule, i.e. it must
> record the exact commit, not "a branch name", for reproducibility. 

I was not talking about changing what the superproject records in its
tree. I was just talking about changing where we look for submodules
(e.g. for updating and such). I.e. in .git* instead of just the tree as
it is at the moment. Thats what I understood from the discussion above.
Sorry that might have been ambiguous.

I agree that there should always be a commit as a reference for a
submodule. But as far as I understand for some projects its to much
overhead to record every change of a submodule but still they want to
use the latest code during development. Those projects might only want
to record the actual commit when they release something. At least thats
what I imagine.

> I am OK if people ignored the unmatch between the recorded commit
> from a submodule and what they had in the submodule directory while
> they developed and tested the superproject commit.  After all, it is
> not an error to make a commit while having a local uncommitted
> changes to tracked files, and it is equally valid to have a commit
> checked out in a submodule directory that is different from what
> goes in the superproject commit.  But we do show "modified but not
> committed" in the status output.  In that light, submodule.*.ignore
> may have been a mistake.

The original intend for submodule.*.ignore was to help people not
showing submodules as dirty when they had untracked files in them. That
was after status learned to look into submodules. 'untracked' to avoid the
performance overhead and 'dirty' for the people that accidentally worked
with dirty submodules. I agree 'all' might have been to much.

For the above workflow what user might actually want is something that
ignores all changes as long as they are part of the remote branch. But I
am just guessing here. My gut feeling is still that most people that
request this feature come from svn. Thats why I asked whether the
options I described provide the behavior that Jeremy wants.

Cheers Heiko


Re: Reference a submodule branch instead of a commit

2016-10-05 Thread Heiko Voigt
On Tue, Oct 04, 2016 at 12:01:13PM -0700, Junio C Hamano wrote:
> Junio C Hamano  writes:
> 
> > Stefan Beller  writes:
> >
> >> I wonder if we could make that convenient for users by not tracking
> >> the submodule,
> >> i.e.
> >> * we have the information in the .gitmodules file
> >> * the path itself is in the .gitignore
> >> * no tree entry
> >>
> >> Then you can update to the remote latest branch, without Git reporting
> >> a dirty submodule locally, in fact it reports nothing for the submodule.
> >>
> >> It sounds like a hack, but maybe it's worth looking into that when
> >> people want to see that workflow.
> >
> > It IS a hack.  
> >
> > But if you do not touch .git file and instead say "clone
> > this other project at that path yourself" in README, that would
> > probably be sufficient.
> 
> eh,... hit send too early.
> 
> It IS a hack, but having this information in .git would
> mean that it can be forced to be in machine readable form, unlike a
> mention in README.  I do not know if the .gitmodules/.gitignore
> combination is a sensible thing to use, but it does smell like a
> potentially useful hack.

IIRC the tree entries are the reference for submodules in the code. We
are iterating over the tree entries in many places so that change does
not seem so easy to me.

But you are right maybe we should stop arguing against this workflow and
just let people use it until they find out whats wrong with it ;)

I have another tip for Jeremy:

git config submodule..ignore all

and you will not see any changes to the submodule. Put that into your
.gitmodules and you do not see any changes to the submodules anymore.

So now the only thing missing for complete convenience is a config
option for the --remote option in 'git submodule update'.

Jeremy, does the ignore option combined with --remote what you want?

Cheers Heiko


Re: Bug Report: "git submodule deinit" fails right after a clone

2016-10-05 Thread Heiko Voigt
Hi,

please do not top-post the conversation will otherwise get hard to
follow. Thank you.

On Tue, Oct 04, 2016 at 05:46:45PM +0200, Thomas Bétous wrote:
> Thank you for your answer and sorry for the delay (I was on vacation...).
> 
> I am using git 2.9.0.windows.1 (run on Windows 7 via git bash).

My initial reaction is that this might be a problem with line endings. Did you
check whether you get any diff when you do a 'git diff' after the clone?

Maybe the variable 'core.autocrlf' is set to 'input' ? Have a look at 'git help
config'

> I tested it on this repo:
> https://github.com/githubtraining/example-dependency.git
> The same problem occurs.
> Here a small script to reproduce the error on my PC:
> #!/bin/bash
> git clone https://github.com/githubtraining/example-dependency.git
> cd example-dependency
> git submodule deinit js
> 
> It ends with this error:
> fatal: Please stage your changes to .gitmodules or stash them to proceed
> Submodule work tree 'js' contains local modifications; use '-f' to discard 
> them

Here I get

$ git submodule deinit js
Cleared directory 'js'

So all seems fine.

> Is the script working on your PC?

Yes. I am on Mac OS X though.

Cheers Heiko


Re: [PATCHv3 1/2] push: change submodule default to check when submodules exist

2016-10-05 Thread Heiko Voigt
On Tue, Oct 04, 2016 at 02:03:58PM -0700, Stefan Beller wrote:
> Jeff,
> thanks for the suggestions, both git_path(..) as well as checking the config,
> this seems quite readable to me:

When reading the discussion I thought the same: What about the
"old-style" repositories. I like this one. Checking both locations
is nice.

Cheers Heiko


Re: Re: Slow pushes on 'pu' - even when up-to-date..

2016-10-04 Thread Heiko Voigt
On Tue, Oct 04, 2016 at 07:44:28AM -0400, Jeff King wrote:
> > My idea of a solution goes like this:
> >   * collect all SHA1's of the remotes refs
> >   * check if we have them locally
> >   * if not we abort and tell the user to fetch them somehow into local
> > refs or disable the check
> >   * when we have them locally we proceed passing those SHA1's as bases
> > instead of --remotes=
> 
> As I argued in [1], I think it's not just "this must be cheaper" but
> "this must not be enabled if submodules are not in use at all".  Most
> repositories don't have submodules enabled at all, so anything that
> cause any extra traversal, even of a portion of the history, is going to
> be a net negative for a lot of people.
> 
> I think the only sane default is going to be some kind of heuristic that
> says "submodules are probably in use". Something like "is there a
> .gitmodules file" is not perfect (you can have gitlink entries without
> it), but it's a really cheap constant-time check.

I agree. We are adding convenience for submodules, so we can also say a
checked out ".gitmodules" file is a must to have convenience.

I am not sure if I agree on another layer of options for this as
suggested in your post. More options mean more implementation
complexity and more confusion on the users side.

How about we choose our defaults based on the existence of a checked out
.gitmodules file? So the default would only be --recurse-submodules=check
if there is a .gitmodules file in the worktree. All other users need to
either pass or explicitly configure it.

Cheers Heiko

> [1] Quoted in
> http://public-inbox.org/git/xmqqh9aaot49@gitster.mtv.corp.google.com/


Re: [RFC PATCH] clone: add clone.recursesubmodules config option

2016-10-04 Thread Heiko Voigt
On Mon, Oct 03, 2016 at 10:18:32AM -0700, Stefan Beller wrote:
> On Mon, Oct 3, 2016 at 8:36 AM, Jeremy Morton  wrote:
> > Did this ever get anywhere?  Can we recursively update submodules with "git
> > pull" in the supermodule now?
> 
> I think the idea is sound.

I am confused there is nothing handling *pull* here? This patch was
about clone. Handling 'pull' is a much bigger topic[1].

Cheers Heiko

[1] 
https://github.com/jlehmann/git-submod-enhancements/wiki/Recursive-submodule-checkout


Re: Reference a submodule branch instead of a commit

2016-10-04 Thread Heiko Voigt
On Mon, Oct 03, 2016 at 12:00:45PM -0700, Junio C Hamano wrote:
> Jeremy Morton  writes:
> 
> > At the moment, supermodules must reference a given commit in each of
> > its submodules.  If one is in control of a submodule and it changes on
> > a regular basis, this can cause a lot of overhead with "submodule
> > updated" commits in the supermodule.  It would be useful of git allows
> > the option of referencing a submodule's branch instead of a given
> > submodule commit.  How about adding this functionality?
> 
> When somebody downstream fetches from your superproject and grabs
> the set of submodules, how would s/he know what _exact_ state you
> meant to record?  When s/he says "I have your superproject commit X,
> which binds submodule's branch Y at path sub/, and it simply does
> not work.  Your project is broken", how do you go about reproducing
> the exact state s/he had trouble with to help her/him?
> 
> The only thing s/he knows is that the commit used from the submodule
> must be one of the commits that was on branch Y at some point in
> time, hopefully close to the timestamp recorded in the commit in the
> superproject.  And your record in the history of the superproject
> does not tell you more than that, so you wouldn't have any idea
> better than what s/he already has to help.
> 
> Hence, such a "functionality" will never happen, at least in the
> exact form you are describing.
> 
> It is conceivable to add some feature that allows you to squelch the
> report that the submodule recorded in your superproject is not up to
> date from "git status" etc. to help those who thinks it is OK to not
> bind the latest submodule commit to the superproject all the time,
> though.

We already have options to support these kinds of workflows. Look at the
option '--remote' for 'git submodule update'.

You then only have to commit the submodule if you do not want to see it
as dirty locally, but you will always get the tip of a remote tracking
branch when updating.

Cheers Heiko


Re: Slow pushes on 'pu' - even when up-to-date..

2016-10-04 Thread Heiko Voigt
Hi,

On Mon, Oct 03, 2016 at 02:11:36PM -0700, Linus Torvalds wrote:
> This seems to be because I'm now on 'pu' as of a day or two ago in
> order to test the abbrev logic, but lookie here:
> 
> time git ls-remote ra.kernel.org:/pub/scm/linux/kernel/git/torvalds/linux
> .. shows all the branches and tags ..
> real 0m0.655s
> user 0m0.011s
> sys 0m0.004s
> 
> so the remote is fast to connect to, and with network connection
> overhead and everything, it's just over half a second. But then:
> 
> time git push ra.kernel.org:/pub/scm/linux/kernel/git/torvalds/linux

The reason behind this is when pushing to an address we do not easily
have the remote refs to compare available. When pushing an existing ref
it would be easy and could get a shortcut but it gets more complicated
for new refs. Currently we fall back to walking the whole history since
that is "the most correct way" we have. But obviously it is not a
practical solution in any way.

I mentioned this fact when discussing the current state and my patches
to make this check less painful. So we still need to think about a
solution for this check when passing an address.

IMO: It's definitely not ready to be switched on as default, unless we
find something a lot cheaper for the above case.

My idea of a solution goes like this:
  * collect all SHA1's of the remotes refs
  * check if we have them locally
  * if not we abort and tell the user to fetch them somehow into local
refs or disable the check
  * when we have them locally we proceed passing those SHA1's as bases
instead of --remotes=

Cheers Heiko


Re: Homebrew and Git

2016-09-20 Thread Heiko Voigt
Hi,

On Sun, Sep 18, 2016 at 05:50:28PM +0200, Jonas Thiel wrote:
> A while ago I have described my problem with Homebrew at the following
> GitHub channel
> (https://github.com/Homebrew/homebrew-core/issues/2970). In the
> meanwhile, I believe that I my problem with Homebrew is based on an
> issues with my Git. I have found the attached Git Crash reports on my
> Mac and because I am not familiar with reading/analysing Crash
> Reports, it would be great if someone could give me some feedback on
> it.
>  
> If you have any question, please do not hesitate to contact me.

>From your crash reports I see that git is apparently crashing in a
strchr() call from within ident_default_email() which is a function that
tries to assemble a name and email to put into your commits.

Can you post us the output of

hostname -f

and

whoami

?

Since it seems you are using an Apple git can you also give us the
output of

git version

Since it seems that Apple is compiling its own git (and not publishing
the changes they made conveniently via git). Have you tried
installing a vanilla git via homebrew and seeing whether that also
produces the issue?

In your bugreport you are talking about modifications you do to your
system after which the issue occurred. I would suggest to exactly find
out which step lead to git crashing (if it actually is the issue). First
to identify an issue we need something that is reproduceable.

Cheers Heiko


Re: Re: Homebrew and Git

2016-09-20 Thread Heiko Voigt
On Tue, Sep 20, 2016 at 01:02:28PM +0200, Heiko Voigt wrote:
> Hi,
> 
> On Sun, Sep 18, 2016 at 05:50:28PM +0200, Jonas Thiel wrote:
> > A while ago I have described my problem with Homebrew at the following
> > GitHub channel
> > (https://github.com/Homebrew/homebrew-core/issues/2970). In the
> > meanwhile, I believe that I my problem with Homebrew is based on an
> > issues with my Git. I have found the attached Git Crash reports on my
> > Mac and because I am not familiar with reading/analysing Crash
> > Reports, it would be great if someone could give me some feedback on
> > it.
> >  
> > If you have any question, please do not hesitate to contact me.
> 
> From your crash reports I see that git is apparently crashing in a
> strchr() call from within ident_default_email() which is a function that
> tries to assemble a name and email to put into your commits.

BTW, here is the callstack inlined from the crashreport:

bsystem_platform.dylib  0x7fff840db41c 
_platform_strchr$VARIANT$Haswell + 28
1   git 0x00010ba1d3f4 ident_default_email 
+ 801
2   git 0x00010ba1d68f fmt_ident + 66
3   git 0x00010ba4b495 files_log_ref_write 
+ 175
4   git 0x00010ba4b0a6 commit_ref_update + 
106
5   git 0x00010ba4c3a8 
ref_transaction_commit + 468
6   git 0x00010b994dd8 s_update_ref + 271
7   git 0x00010b994556 fetch_refs + 1969
8   git 0x00010b9935f2 fetch_one + 1913
9   git 0x00010b992bc4 cmd_fetch + 549
10  git 0x00010b9666c4 handle_builtin + 478
11  git 0x00010b96602f main + 376
12  libdyld.dylib   0x7fff834ef5ad start + 1

Maybe someone else has an idea what might be causing this...

Cheers Heiko


Re: [PATCH 3/2] batch check whether submodule needs pushing into one call

2016-09-19 Thread Heiko Voigt
On Fri, Sep 16, 2016 at 11:13:09AM -0700, Junio C Hamano wrote:
> Heiko Voigt <hvo...@hvoigt.net> writes:
> > The most exact solution would be to use all actual remote refs available
> > (not sure if we have them at this point in the process?) another
> > solution would be to still append the --remotes= option as a
> > fallback to reduce the revisions.
> 
> I'd say --remotes= is the least problematic thing to do.

Ok then lets drop my last patch and keep it the way it was. Because if
the remote sha1 differs we probably do not have it locally anyway. The
only case this does not catch is when the user specifies a remote URL.
But that just means we will iterate over all revisions instead of a
reduced set, which makes the check slower but still correct. As one can
see from my measurements that should not be that bad anymore.

Cheers Heiko


Re: [PATCH 3/2] batch check whether submodule needs pushing into one call

2016-09-19 Thread Heiko Voigt
On Fri, Sep 16, 2016 at 10:59:37AM -0700, Junio C Hamano wrote:
> Heiko Voigt <hvo...@hvoigt.net> writes:
> > +static void check_has_hash(const unsigned char sha1[20], void *data)
> > +{
> > +   int *has_hash = (int *) data;
> > +
> > +   if (!lookup_commit_reference(sha1))
> > +   *has_hash = 0;
> > +}
> > +
> > +static int submodule_has_hashes(const char *path, struct sha1_array 
> > *hashes)
> > +{
> > +   int has_hash = 1;
> > +
> > +   if (add_submodule_odb(path))
> > +   return 0;
> > +
> > +   sha1_array_for_each_unique(hashes, check_has_hash, _hash);
> > +   return has_hash;
> > +}
> > +
> > +static int submodule_needs_pushing(const char *path, struct sha1_array 
> > *hashes)
> > +{
> > +   if (!submodule_has_hashes(path, hashes))
> > return 0;
> 
> I think you meant well, but this optimization is wrong.  A mere
> presence of an object does not mean that the current tip can reach
> that object.  Imagine you pushed commit A earlier to them at the
> tip, then pushed commit A~ to them at the tip, which is the current
> state of the remote of the submodule, and since them they may have
> GC'ed.  They no longer have the commit A.
> 
> For that matter, because you are doing this check by pretending as
> if all the submodule objects are in the object store of the current
> superproject you are working in, and saying "it exists there in the
> submodule repository" when the only thing you know is it exists in
> an object store of either the submodule repository, the superproject
> repository, or any of the other submodule repositories, you really
> cannot tell much from a mere presence of an object.  Not just the
> remote of the submodule repository you are interested in, but the
> submodule repository you are interested in itself, may not have that
> object.
> 
> Drop the previous two helper functions and this short-cut.

This is nothing I added. This came from the original code which I simply
extended to check for all sha1's. The diff is a little bit misleading. This
would be the local diff:

-   if (add_submodule_odb(path) || !lookup_commit_reference(sha1))
+   if (!submodule_has_hashes(path, hashes))
return 0;

I think that this is more a shortcut for: If we can not find the sha1, we do
not need to bother spawning a process for the real check. We default to the
answer no in that case.

It looks like the reason for the sha1 check is to avoid error output from the
spawned 'rev-list' process. The error output might be confusing to the user in
case the sha1 does not exist in the submodule. We should probably die here
since we were not able to check a submodule that has a diff in the revisions we
would push.

After thinking about this further: I think it is actually bad to proceed here,
as the current revisions (just in the superproject) would still be pushed and
the user possibly gets an inconsistent state on the remote side which he tried
to avoid with check/on-demand-push enabled.

So in short this deserves some further love. Will have a look into adding
another patch for this if we agree on my suggestion to die above.

Cheers Heiko


Re: [PATCH 2/2] serialize collection of refs that contain submodule changes

2016-09-19 Thread Heiko Voigt
Hi,

On Fri, Sep 16, 2016 at 10:47:46AM -0700, Junio C Hamano wrote:
> One thing that makes me worried is how the ref cache layer interacts
> with this.  I see you first call push_unpushed_submodules() when
> ON_DEMAND is set, which would result in pushes in submodule
> repositories, updating their remote tracking branches.  At that
> point, before you make another call to find_unpushed_submodules(),
> is our cached ref layer knows that the remote tracking branches
> are now up to date (otherwise, we would incorrectly judge that these
> submodules need pushing based on stale information)?

I am not sure if I understand you correctly here. With the "ref cache layer"
you are referring to add_submodule_odb() which is called indirectly from
submodule_needs_pushing()? Those revs are only used to check whether the hash
we need on the remote side exists in the local submodule. That should not
change due to a push. The actual check whether the commit(s) exist on the
remote side is done using a 'rev-list' in a subprocess later.


> > diff --git a/transport.c b/transport.c
> > index 94d6dc3..76e1daf 100644
> > --- a/transport.c
> > +++ b/transport.c
> > @@ -903,23 +903,29 @@ int transport_push(struct transport *transport,
> >  
> > if ((flags & TRANSPORT_RECURSE_SUBMODULES_ON_DEMAND) && 
> > !is_bare_repository()) {
> > struct ref *ref = remote_refs;
> > +   struct sha1_array hashes = SHA1_ARRAY_INIT;
> > +
> > for (; ref; ref = ref->next)
> > -   if (!is_null_oid(>new_oid) &&
> > -   !push_unpushed_submodules(ref->new_oid.hash,
> > -   transport->remote->name))
> > -   die ("Failed to push all needed 
> > submodules!");
> > +   if (!is_null_oid(>new_oid))
> > +   sha1_array_append(, 
> > ref->new_oid.hash);
> > +
> > +   if (!push_unpushed_submodules(, 
> > transport->remote->name))
> > +   die ("Failed to push all needed submodules!");
> 
> Do we leak the contents of hashes here?

Good catch, sorry about that. Will clear the hashes array.


> > }
> >  
> > if ((flags & (TRANSPORT_RECURSE_SUBMODULES_ON_DEMAND |
> >   TRANSPORT_RECURSE_SUBMODULES_CHECK)) && 
> > !is_bare_repository()) {
> > struct ref *ref = remote_refs;
> > struct string_list needs_pushing = STRING_LIST_INIT_DUP;
> > +   struct sha1_array hashes = SHA1_ARRAY_INIT;
> >  
> > for (; ref; ref = ref->next)
> > -   if (!is_null_oid(>new_oid) &&
> > -   find_unpushed_submodules(ref->new_oid.hash,
> > -   transport->remote->name, 
> > _pushing))
> > -   
> > die_with_unpushed_submodules(_pushing);
> > +   if (!is_null_oid(>new_oid))
> > +   sha1_array_append(, 
> > ref->new_oid.hash);
> > +
> > +   if (find_unpushed_submodules(, 
> > transport->remote->name,
> > +   _pushing))
> > +   die_with_unpushed_submodules(_pushing);
> 
> Do we leak the contents of hashes here?  I do not think we need to
> worry about needs_pushing leaking, as we will always die if it is
> not empty, but it might be a better code hygiene to clear it as
> well.

As above, will fix.

Cheers Heiko


Re: [PATCH 1/2] serialize collection of changed submodules

2016-09-19 Thread Heiko Voigt
On Fri, Sep 16, 2016 at 10:27:04AM -0700, Junio C Hamano wrote:
> Heiko Voigt <hvo...@hvoigt.net> writes:
> 
> > +static struct sha1_array *get_sha1s_from_list(struct string_list 
> > *submodules,
> > +   const char *path)
> > +{
> > +   struct string_list_item *item;
> > +   struct sha1_array *hashes;
> > +
> > +   item = string_list_insert(submodules, path);
> > +   if (item->util)
> > +   return (struct sha1_array *) item->util;
> > +
> > +   hashes = (struct sha1_array *) xmalloc(sizeof(struct sha1_array));
> > +   /* NEEDSWORK: should we add an initializer function for
> > +* sha1_array ? */
> > +   memset(hashes, 0, sizeof(struct sha1_array));
> > +   item->util = hashes;
> 
> 
>   /* NEEDSWORK: should we have SHA1_ARRAY_INIT etc.? */
>   item->util = xcalloc(1, sizeof(struct sha1_array));

Ok will do.


Re: [wishlist?] make submodule commands robust to having non-submodule Subprojects

2016-09-16 Thread Heiko Voigt
On Thu, Sep 15, 2016 at 11:27:54AM -0700, Junio C Hamano wrote:
> Stefan Beller  writes:
> > So how about this fictional work flow:
> >
> >  $ git init top
> >  $ cd top
> >  $ git commit --allow-empty -m 'initial in top'
> >  $ git init sub
> >  $ git -C sub commit --allow-empty -m 'initial in sub'
> >  $ git add sub
> > You added a gitlink, but no corresponding entry in
> > .gitmodules is found. This is fine for gits core functionality, but
> > the submodule command gets confused by this unless you add 'sub'
> > to your .gitmodules via `git submodule add --already-in-tree \
> > --reuse-submodules-origin-as-URL sub`. Alternatively you can make 
> > this
> > message disappear by configuring advice.gitlinkPitfalls.
> 
> I am not sure if I agree with that direction.
> 
> If the trend in Git community collectively these days is to make
> usage of submodules easier and smoother, I'd imagine that you would
> want to teach "git add" that was given a submodule to "git submodule
> add" instead by default, with an option "git add --no-gitmodules
> sub" to disable it, or something like that.
> 
> >  $ git submodule add --fixup-modules-file ./sub sub
> >  Adding .gitmodule entry only for `sub` to use `git -C remote
> > show origin` as URL.
> 
> I agree that a feature like this is needed regardless of what
> happens at "git add" time.

How about just

   git submodule add 

? I remember back in the days when I started with submodules thats the
way I imagined submodules would work:

1. clone the submodule into a directory
2. git submodule add it
3. git commit everything

Because that how you basically work with files.  So instead of adding
another option I would rather like to autodetect that:

 * its a relative path inside this repo that is passed to
   'git submodule add'
 * there is no .gitmodules entry
 * and no .git/config
==> create those from a remote in the submodule

Corner cases:

 * If there is more than one remote we could tell the user to use an
   option to specify which one to use.
 * Barf in case there is no remote (not adding the submodule except -f
   is used).
 * If the gitlink is already there but no .gitmodules entry, 'git
   submodule add' will just add the entry as if it was initially added.

Instead of giving an error message that the submodule is already added
we could actually be nicer to the user and try to fix things for him
instead.

Cheers Heiko


Re: [PATCH 3/2] batch check whether submodule needs pushing into one call

2016-09-16 Thread Heiko Voigt
On Fri, Sep 16, 2016 at 11:40:19AM +0200, Heiko Voigt wrote:
> > By the way, with the two new patches, 'pu' seems to start failing
> > some tests, e.g. 5533 5404 5405.
> 
> Ah ok I did only test on master, will look into those.

Ok I had a look into these and the reason t5533 fails is because on pu
--recurse-submodules is enabled by default and I missed the case when
overwriting a ref. In that case we get the sha1 from the remote side as
old. So we could catch that and fall back to all revisions there, but...

... tl;dr: The solution to use the old revisions from the remote side
was too simple and does not make matters better but actually worse for
some typical usecases. Its only in the last patch.

... that lead me to further thinking about the previous solution (using
the locally cached remote refs) which might actually be a good default
for the non-fastforward cases like creating new ref or overwriting a
ref.

My current patch would handle the --mirror case nicer, since it gets a
lot of old revs to reduce the revisions to check. For the typical one
branch push it would most likely be worse. Except when the user is
updating (fast-forwarding) the remote ref we would scan all revs of a
ref until the root because we do not get enough valid revs that already
exist on the remote.

The most exact solution would be to use all actual remote refs available
(not sure if we have them at this point in the process?) another
solution would be to still append the --remotes= option as a
fallback to reduce the revisions.

What do others think? Will leave this for a little bit further thinking.
Its just the last patch ("use actual start hashes for submodule push
check instead of local refs") that needs to go back to the drawing
board.

Cheers Heiko


Re: [PATCH 3/2] batch check whether submodule needs pushing into one call

2016-09-16 Thread Heiko Voigt
On Thu, Sep 15, 2016 at 02:08:58PM -0700, Junio C Hamano wrote:
> Heiko Voigt <hvo...@hvoigt.net> writes:
> 
> > if (for_each_remote_ref_submodule(path, has_remote, NULL) > 0) {
> > struct child_process cp = CHILD_PROCESS_INIT;
> > -   const char *argv[] = {"rev-list", NULL, "--not", "--remotes", 
> > "-n", "1" , NULL};
> > +
> > +   argv_array_push(, "rev-list");
> > +   sha1_array_for_each_unique(hashes, append_hash_to_argv, 
> > );
> > +   argv_array_pushl(, "--not", "--remotes", "-n", "1" , 
> > NULL);
> > +
> > struct strbuf buf = STRBUF_INIT;
> > int needs_pushing = 0;
> 
> These two become decl-after-stmt; move your new lines a bit lower,
> perhaps?

Thanks, missed those. Will do.

> > -   argv[1] = sha1_to_hex(sha1);
> > -   cp.argv = argv;
> > prepare_submodule_repo_env(_array);
> 
> By the way, with the two new patches, 'pu' seems to start failing
> some tests, e.g. 5533 5404 5405.

Ah ok I did only test on master, will look into those.

Cheers Heiko


Re: [RFC] extending pathspec support to submodules

2016-09-16 Thread Heiko Voigt
Hi,

On Thu, Sep 15, 2016 at 03:28:21PM -0700, Stefan Beller wrote:
> On Thu, Sep 15, 2016 at 3:08 PM, Junio C Hamano  wrote:
> > Brandon Williams  writes:
> >
> >> You're right that seems like the best course of action and it already falls
> >> inline with what I did with a first patch to ls-files to support 
> >> submodules.
> >> In that patch I did exactly as you suggest and pass in the prefix to the
> >> submodule and make the child responsible for prepending the prefix to all 
> >> of
> >> its output.  This way we can simply pass through the whole pathspec (as 
> >> apposed
> >> to my original idea of stripping the prefix off the pathspec prior to 
> >> passing
> >> it to the child...which can get complicated with wild characters) to the
> >> childprocess and when checking if a file matches the pathspec we can check 
> >> if
> >> the prefix + file path matches.
> >
> > That's brilliant.  A few observations.
> >
> >  * With that change to tell the command that is spawned in a
> >submodule directory where the submodule repository is in the
> >context of the top-level superproject _and_ require it to take a
> >pathspec as relative to the top-level superproject, you no longer
> >worry about having to find where to cut the pathspec given at the
> >top-level to adjust it for the submodule's context.  That may
> >simplify things.
> 
> I wonder how this plays together with the prefix in the superproject, e.g.
> 
> cd super/unrelated-path
> # when invoking a git command the internal prefix is "unrelated-path/"
> git ls-files ../submodule-*
> # a submodule in submodule-A would be run in  submodule-A
> # with a superproject prefix of super/ ? but additionally we nned
> to know we're
> # not at the root of the superproject.

Do we need to know that? The internal prefix is internal to each
repository and can be treated as such. I would expect that the prefix is
only prefixed when needed. E.g. when we display output to the user,
match files, ...

How about "../submodule-A" as the submodule prefix in the situation you
describe? The wildcard would be resolved by the superproject since the
directory is still in its domain.

I would think of the submodule prefix as the path relative to the
command that started everything. E.g. if we have a tree like this:

*--subA
   /
super *--subB--subsubB
   \
*--dirC

where subA, subB and subsubB are submodules and dirC is just a
directory inside super.

We would get the following prefixes when issuing a command in dirC that
has a pathspec for subsubB:

  subB: ../subB
  subsubB: ../subB/subsubB

An interesting case is when we issue a command in subA:

  super:   ..
  subB:../subB
  subsubB: ../subB/subsubB

A rule for the prefix option could be: Always specified when crossing a
repository boundary with the pathspec (including upwards).

I have not completely thought this through though so just take this as
some food for thought. Since I am not sure what Junio's rationale behind
making the prefix relative to the toplevel superproject was, but I guess
finding it could be a challenge in some situations. I.e. is the
repository in home directory tracking all the dot-files really the
superproject or was it that other one I found before?

> >So we may have to rethink what this option name should be.  "You
> >are running in a repository that is used as a submodule in a
> >larger context, which has the submodule at this path" is what the
> >option tells the command; if any existing command already has
> >such an option, we should use it.  If we are inventing one,
> >perhaps "--submodule-path" (I didn't check if there are existing
> >options that sound similar to it and mean completely different
> >things, in which case that name is not usable)?
> 
> Would it make sense to add the '--submodule-path' to a more generic
> part of the code? It's not just ls-files/grep that have to solve exactly this
> problem. Up to now we just did not go for those commands, though.

Yes I think so, since it should also handle starting from a submodule
with a pathspec to the superproject or other submodule. In case we
go with my above suggestion I would suggest a more generic name since
the option could also be passed to processes handling the superproject.
E.g. something like --module-prefix or --repository-prefix comes to my
mind, not checked though.

Cheers Heiko


[PATCH 4/2] use actual start hashes for submodule push check instead of local refs

2016-09-15 Thread Heiko Voigt
Push knows the actual revision range it is actually pushing to a remote.
Let's use the start revisions to reduce the amount of checked revisions
instead of the locally cached remote refs which might be out of date.

This actually changes behavior as it now can also properly handle pushes
with URLs. That is also the reason why we change some tests. When
passing the remote as URL the push is made unconditionally in on-demand.
This is wrong since on-demand should only push the submodule if its SHA1
reference got changed in the superproject history that is getting pushed.
The reason behind that is that the exclusion list for revisions was
reduced by using the parameters "--not --remotes=" to
reduce the amount of revisions for the revision walk. In case of an URL
this does not match anything and thus we would always do a full revision
walk until the root commit.

Signed-off-by: Heiko Voigt <hvo...@hvoigt.net>
---
And here is another one which makes the check more correct. For push
--recurse-submodules=check|on-demand with URLs this is also a
performance improvement since at the moment we iterate over all
revisions unconditionally as explained above.

This patch changes the behavior (now its how its supposed to be) so
there may be fallout from users complaining that their submodules do not
get pushed with on-demand anymore. This is only for users using URL for
pushing though.

Cheers Heiko

BTW: Peff, thanks for the good diagnosis of all these problems.

 submodule.c| 12 ++--
 submodule.h|  6 +++---
 t/t5531-deep-submodule-push.sh | 24 ++--
 transport.c| 40 ++--
 4 files changed, 57 insertions(+), 25 deletions(-)

diff --git a/submodule.c b/submodule.c
index 28bb74e..1f82974 100644
--- a/submodule.c
+++ b/submodule.c
@@ -639,8 +639,8 @@ static void free_submodules_sha1s(struct string_list 
*submodules)
string_list_clear(submodules, 1);
 }
 
-int find_unpushed_submodules(struct sha1_array *hashes,
-   const char *remotes_name, struct string_list *needs_pushing)
+int find_unpushed_submodules(struct sha1_array *old_hashes,
+   struct sha1_array *new_hashes, struct string_list 
*needs_pushing)
 {
struct rev_info rev;
struct commit *commit;
@@ -652,9 +652,9 @@ int find_unpushed_submodules(struct sha1_array *hashes,
 
/* argv.argv[0] will be ignored by setup_revisions */
argv_array_push(, "find_unpushed_submodules");
-   sha1_array_for_each_unique(hashes, append_hash_to_argv, );
+   sha1_array_for_each_unique(new_hashes, append_hash_to_argv, );
argv_array_push(, "--not");
-   argv_array_pushf(, "--remotes=%s", remotes_name);
+   sha1_array_for_each_unique(old_hashes, append_hash_to_argv, );
 
setup_revisions(argv.argc, argv.argv, , NULL);
if (prepare_revision_walk())
@@ -700,12 +700,12 @@ static int push_submodule(const char *path)
return 1;
 }
 
-int push_unpushed_submodules(struct sha1_array *hashes, const char 
*remotes_name)
+int push_unpushed_submodules(struct sha1_array *old_hashes, struct sha1_array 
*new_hashes)
 {
int i, ret = 1;
struct string_list needs_pushing = STRING_LIST_INIT_DUP;
 
-   if (!find_unpushed_submodules(hashes, remotes_name, _pushing))
+   if (!find_unpushed_submodules(old_hashes, new_hashes, _pushing))
return 1;
 
for (i = 0; i < needs_pushing.nr; i++) {
diff --git a/submodule.h b/submodule.h
index 065b2f0..55c6c91 100644
--- a/submodule.h
+++ b/submodule.h
@@ -63,9 +63,9 @@ int submodule_uses_gitfile(const char *path);
 int ok_to_remove_submodule(const char *path);
 int merge_submodule(unsigned char result[20], const char *path, const unsigned 
char base[20],
const unsigned char a[20], const unsigned char b[20], int 
search);
-int find_unpushed_submodules(struct sha1_array *hashes, const char 
*remotes_name,
-   struct string_list *needs_pushing);
-int push_unpushed_submodules(struct sha1_array *hashes, const char 
*remotes_name);
+int find_unpushed_submodules(struct sha1_array *old_hashes,
+   struct sha1_array *new_hashes, struct string_list 
*needs_pushing);
+int push_unpushed_submodules(struct sha1_array *old_hashes, struct sha1_array 
*new_hashes);
 void connect_work_tree_and_git_dir(const char *work_tree, const char *git_dir);
 int parallel_submodules(void);
 
diff --git a/t/t5531-deep-submodule-push.sh b/t/t5531-deep-submodule-push.sh
index 198ce84..a5b8ef6 100755
--- a/t/t5531-deep-submodule-push.sh
+++ b/t/t5531-deep-submodule-push.sh
@@ -154,6 +154,8 @@ test_expect_success 'push recurse-submodules on command 
line overrides config' '
git fetch ../pub.git &&
git diff --quiet FETCH_HEAD master &&
(cd gar/bage && git diff --quiet origi

[PATCH 3/2] batch check whether submodule needs pushing into one call

2016-09-15 Thread Heiko Voigt
We run a command for each sha1 change in a submodule. This is
unnecessary since we can simply batch all sha1's we want to check into
one command. Lets do it so we can speedup the check when many submodule
changes are in need of checking.

Signed-off-by: Heiko Voigt <hvo...@hvoigt.net>
---
On Wed, Sep 14, 2016 at 03:30:53PM -0700, Junio C Hamano wrote:
> Heiko Voigt <hvo...@hvoigt.net> writes:
> 
> > Sorry about the late reply. I was not able to process emails until now.
> > Here are two patches that should help to improve the situation and batch
> > up some processing. This one is for repositories with submodules, so
> > that they do not iterate over the same submodule twice with the same
> > hash.
> >
> > The second one will be the one people without submodules are interested
> > in.
> 
> Thanks.  Will take a look at later as I'm already deep in today's
> integration cycle.  Very much appreciated.

No problem. While I am at it: Here are actually another two patches that
should make life of submodule users easier (push times of big pushes).

In Numbers with the qt5[1] superproject and all submodules initialized.
The same --mirror test as before with the git repository:

# Without patch:

book:qt5 hvoigt (5.6)$
rm -rf ~/Downloads/git-test && mkdir ~/Downloads/git-test &&
   (cd ~/Downloads/git-test && git init) &&
   time git push --mirror --recurse-submodules=check ~/Downloads/git-test

real4m0.881s
user3m30.139s
sys 0m22.329s

Without --recurse-submodules=check

real0m0.251s
user0m0.218s
sys 0m0.082s


# With patch:

real0m1.167s
user0m0.846s
sys 0m0.262s

real0m1.110s
user0m0.815s
sys 0m0.247s

real0m1.111s
user0m0.818s
sys 0m0.251s

Without --recurse-submodules=check

real0m0.294s
user0m0.221s
sys 0m0.104s

real0m0.248s
user0m0.216s
sys 0m0.080s

real0m0.247s
user0m0.212s
sys 0m0.082s

[1] git://code.qt.io/qt/qt5.git

 submodule.c | 75 -
 1 file changed, 39 insertions(+), 36 deletions(-)

diff --git a/submodule.c b/submodule.c
index a15e346..28bb74e 100644
--- a/submodule.c
+++ b/submodule.c
@@ -522,27 +522,54 @@ static int has_remote(const char *refname, const struct 
object_id *oid,
return 1;
 }
 
-static int submodule_needs_pushing(const char *path, const unsigned char 
sha1[20])
+static void append_hash_to_argv(const unsigned char sha1[20], void *data)
 {
-   if (add_submodule_odb(path) || !lookup_commit_reference(sha1))
+   struct argv_array *argv = (struct argv_array *) data;
+   argv_array_push(argv, sha1_to_hex(sha1));
+}
+
+static void check_has_hash(const unsigned char sha1[20], void *data)
+{
+   int *has_hash = (int *) data;
+
+   if (!lookup_commit_reference(sha1))
+   *has_hash = 0;
+}
+
+static int submodule_has_hashes(const char *path, struct sha1_array *hashes)
+{
+   int has_hash = 1;
+
+   if (add_submodule_odb(path))
+   return 0;
+
+   sha1_array_for_each_unique(hashes, check_has_hash, _hash);
+   return has_hash;
+}
+
+static int submodule_needs_pushing(const char *path, struct sha1_array *hashes)
+{
+   if (!submodule_has_hashes(path, hashes))
return 0;
 
if (for_each_remote_ref_submodule(path, has_remote, NULL) > 0) {
struct child_process cp = CHILD_PROCESS_INIT;
-   const char *argv[] = {"rev-list", NULL, "--not", "--remotes", 
"-n", "1" , NULL};
+
+   argv_array_push(, "rev-list");
+   sha1_array_for_each_unique(hashes, append_hash_to_argv, 
);
+   argv_array_pushl(, "--not", "--remotes", "-n", "1" , 
NULL);
+
struct strbuf buf = STRBUF_INIT;
int needs_pushing = 0;
 
-   argv[1] = sha1_to_hex(sha1);
-   cp.argv = argv;
prepare_submodule_repo_env(_array);
cp.git_cmd = 1;
cp.no_stdin = 1;
cp.out = -1;
cp.dir = path;
if (start_command())
-   die("Could not run 'git rev-list %s --not --remotes -n 
1' command in submodule %s",
-   sha1_to_hex(sha1), path);
+   die("Could not run 'git rev-list  --not 
--remotes -n 1' command in submodule %s",
+   path);
if (strbuf_read(, cp.out, 41))
needs_pushing = 1;
finish_command();
@@ -601,21 +628,6 @@ static void find_unpushed_submodule_commits(struct commit 
*commit,
diff_tree_combined_merge(commit, 1, );
 }
 
-struct collect_submodule_from_sha1s_data {
-   char *submodule_path;
-

  1   2   3   4   >