Re: [PATCH 1/2] add: warn when adding an embedded repository

2017-06-15 Thread Jeff King
On Wed, Jun 14, 2017 at 10:53:12AM -0700, Stefan Beller wrote:

> >> In my ideal dream world of submodules we would have the following:
> >>
> >>   $ cat .gitmodules
> >>   [submodule "sub42"]
> >> path = foo
> >>   # path only in tree!
> >
> > TBH, I am not sure why we need "path"; couldn't we just use the
> > subsection name as an implicit path?
> 
> That is what was done back in the time. But then people wanted to rename
> submodules (i.e. move them around in the worktree), so the path is not
> constant, so either we'd have to move around the git dir whenever the
> submodule is renamed (bad idea IMO), or instead introduce a mapping
> between (constant name <-> variable path). So that was done.

Ah, right. That makes sense. I forgot that in addition to the in-tree
path, we have to store the submodule repository itself as some name. The
extra level of indirection there isn't strictly necessary, but it lets
the "name" act as a unique id.

> Historically (IIUC) we had submodule.path.url which then was changed
> to submodule.name.url + name->path resolution. And as a hack(?) or
> easy way out of a problem then, the name is often the same as the path
> hence confusing people, when they see:
> 
> [submodule "foo"]
> path = foo
> url = dadada/foo
> 
> What foo means what now? ;)

Right, I am such a person that has been confused. ;)

Thanks for explaining.

> Talking about another tangent:
> 
>   For files there is a rename detection available. For submodules
>   It is hard to imagine that there ever will be such a rename detection
>   as files have because of the explciit name<->path mapping.
> 
>   We *know* when a submodule was moved. So why even try
>   to do rename detection? As we record only sha1s for a submodule
>   you could swap two submodule object names by accident.
>   Consider a superproject that contains different kernels, such as
>   a kernel for your phone/embedded device and then a kernel for
>   your workstation or other device. And these two kernels are different
>   for technical reasons but share the same history.

Do you mean during the rename detection phase of a diff, check to see if
.gitmodules registered a change in path for a particular module (by
finding its entry in the diff and looking at both sides), and if so then
mark that as a rename for the submodule paths?

>From a cursory glance, that sounds like an interesting approach.

-Peff


Re: [PATCH 1/2] add: warn when adding an embedded repository

2017-06-14 Thread Stefan Beller
On Tue, Jun 13, 2017 at 11:36 PM, Jeff King  wrote:
> On Tue, Jun 13, 2017 at 10:07:43AM -0700, Stefan Beller wrote:
>
>> > I also waffled on whether we should ask the submodule code
>> > whether it knows about a particular path. Technically:
>> >
>> >   git config submodule.foo.path foo
>> >   git config submodule.foo.url git://...
>> >   git add foo
>> >
>> > is legal, but would still warn with this patch. I don't know
>> > how much we should care (it would also be easy to do on
>> > top).
>>
>> And here I was thinking this is not legal, because you may override
>> anything *except* submodule.*.path in the config. That is because
>> all the other settings (such as url, active flag, branch,
>> shallow recommendation) are dependent on the use case, the user,
>> changes to the environment (url) or such. The name<->path mapping
>> however is only to be changed via changes to the tracked content.
>> That is why it would make sense to disallow overriding the path
>> outside the tracked content.
>
> It was probably a mistake to use normal config as the example. Junio
> mentioned it as a case that could work if you communicate the submodule
> URL to somebody else out-of-band. My understanding was that you could
> set whatever you like in the regular config, but I think that is just
> showing my ignorance of submodules.
>
> Pretend like I said "-f .gitmodules" in each line above. ;)
>
>> In my ideal dream world of submodules we would have the following:
>>
>>   $ cat .gitmodules
>>   [submodule "sub42"]
>> path = foo
>>   # path only in tree!
>
> TBH, I am not sure why we need "path"; couldn't we just use the
> subsection name as an implicit path?

That is what was done back in the time. But then people wanted to rename
submodules (i.e. move them around in the worktree), so the path is not
constant, so either we'd have to move around the git dir whenever the
submodule is renamed (bad idea IMO), or instead introduce a mapping
between (constant name <-> variable path). So that was done.

Historically (IIUC) we had submodule.path.url which then was changed
to submodule.name.url + name->path resolution. And as a hack(?) or
easy way out of a problem then, the name is often the same as the path
hence confusing people, when they see:

[submodule "foo"]
path = foo
url = dadada/foo

What foo means what now? ;)
As a tangent: I want to make the default name different to the path.

So yeah, we want to keep the name and not mingle with implicit path.

I think we may even have bugs in our code base where the
name/path confusion shows.

Talking about another tangent:

  For files there is a rename detection available. For submodules
  It is hard to imagine that there ever will be such a rename detection
  as files have because of the explciit name<->path mapping.

  We *know* when a submodule was moved. So why even try
  to do rename detection? As we record only sha1s for a submodule
  you could swap two submodule object names by accident.
  Consider a superproject that contains different kernels, such as
  a kernel for your phone/embedded device and then a kernel for
  your workstation or other device. And these two kernels are different
  for technical reasons but share the same history.

  Now the inattentive user may make a mistake and git-add the
  "wrong" kernel submodule.  The smart Git would tell that it is a
  rename/move just as we have with files.

>
>> > +   OPT_HIDDEN_BOOL(0, "warn-embedded-repo", _on_embedded_repo,
>> > +   N_("warn when adding an embedded repository")),
>>
>> We do not have a lot of OPT_HIDDEN_BOOLs throughout the code base.
>> We should use them more often.
>>
>> It makes sense though in this case.
>
> Actually, my main reason is that it's nonsense to show
> "--warn-embedded-repo" in the help, when it's already the default. I
> would like to have written:
>
>   OPT_NEGBOOL(0, "no-warn-embedded-repo", _on_embedded_repo,
> N_("disable warning when adding an embedded repository"))
>
> but we don't have such a thing (and the last discussion on it a few
> months ago left a lot of open questions). So given that this really
> isn't something I'd expect users to want, I figured hiding it was a good
> idea. I mentioned it in the manpage for script writers, but it's really
> not worth cluttering "git add -h".

ok :) If you really wanted, you could go with a raw OPTION though. ;)
This is fine with me though.

>
>> > +static const char embedded_advice[] = N_(
>> > +"You've added another git repository inside your current repository.\n"
>> > +"Clones of the outer repository will not also contain the contents of\n"
>> > +"the embedded repository. If you meant to add a submodule, use:\n"
>>
>> The "will not also" sounds a bit off to me. Maybe:
>>   ...
>>   Clones of the outer repository will not contain the contents
>>   of the embedded repository and has no way of knowing how
>>   to obtain the inner repo. If you meant to add a 

Re: [PATCH 1/2] add: warn when adding an embedded repository

2017-06-14 Thread Jeff King
On Tue, Jun 13, 2017 at 10:16:15AM -0700, Junio C Hamano wrote:

> Jeff King  writes:
> 
> > I also waffled on whether we should ask the submodule code
> > whether it knows about a particular path. Technically:
> >
> >   git config submodule.foo.path foo
> >   git config submodule.foo.url git://...
> >   git add foo
> >
> > is legal, but would still warn with this patch. 
> 
> Did you mean "git config -f .gitmodules" for the first two?  If so,
> I think I agree that (1) it should be legal and (2) we should be
> able to add the check on top of this patch.
> 
> Or do you really mean the case in which these are in .git/config?

I did mean .git/config, but I think it was because I was mostly confused
about what was possible. In either case I think what we'd want is to ask
"could this be used as an active submodule". And leave it to the
submodule code to decide whatever mechanisms are legal.

-Peff


Re: [PATCH 1/2] add: warn when adding an embedded repository

2017-06-14 Thread Jeff King
On Tue, Jun 13, 2017 at 10:07:43AM -0700, Stefan Beller wrote:

> > I also waffled on whether we should ask the submodule code
> > whether it knows about a particular path. Technically:
> >
> >   git config submodule.foo.path foo
> >   git config submodule.foo.url git://...
> >   git add foo
> >
> > is legal, but would still warn with this patch. I don't know
> > how much we should care (it would also be easy to do on
> > top).
> 
> And here I was thinking this is not legal, because you may override
> anything *except* submodule.*.path in the config. That is because
> all the other settings (such as url, active flag, branch,
> shallow recommendation) are dependent on the use case, the user,
> changes to the environment (url) or such. The name<->path mapping
> however is only to be changed via changes to the tracked content.
> That is why it would make sense to disallow overriding the path
> outside the tracked content.

It was probably a mistake to use normal config as the example. Junio
mentioned it as a case that could work if you communicate the submodule
URL to somebody else out-of-band. My understanding was that you could
set whatever you like in the regular config, but I think that is just
showing my ignorance of submodules.

Pretend like I said "-f .gitmodules" in each line above. ;)

> In my ideal dream world of submodules we would have the following:
> 
>   $ cat .gitmodules
>   [submodule "sub42"]
> path = foo
>   # path only in tree!

TBH, I am not sure why we need "path"; couldn't we just use the
subsection name as an implicit path?

> > +   OPT_HIDDEN_BOOL(0, "warn-embedded-repo", _on_embedded_repo,
> > +   N_("warn when adding an embedded repository")),
> 
> We do not have a lot of OPT_HIDDEN_BOOLs throughout the code base.
> We should use them more often.
> 
> It makes sense though in this case.

Actually, my main reason is that it's nonsense to show
"--warn-embedded-repo" in the help, when it's already the default. I
would like to have written:

  OPT_NEGBOOL(0, "no-warn-embedded-repo", _on_embedded_repo,
N_("disable warning when adding an embedded repository"))

but we don't have such a thing (and the last discussion on it a few
months ago left a lot of open questions). So given that this really
isn't something I'd expect users to want, I figured hiding it was a good
idea. I mentioned it in the manpage for script writers, but it's really
not worth cluttering "git add -h".

> > +static const char embedded_advice[] = N_(
> > +"You've added another git repository inside your current repository.\n"
> > +"Clones of the outer repository will not also contain the contents of\n"
> > +"the embedded repository. If you meant to add a submodule, use:\n"
> 
> The "will not also" sounds a bit off to me. Maybe:
>   ...
>   Clones of the outer repository will not contain the contents
>   of the embedded repository and has no way of knowing how
>   to obtain the inner repo. If you meant to add a submodule ...

Yeah, I think we could just strike the "also" (I played around with the
wording here quite a bit and I think it was left from an earlier attempt
where it made more sense).

Your "no way of knowing" is probably a good thing to mention.

> > +"See \"git help submodule\" for more information."
> 
> Once the overhaul of the submodule documentation
> comes along[1], we rather want to point at
> "man 7 git-submodules", which explains the concepts and
> then tell you about commands how to use it. For now the
> git-submodule man page is ok.
> 
> [1] https://public-inbox.org/git/20170607185354.10050-1-sbel...@google.com/

Yeah, I poked around looking for a definitive "here's how submodules
work" intro. I'm happy one is in the works, and I agree this should
point there once it exists.

> > +++ b/t/t7414-submodule-mistakes.sh
> > @@ -0,0 +1,40 @@
> > +#!/bin/sh
> > +
> > +test_description='handling of common mistakes people may make with 
> > submodules'
> 
> That is one way to say it. Do we have other tests for
> "you think it is a bug, but it is features" ? ;)
> I like it though. :)

Heh. I didn't know how else to lump it together. Just "test git add on a
repository" felt like too little for its own script. I almost added it
to t7400, but I think that script is plenty long enough as it is (it's
also one of the longest-running scripts, I think).

> > +test_expect_success 'create embedded repository' '
> > +   git init embed &&
> > +   (
> > +   cd embed &&
> > +   test_commit one
> > +   )
> 
> shorter via:
> 
>   test_create_repo embed &&
>   test_commit -C embed one
> 
> (and saves a shell IIRC)

Right, I forgot we added -C there. Will change.

> Thanks for these tests.
> 
> This patch looks good to me, apart from the perceived wording nits.

Thanks. I'll re-roll with a few tweaks based on your feedback.

-Peff


Re: [PATCH 1/2] add: warn when adding an embedded repository

2017-06-13 Thread Junio C Hamano
Jeff King  writes:

> I also waffled on whether we should ask the submodule code
> whether it knows about a particular path. Technically:
>
>   git config submodule.foo.path foo
>   git config submodule.foo.url git://...
>   git add foo
>
> is legal, but would still warn with this patch. 

Did you mean "git config -f .gitmodules" for the first two?  If so,
I think I agree that (1) it should be legal and (2) we should be
able to add the check on top of this patch.

Or do you really mean the case in which these are in .git/config?


Re: [PATCH 1/2] add: warn when adding an embedded repository

2017-06-13 Thread Brandon Williams
On 06/13, Stefan Beller wrote:
> On Tue, Jun 13, 2017 at 2:24 AM, Jeff King  wrote:
> 
> > There is a config knob that can disable the (long) hint. But
> > I intentionally omitted a config knob to disable the warning
> > entirely. Whether the warning is sensible or not is
> > generally about context, not about the user's preferences.
> > If there's a tool or workflow that adds gitlinks without
> > matching .gitmodules, it should probably be taught about the
> > new command-line option, rather than blanket-disabling the
> > warning.
> >
> > Signed-off-by: Jeff King 
> > ---
> > The check for "is this a gitlink" is done by looking for a
> > trailing "/" in the added path. This feels kind of hacky,
> > but actually seems to work well in practice.
> 
> This whole "slash at the end" thing comes from extensive use
> of shell completion adding the slash at the end of a directory
> IMHO. (cf. PATHSPEC_STRIP_SUBMODULE_SLASH_* is
> the same underlying hack.)

I got rid of PATHSPEC_STRIP_SUBMODULE_SLASH_* recently, just an fyi.

-- 
Brandon Williams


Re: [PATCH 1/2] add: warn when adding an embedded repository

2017-06-13 Thread Stefan Beller
On Tue, Jun 13, 2017 at 2:24 AM, Jeff King  wrote:
> It's an easy mistake to add a repository inside another
> repository, like:
>
>   git clone $url
>   git add .
>
> The resulting entry is a gitlink, but there's no matching
> .gitmodules entry. Trying to use "submodule init" (or clone
> with --recursive) doesn't do anything useful. Prior to
> v2.13, such an entry caused git-submodule to barf entirely.
> In v2.13, the entry is considered "inactive" and quietly
> ignored. Either way, no clone of your repository can do
> anything useful with the gitlink without the user manually
> adding the submodule config.
>
> In most cases, the user probably meant to either add a real
> submodule, or they forgot to put the embedded repository in
> their .gitignore file.
>
> Let's issue a warning when we see this case. There are a few
> things to note:
>
>   - the warning will go in the git-add porcelain; anybody
> wanting to do low-level manipulation of the index is
> welcome to create whatever funny states they want.
>
>   - we detect the case by looking for a newly added gitlink;
> updates via "git add submodule" are perfectly reasonable,
> and this avoids us having to investigate .gitmodules
> entirely
>
>   - there's a command-line option to suppress the warning.
> This is needed for git-submodule itself (which adds the
> entry before adding any submodule config), but also
> provides a mechanism for other scripts doing
> submodule-like things.
>
> We could make this a hard error instead of a warning.
> However, we do add lots of sub-repos in our test suite. It's
> not _wrong_ to do so. It just creates a state where users
> may be surprised. Pointing them in the right direction with
> a gentle hint is probably the best option.

Sounds good up to here (and right).

> There is a config knob that can disable the (long) hint. But
> I intentionally omitted a config knob to disable the warning
> entirely. Whether the warning is sensible or not is
> generally about context, not about the user's preferences.
> If there's a tool or workflow that adds gitlinks without
> matching .gitmodules, it should probably be taught about the
> new command-line option, rather than blanket-disabling the
> warning.
>
> Signed-off-by: Jeff King 
> ---
> The check for "is this a gitlink" is done by looking for a
> trailing "/" in the added path. This feels kind of hacky,
> but actually seems to work well in practice.

This whole "slash at the end" thing comes from extensive use
of shell completion adding the slash at the end of a directory
IMHO. (cf. PATHSPEC_STRIP_SUBMODULE_SLASH_* is
the same underlying hack.)

> We've already
> expanded the pathspecs to real filenames via dir.c, and that
> omits trees. So anything with a trailing slash must be a
> gitlink.

Oh!

>
> And I really didn't want to incur any extra cost in the
> common case here (e.g., checking for "path/.git"). We could
> do it at zero-cost by pushing the check much further down
> (i.e., when we'd realize anyway that it's a gitlink), but I
> didn't want to pollute read-cache.c with what is essentially
> a porcelain warning. The actual check done there seems to be
> checking S_ISDIR, but I didn't even want to incur an extra
> stat per-file.

makes sense.

>
> I also waffled on whether we should ask the submodule code
> whether it knows about a particular path. Technically:
>
>   git config submodule.foo.path foo
>   git config submodule.foo.url git://...
>   git add foo
>
> is legal, but would still warn with this patch. I don't know
> how much we should care (it would also be easy to do on
> top).

And here I was thinking this is not legal, because you may override
anything *except* submodule.*.path in the config. That is because
all the other settings (such as url, active flag, branch,
shallow recommendation) are dependent on the use case, the user,
changes to the environment (url) or such. The name<->path mapping
however is only to be changed via changes to the tracked content.
That is why it would make sense to disallow overriding the path
outside the tracked content.

In my ideal dream world of submodules we would have the following:

  $ cat .gitmodules
  [submodule "sub42"]
path = foo
  # path only in tree!

  $ cat .git/config
  ...
  [submodule]
active = .
active = :(exclude)Irrelevant/submodules/for/my/usecase/*
  # note how this is user centric

  $ git show refs/meta/magic/for/refs/heads/master:.gitmodules
  [submodule "sub42"]
url = https://example.org/foo
branch = .
  # Note how this is neither centering on the in-tree
  # contents, nor the user. Instead it focuses on the
  # project or group. It is *workflow* centric.
  # Workflows may change over time, e.g. the url could
  # be repointed to k.org or an in-house mirror without tree
  # changes.


But back to reviewing this patch.

>
>  Documentation/config.txt  |  3 +++
>  Documentation/git-add.txt |  7 +++
>  

[PATCH 1/2] add: warn when adding an embedded repository

2017-06-13 Thread Jeff King
It's an easy mistake to add a repository inside another
repository, like:

  git clone $url
  git add .

The resulting entry is a gitlink, but there's no matching
.gitmodules entry. Trying to use "submodule init" (or clone
with --recursive) doesn't do anything useful. Prior to
v2.13, such an entry caused git-submodule to barf entirely.
In v2.13, the entry is considered "inactive" and quietly
ignored. Either way, no clone of your repository can do
anything useful with the gitlink without the user manually
adding the submodule config.

In most cases, the user probably meant to either add a real
submodule, or they forgot to put the embedded repository in
their .gitignore file.

Let's issue a warning when we see this case. There are a few
things to note:

  - the warning will go in the git-add porcelain; anybody
wanting to do low-level manipulation of the index is
welcome to create whatever funny states they want.

  - we detect the case by looking for a newly added gitlink;
updates via "git add submodule" are perfectly reasonable,
and this avoids us having to investigate .gitmodules
entirely

  - there's a command-line option to suppress the warning.
This is needed for git-submodule itself (which adds the
entry before adding any submodule config), but also
provides a mechanism for other scripts doing
submodule-like things.

We could make this a hard error instead of a warning.
However, we do add lots of sub-repos in our test suite. It's
not _wrong_ to do so. It just creates a state where users
may be surprised. Pointing them in the right direction with
a gentle hint is probably the best option.

There is a config knob that can disable the (long) hint. But
I intentionally omitted a config knob to disable the warning
entirely. Whether the warning is sensible or not is
generally about context, not about the user's preferences.
If there's a tool or workflow that adds gitlinks without
matching .gitmodules, it should probably be taught about the
new command-line option, rather than blanket-disabling the
warning.

Signed-off-by: Jeff King 
---
The check for "is this a gitlink" is done by looking for a
trailing "/" in the added path. This feels kind of hacky,
but actually seems to work well in practice. We've already
expanded the pathspecs to real filenames via dir.c, and that
omits trees. So anything with a trailing slash must be a
gitlink.

And I really didn't want to incur any extra cost in the
common case here (e.g., checking for "path/.git"). We could
do it at zero-cost by pushing the check much further down
(i.e., when we'd realize anyway that it's a gitlink), but I
didn't want to pollute read-cache.c with what is essentially
a porcelain warning. The actual check done there seems to be
checking S_ISDIR, but I didn't even want to incur an extra
stat per-file.

I also waffled on whether we should ask the submodule code
whether it knows about a particular path. Technically:

  git config submodule.foo.path foo
  git config submodule.foo.url git://...
  git add foo

is legal, but would still warn with this patch. I don't know
how much we should care (it would also be easy to do on
top).

 Documentation/config.txt  |  3 +++
 Documentation/git-add.txt |  7 +++
 advice.c  |  2 ++
 advice.h  |  1 +
 builtin/add.c | 45 ++-
 git-submodule.sh  |  5 +++--
 t/t7414-submodule-mistakes.sh | 40 ++
 7 files changed, 100 insertions(+), 3 deletions(-)
 create mode 100755 t/t7414-submodule-mistakes.sh

diff --git a/Documentation/config.txt b/Documentation/config.txt
index dd4beec39..e909239bc 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -348,6 +348,9 @@ advice.*::
rmHints::
In case of failure in the output of linkgit:git-rm[1],
show directions on how to proceed from the current state.
+   addEmbeddedRepo::
+   Advice on what to do when you've accidentally added one
+   git repo inside of another.
 --
 
 core.fileMode::
diff --git a/Documentation/git-add.txt b/Documentation/git-add.txt
index 7ed63dce0..f4169fb1e 100644
--- a/Documentation/git-add.txt
+++ b/Documentation/git-add.txt
@@ -165,6 +165,13 @@ for "git add --no-all ...", i.e. ignored removed 
files.
be ignored, no matter if they are already present in the work
tree or not.
 
+--no-warn-embedded-repo::
+   By default, `git add` will warn when adding an embedded
+   repository to the index without using `git submodule add` to
+   create an entry in `.gitmodules`. This option will suppress the
+   warning (e.g., if you are manually performing operations on
+   submodules).
+
 --chmod=(+|-)x::
Override the executable bit of the added files.  The executable
bit is only changed in the index, the files on disk are left
diff --git