Begin forwarded message:

> From: Nick Townsend <nick.towns...@mac.com>
> Subject: Re: [PATCH] submodule recursion in git-archive
> Date: 2 December 2013 16:00:50 GMT-8
> To: Junio C Hamano <gits...@pobox.com>
> Cc: René Scharfe <l....@web.de>, Jens Lehmann <jens.lehm...@web.de>, 
> git@vger.kernel.org, Jeff King <p...@peff.net>
> 
> 
> On 27 Nov 2013, at 11:43, Junio C Hamano <gits...@pobox.com> wrote:
> 
>> Nick Townsend <nick.towns...@mac.com> writes:
>> 
>>> On 26 Nov 2013, at 14:18, Junio C Hamano <gits...@pobox.com> wrote:
>>> 
>>>> Even if the code is run inside a repository with a working tree,
>>>> when producing a tarball out of an ancient commit that had a
>>>> submodule not at its current location, --recurse-submodules option
>>>> should do the right thing, so asking for working tree location of
>>>> that submodule to find its repository is wrong, I think.  It may
>>>> happen to find one if the archived revision is close enough to what
>>>> is currently checked out, but that may not necessarily be the case.
>>>> 
>>>> At that point when the code discovers an S_ISGITLINK entry, it
>>>> should have both a pathname to the submodule relative to the
>>>> toplevel and the commit object name bound to that submodule
>>>> location.  What it should do, when it does not find the repository
>>>> at the given path (maybe because there is no working tree, or the
>>>> sudmodule directory has moved over time) is roughly:
>>>> 
>>>> - Read from .gitmodules at the top-level from the tree it is
>>>> creating the tarball out of;
>>>> 
>>>> - Find "submodule.$name.path" entry that records that path to the
>>>> submodule; and then
>>>> 
>>>> - Using that $name, find the stashed-away location of the submodule
>>>> repository in $GIT_DIR/modules/$name.
>>>> 
>>>> or something like that.
>>>> 
>>>> This is a related tangent, but when used in a repository that people
>>>> often use as their remote, the repository discovery may have to
>>>> interact with the relative URL.  People often ship .gitmodules with
>>>> 
>>>>    [submodule "bar"]
>>>>            URL = ../bar.git
>>>>            path = barDir
>>>> 
>>>> for a top-level project "foo" that can be cloned thusly:
>>>> 
>>>>    git clone git://site.xz/foo.git
>>>> 
>>>> and host bar.git to be clonable with
>>>> 
>>>>    git clone git://site.xz/bar.git barDir/
>>>> 
>>>> inside the working tree of the foo project.  In such a case, when
>>>> "archive --recurse-submodules" is running, it would find the
>>>> repository for the "bar" submodule at "../bar.git", I would think.
>>>> 
>>>> So this part needs a bit more thought, I am afraid.
>>> 
>>> I see that there is a lot of potential complexity around setting up a 
>>> submodule:
>> 
>> No question about it.
>> 
>>> * The .gitmodules file can be dirty (easy to flag, but should we
>>> allow archive to proceed?)
>> 
>> As we are discussing "archive", which takes a tree object from the
>> top-level project that is recorded in the object database, the
>> information _about_ the submodule in question should come from the
>> given tree being archived.  There is no reason for the .gitmodules
>> file that happens to be sitting in the working tree of the top-level
>> project to be involved in the decision, so its dirtyness should not
>> matter, I think.  If the tree being archived has a submodule whose
>> name is "kernel" at path "linux/" (relative to the top-level
>> project), its repository should be at .git/modules/kernel in the
>> layout recent git-submodule prepares, and we should find that
>> path-and-name mapping from .gitmodules recorded in that tree object
>> we are archiving. The version that happens to be checked out to the
>> working tree may have moved the submodule to a new path "linux-3.0/"
>> and "linux-3.0/.git" may have "gitdir: .git/modules/kernel" in it,
>> but when archiving a tree that has the submodule at "linux/", it
>> would not help---we would not know to look at "linux-3.0/.git" to
>> learn that information anyway because .gitmodules in the working
>> tree would say that the submodule at path "linux-3.0/" is with name
>> "kernel", and would not tell us anything about "linux/".
>> 
>>> * Users can mess with settings both prior to git submodule init
>>> and before git submodule update.
>> 
>> I think this is irrelevant for exactly the same reason as above.
>> 
>> What makes this tricker, however, is how to deal with an old-style
>> repository, where the submodule repositories are embedded in the
>> working tree that happens to be checked out.  In that case, we may
>> have to read .gitmodules from two places, i.e.
>> 
>> (1) We are archiving a tree with a submodule at "linux/";
>> 
>> (2) We read .gitmodules from that tree and learn that the submodule
>>    has name "kernel";
>> 
>> (3) There is no ".git/modules/kernel" because the repository uses
>>    the old layout (if the user never was interested in this
>>    submodule, .git/modules/kernel may also be missing, and we
>>    should tell these two cases apart by checking .git/config to
>>    see if a corresponding entry for the "kernel" submodule exists
>>    there);
>> 
>> (4) In a repository that uses the old layout, there must be the
>>    repository somewhere embedded in the current working tree (this
>>    inability to remove is why we use the new layout these days).
>>    We can learn where it is by looking at .gitmodules in the
>>    working tree---map the name "kernel" we learned earlier, and
>>    map it to the current path ("linux-3.0/" if you have been
>>    following this example so far).
>> 
>> And in that fallback context, I would say that reading from a dirty
>> (or "messed with by the user") .gitmodules is the right thing to
>> do.  Perhaps the user may be in the process of moving the submodule
>> in his working tree with
>> 
>>   $ mv linux-3.0 linux-3.2
>>   $ git config -f .gitmodules submodule.kernel.path linux-3.2
>> 
>> but hasn't committed the change yet.
>> 
>>> For those reasons I deliberately decided not to reproduce the
>>> above logic all by myself.
>> 
>> As I already hinted, I agree that the "how to find the location of
>> submodule repository, given a particular tree in the top-level
>> project the submodule belongs to and the path to the submodule in
>> question" deserves a separate thread to discuss with area experts.
> 
> As per my email to Heiko on this thread, I’m happy to start such 
> a discussion - I’ll use your notes as a starting point. I’m much more 
> comfortable
> using a wiki for this - is this common or should I start a new mail thread
> with RFC in the title or similar?
> 
> I did complete my work on my version of git-archive (for internal use) and 
> added some regression tests
> for current behaviour. Also the add_submodule_odb patch should IMHO be 
> incorporated
> anyway. I’ll resubmit those two for consideration in a new thread.
> 
> Kind Regards
> Nick Townsend
> 

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to