RE: Uninitialized submodules as symlinks

2016-10-17 Thread David Turner
> -Original Message-
> From: Duy Nguyen [mailto:pclo...@gmail.com]
> Sent: Monday, October 17, 2016 5:46 AM
> To: David Turner
> Cc: Stefan Beller; git@vger.kernel.org
> Subject: Re: Uninitialized submodules as symlinks
> 
> On Sat, Oct 8, 2016 at 2:59 AM, David Turner <david.tur...@twosigma.com>
> wrote:
> >
> >
> >> -Original Message-
> >> From: Stefan Beller [mailto:sbel...@google.com]
> >> Sent: Friday, October 07, 2016 2:56 PM
> >> To: David Turner
> >> Cc: git@vger.kernel.org
> >> Subject: Re: Uninitialized submodules as symlinks
> >>
> >> On Fri, Oct 7, 2016 at 11:17 AM, David Turner
> >> <david.tur...@twosigma.com>
> >> wrote:
> >> > Presently, uninitialized submodules are materialized in the working
> >> > tree as empty directories.
> >>
> >> Right, there has to be something, to hint at the user that creating a
> >> file with that path is probably not what they want.
> >>
> >> >  We would like to consider having them be symlinks.  Specifically,
> >> > we'd like them to be symlinks into a FUSE filesystem which
> >> > retrieves files on demand.
> >> >
> >> > We've actually already got a FUSE filesystem written, but we use a
> >> > different (semi-manual) means to connect it to the initialized
> submodules.
> >>
> >> So you currently do a
> >>
> >> git submodule init 
> >> custom-submodule make-symlink 
> >>
> >> ?
> >
> > We do something like
> >
> > For each initialized submodule: symlink it into the right place in
> > .../somedir For each uninitialized submodule: symlink from the FUSE
> > into the right place in .../somedir
> >
> > So .../somedir has the structure of the git main repo, but is all
> symlinks -- some into FUSE, some into the git repo.
> >
> > This means that when we initialize (or deinitialize) a submodule, we
> need to re-run the linking script.
> 
> Do .git files work? If .git files point to somewhere in fuse, I guess you
> still have file retrieval on demand. It depends on what files to retrieve
> I guess. If you want worktree files, not object database then .git files
> won't work because worktree remains in the same filesystem as the super
> repo.

Yes, we want worktree files (or even worktree files + built artifacts).


RE: Uninitialized submodules as symlinks

2016-10-17 Thread David Turner


> -Original Message-
> From: Heiko Voigt [mailto:hvo...@hvoigt.net]
> Sent: Thursday, October 13, 2016 12:10 PM
> To: David Turner
> Cc: git@vger.kernel.org
> Subject: Re: Uninitialized submodules as symlinks
> 
> On Fri, Oct 07, 2016 at 06:17:05PM +, David Turner wrote:
> > Presently, uninitialized submodules are materialized in the working
> > tree as empty directories.  We would like to consider having them be
> > symlinks.  Specifically, we'd like them to be symlinks into a FUSE
> > filesystem which retrieves files on demand.
> 
> How about portability? This feature would only work on Unix like operating
> systems. You have to be careful to not break Windows since they do not
> have symlinks.

Windows doesn't support FUSE either IIRC.  Since this would be an alternate 
mode of operation, Windows would still work fine on the old model.


Re: Uninitialized submodules as symlinks

2016-10-17 Thread Duy Nguyen
On Sat, Oct 8, 2016 at 2:59 AM, David Turner <david.tur...@twosigma.com> wrote:
>
>
>> -Original Message-
>> From: Stefan Beller [mailto:sbel...@google.com]
>> Sent: Friday, October 07, 2016 2:56 PM
>> To: David Turner
>> Cc: git@vger.kernel.org
>> Subject: Re: Uninitialized submodules as symlinks
>>
>> On Fri, Oct 7, 2016 at 11:17 AM, David Turner <david.tur...@twosigma.com>
>> wrote:
>> > Presently, uninitialized submodules are materialized in the working tree
>> > as empty directories.
>>
>> Right, there has to be something, to hint at the user that creating a file
>> with that path is probably not what they want.
>>
>> >  We would like to consider having them be symlinks.  Specifically, we'd
>> > like them to be symlinks into a FUSE filesystem which retrieves files on
>> > demand.
>> >
>> > We've actually already got a FUSE filesystem written, but we use a
>> > different (semi-manual) means to connect it to the initialized submodules.
>>
>> So you currently do a
>>
>> git submodule init 
>> custom-submodule make-symlink 
>>
>> ?
>
> We do something like
>
> For each initialized submodule: symlink it into the right place in .../somedir
> For each uninitialized submodule: symlink from the FUSE into the right place 
> in .../somedir
>
> So .../somedir has the structure of the git main repo, but is all symlinks -- 
> some into FUSE, some into the git repo.
>
> This means that when we initialize (or deinitialize) a submodule, we need to 
> re-run the linking script.

Do .git files work? If .git files point to somewhere in fuse, I guess
you still have file retrieval on demand. It depends on what files to
retrieve I guess. If you want worktree files, not object database then
.git files won't work because worktree remains in the same filesystem
as the super repo.
-- 
Duy


Re: Uninitialized submodules as symlinks

2016-10-17 Thread Heiko Voigt
On Fri, Oct 14, 2016 at 09:48:16AM -0700, Junio C Hamano wrote:
> Kevin Daudt  writes:
> 
> > On Thu, Oct 13, 2016 at 06:10:17PM +0200, Heiko Voigt wrote:
> >> On Fri, Oct 07, 2016 at 06:17:05PM +, David Turner wrote:
> >> > Presently, uninitialized submodules are materialized in the working
> >> > tree as empty directories.  We would like to consider having them be
> >> > symlinks.  Specifically, we'd like them to be symlinks into a FUSE
> >> > filesystem which retrieves files on demand.
> >> 
> >> How about portability? This feature would only work on Unix like
> >> operating systems. You have to be careful to not break Windows since
> >> they do not have symlinks.
> >
> > NTFS does have symlinks, but you need admin right to create them though
> > (unless you change the policy).
> 
> That sounds like saying "It has, but it practically is not usable by
> Git as a mechanism to achieve this goal" to me.

Yes and that is why Git for Windows does not use them and I simplified
to: "Windows does not have symlinks". For a normal user there is no such
thing as symlinks on Windows, unfortunately.

Cheers Heiko


Re: Uninitialized submodules as symlinks

2016-10-14 Thread Junio C Hamano
Kevin Daudt  writes:

> On Thu, Oct 13, 2016 at 06:10:17PM +0200, Heiko Voigt wrote:
>> On Fri, Oct 07, 2016 at 06:17:05PM +, David Turner wrote:
>> > Presently, uninitialized submodules are materialized in the working
>> > tree as empty directories.  We would like to consider having them be
>> > symlinks.  Specifically, we'd like them to be symlinks into a FUSE
>> > filesystem which retrieves files on demand.
>> 
>> How about portability? This feature would only work on Unix like
>> operating systems. You have to be careful to not break Windows since
>> they do not have symlinks.
>
> NTFS does have symlinks, but you need admin right to create them though
> (unless you change the policy).

That sounds like saying "It has, but it practically is not usable by
Git as a mechanism to achieve this goal" to me.





Re: Uninitialized submodules as symlinks

2016-10-13 Thread Kevin Daudt
On Thu, Oct 13, 2016 at 06:10:17PM +0200, Heiko Voigt wrote:
> On Fri, Oct 07, 2016 at 06:17:05PM +, David Turner wrote:
> > Presently, uninitialized submodules are materialized in the working
> > tree as empty directories.  We would like to consider having them be
> > symlinks.  Specifically, we'd like them to be symlinks into a FUSE
> > filesystem which retrieves files on demand.
> 
> How about portability? This feature would only work on Unix like
> operating systems. You have to be careful to not break Windows since
> they do not have symlinks.
> 

NTFS does have symlinks, but you need admin right to create them though
(unless you change the policy).


Re: Uninitialized submodules as symlinks

2016-10-13 Thread Heiko Voigt
On Fri, Oct 07, 2016 at 06:17:05PM +, David Turner wrote:
> Presently, uninitialized submodules are materialized in the working
> tree as empty directories.  We would like to consider having them be
> symlinks.  Specifically, we'd like them to be symlinks into a FUSE
> filesystem which retrieves files on demand.

How about portability? This feature would only work on Unix like
operating systems. You have to be careful to not break Windows since
they do not have symlinks.

Cheers Heiko


RE: Uninitialized submodules as symlinks

2016-10-07 Thread David Turner


> -Original Message-
> From: Stefan Beller [mailto:sbel...@google.com]
> Sent: Friday, October 07, 2016 2:56 PM
> To: David Turner
> Cc: git@vger.kernel.org
> Subject: Re: Uninitialized submodules as symlinks
> 
> On Fri, Oct 7, 2016 at 11:17 AM, David Turner <david.tur...@twosigma.com>
> wrote:
> > Presently, uninitialized submodules are materialized in the working tree
> > as empty directories.
> 
> Right, there has to be something, to hint at the user that creating a file
> with that path is probably not what they want.
> 
> >  We would like to consider having them be symlinks.  Specifically, we'd
> > like them to be symlinks into a FUSE filesystem which retrieves files on
> > demand.
> >
> > We've actually already got a FUSE filesystem written, but we use a
> > different (semi-manual) means to connect it to the initialized submodules.
> 
> So you currently do a
> 
> git submodule init 
> custom-submodule make-symlink 
> 
> ?

We do something like

For each initialized submodule: symlink it into the right place in .../somedir
For each uninitialized submodule: symlink from the FUSE into the right place in 
.../somedir

So .../somedir has the structure of the git main repo, but is all symlinks -- 
some into FUSE, some into the git repo.

This means that when we initialize (or deinitialize) a submodule, we need to 
re-run the linking script.  

> > We hope to release this FUSE filesystem as free software at some point
> > soon, but we do not yet have a fixed schedule for doing so.  Having to run
> > a command to create the symlink-based "union" filesystem is not optimal
> > (since we have to re-run it every time we initialize or deinitialize a
> > submodule).
> >
> > But if the uninitialized submodules could be symlinks into the FUSE
> > filesystem, we wouldn't have this problem.  This solution isn't
> > necessarily FUSE-specific -- perhaps someone would want copies of the same
> > submodule in multiple repos, and would want to save disk space by having
> > all copies point to the same place.  So the symlinks would be configured
> > by a per-submodule config variable.
> 
> I'd imagine that you want both a per-submodule config variable as well as
> a global variable that is a default for all submodules?
> 
> git config submodule.trySymlinkDefault /mounted/fuse/
> # any (new) submodule tries to be linked to /mounted/fuse/
> git config submodule..symlinked ~/my/private/symlinked
> # The  submodule goes into another path.
> 
> As you propose the FUSE filesystem fetches files on demand, you probably
> want to disable things that scan the whole submodule, e.g. look at
> submodule..ignore to suppress status looking at all files.

I would actually expect that git would detect that the symlink is unmodified 
from the configured symlink and automatically decide not to look there.
 
> When looking through the options, you could add the value "symlink" to
> submodule..update, which then respects the
> submodule.trySymlinkDefault if present, such that
> 
> git clone --recurse-submodules ...
> 
> works and sets up the FUSE thing correctly.
> 
> How does the FUSE system handle different versions, i.e.
> `git submodule update` to checkout another version of the submodule?
> (btw, I plan on working on integrating submodules to "git checkout", so
> "submodule update" would not need to be run there, but we'd hook it into
> checkout instead)

The fuse has a (virtual) directory for each SHA of the main repo, with each 
submodule mapped to the then-current version of the submodule's code. Actually, 
it's a bit more complicated because the uninitialized modules point to 
already-built binaries -- that is, the symlink is to something like 
$fuse/$SHA/built/$submodule. 

If you check out a new version of the main module, in our current setup, you 
need to again update all of the submodule symlinks (as described above). 

Under my proposal, I guess this would still need to happen.  A post-checkout 
hook could handle it either way.  Despite this flaw, switching a submodule 
between an initialized and deinitialized state would still be more seamless 
with the symlinks.

> > Naturally, this would require some changes to code that examines the
> working tree -- git status, git diff, etc.  They would have to report
> "unchanged" for submodules which were still symlinks to the configured
> location.  I have not yet looked at the implementation details beyond
> this.
> >
> > Does this idea make any sense?  If I were to implement it (probably in a
> few months, but no official timeline yet), would patches be considered?
> 
> I am happy to review patches.

Thanks.