RE: Uninitialized submodules as symlinks
> -Original Message- > From: Duy Nguyen [mailto:pclo...@gmail.com] > Sent: Monday, October 17, 2016 5:46 AM > To: David Turner > Cc: Stefan Beller; git@vger.kernel.org > Subject: Re: Uninitialized submodules as symlinks > > On Sat, Oct 8, 2016 at 2:59 AM, David Turner > wrote: > > > > > >> -Original Message- > >> From: Stefan Beller [mailto:sbel...@google.com] > >> Sent: Friday, October 07, 2016 2:56 PM > >> To: David Turner > >> Cc: git@vger.kernel.org > >> Subject: Re: Uninitialized submodules as symlinks > >> > >> On Fri, Oct 7, 2016 at 11:17 AM, David Turner > >> > >> wrote: > >> > Presently, uninitialized submodules are materialized in the working > >> > tree as empty directories. > >> > >> Right, there has to be something, to hint at the user that creating a > >> file with that path is probably not what they want. > >> > >> > We would like to consider having them be symlinks. Specifically, > >> > we'd like them to be symlinks into a FUSE filesystem which > >> > retrieves files on demand. > >> > > >> > We've actually already got a FUSE filesystem written, but we use a > >> > different (semi-manual) means to connect it to the initialized > submodules. > >> > >> So you currently do a > >> > >> git submodule init > >> custom-submodule make-symlink > >> > >> ? > > > > We do something like > > > > For each initialized submodule: symlink it into the right place in > > .../somedir For each uninitialized submodule: symlink from the FUSE > > into the right place in .../somedir > > > > So .../somedir has the structure of the git main repo, but is all > symlinks -- some into FUSE, some into the git repo. > > > > This means that when we initialize (or deinitialize) a submodule, we > need to re-run the linking script. > > Do .git files work? If .git files point to somewhere in fuse, I guess you > still have file retrieval on demand. It depends on what files to retrieve > I guess. If you want worktree files, not object database then .git files > won't work because worktree remains in the same filesystem as the super > repo. Yes, we want worktree files (or even worktree files + built artifacts).
RE: Uninitialized submodules as symlinks
> -Original Message- > From: Heiko Voigt [mailto:hvo...@hvoigt.net] > Sent: Thursday, October 13, 2016 12:10 PM > To: David Turner > Cc: git@vger.kernel.org > Subject: Re: Uninitialized submodules as symlinks > > On Fri, Oct 07, 2016 at 06:17:05PM +, David Turner wrote: > > Presently, uninitialized submodules are materialized in the working > > tree as empty directories. We would like to consider having them be > > symlinks. Specifically, we'd like them to be symlinks into a FUSE > > filesystem which retrieves files on demand. > > How about portability? This feature would only work on Unix like operating > systems. You have to be careful to not break Windows since they do not > have symlinks. Windows doesn't support FUSE either IIRC. Since this would be an alternate mode of operation, Windows would still work fine on the old model.
Re: Uninitialized submodules as symlinks
On Sat, Oct 8, 2016 at 2:59 AM, David Turner wrote: > > >> -Original Message- >> From: Stefan Beller [mailto:sbel...@google.com] >> Sent: Friday, October 07, 2016 2:56 PM >> To: David Turner >> Cc: git@vger.kernel.org >> Subject: Re: Uninitialized submodules as symlinks >> >> On Fri, Oct 7, 2016 at 11:17 AM, David Turner >> wrote: >> > Presently, uninitialized submodules are materialized in the working tree >> > as empty directories. >> >> Right, there has to be something, to hint at the user that creating a file >> with that path is probably not what they want. >> >> > We would like to consider having them be symlinks. Specifically, we'd >> > like them to be symlinks into a FUSE filesystem which retrieves files on >> > demand. >> > >> > We've actually already got a FUSE filesystem written, but we use a >> > different (semi-manual) means to connect it to the initialized submodules. >> >> So you currently do a >> >> git submodule init >> custom-submodule make-symlink >> >> ? > > We do something like > > For each initialized submodule: symlink it into the right place in .../somedir > For each uninitialized submodule: symlink from the FUSE into the right place > in .../somedir > > So .../somedir has the structure of the git main repo, but is all symlinks -- > some into FUSE, some into the git repo. > > This means that when we initialize (or deinitialize) a submodule, we need to > re-run the linking script. Do .git files work? If .git files point to somewhere in fuse, I guess you still have file retrieval on demand. It depends on what files to retrieve I guess. If you want worktree files, not object database then .git files won't work because worktree remains in the same filesystem as the super repo. -- Duy
Re: Uninitialized submodules as symlinks
On Fri, Oct 14, 2016 at 09:48:16AM -0700, Junio C Hamano wrote: > Kevin Daudt writes: > > > On Thu, Oct 13, 2016 at 06:10:17PM +0200, Heiko Voigt wrote: > >> On Fri, Oct 07, 2016 at 06:17:05PM +, David Turner wrote: > >> > Presently, uninitialized submodules are materialized in the working > >> > tree as empty directories. We would like to consider having them be > >> > symlinks. Specifically, we'd like them to be symlinks into a FUSE > >> > filesystem which retrieves files on demand. > >> > >> How about portability? This feature would only work on Unix like > >> operating systems. You have to be careful to not break Windows since > >> they do not have symlinks. > > > > NTFS does have symlinks, but you need admin right to create them though > > (unless you change the policy). > > That sounds like saying "It has, but it practically is not usable by > Git as a mechanism to achieve this goal" to me. Yes and that is why Git for Windows does not use them and I simplified to: "Windows does not have symlinks". For a normal user there is no such thing as symlinks on Windows, unfortunately. Cheers Heiko
Re: Uninitialized submodules as symlinks
Kevin Daudt writes: > On Thu, Oct 13, 2016 at 06:10:17PM +0200, Heiko Voigt wrote: >> On Fri, Oct 07, 2016 at 06:17:05PM +, David Turner wrote: >> > Presently, uninitialized submodules are materialized in the working >> > tree as empty directories. We would like to consider having them be >> > symlinks. Specifically, we'd like them to be symlinks into a FUSE >> > filesystem which retrieves files on demand. >> >> How about portability? This feature would only work on Unix like >> operating systems. You have to be careful to not break Windows since >> they do not have symlinks. > > NTFS does have symlinks, but you need admin right to create them though > (unless you change the policy). That sounds like saying "It has, but it practically is not usable by Git as a mechanism to achieve this goal" to me.
Re: Uninitialized submodules as symlinks
On Thu, Oct 13, 2016 at 06:10:17PM +0200, Heiko Voigt wrote: > On Fri, Oct 07, 2016 at 06:17:05PM +, David Turner wrote: > > Presently, uninitialized submodules are materialized in the working > > tree as empty directories. We would like to consider having them be > > symlinks. Specifically, we'd like them to be symlinks into a FUSE > > filesystem which retrieves files on demand. > > How about portability? This feature would only work on Unix like > operating systems. You have to be careful to not break Windows since > they do not have symlinks. > NTFS does have symlinks, but you need admin right to create them though (unless you change the policy).
Re: Uninitialized submodules as symlinks
On Fri, Oct 07, 2016 at 06:17:05PM +, David Turner wrote: > Presently, uninitialized submodules are materialized in the working > tree as empty directories. We would like to consider having them be > symlinks. Specifically, we'd like them to be symlinks into a FUSE > filesystem which retrieves files on demand. How about portability? This feature would only work on Unix like operating systems. You have to be careful to not break Windows since they do not have symlinks. Cheers Heiko
RE: Uninitialized submodules as symlinks
> -Original Message- > From: Stefan Beller [mailto:sbel...@google.com] > Sent: Friday, October 07, 2016 2:56 PM > To: David Turner > Cc: git@vger.kernel.org > Subject: Re: Uninitialized submodules as symlinks > > On Fri, Oct 7, 2016 at 11:17 AM, David Turner > wrote: > > Presently, uninitialized submodules are materialized in the working tree > > as empty directories. > > Right, there has to be something, to hint at the user that creating a file > with that path is probably not what they want. > > > We would like to consider having them be symlinks. Specifically, we'd > > like them to be symlinks into a FUSE filesystem which retrieves files on > > demand. > > > > We've actually already got a FUSE filesystem written, but we use a > > different (semi-manual) means to connect it to the initialized submodules. > > So you currently do a > > git submodule init > custom-submodule make-symlink > > ? We do something like For each initialized submodule: symlink it into the right place in .../somedir For each uninitialized submodule: symlink from the FUSE into the right place in .../somedir So .../somedir has the structure of the git main repo, but is all symlinks -- some into FUSE, some into the git repo. This means that when we initialize (or deinitialize) a submodule, we need to re-run the linking script. > > We hope to release this FUSE filesystem as free software at some point > > soon, but we do not yet have a fixed schedule for doing so. Having to run > > a command to create the symlink-based "union" filesystem is not optimal > > (since we have to re-run it every time we initialize or deinitialize a > > submodule). > > > > But if the uninitialized submodules could be symlinks into the FUSE > > filesystem, we wouldn't have this problem. This solution isn't > > necessarily FUSE-specific -- perhaps someone would want copies of the same > > submodule in multiple repos, and would want to save disk space by having > > all copies point to the same place. So the symlinks would be configured > > by a per-submodule config variable. > > I'd imagine that you want both a per-submodule config variable as well as > a global variable that is a default for all submodules? > > git config submodule.trySymlinkDefault /mounted/fuse/ > # any (new) submodule tries to be linked to /mounted/fuse/ > git config submodule..symlinked ~/my/private/symlinked > # The submodule goes into another path. > > As you propose the FUSE filesystem fetches files on demand, you probably > want to disable things that scan the whole submodule, e.g. look at > submodule..ignore to suppress status looking at all files. I would actually expect that git would detect that the symlink is unmodified from the configured symlink and automatically decide not to look there. > When looking through the options, you could add the value "symlink" to > submodule..update, which then respects the > submodule.trySymlinkDefault if present, such that > > git clone --recurse-submodules ... > > works and sets up the FUSE thing correctly. > > How does the FUSE system handle different versions, i.e. > `git submodule update` to checkout another version of the submodule? > (btw, I plan on working on integrating submodules to "git checkout", so > "submodule update" would not need to be run there, but we'd hook it into > checkout instead) The fuse has a (virtual) directory for each SHA of the main repo, with each submodule mapped to the then-current version of the submodule's code. Actually, it's a bit more complicated because the uninitialized modules point to already-built binaries -- that is, the symlink is to something like $fuse/$SHA/built/$submodule. If you check out a new version of the main module, in our current setup, you need to again update all of the submodule symlinks (as described above). Under my proposal, I guess this would still need to happen. A post-checkout hook could handle it either way. Despite this flaw, switching a submodule between an initialized and deinitialized state would still be more seamless with the symlinks. > > Naturally, this would require some changes to code that examines the > working tree -- git status, git diff, etc. They would have to report > "unchanged" for submodules which were still symlinks to the configured > location. I have not yet looked at the implementation details beyond > this. > > > > Does this idea make any sense? If I were to implement it (probably in a > few months, but no official timeline yet), would patches be considered? > > I am happy to review patches. Thanks.
Re: Uninitialized submodules as symlinks
On Fri, Oct 7, 2016 at 11:17 AM, David Turner wrote: > Presently, uninitialized submodules are materialized in the working tree as > empty directories. Right, there has to be something, to hint at the user that creating a file with that path is probably not what they want. > We would like to consider having them be symlinks. Specifically, we'd like > them to be symlinks into a FUSE filesystem which retrieves files on demand. > > We've actually already got a FUSE filesystem written, but we use a different > (semi-manual) means to connect it to the initialized submodules. So you currently do a git submodule init custom-submodule make-symlink ? > We hope to release this FUSE filesystem as free software at some point soon, > but we do not yet have a fixed schedule for doing so. Having to run a > command to create the symlink-based "union" filesystem is not optimal (since > we have to re-run it every time we initialize or deinitialize a submodule). > > But if the uninitialized submodules could be symlinks into the FUSE > filesystem, we wouldn't have this problem. This solution isn't necessarily > FUSE-specific -- perhaps someone would want copies of the same submodule in > multiple repos, and would want to save disk space by having all copies point > to the same place. So the symlinks would be configured by a per-submodule > config variable. I'd imagine that you want both a per-submodule config variable as well as a global variable that is a default for all submodules? git config submodule.trySymlinkDefault /mounted/fuse/ # any (new) submodule tries to be linked to /mounted/fuse/ git config submodule..symlinked ~/my/private/symlinked # The submodule goes into another path. As you propose the FUSE filesystem fetches files on demand, you probably want to disable things that scan the whole submodule, e.g. look at submodule..ignore to suppress status looking at all files. When looking through the options, you could add the value "symlink" to submodule..update, which then respects the submodule.trySymlinkDefault if present, such that git clone --recurse-submodules ... works and sets up the FUSE thing correctly. How does the FUSE system handle different versions, i.e. `git submodule update` to checkout another version of the submodule? (btw, I plan on working on integrating submodules to "git checkout", so "submodule update" would not need to be run there, but we'd hook it into checkout instead) > > Naturally, this would require some changes to code that examines the working > tree -- git status, git diff, etc. They would have to report "unchanged" for > submodules which were still symlinks to the configured location. I have not > yet looked at the implementation details beyond this. > > Does this idea make any sense? If I were to implement it (probably in a few > months, but no official timeline yet), would patches be considered? I am happy to review patches.