Re: symlinked directories in refs are now unreachable
Thanks for committing my one-character patch. In the commit message you said > Come to think of > it, maybe we should disallow symlink inside .git/refs hierarchy; > we update the files there by creat/rename pair, so having > symlinks would not work anyway when you do anything that would > update them.] I agree, linking at the file level makes no sense --- create/rename pairs will clobber symlinked files (as they would for hardlinked files). If you accept their use at all, symlinked directories are the only way to go. The alternatives to symlinked directories are: (1) adding a command line option to fsck that supplies a path to an external refs directory (1a) adding a subtool to create a commandline list of sha from a supplied path to an external refs directory (2) adding an environment variable to do the same (3) adding a .git configuration file which contains a path --- this is just a userspace symlink (4) create a monster (refcounting objects?) (true cross-references?) (conservative garbage collectors that scan your entire hard disk for potential references) (I believe the same arguments hold for the pulling code as for fsck) Case (1) is easy to implement. (I whipped up a working patch yesterday) The hardest part is thinking up a good name for the command line argument. Case (1a) is too ugly. Both of these cases place a considerable burden on the user, and require some Porcelain work. Case (2) makes sense but is too intrusive for me to contemplate. I defer to a core developer. Case (3) offers no advantages over a symlinked directory under refs. Case (4) is probably patented by Microsoft. I like symlinks for their simplicity, and that they work now. Otherwise I am a really only submitting a feature request. Matt - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: symlinked directories in refs are now unreachable
Matt Draisey <[EMAIL PROTECTED]> writes: > Having thus been forced to read the mailing list, I see a slight problem > in .git/objects/info/alternates mechanism. Using the original > ALTERNATE_DB_ENVIRONMENT variable you assert to the git programmes that > you know all the repositories to search for objects. In > the .git/objects/info/alternates mechanism you implicitly defer to other > repositories, which might also implicitly defer to yet another > repository. To ensure an object is truly available you need to compute > a transitive closure on all .git/objects/info/alternates --- you can't > really rely on .git/objects/info/alternates being transitively closed > already. No, "git clone -l -s" not copying the objects/info/alternates of the repository being cloned was simply a bug; by doing so the transitive closure can be set up "initially". Both the environment variable and objects/info/alternates share the same problem if the cloned/borrowed from repository suddenly starts to borrow from another repository, losing objects it used to have from itself. You just shouldn't do it. With objects/info/alternates, you _could_ do the transitive closure at runtime and do not have to worry about this issue (but you now need to worry about cycles), which you cannot do with the environment variable approach. - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: symlinked directories in refs are now unreachable
On Sun, 2005-08-14 at 22:12 -0700, Junio C Hamano wrote: > I would like to know > a use case or two to illustrate why there are symlinks pointing > at real files outside .git/refs/ hierarchy, and how that > arrangement is useful. I've clearly laid out my case very badly. Here is the patch via sed $ sed -i -e '49s/lstat/stat/' refs.c It changes the do_for_each_ref function to follow symlinks blindly in both the pulling and fscking code. The usage I have is simple. I want to use git to give me a personal versioned filesystem of all my current working data. My problems are twofold. First, it is not obvious where the divisions between projects should be --- for the most part, this is not distributable software, but a haphazard collection of one-time code, and various other material that still benefits from version control. An all inclusive, time-based commit of the entire directory structure turns out to be quite useful. Obviously this use will never be anything but a personal store of data. Secondly, I have ambitions for some of the software I put together. These invariably are small projects that may or may not ever become distributable. Clearly they require their own commit history, yet given their immature state it seems hard to justify their own objects directory with its 256 directories for a few kilobytes of code. Given the existence of the GIT_OBJECT_DIRECTORY environment variable, it is a natural step to sharing a single object store amongst the small tightly-focused projects and the all-encompassing but unmanaged outermost directory which already is tracking the contents of the contained projects but knows nothing of their commit history. Implementing a commit tool to do this is actually very easy. All you need do is walk up the chain of parent directories from your working directory, noting the .git directories, until you reach one that has an .git/objects directory then set up the environment appropriately. The only catch is the necessity of maintaining the common object directory. The outermost .git directory needs refs to the contained subprojects commit histories that it honours in fsck-cache if it is not to silently delete them the next time you prune the repository. Symlinking in the refs/heads directories of all subprojects is the most straightforward way to achieve this as it automatically does the right thing with very little maintainance. It also keeps the real refs/heads of the subprojects' commit histories properly localized in the subprojects where they belong. It seems to me that having created the GIT_OBJECT_DIRECTORY hook, it only makes sense to follow symlinks in the refs directories. Matt --- http://free.draisey.ca P.S. This email is a bit long-winded so I didn't CC it to the mailing list. - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: symlinked directories in refs are now unreachable
Matt Draisey <[EMAIL PROTECTED]> writes: > On Sun, 2005-08-14 at 22:12 -0700, Junio C Hamano wrote: >> I would like to know >> a use case or two to illustrate why there are symlinks pointing >> at real files outside .git/refs/ hierarchy, and how that >> arrangement is useful. >... > This email is a bit long-winded so I didn't CC it to the mailing list. Thanks for a clear explanation. Your arrangement indeed is an intriguing one, in that there are very similar issues in the fsck/prune area even with arrangements quite different from yours. I personally think your reasoning about this issue deserves to be shared with the list. I'll CC _this_ message to the list and leave it up to you to forward your words there as well. People are known to do something similar to what you are doing without having any special commit tool. They just do this: $ mkdir A B $ cd A && git init-db $ cd ../B && git init-db $ rm -fr .git/objects && ln -s ../../A/.git/objects .git/objects The repositories A and B share the same object database, and they have independent sets of refs. For the exact same reason as your arrangement, you cannot "git prune" in either repository, because they do not know about objects reachable only from the other side. Further, one repository can borrow objects from another repository via the .git/objects/info/alternates mechanism. This is useful when a repository is a local clone of another. You would do this: $ git clone -l -s linux-2.6/.git/ my-linux $ cd my-linux && cat .git/objects/info/alternates /path/to/linux-2.6/.git/ The new repository my-linux has the .git/objects with 256 fan-out subdirectories, but starts out without any object files in it. It literally borrows the existing objects from the neighbouring repository, and its own .git/objects hierarchy is only used to hold newly created objects in it. For the same reason as your arrangement, you should not "git prune" the linux-2.6 repository, either. However, my-linux repository can be pruned as long as somebody else does not "borrow" from it. So while I find your "do follow symlink" patch an improvement in that it makes things a little bit safer, I think there should be a more generalized way to say "this object database holds things that are refered by these refs/ directories outside. fsck/prune had better hold onto objects referenced by them, not just by the refs directory that happens to be next to th objects directory". That would be the inverse of .git/objects/info/alternates. -jc - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: symlinked directories in refs are now unreachable
On Mon, 2005-08-15 at 00:41 -0700, Junio C Hamano wrote: > Matt Draisey <[EMAIL PROTECTED]> writes: > > > ... My own programming efforts rarely exceed two or three files > > per project, and don't justify there own .git/objects repository. > > Still, a few projects do benefit from having their own commit history, > > I am afraid I am not quite getting it. > > You are interested in many projects that have outside upstream, > and you typically modify only small portion of each of them, > which is quite a typical behaviour for individual developers. > For some reason you want to keep those repository "clean" > without your own commit objects or changed objects only > reachable from your commits. Is it what is happening here? No, all the projects are my own. I am not a developer at all, merely a hobbyist. Upstream projects don't fit into this scheme. > > I've only written a commit tool. All the other git and cogito tools I > > invoke from the outermost directory like so > > > > $git-cat-file commit per/Minesweeper/master > > > > Symlinking still works here as expected. The per directory is just > > there so I don't stomp on the outermost namespace, the Minesweeper is a > > symlink to the nested project's refs directory. > > Hmm. So you have two GIT managed trees, $D/matt and $D/Minesweeper, > and a symlink between them like this. Is that what is happening here? > > $D/matt/.git/refs/heads/per/Minesweeper -> $D/Minesweeper/.git/refs/heads > No, they are nested $D/.git/refs/heads/per/Minesweeper -> $D/Minesweeper/.git/refs/heads The outermost repository merely aggregates a bunch of small unrelated projects that are not yet ready for an independent existence. The idea is to put everything under revision control in the hope that eventually something useful falls out. My commit tool walks up the chain towards root until it finds the objects directory and does the appropriate thing. > Of course 'git-cat-file commit per/Minesweeper/master' would > work in "$D/matt" directory. How do the set of paths recorded > in the index file used in these repositories relate to each > other? Is $D/matt/ tracking the same set of files as the other > repository tracks? Is it meant to be a superset? Subset? More > or less independent "private additions"? > > There must be some advantage to this arrangement than the more > typical arrangement I've seen people do, which is to have two > branches in Minesweeper (that is the upstream, right?) > repository, one "origin" and the other "master". Upstream > changes you fetch and pull into "origin" branch while you commit > your changes to "master" branch. I just do not yet see what > that advantage is, and I strongly suspect because I misread your > description and misunderstood the two repository arrangement you > have and how they are used. > > By the way, did you want to take this discussion private or was > it by accident you did not CC: the list? > No, I didn't want to take it private. I just don't know how my email programme works. I also just discovered that Evolution's Forward As > Redirect is really a bounce and not a forward at all (it doesn't change the to: address) - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: symlinked directories in refs are now unreachable
On Sun, 2005-08-14 at 22:12 -0700, Junio C Hamano wrote: > Matt Draisey <[EMAIL PROTECTED]> writes: > > > The behaviour of the symlinked in ref directories has changed from > > earlier versions of git. They used to be taken into account in > > git-fsck-cache --unreachable. > > > > Can the previous behaviour be reinstated? > > I would not have much problem accepting a patch for that; it > would make things safer when a symlink points to a real file > that is outside .git/refs/ that holds a pointer to a valid > object. > > Having said that, I would first like to know why you have a > symlink there, and the real file pointed by it outside .git/refs > hierarchy. The core GIT tools do not create such symlinks, so > either you are creating one by hand, or your Porcelain is > creating one for you for whatever reason. It is my own home-grown Porcelain that creates the symlinks. I've thrown together a python programme to track a nested collection of projects. My own programming efforts rarely exceed two or three files per project, and don't justify there own .git/objects repository. Still, a few projects do benefit from having their own commit history, while the rest are tracked as one big outermost superproject of unrelated stuff. > I would like to know > a use case or two to illustrate why there are symlinks pointing > at real files outside .git/refs/ hierarchy, and how that > arrangement is useful. Whether or not its useful?? Hmmm. Debatable. I've only written a commit tool. All the other git and cogito tools I invoke from the outermost directory like so $git-cat-file commit per/Minesweeper/master Symlinking still works here as expected. The per directory is just there so I don't stomp on the outermost namespace, the Minesweeper is a symlink to the nested project's refs directory. Symlinking seems the natural way to do this as they only need updating when I move subdirectories around. P.S. $echo new-id > .git/per/Minesweeper/master is safe here --- this is the actual behaviour I want. - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: symlinked directories in refs are now unreachable
On Sun, 2005-08-14 at 22:12 -0700, Junio C Hamano wrote: > Matt Draisey <[EMAIL PROTECTED]> writes: > > > The behaviour of the symlinked in ref directories has changed from > > earlier versions of git. They used to be taken into account in > > git-fsck-cache --unreachable. > > > > Can the previous behaviour be reinstated? > > I would not have much problem accepting a patch for that; it > would make things safer when a symlink points to a real file > that is outside .git/refs/ that holds a pointer to a valid > object. > > Having said that, I would first like to know why you have a > symlink there, and the real file pointed by it outside .git/refs > hierarchy. The core GIT tools do not create such symlinks, so > either you are creating one by hand, or your Porcelain is > creating one for you for whatever reason. It is my own home-grown Porcelain that creates the symlinks. I've thrown together a python programme to track a nested collection of projects. My own programming efforts rarely exceed two or three files per project, and don't justify there own .git/objects repository. Still, a few projects do benefit from having their own commit history, while the rest are tracked as one big outermost superproject of unrelated stuff. > I would like to know > a use case or two to illustrate why there are symlinks pointing > at real files outside .git/refs/ hierarchy, and how that > arrangement is useful. Whether or not its useful?? Hmmm. Debatable. I've only written a commit tool. All the other git and cogito tools I invoke from the outermost directory like so $git-cat-file commit per/Minesweeper/master Symlinking still works here as expected. The per directory is just there so I don't stomp on the outermost namespace, the Minesweeper is a symlink to the nested project's refs directory. Symlinking seems the natural way to do this as they only need updating when I move subdirectories around. P.S. $echo new-id > .git/per/Minesweeper/master is safe here --- this is the actual behaviour I want. - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: symlinked directories in refs are now unreachable
Matt Draisey <[EMAIL PROTECTED]> writes: > The behaviour of the symlinked in ref directories has changed from > earlier versions of git. They used to be taken into account in > git-fsck-cache --unreachable. > > Can the previous behaviour be reinstated? I would not have much problem accepting a patch for that; it would make things safer when a symlink points to a real file that is outside .git/refs/ that holds a pointer to a valid object. Having said that, I would first like to know why you have a symlink there, and the real file pointed by it outside .git/refs hierarchy. The core GIT tools do not create such symlinks, so either you are creating one by hand, or your Porcelain is creating one for you for whatever reason. I would like to know a use case or two to illustrate why there are symlinks pointing at real files outside .git/refs/ hierarchy, and how that arrangement is useful. - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: symlinked directories in refs are now unreachable
On Mon, 15 Aug 2005, Matt Draisey wrote: > > The behaviour of the symlinked in ref directories has changed from > earlier versions of git. Hmm.. There used to be a mix of lstat() (in receive-pack) and stat() (in fsck-cache.c, and it got standardized in one function which used lstat. The reason for the lstat is really to try to avoid having especially the remote protocols follow symlinks, but I guess it's not a very good reason, so I don't think it would be wrong to just standardize refs.c to use "stat()" instead. You might sent a patch to Junio.. HOWEVER: symlinks for references really are pretty dangerous. We do things like "echo new-id > .git/HEAD" and links (symlinks _or_ hardlinks) thus really aren't safe. You're much better off copying those small files. Linus - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html