Re: symlinked directories in refs are now unreachable

2005-08-18 Thread Matt Draisey
Thanks for committing my one-character patch.  In the commit message you
said 

> Come to think of
> it, maybe we should disallow symlink inside .git/refs hierarchy;
> we update the files there by creat/rename pair, so having
> symlinks would not work anyway when you do anything that would
> update them.]

I agree, linking at the file level makes no sense ---  create/rename
pairs will clobber symlinked files (as they would for hardlinked files).
If you accept their use at all, symlinked directories are the only way
to go.

The alternatives to symlinked directories are:

(1) adding a command line option to fsck that supplies a path to an
external refs directory

(1a) adding a subtool to create a commandline list of sha from a
supplied path to an external refs directory

(2) adding an environment variable to do the same

(3) adding a .git configuration file which contains a path --- this is
just a userspace symlink

(4) create a monster (refcounting objects?) (true cross-references?)
(conservative garbage collectors that scan your entire hard disk for
potential references)

(I believe the same arguments hold for the pulling code as for fsck)

Case (1) is easy to implement. (I whipped up a working patch yesterday)
The hardest part is thinking up a good name for the command line
argument.  Case (1a) is too ugly.  Both of these cases place a
considerable burden on the user, and require some Porcelain work.  Case
(2) makes sense but is too intrusive for me to contemplate.  I defer to
a core developer.  Case (3) offers no advantages over a symlinked
directory under refs.  Case (4) is probably patented by Microsoft.

I like symlinks for their simplicity, and that they work now.  Otherwise
I am a really only submitting a feature request.

Matt


-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: symlinked directories in refs are now unreachable

2005-08-17 Thread Junio C Hamano
Matt Draisey <[EMAIL PROTECTED]> writes:

> Having thus been forced to read the mailing list, I see a slight problem
> in .git/objects/info/alternates mechanism.  Using the original
> ALTERNATE_DB_ENVIRONMENT variable you assert to the git programmes that
> you know all the repositories to search for objects.  In
> the .git/objects/info/alternates mechanism you implicitly defer to other
> repositories, which might also implicitly defer to yet another
> repository.  To ensure an object is truly available you need to compute
> a transitive closure on all .git/objects/info/alternates --- you can't
> really rely on .git/objects/info/alternates being transitively closed
> already.

No, "git clone -l -s" not copying the objects/info/alternates of
the repository being cloned was simply a bug; by doing so the
transitive closure can be set up "initially".

Both the environment variable and objects/info/alternates share
the same problem if the cloned/borrowed from repository suddenly
starts to borrow from another repository, losing objects it used
to have from itself.  You just shouldn't do it.

With objects/info/alternates, you _could_ do the transitive
closure at runtime and do not have to worry about this issue
(but you now need to worry about cycles), which you cannot do
with the environment variable approach.

-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: symlinked directories in refs are now unreachable

2005-08-16 Thread Matt Draisey
On Sun, 2005-08-14 at 22:12 -0700, Junio C Hamano wrote:
> I would like to know
> a use case or two to illustrate why there are symlinks pointing
> at real files outside .git/refs/ hierarchy, and how that
> arrangement is useful.

I've clearly laid out my case very badly.

Here is the patch via sed

$ sed -i -e '49s/lstat/stat/' refs.c

It changes the do_for_each_ref function to follow symlinks blindly in
both the pulling and fscking code.

The usage I have is simple.  I want to use git to give me a personal
versioned filesystem of all my current working data.  My problems are
twofold.  First, it is not obvious where the divisions between projects
should be --- for the most part, this is not distributable software, but
a haphazard collection of one-time code, and various other material that
still benefits from version control.  An all inclusive, time-based
commit of the entire directory structure turns out to be quite useful.
Obviously this use will never be anything but a personal store of data.

Secondly, I have ambitions for some of the software I put together.
These invariably are small projects that may or may not ever become
distributable.  Clearly they require their own commit history, yet given
their immature state it seems hard to justify their own objects
directory with its 256 directories for a few kilobytes of code.  Given
the existence of the GIT_OBJECT_DIRECTORY environment variable, it is a
natural step to sharing a single object store amongst the small
tightly-focused projects and the all-encompassing but unmanaged
outermost directory which already is tracking the contents of the
contained projects but knows nothing of their commit history.

Implementing a commit tool to do this is actually very easy.  All you
need do is walk up the chain of parent directories from your working
directory, noting the .git directories, until you reach one that has
an .git/objects directory then set up the environment appropriately.

The only catch is the necessity of maintaining the common object
directory.  The outermost .git directory needs refs to the contained
subprojects commit histories that it honours in fsck-cache if it is not
to silently delete them the next time you prune the repository.
Symlinking in the refs/heads directories of all subprojects is the most
straightforward way to achieve this as it automatically does the right
thing with very little maintainance.  It also keeps the real refs/heads
of the subprojects' commit histories properly localized in the
subprojects where they belong.  It seems to me that having created the
GIT_OBJECT_DIRECTORY hook, it only makes sense to follow symlinks in the
refs directories.

Matt --- http://free.draisey.ca

P.S.
This email is a bit long-winded so I didn't CC it to the mailing list.


-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: symlinked directories in refs are now unreachable

2005-08-15 Thread Junio C Hamano
Matt Draisey <[EMAIL PROTECTED]> writes:

> On Sun, 2005-08-14 at 22:12 -0700, Junio C Hamano wrote:
>> I would like to know
>> a use case or two to illustrate why there are symlinks pointing
>> at real files outside .git/refs/ hierarchy, and how that
>> arrangement is useful.
>...
> This email is a bit long-winded so I didn't CC it to the mailing list.

Thanks for a clear explanation.  Your arrangement indeed is an
intriguing one, in that there are very similar issues in the
fsck/prune area even with arrangements quite different from
yours.  I personally think your reasoning about this issue
deserves to be shared with the list.  I'll CC _this_ message to
the list and leave it up to you to forward your words there as
well.

People are known to do something similar to what you are doing
without having any special commit tool.  They just do this:

$ mkdir A B
$ cd A && git init-db
$ cd ../B && git init-db
$ rm -fr .git/objects && ln -s ../../A/.git/objects .git/objects

The repositories A and B share the same object database,
and they have independent sets of refs.  For the exact same
reason as your arrangement, you cannot "git prune" in either
repository, because they do not know about objects reachable
only from the other side.

Further, one repository can borrow objects from another
repository via the .git/objects/info/alternates mechanism.  This
is useful when a repository is a local clone of another.  You
would do this:

$ git clone -l -s linux-2.6/.git/ my-linux
$ cd my-linux && cat .git/objects/info/alternates
/path/to/linux-2.6/.git/

The new repository my-linux has the .git/objects with 256
fan-out subdirectories, but starts out without any object files
in it.  It literally borrows the existing objects from the
neighbouring repository, and its own .git/objects hierarchy is
only used to hold newly created objects in it.  For the same
reason as your arrangement, you should not "git prune" the
linux-2.6 repository, either.  However, my-linux repository can
be pruned as long as somebody else does not "borrow" from it.

So while I find your "do follow symlink" patch an improvement in
that it makes things a little bit safer, I think there should be
a more generalized way to say "this object database holds things
that are refered by these refs/ directories outside.  fsck/prune
had better hold onto objects referenced by them, not just by the
refs directory that happens to be next to th objects directory".

That would be the inverse of .git/objects/info/alternates.

-jc



-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: symlinked directories in refs are now unreachable

2005-08-15 Thread Matt Draisey
On Mon, 2005-08-15 at 00:41 -0700, Junio C Hamano wrote:
> Matt Draisey <[EMAIL PROTECTED]> writes:
> 
> > ...  My own programming efforts rarely exceed two or three files
> > per project, and don't justify there own .git/objects repository.
> > Still, a few projects do benefit from having their own commit history,
> 
> I am afraid I am not quite getting it.
> 
> You are interested in many projects that have outside upstream,
> and you typically modify only small portion of each of them,
> which is quite a typical behaviour for individual developers.
> For some reason you want to keep those repository "clean"
> without your own commit objects or changed objects only
> reachable from your commits.  Is it what is happening here?

No, all the projects are my own.  I am not a developer at all, merely a
hobbyist.  Upstream projects don't fit into this scheme.

> > I've only written a commit tool.  All the other git and cogito tools I
> > invoke from the outermost directory like so 
> >
> > $git-cat-file commit per/Minesweeper/master
> >
> > Symlinking still works here as expected.  The per directory is just
> > there so I don't stomp on the outermost namespace, the Minesweeper is a
> > symlink to the nested project's refs directory.
> 
> Hmm.  So you have two GIT managed trees, $D/matt and $D/Minesweeper,
> and a symlink between them like this.  Is that what is happening here?
> 
>   $D/matt/.git/refs/heads/per/Minesweeper -> $D/Minesweeper/.git/refs/heads
> 

No, they are nested

$D/.git/refs/heads/per/Minesweeper -> $D/Minesweeper/.git/refs/heads

The outermost repository merely aggregates a bunch of small unrelated
projects that are not yet ready for an independent existence.  The idea
is to put everything under revision control in the hope that eventually
something useful falls out.

My commit tool walks up the chain towards root until it finds the
objects directory and does the appropriate thing.

> Of course 'git-cat-file commit per/Minesweeper/master' would
> work in "$D/matt" directory.  How do the set of paths recorded
> in the index file used in these repositories relate to each
> other?  Is $D/matt/ tracking the same set of files as the other
> repository tracks?  Is it meant to be a superset?  Subset?  More
> or less independent "private additions"?
> 
> There must be some advantage to this arrangement than the more
> typical arrangement I've seen people do, which is to have two
> branches in Minesweeper (that is the upstream, right?)
> repository, one "origin" and the other "master".  Upstream
> changes you fetch and pull into "origin" branch while you commit
> your changes to "master" branch.  I just do not yet see what
> that advantage is, and I strongly suspect because I misread your
> description and misunderstood the two repository arrangement you
> have and how they are used.
> 
> By the way, did you want to take this discussion private or was
> it by accident you did not CC: the list?
> 

No, I didn't want to take it private.  I just don't know how my email
programme works.  I also just discovered that Evolution's Forward As >
Redirect is really a bounce and not a forward at all (it doesn't change
the to: address)


-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: symlinked directories in refs are now unreachable

2005-08-15 Thread Matt Draisey
On Sun, 2005-08-14 at 22:12 -0700, Junio C Hamano wrote:
> Matt Draisey <[EMAIL PROTECTED]> writes:
> 
> > The behaviour of the symlinked in ref directories has changed from
> > earlier versions of git.  They used to be taken into account in
> > git-fsck-cache --unreachable.
> >
> > Can the previous behaviour be reinstated?
> 
> I would not have much problem accepting a patch for that; it
> would make things safer when a symlink points to a real file
> that is outside .git/refs/ that holds a pointer to a valid
> object.
> 
> Having said that, I would first like to know why you have a
> symlink there, and the real file pointed by it outside .git/refs
> hierarchy.  The core GIT tools do not create such symlinks, so
> either you are creating one by hand, or your Porcelain is
> creating one for you for whatever reason.

It is my own home-grown Porcelain that creates the symlinks.  I've
thrown together a python programme to track a nested collection of
projects.  My own programming efforts rarely exceed two or three files
per project, and don't justify there own .git/objects repository.
Still, a few projects do benefit from having their own commit history,
while the rest are tracked as one big outermost superproject of
unrelated stuff.

> I would like to know
> a use case or two to illustrate why there are symlinks pointing
> at real files outside .git/refs/ hierarchy, and how that
> arrangement is useful.

Whether or not its useful??  Hmmm.  Debatable.

I've only written a commit tool.  All the other git and cogito tools I
invoke from the outermost directory like so 

$git-cat-file commit per/Minesweeper/master

Symlinking still works here as expected.  The per directory is just
there so I don't stomp on the outermost namespace, the Minesweeper is a
symlink to the nested project's refs directory.  Symlinking seems the
natural way to do this as they only need updating when I move
subdirectories around.

P.S. $echo new-id > .git/per/Minesweeper/master is safe here --- this is
the actual behaviour I want.


-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: symlinked directories in refs are now unreachable

2005-08-15 Thread Matt Draisey
On Sun, 2005-08-14 at 22:12 -0700, Junio C Hamano wrote:
> Matt Draisey <[EMAIL PROTECTED]> writes:
> 
> > The behaviour of the symlinked in ref directories has changed from
> > earlier versions of git.  They used to be taken into account in
> > git-fsck-cache --unreachable.
> >
> > Can the previous behaviour be reinstated?
> 
> I would not have much problem accepting a patch for that; it
> would make things safer when a symlink points to a real file
> that is outside .git/refs/ that holds a pointer to a valid
> object.
> 
> Having said that, I would first like to know why you have a
> symlink there, and the real file pointed by it outside .git/refs
> hierarchy.  The core GIT tools do not create such symlinks, so
> either you are creating one by hand, or your Porcelain is
> creating one for you for whatever reason.

It is my own home-grown Porcelain that creates the symlinks.  I've
thrown together a python programme to track a nested collection of
projects.  My own programming efforts rarely exceed two or three files
per project, and don't justify there own .git/objects repository.
Still, a few projects do benefit from having their own commit history,
while the rest are tracked as one big outermost superproject of
unrelated stuff.

> I would like to know
> a use case or two to illustrate why there are symlinks pointing
> at real files outside .git/refs/ hierarchy, and how that
> arrangement is useful.

Whether or not its useful??  Hmmm.  Debatable.

I've only written a commit tool.  All the other git and cogito tools I
invoke from the outermost directory like so 

$git-cat-file commit per/Minesweeper/master

Symlinking still works here as expected.  The per directory is just
there so I don't stomp on the outermost namespace, the Minesweeper is a
symlink to the nested project's refs directory.  Symlinking seems the
natural way to do this as they only need updating when I move
subdirectories around.

P.S. $echo new-id > .git/per/Minesweeper/master is safe here --- this is
the actual behaviour I want.


-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: symlinked directories in refs are now unreachable

2005-08-14 Thread Junio C Hamano
Matt Draisey <[EMAIL PROTECTED]> writes:

> The behaviour of the symlinked in ref directories has changed from
> earlier versions of git.  They used to be taken into account in
> git-fsck-cache --unreachable.
>
> Can the previous behaviour be reinstated?

I would not have much problem accepting a patch for that; it
would make things safer when a symlink points to a real file
that is outside .git/refs/ that holds a pointer to a valid
object.

Having said that, I would first like to know why you have a
symlink there, and the real file pointed by it outside .git/refs
hierarchy.  The core GIT tools do not create such symlinks, so
either you are creating one by hand, or your Porcelain is
creating one for you for whatever reason.  I would like to know
a use case or two to illustrate why there are symlinks pointing
at real files outside .git/refs/ hierarchy, and how that
arrangement is useful.

-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: symlinked directories in refs are now unreachable

2005-08-14 Thread Linus Torvalds


On Mon, 15 Aug 2005, Matt Draisey wrote:
>
> The behaviour of the symlinked in ref directories has changed from
> earlier versions of git.

Hmm.. There used to be a mix of lstat() (in receive-pack) and stat() (in 
fsck-cache.c, and it got standardized in one function which used lstat.

The reason for the lstat is really to try to avoid having especially the 
remote protocols follow symlinks, but I guess it's not a very good reason, 
so I don't think it would be wrong to just standardize refs.c to use 
"stat()" instead.

You might sent a patch to Junio..

HOWEVER: symlinks for references really are pretty dangerous. We do things 
like "echo new-id > .git/HEAD" and links (symlinks _or_ hardlinks) thus 
really aren't safe. You're much better off copying those small files.

Linus
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html