Re: RFC: git cat-file --follow-symlinks?

2015-04-30 Thread Jeff King
On Thu, Apr 30, 2015 at 08:29:14PM -0700, David Turner wrote:

> >   4. Return the last object we could resolve, as I described. So
> [...]
> 
> Actually, I think 4 has an insurmountable problem.  Here's the case I'm
> thinking of:
> 
> ln -s ..  morx
> 
> Imagine that we go to look up 'morx/fleem'.  Now morx is the "last
> object we could resolve", but we don't know how much of our input has
> been consumed at this point.  So consumers don't know that after they
> exit the repo, they still need to find fleem next to it.

Yes, agreed (my list was written before Andreas brought up the idea of
symlinks in the intermediate paths). I think to let the caller pick up
where you left off, you would have to create a new string that has the
"remainder" concatenated to it.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC: git cat-file --follow-symlinks?

2015-04-30 Thread Jeff King
On Thu, Apr 30, 2015 at 12:00:22PM -0700, David Turner wrote:

> > > Also, BUILD files are scattered throughout the tree, so the entire tree
> > > would still need to be traversed.  At present, our monorepo is not quite
> > > large enough for this to matter (a full ls-tree only takes me 0.6s), but
> > > it is growing.
> > 
> > But aren't you asking git to do that internally? I.e., it can limit the
> > traversal for a prefix-match, but it cannot do so for an arbitrary
> > filename. It has to open every tree. So the extra expense is really just
> > the I/O over the pipe. That's not optimal, but it is a constant factor
> > slowdown from what git would do internally.
> 
> No, I'm not trying to find all BUILD files -- only ones that are in the
> transitive dependency tree of the target I'm trying to sparsely check
> out. So if the target foo/bar/baz depends on morx/fleem, and morx/fleem
> depends on plugh/xyzzy, then I have to examine those three places only.
> I don't have to examine anything in the gibbberish/ subtree, for
> instance.  

OK, let me see if I understand your use case by parroting it back.

You _don't_ want to feed git a "find all BUILD" pattern, which is good
(because it doesn't work ;) ). You do want to feed it a set of raw
paths to find, because you're going to discover the paths yourself at
each step as you recurse through the dependency-chain of build files. 
You don't actually care about feeding those paths to "ls-tree" at all.
You care only about the _content_ at each path (and will parse that
content to see if you need to take a further recursive step).

So I think git out-of-the-box supports that pretty well (via cat-file).
And your sticking point is that some of the paths may involve symlinks
in the tree, so you want cat-file to answer "if I had checked this out
and typed cat /some/path/to/BUILD, what content would I get". Which
brings us back to the original symlink question.

Is that all accurate?

I'm not sure that helps with the "how to handle symlinks" discussion,
but at least your goals make sense to me at this point.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC: git cat-file --follow-symlinks?

2015-04-30 Thread Jeff King
On Thu, Apr 30, 2015 at 11:44:50AM -0700, David Turner wrote:

> > git ls-tree HEAD -- BUILD ?
> 
> This does not actually seem to work (even with -r); it only recurses
> into directories that are named BUILD, rather than being equivalent to
> git ls-tree -r HEAD |grep /BUILD$.

Ah, I thought that was what you wanted (to find specific files, not a
pattern). I think `ls-tree` doesn't understand our normal pathspecs, for
historical reasons.

> Also, BUILD files are scattered throughout the tree, so the entire tree
> would still need to be traversed.  At present, our monorepo is not quite
> large enough for this to matter (a full ls-tree only takes me 0.6s), but
> it is growing.

But aren't you asking git to do that internally? I.e., it can limit the
traversal for a prefix-match, but it cannot do so for an arbitrary
filename. It has to open every tree. So the extra expense is really just
the I/O over the pipe. That's not optimal, but it is a constant factor
slowdown from what git would do internally.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC: git cat-file --follow-symlinks?

2015-04-29 Thread David Turner
On Wed, 2015-04-29 at 21:16 -0400, Jeff King wrote:
> On Wed, Apr 29, 2015 at 06:06:23PM -0700, David Turner wrote:
>   3. Ditto for out-of-tree. Note that this would be the _raw_ symlink
>  contents, not any kind of simplification (so if you asked for
>  "foo/bar/baz" and it was "../../../../out", you would the full path
>  with all those dots, not a simplified "../out", which I think is
>  what you were trying to show in earlier examples).

Unfortunately, we need the simplified version, because we otherwise
don't know what the ..s are relative to in the case of a link to a link:

  echo content >dest ;# actual blob
  mkdir -p foo/bar
  ln -s foo/bar/baz fleem # in-tree link-to-link 
  ln -s ../../../external foo/bar/baz # out-of-tree link

If echo HEAD^{resolve}:fleem were to return ../../../external (after
following the first symlink to the second), we would have lost
information.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC: git cat-file --follow-symlinks?

2015-04-29 Thread Junio C Hamano
Jeff King  writes:

> I had imagined we would stop resolution and you would just get the last
> object peeled object. Combined with teaching cat-file to show more
> object context, doing:
>
>   echo content >dest ;# actual blob
>   ln -s dest link;# link to blob
>   ln -s broken foo   ;# broken link
>   ln -s out ../foo   ;# out-of-tree link
>   git add . && git commit -m foo
>   for i in link broken out; do
>   echo HEAD^{resolve}:$i
>   done |
>   git cat-file --batch="%(intreemode) %(size)"
>
> would yield:
>
>  (1)   100644 8
>content
>  (2)   04 3
>foo
>  (3)   04 6
>../foo
>
> where the left-margin numbers are for reference:
>
>   1. We dereference a real symlink, and pretend like we actually asked
>  for its referent.
>
>   2. For a broken link, we can't dereference, so we return the link
>  itself. You can tell by the mode, and the content tells you what
>  would have been dereferenced.
>
>   3. Ditto for out-of-tree. Note that this would be the _raw_ symlink
>  contents, not any kind of simplification (so if you asked for
>  "foo/bar/baz" and it was "../../../../out", you would the full path
>  with all those dots, not a simplified "../out", which I think is
>  what you were trying to show in earlier examples).

s/04/16/ I would think (if you really meant to expose a
tree, write it as 4 instead, so that people will not get a wrong
impression and reimplement a broken tree object encoding some popular
Git hosting site broke their customer projects with ;-).

I am not sure $treeish^{resolve} is a great syntax, but I like the
concept and agree that it is a lot more sensible to handle this at
the level of sha1_name.c layer than an ad-hoc solution in the
cat-file layer.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC: git cat-file --follow-symlinks?

2015-04-29 Thread David Turner
On Wed, 2015-04-29 at 20:37 -0400, Jeff King wrote:
> On Wed, Apr 29, 2015 at 07:11:50PM -0400, Jeff King wrote:
> 
> > Yeah, I agree if you let git punt on leaving the filesystem, most of the
> > complicated problems go away. It still feels a bit more magical than I
> > expect out of cat-file, and there are still corner cases (e.g., do we do
> > cycle detection? Or just have a limit to the recursion depth?)
> 
> I was pondering the "magical" above. I think what bugs me is that it
> seems like a feature that is implemented as part of one random bit of
> plumbing, but not available elsewhere.
> 
> Conceptually, this is like peeling object names. You may give a tag
> name, but if you ask for a tree commit we will peel the tag to a commit,
> and the commit to a tree. This is sort of the same thing; you give a
> path within a tree, and we will peel until we hit a "real" non-symlink
> object.
> 
> I don't know what the syntax would look like. To match "foo^{tree}" it
> would be something like:
> 
>   HEAD:foo/bar^{resolve}
> 
> or something like that. Except that it is a bad idea to allow "^{}"
> syntax on the right-hand side of a colon, as it is ambiguous with
> filenames that contain "^{resolve}". So it would have to look something
> like:
> 
>   HEAD^{resolve}:foo/bar
> 
> which is a _little_ weird, but actually kind of makes sense. The
> "resolve" operation inherently is not just about the filename, but about
> uses HEAD^{tree} as the root context.
> 
> So I dunno. This pushes the resolving logic even _lower_ in the stack
> than it would be in cat-file. So why do I like it more? Cognitive
> dissonance? I guess I the appeal to me is that it:
> 
>   1. Makes the concept available more generally (you can "rev-parse" it,
>  you can "git show" it, etc). It also lets you _name_ the object in
>  question, so you can ask for other things besides it contents (like
>  its name, its type, etc).
> 
>   2. Positions it alongside other peeling name-resolution functions.

Just to clarify: if you do git rev-parse, and the result is an
out-of-tree symlink, you see /foo or ../foo instead of a sha?  And if
you "git show" it it says "symlink HEAD:../foo"?

This seems totally reasonable to me, and solves my problem.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC: git cat-file --follow-symlinks?

2015-04-29 Thread Jonathan Nieder
Jeff King wrote:

>   1. Git has to make a decision about what to do in corner cases. What
>  is our cwd for relative links? The project root?

I don't follow.  Isn't symlink resolution always relative to the
symlink, regardless of cwd?

Thanks,
Jonathan
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC: git cat-file --follow-symlinks?

2015-04-29 Thread David Turner
On Wed, 2015-04-29 at 14:16 -0700, Jonathan Nieder wrote:
> Hi,
> 
> David Turner wrote:
> 
> > Instead, it would be cool if cat-file had a mode in which it would
> > follow symlinks.
> 
> Makes sense.
> 
> > The major wrinkle is that symlinks can point outside the repository --
> > either because they are absolute paths, or because they are relative
> > paths with enough ../ in them.  For this case, I propose that
> > --follow-symlinks should output [sha] "symlink" [target] instead of the
> > usual [sha] "blob" [bytes].
> 
> What happens when the symlink payload contains a newline?

Oh, right.
So, how about [sha] "symlink" [bytes] "\n" [target] instead?


--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html