Re: [PATCH 0/2] Hiding some refs in ls-remote

2013-01-21 Thread Jeff King
On Sat, Jan 19, 2013 at 11:16:00AM -0800, Junio C Hamano wrote:

 For example, if you mirror-clone from either of my repositories from
 GitHub:
 
 git clone --mirror git://github.com/git/git/
 git clone --mirror git://github.com/gitster/git/
 
 you will see stuff that does not belong to the project; objects that
 are only reachable from refs/pull/* are not something I approved to
 be placed in my repository; I as the project owner do not want to
 give these objects to a cloner as if they are part of my project.
 
 The server side can hide refs/pull/ and refs/pull-request-tags/ so
 that clones (and ls-remote) will not see clutter, and nobody gets
 hurt.

I think it is unfortunately not so simple as that. You do not want them
to be part of your clone --mirror, but some people do (because they
will inspect them when evaluating the pull request). And that decision
is, I think, in the eye of the cloner, not the eye of the repo owner. I
think in your case it is a little harder to see, in that you do not care
about the PR mechanism for git.git at all, but let us think for a minute
about a project that actually uses PRs.

For an ordinary developer, they would be content to fetch the branch
tips and tags (and that is what we do by default with git clone). They
do not need to fetch all of refs/pull. If they learn out-of-band that PR
123 exists, they can in theory ask for refs/pull/123/head. That works
even with your proposal, because it is just about dropping the
advertisement, not the availability of refs.

But what about entities which really do want all of refs/pull?  I can
think of two good reasons to want them:

  1. You really do want a mirror, because you are creating a backup or
 alternate distribution channel. IOW, you are using git clone
 --mirror, but it is missing those refs.

  2. You are a developer who is sometimes disconnected. You want to
 fetch all of the PRs, and then examine them at your leisure.

Without advertising, how do they ask for the wildcard of `refs/pull/*`?
They're stuck massaging some out-of-band data into a set of distinct
fetch refspecs.

I don't know much about what's in Gerrit's refs/changes, but I imagine
one could make a similar argument that their value is in the eye of the
client. And later you give this example:

 Another example.  If you run ls-remote against the above two
 repositories, you will notice that the latter has tons more
 branches.  The former are to publish only the primary integration
 branches, while the latter are to show individual topics.
 
 I wish I didn't have to do this if I could.
 
 We have periodical What's cooking postings that let interested
 parties learn topics out-of-band.  If I can hide refs/heads/??/*
 from normal users' clones while actually keeping individual topics
 there at the same place, I do not have to push into two places.

Most people do not want to see those heads. But some people (like me)
do, and it is a great convenience to be able to fetch them all with a
wildcard refspec, which cannot work if they are not advertised. I would
have to parse what's cooking and fetch them each individually. That's
not a technologically insurmountable problem, but it means git is being
a lot less helpful than it could be.


So I think in these cases of extra refs, some clients would want them,
and some would not. And only they can know which camp they are in, not
the server side. Which is why the current system works pretty well: we
advertise everything and let the client decide what it wants. Clone very
sanely fetches only refs/heads/* (and associated tags), and you can put
whatever extra junk you want elsewhere in the hierarchy. A mirrored
clone will fetch it, but to me that is the point of --mirror. And if you
want some arbitrary subset, then you can implement that, too, by the
use of fetch refspecs[1].

The downside, of course, is that the ref advertisement is bigger than
many clients will want. But dealing with that is a separate issue from
these refs should never be shown, as solutions for one may not work
from the other (e.g., if we delayed ref advertisement until the client
had spoken, the client could tell us what refspecs they are interested
in).

For your topic branch example, why don't you push to refs/topics of the
main git repository? Then normal cloners wouldn't see it, but anybody
interested could add the appropriate fetch refspec.

-Peff

[1] It may be that refspecs are not as expressive as we would like,
but that is a client side problem we can deal with. For instance,
you cannot currently say give me everything except refs/foo/*.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] Hiding some refs in ls-remote

2013-01-21 Thread Jeff King
On Sun, Jan 20, 2013 at 02:08:32PM -0800, Junio C Hamano wrote:

 Junio C Hamano gits...@pobox.com writes:
 
  Jeff King p...@peff.net writes:
 
[uploadPack]
hiderefs = refs/changes
 
  Would you want to do the same thing on receive-pack? It could benefit
  from the same reduction in network cost (although it tends to be invoked
  less frequently than upload-pack).
  ...
  As I said, I view this primarily as solving the cluttering issue.
  The use of hiderefs to hide these refs that point at objects I do
  not consider belong to my repository from the pushers equally makes
  sense as such, I think.
 
 Now that raises one question.  Should this be transfer.hiderefs that
 governs both upload-pack and receive-pack?  I tend to think the
 answer is yes.

Yes, that is what I was getting at (and it should have individual
hiderefs for each side that override the transfer.*, in case somebody
wants to make the distinction).

 It may even make sense to have git push honor it, excluding them
 from git push --mirror (or git push --all if some of the
 branches are hidden); I haven't thought things through, though.

That is harder, as that is something that happens on the client. How
does the client learn about the transfer.hiderefs setting on the remote?
It has to be out-of-band, at which point calling it by the same name has
little benefit.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] Hiding some refs in ls-remote

2013-01-21 Thread Jeff King
On Sun, Jan 20, 2013 at 10:06:33AM -0800, Junio C Hamano wrote:

 Jeff King p...@peff.net writes:
 
 [uploadPack]
 hiderefs = refs/changes
 
  Would you want to do the same thing on receive-pack? It could benefit
  from the same reduction in network cost (although it tends to be invoked
  less frequently than upload-pack).
 
 Do *I* want to do?  Not when there already is a patch exists; I'd
 rather take existing and tested patch ;-)

The patch we have is below, but I do not think you want it, as it is
missing several things:

  - docs and tests

  - handling multiple values

  - anything but raw prefix matching (you even have to put in the /
yourself).

It was not my patch originally, and I never bothered to clean and
upstream it because I didn't think it was something anybody else would
be interested in. I'm happy to do so, but if you are working in the area
anyway, it would make sense to be part of your series.

-Peff

---
diff --git b/builtin/receive-pack.c a/builtin/receive-pack.c
index 0afb8b2..b22670c 100644
--- b/builtin/receive-pack.c
+++ a/builtin/receive-pack.c
@@ -39,6 +39,7 @@ static void *head_name_to_free;
 static int auto_gc = 1;
 static const char *head_name;
 static void *head_name_to_free;
+static const char *hide_refs;
 static int sent_capabilities;
 
 static enum deny_action parse_deny_action(const char *var, const char *value)
@@ -113,11 +114,19 @@ static void show_ref(const char *path, const unsigned 
char *sha1)
return 0;
}
 
+   if (strcmp(var, receive.hiderefs) == 0) {
+   git_config_string(hide_refs, var, value);
+   return 0;
+   }
+
return git_default_config(var, value, cb);
 }
 
 static void show_ref(const char *path, const unsigned char *sha1)
 {
+   if (hide_refs  strncmp(path, hide_refs, strlen(hide_refs)) == 0)
+   return 0;
+
if (sent_capabilities)
packet_write(1, %s %s\n, sha1_to_hex(sha1), path);
else
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] Hiding some refs in ls-remote

2013-01-21 Thread Junio C Hamano
Jeff King p...@peff.net writes:

 It may even make sense to have git push honor it, excluding them
 from git push --mirror (or git push --all if some of the
 branches are hidden); I haven't thought things through, though.

 That is harder, as that is something that happens on the client. How
 does the client learn about the transfer.hiderefs setting on the remote?

No, I was talking about running git push from a repository that
has the transfer.hiderefs set, emulating a fetch from a client by
pushing out in the reverse direction.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] Hiding some refs in ls-remote

2013-01-20 Thread Junio C Hamano
Jeff King p...@peff.net writes:

  [uploadPack]
  hiderefs = refs/changes

 Would you want to do the same thing on receive-pack? It could benefit
 from the same reduction in network cost (although it tends to be invoked
 less frequently than upload-pack).

Do *I* want to do?  Not when there already is a patch exists; I'd
rather take existing and tested patch ;-)

As I said, I view this primarily as solving the cluttering issue.
The use of hiderefs to hide these refs that point at objects I do
not consider belong to my repository from the pushers equally makes
sense as such, I think.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] Hiding some refs in ls-remote

2013-01-20 Thread Junio C Hamano
Junio C Hamano gits...@pobox.com writes:

 Duy Nguyen pclo...@gmail.com writes:

 Should the client side learn how to list hidden refs too? I'm thinking
 of an extreme case where upload-pack advertises nothing (or maybe just
 refs/heads/master) and it's up to the client to ask for the ref
 selection it's interested in. upload-pack may need more updates to do
 that, I think.

 What you are talking about is a different goal.

 Look at this as a mechanism for the repository owner to control the
 clutter in what is shown to the intended audience of what s/he
 publishes in the repository.  Network bandwidth reduction of
 advertisement is a side effect of clutter reduction, and not
 necessarily the primary goal.

As a separate and follow-up topic, I can see a future where we also
have .delayref (in addition to .hideref) configuration, that causes
the upload-pack to:

 * omit refs that are marked as delayed;

 * advertise allow-expand-ref=value where the value is the
   pattern used to specify which refs are omitted via this mechanism
   (e.g. refs/* in your extreme example, or perhaps refs/tags/
   if you only advertise branches in order to limit the bandwidth);

and then a fetch-pack that is interested in fetching refs/foo/bar,
after seeing that it matches one of the allow-expand-ref patterns,
to ask expand-ref refs/foo/bar, expect the upload-pack to respond
with refs/foo/bar value of that ref (or refs/foo/bar does not
exist), before going into the usual want exchange (ls-remote
would of course do the same, and pattern-less ls-remote may even
ask to expand everything by saying expand-ref refs/* (where the
capability said allow-expand-ref=refs/* in your extreme example)
to grab everything discoverable).

That is _primarily_ for trading network bandwidth with initial
latency; as far as the end-result is concerned, delayed ones to be
expanded on demand are just as discoverable as the ones advertised
initially.

I think we'd want to have both in the end.  It just happens I just
posted the beginning of clutter-reduction one, as I think developing
both in parallel would be messy and hideref is probably less
involved than the delayref.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] Hiding some refs in ls-remote

2013-01-20 Thread Junio C Hamano
Junio C Hamano gits...@pobox.com writes:

 Jeff King p...@peff.net writes:

 [uploadPack]
 hiderefs = refs/changes

 Would you want to do the same thing on receive-pack? It could benefit
 from the same reduction in network cost (although it tends to be invoked
 less frequently than upload-pack).
 ...
 As I said, I view this primarily as solving the cluttering issue.
 The use of hiderefs to hide these refs that point at objects I do
 not consider belong to my repository from the pushers equally makes
 sense as such, I think.

Now that raises one question.  Should this be transfer.hiderefs that
governs both upload-pack and receive-pack?  I tend to think the
answer is yes.

It may even make sense to have git push honor it, excluding them
from git push --mirror (or git push --all if some of the
branches are hidden); I haven't thought things through, though.

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] Hiding some refs in ls-remote

2013-01-20 Thread Duy Nguyen
On Sun, Jan 20, 2013 at 2:16 AM, Junio C Hamano gits...@pobox.com wrote:
 Duy Nguyen pclo...@gmail.com writes:

 Should the client side learn how to list hidden refs too? I'm thinking
 of an extreme case where upload-pack advertises nothing (or maybe just
 refs/heads/master) and it's up to the client to ask for the ref
 selection it's interested in. upload-pack may need more updates to do
 that, I think.

 What you are talking about is a different goal.

 Look at this as a mechanism for the repository owner to control the
 clutter in what is shown to the intended audience of what s/he
 publishes in the repository.  Network bandwidth reduction of
 advertisement is a side effect of clutter reduction, and not
 necessarily the primary goal.

Probably stupid question: does gitnamespaces.txt attempt to achieve
the same? The document says smart http supports passing namespace,
nothing about git protocol so I guess we need some extension in
upload-pack (or git-daemon) for specifying namespace over git
protocol.
-- 
Duy
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] Hiding some refs in ls-remote

2013-01-19 Thread Jeff King
On Fri, Jan 18, 2013 at 04:37:04PM -0800, Junio C Hamano wrote:

 This is an early preview of reducing the network cost while talking
 with a repository with tons of refs, most of which are of use by
 very narrow audiences (e.g. refs under Gerrit's refs/changes/ are
 useful only for people who are interested in the changes under
 review).  As long as these narrow audiences have a way to learn the
 names of refs or objects pointed at by the refs out-of-band, it is
 not necessary to advertise these refs.
 
 On the server end, you tell upload-pack that some refs do not have
 to be advertised with the uploadPack.hiderefs multi-valued
 configuration variable:
 
   [uploadPack]
   hiderefs = refs/changes

Would you want to do the same thing on receive-pack? It could benefit
from the same reduction in network cost (although it tends to be invoked
less frequently than upload-pack).

At GitHub, we have a similar patch (we even call it hiderefs), but we do
it only for receive-pack. In our case, it is not about network traffic,
but rather that we provide a set of read-only refs in the refs/pull
hierarchy. These are generated upstream by the creation of pull
requests, and we reject any updates to them via the git protocol using a
pre-receive hook.

However, if a client without these refs uses git push --mirror, it
will attempt to delete them (which will fail). Meaning that a mirror
push will always report failure, because it will always fail to push the
refs/pull deletions.

I don't know much about Gerrit's inner workings. Are refs/changes also
read-only?

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] Hiding some refs in ls-remote

2013-01-19 Thread Junio C Hamano
Duy Nguyen pclo...@gmail.com writes:

 Should the client side learn how to list hidden refs too? I'm thinking
 of an extreme case where upload-pack advertises nothing (or maybe just
 refs/heads/master) and it's up to the client to ask for the ref
 selection it's interested in. upload-pack may need more updates to do
 that, I think.

What you are talking about is a different goal.

Look at this as a mechanism for the repository owner to control the
clutter in what is shown to the intended audience of what s/he
publishes in the repository.  Network bandwidth reduction of
advertisement is a side effect of clutter reduction, and not
necessarily the primary goal.

It lets me as the repository owner to say do not list them to make
these discoverable; letting the client side learn defeats the whole
point of this mechanism.

For example, if you mirror-clone from either of my repositories from
GitHub:

git clone --mirror git://github.com/git/git/
git clone --mirror git://github.com/gitster/git/

you will see stuff that does not belong to the project; objects that
are only reachable from refs/pull/* are not something I approved to
be placed in my repository; I as the project owner do not want to
give these objects to a cloner as if they are part of my project.

The server side can hide refs/pull/ and refs/pull-request-tags/ so
that clones (and ls-remote) will not see clutter, and nobody gets
hurt.

These refs are there only to support GitHub pull requests; the
advertisement of them to ls-remote and appearance of them in mirror
clones are undesirable side effects of how GitHub happened to
implement the feature.  GitHub has its way to notify of these pull
requests, and it makes these pull requests discoverable out of band
without involving Git.

Ability to say git fetch origin $SHA1 (or git fetch origin
pull/1) is the only thing lacking, in order to get rid of these
clutter.  The patches do the get rid of clutter part by letting
you hide these refs from ls-remote and clone; allowing fetch by tip
SHA1 needs to come latter.

Another example.  If you run ls-remote against the above two
repositories, you will notice that the latter has tons more
branches.  The former are to publish only the primary integration
branches, while the latter are to show individual topics.

I wish I didn't have to do this if I could.

We have periodical What's cooking postings that let interested
parties learn topics out-of-band.  If I can hide refs/heads/??/*
from normal users' clones while actually keeping individual topics
there at the same place, I do not have to push into two places.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] Hiding some refs in ls-remote

2013-01-18 Thread Duy Nguyen
On Sat, Jan 19, 2013 at 7:37 AM, Junio C Hamano gits...@pobox.com wrote:
 This is an early preview of reducing the network cost while talking
 with a repository with tons of refs, most of which are of use by
 very narrow audiences (e.g. refs under Gerrit's refs/changes/ are
 useful only for people who are interested in the changes under
 review).  As long as these narrow audiences have a way to learn the
 names of refs or objects pointed at by the refs out-of-band, it is
 not necessary to advertise these refs.

 On the server end, you tell upload-pack that some refs do not have
 to be advertised with the uploadPack.hiderefs multi-valued
 configuration variable:

 [uploadPack]
 hiderefs = refs/changes

 The changes necessary on the client side to allow fetching objects
 at the tip of a ref in hidden hierarchies are much more involved and
 not part of this early preview, but the end user UI is expected to
 be like these:

 $ git fetch $there refs/changes/72/41672/1
 $ git fetch $there 9598d59cdc098c5d9094d68024475e2430343182

 That is, you ask for a refname as usual even though it is not part
 of ls-remote response, or you ask for the commit object that is at
 the tip of whatever hidden ref you are interested in.

Should the client side learn how to list hidden refs too? I'm thinking
of an extreme case where upload-pack advertises nothing (or maybe just
refs/heads/master) and it's up to the client to ask for the ref
selection it's interested in. upload-pack may need more updates to do
that, I think.
-- 
Duy
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] Hiding some refs in ls-remote

2013-01-18 Thread Michael Haggerty
On 01/19/2013 01:37 AM, Junio C Hamano wrote:
 This is an early preview of reducing the network cost while talking
 with a repository with tons of refs, most of which are of use by
 very narrow audiences (e.g. refs under Gerrit's refs/changes/ are
 useful only for people who are interested in the changes under
 review).  As long as these narrow audiences have a way to learn the
 names of refs or objects pointed at by the refs out-of-band, it is
 not necessary to advertise these refs.
 
 On the server end, you tell upload-pack that some refs do not have
 to be advertised with the uploadPack.hiderefs multi-valued
 configuration variable:
 
   [uploadPack]
   hiderefs = refs/changes
 
 The changes necessary on the client side to allow fetching objects
 at the tip of a ref in hidden hierarchies are much more involved and
 not part of this early preview, but the end user UI is expected to
 be like these:
 
   $ git fetch $there refs/changes/72/41672/1
   $ git fetch $there 9598d59cdc098c5d9094d68024475e2430343182
 
 That is, you ask for a refname as usual even though it is not part
 of ls-remote response, or you ask for the commit object that is at
 the tip of whatever hidden ref you are interested in.

Although I can understand the pain of slow network performance, somehow
this proposal gives me the feeling of being expeditious rather than elegant.

Could the problem be solved in some other way?  Maybe such references
could be stored in a second repository or in a separate namespace (in
the sense of gitnamespaces(7)) to prevent their creating overhead when
they are unneeded?

And *if* reference hiding makes sense, it seems to me that the client,
not the server, should be the one who decides which server references it
is interested in (though I understand that would require a protocol
change).  Otherwise the git repository *relies* on out-of-band channels
for its functionality.  If I understand correctly, a user would have *no
way* to discover, via git, what hidden references are contained in a
remote repository, or indeed even that the repo contains a hidden
namespace.  For example this would make it impossible to clean up
obsolete hidden references on a remote repository without the
supplementary information stored elsewhere.  And if anybody accidentally
creates a reference in a hidden namespace by hand, it will just sit
there undetectably, forever.

I assume (though I've never checked) that a server does not let a client
ask for a SHA1 that is not currently reachable from a server-side
reference, and I assume that that you are not proposing to change this
policy.  But allowing objects to be fetched from a hidden reference
opens up some interesting possibilities:

* A pusher could upload arbitrary content to a public git server under a
cryptic hidden reference name.  Most people would be completely unable
to see this content, unless given the SHA1 or the reference name by the
pusher.  Thus this mechanism could be used as a dark channel to exchange
arbitrary data relatively secretly.

* Somebody could push a trojan version of code to a hidden reference in
a project, then pass the SHA1 to a victim.  The victim might trust the
code because it comes from a known project website, even though the code
would be invisible to other project developers and thus impossible for
them to audit.  And even if they learned about the trojan's SHA1 they
would be unable to remove it from their repository because they have no
way to find out the name of the hidden reference!

Obviously these hacks would only be possible for a bad guy with push
privileges to a repository that has turned on hidden references, but I
think they are sobering nevertheless.

These worries would go away if reference hiding were configured on the
client rather than on the server.

A second point: currently, the output of git show-ref -d and git
ls-remote . are almost identical.  Under your proposal, I believe that
the hiderefs would only be omitted from the latter.  Would it be useful
to add an option to git show-ref to make it omit the hiderefs refs?
 And maybe another option to make it display *only* the hideref refs?

And in the bikeshedding department, I wonder if hiderefs is the best
name for the config setting.  hiderefs, implies to me that the refs
are actively hidden and not available to the client in any way.  But in
fact they are just not advertised; they can be fetched normally.  Maybe
another name would be more suggestive of its true effect, for example
quietrefs or noadvertiserefs.

Michael

-- 
Michael Haggerty
mhag...@alum.mit.edu
http://softwareswirl.blogspot.com/
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html