Re: http-protocol question

2014-12-02 Thread Junio C Hamano
Jeff King p...@peff.net writes:

 For a public repository, it might make sense to provide a config option
 to loosen the is_our_ref check completely (i.e., to just has_sha1_file).
 But such an option does not yet exist.

In principle, yes, but that cannot be has_sha1_file(); it has to
have a fully connected healthy history behind it.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: http-protocol question

2014-12-02 Thread Jeff King
On Tue, Dec 02, 2014 at 09:45:06AM -0800, Junio C Hamano wrote:

 Jeff King p...@peff.net writes:
 
  For a public repository, it might make sense to provide a config option
  to loosen the is_our_ref check completely (i.e., to just has_sha1_file).
  But such an option does not yet exist.
 
 In principle, yes, but that cannot be has_sha1_file(); it has to
 have a fully connected healthy history behind it.

I thought about that, but I wonder if it matters whether we check up
front. We are not advertising it, but only trying our best to fulfill
the want from the other side.  So either:

  1. We check immediately whether we have the full history behind it,
 and reject the want otherwise.

  2. We start pack-objects on it, and the first thing it will do is
 collect the full set of objects to send. If it doesn't have them,
 it will barf.

Probably (1) would produce nicer error messages, but (2) is a lot more
efficient.

I dunno. I am not volunteering to implement, so I will leave thinking on
it more to somebody who wants to do so. :)

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


http-protocol question

2014-12-01 Thread Bryan Turner
In Documentation/technical/http-protocol.txt, there is the following statement:

S: Parse the git-upload-pack request:

Verify all objects in `want` are directly reachable from refs.

The server MAY walk backwards through history or through
the reflog to permit slightly stale requests.

Is there actually logic somewhere in Git that does that MAY walk
backwards step? Or is that something another implementation of Git
may do but the standard Git does not?

Thanks,
Bryan Turner
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: http-protocol question

2014-12-01 Thread Jonathan Nieder
Hi Bryan,

Bryan Turner wrote:

 Is there actually logic somewhere in Git that does that MAY walk
 backwards step?

Yes.  See upload-pack.c::check_non_tip and
http://thread.gmane.org/gmane.comp.version-control.git/178814.

Hope that helps,
Jonathan
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: http-protocol question

2014-12-01 Thread Bryan Turner
On Tue, Dec 2, 2014 at 2:34 PM, Jonathan Nieder jrnie...@gmail.com wrote:
 Hi Bryan,

 Bryan Turner wrote:

 Is there actually logic somewhere in Git that does that MAY walk
 backwards step?

 Yes.  See upload-pack.c::check_non_tip and
 http://thread.gmane.org/gmane.comp.version-control.git/178814.

Jonathan,

Thanks for the reply! I realize now I didn't really cite the part of
that documentation that I'm most interested in, which is: through
history _or through the reflog_. It's the reflog bit I'm looking for.
Sorry for not being more clear. check_non_tip appears to look at
ancestry, walking back up (or down, depending on your view) the DAG to
see if the requested SHA-1 is reachable from a current ref, so it
looks like that's covering the through history part of that blurb.

The reason I ask is that there is a race condition that exists where
the ref advertisement lists refs/heads/foo at abc1234, and then foo is
deleted before the actual upload-pack request comes in. In that
situation, walking backwards through _history_ will only allow the
upload-pack to complete unless the deleted ref was merged into another
ref prior to being deleted (if I understand the code correctly). It
seems like looking at the reflogs, and seeing that refs/heads/foo did
in fact exist and did in fact refer to abc1234, is the only approach
that could allow the upload-pack to complete. The documentation hints
that such a thing is possible, but I don't see any code in Git that
seems to do that.

Does that make more sense? Does that functionality exist and I've just
overlooked it?

Thanks again,
Bryan Turner


 Hope that helps,
 Jonathan
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: http-protocol question

2014-12-01 Thread Bryan Turner
On Tue, Dec 2, 2014 at 3:28 PM, Bryan Turner btur...@atlassian.com wrote:
 On Tue, Dec 2, 2014 at 2:34 PM, Jonathan Nieder jrnie...@gmail.com wrote:
 Hi Bryan,

 Bryan Turner wrote:

 Is there actually logic somewhere in Git that does that MAY walk
 backwards step?

 Yes.  See upload-pack.c::check_non_tip and
 http://thread.gmane.org/gmane.comp.version-control.git/178814.

 Jonathan,

 Thanks for the reply! I realize now I didn't really cite the part of
 that documentation that I'm most interested in, which is: through
 history _or through the reflog_. It's the reflog bit I'm looking for.
 Sorry for not being more clear. check_non_tip appears to look at
 ancestry, walking back up (or down, depending on your view) the DAG to
 see if the requested SHA-1 is reachable from a current ref, so it
 looks like that's covering the through history part of that blurb.

 The reason I ask is that there is a race condition that exists where
 the ref advertisement lists refs/heads/foo at abc1234, and then foo is
 deleted before the actual upload-pack request comes in. In that
 situation, walking backwards through _history_ will only allow the
 upload-pack to complete unless the deleted ref was merged into another

s/unless/if, sorry. In that situation, walking backwards through
history will only allow the upload-pack to complete if the deleted ref
was merged into another ref.

 ref prior to being deleted (if I understand the code correctly). It
 seems like looking at the reflogs, and seeing that refs/heads/foo did
 in fact exist and did in fact refer to abc1234, is the only approach
 that could allow the upload-pack to complete. The documentation hints
 that such a thing is possible, but I don't see any code in Git that
 seems to do that.

 Does that make more sense? Does that functionality exist and I've just
 overlooked it?

 Thanks again,
 Bryan Turner


 Hope that helps,
 Jonathan
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: http-protocol question

2014-12-01 Thread Jonathan Nieder
Hi,

Bryan Turner wrote:

 The reason I ask is that there is a race condition that exists where
 the ref advertisement lists refs/heads/foo at abc1234, and then foo is
 deleted before the actual upload-pack request comes in.

Can you say more about the context?  For example, was this actually
happening, or is this for the sake of understanding the protocol
better?

I ask because knowing the context might help us give more specific
advice.

Sometimes when people delete a branch they really want those objects
to be inaccessible *right away*.  So for such people, git's behavior
of failing the request unless the objects are still accessible by
some other path is helpful.

A stateful server could theoretically cache the list of objects they
have advertised for a short time, to avoid clients having to suffer
when something becomes inaccessible during the window between the
upload-pack advertisement and upload-pack request.  Or a permissive
server can allow all wants except a specific blacklist (and some
people do that).

Jonathan
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: http-protocol question

2014-12-01 Thread Bryan Turner
On Tue, Dec 2, 2014 at 3:45 PM, Jonathan Nieder jrnie...@gmail.com wrote:
 Hi,

 Bryan Turner wrote:

 The reason I ask is that there is a race condition that exists where
 the ref advertisement lists refs/heads/foo at abc1234, and then foo is
 deleted before the actual upload-pack request comes in.

 Can you say more about the context?  For example, was this actually
 happening, or is this for the sake of understanding the protocol
 better?

I ask because it's actually happening. Heavy CI load sometimes has
builds fail because git clone dies with not our ref. That's the
specific context I'm working to try and address right now. Some teams
use rebase-heavy workflows, which also evades the check_non_tip easing
and fails with not our ref, so I can't be 100% certain it's ref
deletion in specific causing it (but I guess which of those it is is
probably largely academic; as long as hosting spans multiple requests
it seems like this type of race condition is unavoidable).

I'm trying to decide if there is something I can enable or tune to
prevent it, and the type of twilighting hinted at by the http-protocol
documentation seemed like it could be just the thing.


 I ask because knowing the context might help us give more specific
 advice.

 Sometimes when people delete a branch they really want those objects
 to be inaccessible *right away*.  So for such people, git's behavior
 of failing the request unless the objects are still accessible by
 some other path is helpful.

 A stateful server could theoretically cache the list of objects they
 have advertised for a short time, to avoid clients having to suffer
 when something becomes inaccessible during the window between the
 upload-pack advertisement and upload-pack request.  Or a permissive
 server can allow all wants except a specific blacklist (and some
 people do that).

 Jonathan
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: http-protocol question

2014-12-01 Thread Jonathan Nieder
Bryan Turner wrote:

 I ask because it's actually happening. Heavy CI load sometimes has
 builds fail because git clone dies with not our ref. That's the
 specific context I'm working to try and address right now.

Thanks --- that makes sense.

Some teams
 use rebase-heavy workflows, which also evades the check_non_tip easing
 and fails with not our ref, so I can't be 100% certain it's ref
 deletion in specific causing it (but I guess which of those it is is
 probably largely academic; as long as hosting spans multiple requests
 it seems like this type of race condition is unavoidable).

Yeah, everything mentioned before about ref deletion also applies to
non-fast-forward updates.

 I'm trying to decide if there is something I can enable or tune to
 prevent it, and the type of twilighting hinted at by the http-protocol
 documentation seemed like it could be just the thing.

If you control the side that clones, then just cloning the single branch
you are building (with --single-branch and -b) can help.

Using a bidirectional protocol like git:// or ssh:// (where the ref
advertisement and check of wants happen in the same connection) would
avoid the problem we're talking about.

On the server side, I agree that either mining reflogs or storing
advertisements to disk would be a nice way to take care of this.
No one has implemented either of those, but it would be a nice setting
to have. :)

Thanks,
Jonathan
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: http-protocol question

2014-12-01 Thread Duy Nguyen
On Tue, Dec 2, 2014 at 12:17 PM, Jonathan Nieder jrnie...@gmail.com wrote:
 On the server side, I agree that either mining reflogs or storing
 advertisements to disk would be a nice way to take care of this.
 No one has implemented either of those, but it would be a nice setting
 to have. :)

and could be dangerous too. If I accidentally add a password file,
then delete it (way before a clone is performed), I don't want that
part of reflog visible to the cloner. Problem is we don't know how far
we should look back in reflog unless we keep some state.
-- 
Duy
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: http-protocol question

2014-12-01 Thread Jeff King
On Tue, Dec 02, 2014 at 04:04:11PM +1100, Bryan Turner wrote:

  Can you say more about the context?  For example, was this actually
  happening, or is this for the sake of understanding the protocol
  better?
 
 I ask because it's actually happening. Heavy CI load sometimes has
 builds fail because git clone dies with not our ref. That's the
 specific context I'm working to try and address right now. Some teams
 use rebase-heavy workflows, which also evades the check_non_tip easing
 and fails with not our ref, so I can't be 100% certain it's ref
 deletion in specific causing it (but I guess which of those it is is
 probably largely academic; as long as hosting spans multiple requests
 it seems like this type of race condition is unavoidable).

There is a practical reason to care. Ref deletion will also delete the
reflog, leaving no trace of the reachability. Whereas a non-fast-forward
push could be resolved by looking in the reflog.

One problem with hunting for sha1s in the reflog is that upload-pack
does not know which ref the client thinks they are requesting. So a
search would involve looking in _every_ reflog, which could be
expensive. It might not be too painful if you do the search only after
hitting a not our ref condition, which should in theory be rare. A
malicious client could convince you to grep your reflogs repeatedly, but
that is hardly the only way to convince upload-pack to chew CPU. Asking
it to make a pack comes to mind. :)

 I'm trying to decide if there is something I can enable or tune to
 prevent it, and the type of twilighting hinted at by the http-protocol
 documentation seemed like it could be just the thing.

For a public repository, it might make sense to provide a config option
to loosen the is_our_ref check completely (i.e., to just has_sha1_file).
But such an option does not yet exist.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: http-protocol question

2014-12-01 Thread Duy Nguyen
On Tue, Dec 2, 2014 at 12:17 PM, Jonathan Nieder jrnie...@gmail.com wrote:
 I'm trying to decide if there is something I can enable or tune to
 prevent it, and the type of twilighting hinted at by the http-protocol
 documentation seemed like it could be just the thing.

 If you control the side that clones, then just cloning the single branch
 you are building (with --single-branch and -b) can help.

Or we could extend this a bit on the server side, if one ref fail, let
the clone continue with the remaining refs (and warn loudly to the
user).
-- 
Duy
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: http-protocol question

2014-12-01 Thread Bryan Turner
On Tue, Dec 2, 2014 at 4:33 PM, Jeff King p...@peff.net wrote:
 On Tue, Dec 02, 2014 at 04:04:11PM +1100, Bryan Turner wrote:

  Can you say more about the context?  For example, was this actually
  happening, or is this for the sake of understanding the protocol
  better?

 I ask because it's actually happening. Heavy CI load sometimes has
 builds fail because git clone dies with not our ref. That's the
 specific context I'm working to try and address right now. Some teams
 use rebase-heavy workflows, which also evades the check_non_tip easing
 and fails with not our ref, so I can't be 100% certain it's ref
 deletion in specific causing it (but I guess which of those it is is
 probably largely academic; as long as hosting spans multiple requests
 it seems like this type of race condition is unavoidable).

 There is a practical reason to care. Ref deletion will also delete the
 reflog, leaving no trace of the reachability. Whereas a non-fast-forward
 push could be resolved by looking in the reflog.

A fair point. I had mistakenly thought that reflogs would survive the
ref's deletion and be pruned as part of garbage collection, but a
quick test shows that, as I'm sure you already know, that's not true.

Thanks for correcting my mistake!
Bryan Turner
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: http-protocol question

2014-12-01 Thread Jeff King
On Tue, Dec 02, 2014 at 04:47:50PM +1100, Bryan Turner wrote:

  There is a practical reason to care. Ref deletion will also delete the
  reflog, leaving no trace of the reachability. Whereas a non-fast-forward
  push could be resolved by looking in the reflog.
 
 A fair point. I had mistakenly thought that reflogs would survive the
 ref's deletion and be pruned as part of garbage collection, but a
 quick test shows that, as I'm sure you already know, that's not true.

I wish it worked that way. Unfortunately there are complications with
keeping the old reflogs in place, because they sometimes cause conflicts
with new refs being created (e.g., a reflog in .git/logs/refs/heads/foo
would prevent .git/logs/refs/heads/foo/bar from being created). I had
some patches long ago to try to keep a reflog graveyard around, but
they were quite invasive, and there were some corner cases that caused
weird errors.

Handling this sort of D/F conflict more gracefully is one of the things
I'd like to experiment with once we have pluggable ref backends (I think
we'll also disallow foo/bar if foo exists, but the storage could at
least keep the reflogs around).

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html