Re: http-protocol question
Jeff King p...@peff.net writes: For a public repository, it might make sense to provide a config option to loosen the is_our_ref check completely (i.e., to just has_sha1_file). But such an option does not yet exist. In principle, yes, but that cannot be has_sha1_file(); it has to have a fully connected healthy history behind it. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: http-protocol question
On Tue, Dec 02, 2014 at 09:45:06AM -0800, Junio C Hamano wrote: Jeff King p...@peff.net writes: For a public repository, it might make sense to provide a config option to loosen the is_our_ref check completely (i.e., to just has_sha1_file). But such an option does not yet exist. In principle, yes, but that cannot be has_sha1_file(); it has to have a fully connected healthy history behind it. I thought about that, but I wonder if it matters whether we check up front. We are not advertising it, but only trying our best to fulfill the want from the other side. So either: 1. We check immediately whether we have the full history behind it, and reject the want otherwise. 2. We start pack-objects on it, and the first thing it will do is collect the full set of objects to send. If it doesn't have them, it will barf. Probably (1) would produce nicer error messages, but (2) is a lot more efficient. I dunno. I am not volunteering to implement, so I will leave thinking on it more to somebody who wants to do so. :) -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
http-protocol question
In Documentation/technical/http-protocol.txt, there is the following statement: S: Parse the git-upload-pack request: Verify all objects in `want` are directly reachable from refs. The server MAY walk backwards through history or through the reflog to permit slightly stale requests. Is there actually logic somewhere in Git that does that MAY walk backwards step? Or is that something another implementation of Git may do but the standard Git does not? Thanks, Bryan Turner -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: http-protocol question
Hi Bryan, Bryan Turner wrote: Is there actually logic somewhere in Git that does that MAY walk backwards step? Yes. See upload-pack.c::check_non_tip and http://thread.gmane.org/gmane.comp.version-control.git/178814. Hope that helps, Jonathan -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: http-protocol question
On Tue, Dec 2, 2014 at 2:34 PM, Jonathan Nieder jrnie...@gmail.com wrote: Hi Bryan, Bryan Turner wrote: Is there actually logic somewhere in Git that does that MAY walk backwards step? Yes. See upload-pack.c::check_non_tip and http://thread.gmane.org/gmane.comp.version-control.git/178814. Jonathan, Thanks for the reply! I realize now I didn't really cite the part of that documentation that I'm most interested in, which is: through history _or through the reflog_. It's the reflog bit I'm looking for. Sorry for not being more clear. check_non_tip appears to look at ancestry, walking back up (or down, depending on your view) the DAG to see if the requested SHA-1 is reachable from a current ref, so it looks like that's covering the through history part of that blurb. The reason I ask is that there is a race condition that exists where the ref advertisement lists refs/heads/foo at abc1234, and then foo is deleted before the actual upload-pack request comes in. In that situation, walking backwards through _history_ will only allow the upload-pack to complete unless the deleted ref was merged into another ref prior to being deleted (if I understand the code correctly). It seems like looking at the reflogs, and seeing that refs/heads/foo did in fact exist and did in fact refer to abc1234, is the only approach that could allow the upload-pack to complete. The documentation hints that such a thing is possible, but I don't see any code in Git that seems to do that. Does that make more sense? Does that functionality exist and I've just overlooked it? Thanks again, Bryan Turner Hope that helps, Jonathan -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: http-protocol question
On Tue, Dec 2, 2014 at 3:28 PM, Bryan Turner btur...@atlassian.com wrote: On Tue, Dec 2, 2014 at 2:34 PM, Jonathan Nieder jrnie...@gmail.com wrote: Hi Bryan, Bryan Turner wrote: Is there actually logic somewhere in Git that does that MAY walk backwards step? Yes. See upload-pack.c::check_non_tip and http://thread.gmane.org/gmane.comp.version-control.git/178814. Jonathan, Thanks for the reply! I realize now I didn't really cite the part of that documentation that I'm most interested in, which is: through history _or through the reflog_. It's the reflog bit I'm looking for. Sorry for not being more clear. check_non_tip appears to look at ancestry, walking back up (or down, depending on your view) the DAG to see if the requested SHA-1 is reachable from a current ref, so it looks like that's covering the through history part of that blurb. The reason I ask is that there is a race condition that exists where the ref advertisement lists refs/heads/foo at abc1234, and then foo is deleted before the actual upload-pack request comes in. In that situation, walking backwards through _history_ will only allow the upload-pack to complete unless the deleted ref was merged into another s/unless/if, sorry. In that situation, walking backwards through history will only allow the upload-pack to complete if the deleted ref was merged into another ref. ref prior to being deleted (if I understand the code correctly). It seems like looking at the reflogs, and seeing that refs/heads/foo did in fact exist and did in fact refer to abc1234, is the only approach that could allow the upload-pack to complete. The documentation hints that such a thing is possible, but I don't see any code in Git that seems to do that. Does that make more sense? Does that functionality exist and I've just overlooked it? Thanks again, Bryan Turner Hope that helps, Jonathan -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: http-protocol question
Hi, Bryan Turner wrote: The reason I ask is that there is a race condition that exists where the ref advertisement lists refs/heads/foo at abc1234, and then foo is deleted before the actual upload-pack request comes in. Can you say more about the context? For example, was this actually happening, or is this for the sake of understanding the protocol better? I ask because knowing the context might help us give more specific advice. Sometimes when people delete a branch they really want those objects to be inaccessible *right away*. So for such people, git's behavior of failing the request unless the objects are still accessible by some other path is helpful. A stateful server could theoretically cache the list of objects they have advertised for a short time, to avoid clients having to suffer when something becomes inaccessible during the window between the upload-pack advertisement and upload-pack request. Or a permissive server can allow all wants except a specific blacklist (and some people do that). Jonathan -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: http-protocol question
On Tue, Dec 2, 2014 at 3:45 PM, Jonathan Nieder jrnie...@gmail.com wrote: Hi, Bryan Turner wrote: The reason I ask is that there is a race condition that exists where the ref advertisement lists refs/heads/foo at abc1234, and then foo is deleted before the actual upload-pack request comes in. Can you say more about the context? For example, was this actually happening, or is this for the sake of understanding the protocol better? I ask because it's actually happening. Heavy CI load sometimes has builds fail because git clone dies with not our ref. That's the specific context I'm working to try and address right now. Some teams use rebase-heavy workflows, which also evades the check_non_tip easing and fails with not our ref, so I can't be 100% certain it's ref deletion in specific causing it (but I guess which of those it is is probably largely academic; as long as hosting spans multiple requests it seems like this type of race condition is unavoidable). I'm trying to decide if there is something I can enable or tune to prevent it, and the type of twilighting hinted at by the http-protocol documentation seemed like it could be just the thing. I ask because knowing the context might help us give more specific advice. Sometimes when people delete a branch they really want those objects to be inaccessible *right away*. So for such people, git's behavior of failing the request unless the objects are still accessible by some other path is helpful. A stateful server could theoretically cache the list of objects they have advertised for a short time, to avoid clients having to suffer when something becomes inaccessible during the window between the upload-pack advertisement and upload-pack request. Or a permissive server can allow all wants except a specific blacklist (and some people do that). Jonathan -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: http-protocol question
Bryan Turner wrote: I ask because it's actually happening. Heavy CI load sometimes has builds fail because git clone dies with not our ref. That's the specific context I'm working to try and address right now. Thanks --- that makes sense. Some teams use rebase-heavy workflows, which also evades the check_non_tip easing and fails with not our ref, so I can't be 100% certain it's ref deletion in specific causing it (but I guess which of those it is is probably largely academic; as long as hosting spans multiple requests it seems like this type of race condition is unavoidable). Yeah, everything mentioned before about ref deletion also applies to non-fast-forward updates. I'm trying to decide if there is something I can enable or tune to prevent it, and the type of twilighting hinted at by the http-protocol documentation seemed like it could be just the thing. If you control the side that clones, then just cloning the single branch you are building (with --single-branch and -b) can help. Using a bidirectional protocol like git:// or ssh:// (where the ref advertisement and check of wants happen in the same connection) would avoid the problem we're talking about. On the server side, I agree that either mining reflogs or storing advertisements to disk would be a nice way to take care of this. No one has implemented either of those, but it would be a nice setting to have. :) Thanks, Jonathan -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: http-protocol question
On Tue, Dec 2, 2014 at 12:17 PM, Jonathan Nieder jrnie...@gmail.com wrote: On the server side, I agree that either mining reflogs or storing advertisements to disk would be a nice way to take care of this. No one has implemented either of those, but it would be a nice setting to have. :) and could be dangerous too. If I accidentally add a password file, then delete it (way before a clone is performed), I don't want that part of reflog visible to the cloner. Problem is we don't know how far we should look back in reflog unless we keep some state. -- Duy -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: http-protocol question
On Tue, Dec 02, 2014 at 04:04:11PM +1100, Bryan Turner wrote: Can you say more about the context? For example, was this actually happening, or is this for the sake of understanding the protocol better? I ask because it's actually happening. Heavy CI load sometimes has builds fail because git clone dies with not our ref. That's the specific context I'm working to try and address right now. Some teams use rebase-heavy workflows, which also evades the check_non_tip easing and fails with not our ref, so I can't be 100% certain it's ref deletion in specific causing it (but I guess which of those it is is probably largely academic; as long as hosting spans multiple requests it seems like this type of race condition is unavoidable). There is a practical reason to care. Ref deletion will also delete the reflog, leaving no trace of the reachability. Whereas a non-fast-forward push could be resolved by looking in the reflog. One problem with hunting for sha1s in the reflog is that upload-pack does not know which ref the client thinks they are requesting. So a search would involve looking in _every_ reflog, which could be expensive. It might not be too painful if you do the search only after hitting a not our ref condition, which should in theory be rare. A malicious client could convince you to grep your reflogs repeatedly, but that is hardly the only way to convince upload-pack to chew CPU. Asking it to make a pack comes to mind. :) I'm trying to decide if there is something I can enable or tune to prevent it, and the type of twilighting hinted at by the http-protocol documentation seemed like it could be just the thing. For a public repository, it might make sense to provide a config option to loosen the is_our_ref check completely (i.e., to just has_sha1_file). But such an option does not yet exist. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: http-protocol question
On Tue, Dec 2, 2014 at 12:17 PM, Jonathan Nieder jrnie...@gmail.com wrote: I'm trying to decide if there is something I can enable or tune to prevent it, and the type of twilighting hinted at by the http-protocol documentation seemed like it could be just the thing. If you control the side that clones, then just cloning the single branch you are building (with --single-branch and -b) can help. Or we could extend this a bit on the server side, if one ref fail, let the clone continue with the remaining refs (and warn loudly to the user). -- Duy -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: http-protocol question
On Tue, Dec 2, 2014 at 4:33 PM, Jeff King p...@peff.net wrote: On Tue, Dec 02, 2014 at 04:04:11PM +1100, Bryan Turner wrote: Can you say more about the context? For example, was this actually happening, or is this for the sake of understanding the protocol better? I ask because it's actually happening. Heavy CI load sometimes has builds fail because git clone dies with not our ref. That's the specific context I'm working to try and address right now. Some teams use rebase-heavy workflows, which also evades the check_non_tip easing and fails with not our ref, so I can't be 100% certain it's ref deletion in specific causing it (but I guess which of those it is is probably largely academic; as long as hosting spans multiple requests it seems like this type of race condition is unavoidable). There is a practical reason to care. Ref deletion will also delete the reflog, leaving no trace of the reachability. Whereas a non-fast-forward push could be resolved by looking in the reflog. A fair point. I had mistakenly thought that reflogs would survive the ref's deletion and be pruned as part of garbage collection, but a quick test shows that, as I'm sure you already know, that's not true. Thanks for correcting my mistake! Bryan Turner -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: http-protocol question
On Tue, Dec 02, 2014 at 04:47:50PM +1100, Bryan Turner wrote: There is a practical reason to care. Ref deletion will also delete the reflog, leaving no trace of the reachability. Whereas a non-fast-forward push could be resolved by looking in the reflog. A fair point. I had mistakenly thought that reflogs would survive the ref's deletion and be pruned as part of garbage collection, but a quick test shows that, as I'm sure you already know, that's not true. I wish it worked that way. Unfortunately there are complications with keeping the old reflogs in place, because they sometimes cause conflicts with new refs being created (e.g., a reflog in .git/logs/refs/heads/foo would prevent .git/logs/refs/heads/foo/bar from being created). I had some patches long ago to try to keep a reflog graveyard around, but they were quite invasive, and there were some corner cases that caused weird errors. Handling this sort of D/F conflict more gracefully is one of the things I'd like to experiment with once we have pluggable ref backends (I think we'll also disallow foo/bar if foo exists, but the storage could at least keep the reflogs around). -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html