Re: push race
Jeff King writes: > But still...the complexity is ugly, and we do not even have a measured > problem in the real world. This is not worth thinking about. :) Yup. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: push race
On Tue, Oct 16, 2012 at 10:21:02AM -0700, Junio C Hamano wrote: > > I suppose we could do the way unpack-objects does: prefer present > > objects and drop the new identical ones, no memcmp. Objects that are > > not bases, or are ref-delta bases, can be safely dropped. ofs-delta > > bases may lead to rewriting the pack. Do-able but not sure it's worth > > the effort. > > Until you read all the incoming pack data, you won't know what > objects are used as bases for others, so unless you are keeping > everything in core, you would have to spool the incoming data to a > file and then rewrite the final pack file to "drop" these "can be > safely dropped" objects, with or without offset delta encoding. By definition, you know that you have another copy of these objects (that is why you are dropping them). So you could treat later delta references to them the same as thin-pack references, and re-add your existing on-disk copy of the object to the end of the pack. But still...the complexity is ugly, and we do not even have a measured problem in the real world. This is not worth thinking about. :) -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: push race
Nguyen Thai Ngoc Duy writes: > On Tue, Oct 16, 2012 at 12:37 PM, Jeff King wrote: >> I suspect a site that is heavy on alternates is invoking the index-pack >> code path more frequently than necessary (e.g., history gets pushed to >> one forked repo, then when it goes to the next one, we may not share the >> ref that tells the client we already have the object and receive it a >> second time). > > I suppose we could do the way unpack-objects does: prefer present > objects and drop the new identical ones, no memcmp. Objects that are > not bases, or are ref-delta bases, can be safely dropped. ofs-delta > bases may lead to rewriting the pack. Do-able but not sure it's worth > the effort. Until you read all the incoming pack data, you won't know what objects are used as bases for others, so unless you are keeping everything in core, you would have to spool the incoming data to a file and then rewrite the final pack file to "drop" these "can be safely dropped" objects, with or without offset delta encoding. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: push race
On Tue, Oct 16, 2012 at 05:45:12PM +0700, Nguyen Thai Ngoc Duy wrote: > On Tue, Oct 16, 2012 at 12:37 PM, Jeff King wrote: > > I suspect a site that is heavy on alternates is invoking the index-pack > > code path more frequently than necessary (e.g., history gets pushed to > > one forked repo, then when it goes to the next one, we may not share the > > ref that tells the client we already have the object and receive it a > > second time). > > I suppose we could do the way unpack-objects does: prefer present > objects and drop the new identical ones, no memcmp. Objects that are > not bases, or are ref-delta bases, can be safely dropped. ofs-delta > bases may lead to rewriting the pack. Do-able but not sure it's worth > the effort. Yeah, I think that complexity is why we don't do it currently. We are pretty alternates-heavy at GitHub, and we have not noticed a performance impact. So I think it is probably not worth worrying about. -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: push race
Hi Jeff, it would be worth to put your description as comments in the code for future reference. Thanks -Angelo -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: push race
On Tue, Oct 16, 2012 at 12:15:21PM +0700, Nguyen Thai Ngoc Duy wrote: > On Tue, Oct 16, 2012 at 11:51 AM, Jeff King wrote: > >> Its worth nothing that a SHA-1 collision can be identified at the > >> server because the server performs a byte-for-byte compare of both > >> copies of the object to make sure they match exactly in every way. Its > >> not fast, but its safe. :-) > > > > Do we? I thought early versions of git did that, but we did not > > double-check collisions any more for performance reasons. You don't > > happen to remember where that code is, do you (not that it really > > matters, but I am just curious)? > > We do. I touched that sha-1 collision code last time I updated > index-pack, to support large blobs. We only do that when we receive an > object that we already have, which should not happen often unless > you're under attack, so little performance impact normally. Search > "collision" in index-pack.c Ah, thanks, I remember this now. I think that I was thinking of the very early code to check every sha1 file write. E.g., the code killed off by aac1794 (Improve sha1 object file writing., 2005-05-03). But that is ancient history that is not really relevant. Interesting that we check only in index-pack. If the pushed content is small enough, we will call unpack-objects. That follows the usual code path for writing the object, which will prefer the existing copy. I suspect a site that is heavy on alternates is invoking the index-pack code path more frequently than necessary (e.g., history gets pushed to one forked repo, then when it goes to the next one, we may not share the ref that tells the client we already have the object and receive it a second time). -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: push race
On Mon, Oct 15, 2012 at 07:09:52PM -0700, Shawn O. Pearce wrote: > On Mon, Oct 15, 2012 at 11:56 AM, Jeff King wrote: > > Right. The only thing that needs locking is the refs, because the object > > database is add-only for normal operations, and by definition collisions > > mean you have the same content (or are astronomically unlucky, but your > > consolation prize is that you can write a paper on how you found a sha1 > > collision). > > Its worth nothing that a SHA-1 collision can be identified at the > server because the server performs a byte-for-byte compare of both > copies of the object to make sure they match exactly in every way. Its > not fast, but its safe. :-) Do we? I thought early versions of git did that, but we did not double-check collisions any more for performance reasons. You don't happen to remember where that code is, do you (not that it really matters, but I am just curious)? -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: push race
On Mon, Oct 15, 2012 at 11:56 AM, Jeff King wrote: > Right. The only thing that needs locking is the refs, because the object > database is add-only for normal operations, and by definition collisions > mean you have the same content (or are astronomically unlucky, but your > consolation prize is that you can write a paper on how you found a sha1 > collision). Its worth nothing that a SHA-1 collision can be identified at the server because the server performs a byte-for-byte compare of both copies of the object to make sure they match exactly in every way. Its not fast, but its safe. :-) -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: push race
On Mon, Oct 15, 2012 at 05:50:47PM +0200, Angelo Borsotti wrote: > correct, there will be no file overwriting because no files are > written on the work tree. > I tried to follow the actions of the program, but did not quite catch > the 6. you mention. It is the "oldval" parameter to refs.c:update_ref. Or if you are using the "git update-ref" plumbing, it is the "oldvalue" parameter. -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: push race
On Mon, Oct 15, 2012 at 10:29:08AM -0400, Marc Branchaud wrote: > Here's a previous discussion of a race in concurrent updates to the same ref, > even when the updates are all identical: > > http://news.gmane.org/find-root.php?group=gmane.comp.version-control.git&article=164636 > > In that thread, Peff outlines the lock procedure for refs: > > 1. get the lock > 2. check and remember the sha1 > 3. release the lock > 4. do some long-running work (like the actual push) > 5. get the lock > 6. check that the sha1 is the same as the remembered one > 7. update the sha1 > 8. release the lock A minor nit, but I was wrong on steps 1-3. We don't have to take a lock on reading, because our write mechanism uses atomic replacement. So it is really: 1. read and remember the original sha1 2. do some long-running work (like the actual push) 3. get the write lock 4. read the sha1 and check that it's the same as our original 5. write the new sha1 to the lockfile 6. simultaneously release the lock and update the ref by atomically renaming the lockfile to the actual ref Any simultaneous push may see the "old" sha1 before step 6, and when it gets to its own step 4, will fail (and two processes cannot be in steps 3-6 simultaneously). > Angelo, in your case I think one of your concurrent updates would fail in > step 6. As you say, this is after the changes have been uploaded. However, > there's none of the file-overwriting that you fear, because the changes are > stored in git's object database under their SHA hashes. So there'll only be > an object-level collision if two parties upload the exact same object, in > which case it doesn't matter. Right. The only thing that needs locking is the refs, because the object database is add-only for normal operations, and by definition collisions mean you have the same content (or are astronomically unlucky, but your consolation prize is that you can write a paper on how you found a sha1 collision). -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: push race
Hi Marc, correct, there will be no file overwriting because no files are written on the work tree. I tried to follow the actions of the program, but did not quite catch the 6. you mention. -Angelo -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: push race
On 12-10-15 10:09 AM, Ævar Arnfjörð Bjarmason wrote: > On Mon, Oct 15, 2012 at 11:14 AM, Angelo Borsotti > wrote: >> Hello, > > FWIW we have a lot of lemmings pushing to the same ref all the time at > $work, and while I've seen cases where: > > 1. Two clients try to push > 2. They both get the initial lock > 3. One of them fails to get the secondary lock (I think updating the ref) > > I've never seen cases where they clobber each other in #3 (and I would > have known from "dude, where's my commit that I just pushed" reports). > > So while we could fix git to make sure there's no race condition such > that two clients never get the #2 lock I haven't seen it cause actual > data issues because of two clients getting the #3 lock. > > It might still happen in some cases, I recommend testing it with e.g. > lots of pushes in parallel with GNU Parallel. Here's a previous discussion of a race in concurrent updates to the same ref, even when the updates are all identical: http://news.gmane.org/find-root.php?group=gmane.comp.version-control.git&article=164636 In that thread, Peff outlines the lock procedure for refs: 1. get the lock 2. check and remember the sha1 3. release the lock 4. do some long-running work (like the actual push) 5. get the lock 6. check that the sha1 is the same as the remembered one 7. update the sha1 8. release the lock Angelo, in your case I think one of your concurrent updates would fail in step 6. As you say, this is after the changes have been uploaded. However, there's none of the file-overwriting that you fear, because the changes are stored in git's object database under their SHA hashes. So there'll only be an object-level collision if two parties upload the exact same object, in which case it doesn't matter. M. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: push race
On 15 October 2012 16:09, Ævar Arnfjörð Bjarmason wrote: > On Mon, Oct 15, 2012 at 11:14 AM, Angelo Borsotti > wrote: >> Hello, > > FWIW we have a lot of lemmings pushing to the same ref all the time at > $work, and while I've seen cases where: > > 1. Two clients try to push > 2. They both get the initial lock > 3. One of them fails to get the secondary lock (I think updating the ref) > > I've never seen cases where they clobber each other in #3 (and I would > have known from "dude, where's my commit that I just pushed" reports). Except that the error message is really cryptic. It definitely doesnt shout out "maybe you collided with someone elses push". Yves -- perl -Mre=debug -e "/just|another|perl|hacker/" -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: push race
On Mon, Oct 15, 2012 at 11:14 AM, Angelo Borsotti wrote: > Hello, FWIW we have a lot of lemmings pushing to the same ref all the time at $work, and while I've seen cases where: 1. Two clients try to push 2. They both get the initial lock 3. One of them fails to get the secondary lock (I think updating the ref) I've never seen cases where they clobber each other in #3 (and I would have known from "dude, where's my commit that I just pushed" reports). So while we could fix git to make sure there's no race condition such that two clients never get the #2 lock I haven't seen it cause actual data issues because of two clients getting the #3 lock. It might still happen in some cases, I recommend testing it with e.g. lots of pushes in parallel with GNU Parallel. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: push race
On Mon, Oct 15, 2012 at 6:05 PM, Matthieu Moy wrote: > Angelo Borsotti writes: > >> the push command checks first if the tips of the branches match those >> of the remote references, and if it does uploads the snapshot. > > The update does two things: upload objects to the database, and then > update the reference. Adding objects to the database does not change the > repository until the objects are reachable from a ref. Updating the ref > is usually done giving the expected old sha1, and locks the ref, so it > can't change in the meantime. > > I don't know this part of the code very well, but check refs.c for the C > part, and "git update-ref" for the plumbing interface. I think it's lock_any_ref_for_update(), which is called inside refs.c:update_ref(). -- Duy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: push race
Angelo Borsotti writes: > the push command checks first if the tips of the branches match those > of the remote references, and if it does uploads the snapshot. The update does two things: upload objects to the database, and then update the reference. Adding objects to the database does not change the repository until the objects are reachable from a ref. Updating the ref is usually done giving the expected old sha1, and locks the ref, so it can't change in the meantime. I don't know this part of the code very well, but check refs.c for the C part, and "git update-ref" for the plumbing interface. -- Matthieu Moy http://www-verimag.imag.fr/~moy/ -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html