Re: [PATCH 0/6] receive-pack: quarantine pushed objects
On Sun, Oct 2, 2016 at 3:02 PM, Jeff Kingwrote: > On Sun, Oct 02, 2016 at 11:20:59AM +0200, Christian Couder wrote: > >> I wonder if the patch you sent in: >> >> https://public-inbox.org/git/20160816144642.5ikkta4l5hyx6...@sigill.intra.peff.net/ >> >> is still useful or not. > > It is potentially still useful for other code paths besides > receive-pack. But if the main concern is pushes, then yeah, I think it > is not really doing anything. > >> I guess if we fail the receive-pack because the pack is bigger than >> receive.maxInputSize, then the "quarantine" directory will also be >> removed, so the part of the pack that we received before failing the >> receive-pack will be deleted. > > Correct. _Any_ failure up to the tmp_objdir_migrate() call will drop the > objects. So that includes index-pack failing for any reason. Great, thanks for explaining! >> > These two patches set that up by letting index-pack and pre-receive >> > know that quarantine path and use it to store arbitrary files that >> > _don't_ get migrated to the main object database (i.e., the log file >> > mentioned above). >> >> It would be nice to have a diffstat for the whole series. > > You mean in the cover letter? I do not mind including it if people find > them useful, but I personally have always just found them to be clutter > at that level. I think it can help to quickly get an idea about what the series impacts, and it would have made it easier for me to see that the changes in the patch you sent previously (https://public-inbox.org/git/20160816144642.5ikkta4l5hyx6...@sigill.intra.peff.net/) are not part of this series. Thanks anyway, Christian.
Re: [PATCH 0/6] receive-pack: quarantine pushed objects
On Sun, Oct 02, 2016 at 11:20:59AM +0200, Christian Couder wrote: > On Fri, Sep 30, 2016 at 9:35 PM, Jeff Kingwrote: > > I've mentioned before on the list that GitHub "quarantines" objects > > while the pre-receive hook runs. Here are the patches to implement > > that. > > Great! Thanks for upstreaming these patches! > > I wonder if the patch you sent in: > > https://public-inbox.org/git/20160816144642.5ikkta4l5hyx6...@sigill.intra.peff.net/ > > is still useful or not. It is potentially still useful for other code paths besides receive-pack. But if the main concern is pushes, then yeah, I think it is not really doing anything. > I guess if we fail the receive-pack because the pack is bigger than > receive.maxInputSize, then the "quarantine" directory will also be > removed, so the part of the pack that we received before failing the > receive-pack will be deleted. Correct. _Any_ failure up to the tmp_objdir_migrate() call will drop the objects. So that includes index-pack failing for any reason. > > These two patches set that up by letting index-pack and pre-receive > > know that quarantine path and use it to store arbitrary files that > > _don't_ get migrated to the main object database (i.e., the log file > > mentioned above). > > It would be nice to have a diffstat for the whole series. You mean in the cover letter? I do not mind including it if people find them useful, but I personally have always just found them to be clutter at that level. -Peff
Re: [PATCH 0/6] receive-pack: quarantine pushed objects
On Fri, Sep 30, 2016 at 9:35 PM, Jeff Kingwrote: > I've mentioned before on the list that GitHub "quarantines" objects > while the pre-receive hook runs. Here are the patches to implement > that. Great! Thanks for upstreaming these patches! I wonder if the patch you sent in: https://public-inbox.org/git/20160816144642.5ikkta4l5hyx6...@sigill.intra.peff.net/ is still useful or not. > The basic problem is that as-is, index-pack admits pushed objects into > the main object database immediately, before the pre-receive hook runs. > It _has_ to, since the hook needs to be able to actually look at the > objects. However, this means that if the pre-receive hook rejects the > push, we still end up with the objects in the repository. We can't just > delete them as temporary files, because we don't know what other > processes might have started referencing them. > > The solution here is to push into a "quarantine" directory that is > accessible only to pre-receive, check_connected(), etc, and only > move the objects into the main object database after we've finished > those basic checks. I guess if we fail the receive-pack because the pack is bigger than receive.maxInputSize, then the "quarantine" directory will also be removed, so the part of the pack that we received before failing the receive-pack will be deleted. [...] > These two patches set that up by letting index-pack and pre-receive > know that quarantine path and use it to store arbitrary files that > _don't_ get migrated to the main object database (i.e., the log file > mentioned above). It would be nice to have a diffstat for the whole series. Thanks, Christian.
[PATCH 0/6] receive-pack: quarantine pushed objects
I've mentioned before on the list that GitHub "quarantines" objects while the pre-receive hook runs. Here are the patches to implement that. The basic problem is that as-is, index-pack admits pushed objects into the main object database immediately, before the pre-receive hook runs. It _has_ to, since the hook needs to be able to actually look at the objects. However, this means that if the pre-receive hook rejects the push, we still end up with the objects in the repository. We can't just delete them as temporary files, because we don't know what other processes might have started referencing them. The solution here is to push into a "quarantine" directory that is accessible only to pre-receive, check_connected(), etc, and only move the objects into the main object database after we've finished those basic checks. One of the things we use it for at GitHub is object-size policy, which we implement via a pre-receive hook (sort of; see below). This scheme has been in use for about 2 years, though I did do a fair bit of tweaking to make it ready for upstream (squashing bugfixes and merges from upstream that came later, along with polishing a few rough edges I saw while doing so). So I may have introduced new bugs. :) The patches are: [1/6]: check_connected: accept an env argument [2/6]: sha1_file: always allow relative paths to alternates These two are preparatory. [3/6]: tmp-objdir: introduce API for temporary object directories [4/6]: receive-pack: quarantine objects until pre-receive accepts This is the interesting part. [5/6]: tmp-objdir: put quarantine information in the environment [6/6]: tmp-objdir: do not migrate files starting with '.' These are two changes that I ended up doing later to support another series. They're not strictly necessary here, but I think they're worth including now, as they change the visible behavior in minor ways. It seems like a good idea to start with what I think should be the final behavior. The other series is basically an optimization for the object-size policy. Without it, you are stuck walking the graph again in the pre-receive hook to find the new objects and check their sizes. But index-pack can do that for you very cheaply; it has the size of each object already. But it _doesn't_ produce nice error messages; it has no idea at what path the objects are found, and it doesn't know what kind of advice it should give the user. So what we can do is ask index-pack to make a note of any objects larger than N bytes, and write their sha1 and size into a file in the quarantine path. Then the pre-receive hook can look in that log and generate any nice message it wants. In the common case, the log is empty, and it does not have to do any work at all. These two patches set that up by letting index-pack and pre-receive know that quarantine path and use it to store arbitrary files that _don't_ get migrated to the main object database (i.e., the log file mentioned above). -Peff