Re: [PATCH] git exproll: steps to tackle gc aggression

2013-08-10 Thread Ramkumar Ramachandra
Jeff King wrote: [...] First off, thanks for the fabulous writeup. I hope more contributors read this, and get interested in tackling the problems. You are missing a step in the preparation of the delta list. If we already have a delta on-disk, we will check whether it is usable and keep a

Re: [PATCH] git exproll: steps to tackle gc aggression

2013-08-10 Thread Duy Nguyen
On Sat, Aug 10, 2013 at 3:42 PM, Ramkumar Ramachandra artag...@gmail.com wrote: As Junio mentioned, that is what --thin is about; the sender omits the base and the receiver adds it back in (index-pack --fix-thin). Yeah, I read about the --thin option. However, it's only for network-packs (i.e

Re: [PATCH] git exproll: steps to tackle gc aggression

2013-08-10 Thread Duy Nguyen
On Sat, Aug 10, 2013 at 4:24 PM, Duy Nguyen pclo...@gmail.com wrote: push has --thin turned off by default favoring server resources over network traffic, see a4503a1 (Make --no-thin the default in git-push to save server resources - 2007-09-09) Side note, I think the server should be able to

Re: [PATCH] git exproll: steps to tackle gc aggression

2013-08-10 Thread Jeff King
On Sat, Aug 10, 2013 at 02:12:46PM +0530, Ramkumar Ramachandra wrote: First off, thanks for the fabulous writeup. I hope more contributors read this, and get interested in tackling the problems. You're welcome. I'll just respond to a few questions/points you raised in your response: Yeah, I

Re: [PATCH] git exproll: steps to tackle gc aggression

2013-08-10 Thread Jeff King
On Sat, Aug 10, 2013 at 08:24:39AM +0700, Nguyen Thai Ngoc Duy wrote: the other end cannot use). You _might_ be able to get by with a kind of two-level hack: consider your main pack as group A and newly pushed packs as group B. Allow storing thin deltas on disk from group B against group

Re: [PATCH] git exproll: steps to tackle gc aggression

2013-08-10 Thread Duy Nguyen
On Sat, Aug 10, 2013 at 4:43 PM, Jeff King p...@peff.net wrote: push has --thin turned off by default favoring server resources over network traffic, see a4503a1 (Make --no-thin the default in git-push to save server resources - 2007-09-09) Hmm. I don't think that is the case anymore. If I

Re: [PATCH] git exproll: steps to tackle gc aggression

2013-08-10 Thread Ramkumar Ramachandra
Duy Nguyen wrote: Right. transport_get() is also run for push and it sets smart_options-thin = 1 unconditionally. So, it looks like smart http implies the thin capability. I think sop's patch (6 years ago) was about turning off thin on dumb http: can we check that before deciding that push

Re: [PATCH] git exproll: steps to tackle gc aggression

2013-08-10 Thread Duy Nguyen
On Sat, Aug 10, 2013 at 5:05 PM, Ramkumar Ramachandra artag...@gmail.com wrote: Duy Nguyen wrote: Right. transport_get() is also run for push and it sets smart_options-thin = 1 unconditionally. So, it looks like smart http implies the thin capability. smart_options is a bit misleading. This

Re: [PATCH] git exproll: steps to tackle gc aggression

2013-08-09 Thread Jeff King
On Fri, Aug 09, 2013 at 01:34:48AM +0530, Ramkumar Ramachandra wrote: Certainly. A push will never use an existing pack as-is, as it's very highly unlikely that the server requested exactly what gc --auto packed for us locally. Sure, undeltified objects in the pack are probably better for

Re: [PATCH] git exproll: steps to tackle gc aggression

2013-08-09 Thread Ramkumar Ramachandra
Jeff King wrote: It depends on what each side has it, doesn't it? We generally try to reuse on-disk deltas when we can, since they require no computation. If I have object A delta'd against B, and I know that the other side wants A and has B (or I am also sending B), I can simply send what I

Re: [PATCH] git exproll: steps to tackle gc aggression

2013-08-09 Thread Junio C Hamano
Ramkumar Ramachandra artag...@gmail.com writes: I'll raise some (hopefully interesting) points. Let's take the example of a simple push: I start send-pack, which in turn starts receive_pack on the server and connects its stdin/stdout to it (using git_connect). Now, it reads the (sha1, ref)

Re: [PATCH] git exproll: steps to tackle gc aggression

2013-08-09 Thread Jeff King
On Fri, Aug 09, 2013 at 07:04:23PM +0530, Ramkumar Ramachandra wrote: I'll raise some (hopefully interesting) points. Let's take the example of a simple push: I start send-pack, which in turn starts receive_pack on the server and connects its stdin/stdout to it (using git_connect). Now, it

Re: [PATCH] git exproll: steps to tackle gc aggression

2013-08-09 Thread Duy Nguyen
On Sat, Aug 10, 2013 at 5:16 AM, Jeff King p...@peff.net wrote: Another solution could involve not writing the duplicate of Y in the first place. The reason we do not store thin-packs on disk is that you run into problems with cycles in the delta graph (e.g., A deltas against B, which deltas

Re: [PATCH] git exproll: steps to tackle gc aggression

2013-08-09 Thread Junio C Hamano
Jeff King p...@peff.net writes: ... The reason we do not store thin-packs on disk is that you run into problems with cycles in the delta graph (e.g., A deltas against B, which deltas against C, which deltas against A; at one point you had a full copy of one object which let you create the

Re: [PATCH] git exproll: steps to tackle gc aggression

2013-08-08 Thread Junio C Hamano
Ramkumar Ramachandra artag...@gmail.com writes: Junio C Hamano wrote: Imagine we have a cheap way to enumerate the young objects without the usual history traversal. Before we discuss the advantages, can you outline how we can possibly get this data without actually walking downwards from

Re: [PATCH] git exproll: steps to tackle gc aggression

2013-08-08 Thread Ramkumar Ramachandra
Junio C Hamano wrote: it is not a problem for the pack that consolidates young objects into a single pack to contain some unreachable crufts. So far, we have never considered putting unreachable objects in packs. Let me ask the obvious question first: what happens when I push? Do I pack up all

Re: [PATCH] git exproll: steps to tackle gc aggression

2013-08-08 Thread Junio C Hamano
Ramkumar Ramachandra artag...@gmail.com writes: Junio C Hamano wrote: it is not a problem for the pack that consolidates young objects into a single pack to contain some unreachable crufts. So far, we have never considered putting unreachable objects in packs. Let me ask the obvious

Re: [PATCH] git exproll: steps to tackle gc aggression

2013-08-08 Thread Martin Fick
On Thursday, August 08, 2013 10:56:38 am Junio C Hamano wrote: I thought the discussion was about making the local gc cheaper, and the Imagine we have a cheap way was to address it by assuming that the daily pack young objects into a single pack can be sped up if we did not have to traverse

Re: [PATCH] git exproll: steps to tackle gc aggression

2013-08-08 Thread Ramkumar Ramachandra
Junio C Hamano wrote: So I do not see how that question is obvious. The question obviously pointless and misses the mark by wide margin? The question makes it obvious that whoever asks it does not understand how Git works? Shall we all sit and mourn over the fact that I don't understand how

Re: [PATCH] git exproll: steps to tackle gc aggression

2013-08-08 Thread Junio C Hamano
Martin Fick mf...@codeaurora.org writes: Assuming I understand what you are suggesting, would these young object likely still get deduped in an efficient way without doing history traversal (it sounds like they would)? Yes. The very first thing pack-object machinery does is to get the

Re: [PATCH] git exproll: steps to tackle gc aggression

2013-08-08 Thread Ramkumar Ramachandra
Junio C Hamano wrote: Martin Fick mf...@codeaurora.org writes: Assuming I understand what you are suggesting, would these young object likely still get deduped in an efficient way without doing history traversal (it sounds like they would)? Yes. The very first thing pack-object machinery

Re: [PATCH] git exproll: steps to tackle gc aggression

2013-08-08 Thread Junio C Hamano
Ramkumar Ramachandra artag...@gmail.com writes: I asked you a very simple question: what happens when I do git push? You asked does push send unreachable cruft? and I said No. Does that answer your question? 3. After a few days of working, the gc heuristics figure out that I have too much

Re: [PATCH] git exproll: steps to tackle gc aggression

2013-08-08 Thread Ramkumar Ramachandra
Junio C Hamano wrote: 3. After a few days of working, the gc heuristics figure out that I have too much garbage and too many packs; a cleanup is required. The gc --auto which doesn't tolerate fragmentation: it tries to put everything into one large pack. Today's gc --auto skims the history

Re: [PATCH] git exproll: steps to tackle gc aggression

2013-08-07 Thread Duy Nguyen
On Wed, Aug 7, 2013 at 7:10 AM, Martin Fick mf...@codeaurora.org wrote: I wonder if a simpler approach may be nearly efficient as this one: keep the largest pack out, repack the rest at fetch/push time so there are at most 2 packs at a time. Or we we could do the repack at 'gc --auto' time,

Re: [PATCH] git exproll: steps to tackle gc aggression

2013-08-06 Thread Duy Nguyen
On Tue, Aug 6, 2013 at 9:38 AM, Ramkumar Ramachandra artag...@gmail.com wrote: + Garbage collect using a pseudo logarithmic packfile maintenance + approach. This approach attempts to minimize packfile churn + by keeping several generations of varying

Re: [PATCH] git exproll: steps to tackle gc aggression

2013-08-06 Thread Junio C Hamano
Duy Nguyen pclo...@gmail.com writes: On Tue, Aug 6, 2013 at 9:38 AM, Ramkumar Ramachandra artag...@gmail.com wrote: + Garbage collect using a pseudo logarithmic packfile maintenance + approach. This approach attempts to minimize packfile churn +

Re: [PATCH] git exproll: steps to tackle gc aggression

2013-08-06 Thread Martin Fick
On Tuesday, August 06, 2013 06:24:50 am Duy Nguyen wrote: On Tue, Aug 6, 2013 at 9:38 AM, Ramkumar Ramachandra artag...@gmail.com wrote: + Garbage collect using a pseudo logarithmic packfile maintenance + approach. This approach attempts to minimize packfile

Re: [PATCH] git exproll: steps to tackle gc aggression

2013-08-06 Thread Martin Fick
On Monday, August 05, 2013 08:38:47 pm Ramkumar Ramachandra wrote: This is the rough explanation I wrote down after reading it: So, the problem is that my .git/objects/pack is polluted with little packs everytime I fetch (or push, if you're the server), and this is problematic from the

Re: [PATCH] git exproll: steps to tackle gc aggression

2013-08-06 Thread Ramkumar Ramachandra
Martin Fick wrote: So, it has me wondering if there isn't a more accurate way to estimate the new packfile without wasting a ton of time? I'm not sure there is. Adding the sizes of individual packs can be off by a lot, because your deltification will be more effective if you have more data to

Re: [PATCH] git exproll: steps to tackle gc aggression

2013-08-06 Thread Ramkumar Ramachandra
Junio C Hamano wrote: Imagine we have a cheap way to enumerate the young objects without the usual history traversal. Before we discuss the advantages, can you outline how we can possibly get this data without actually walking downwards from the roots (refs)? One way to do it is to pull data