Re: [PATCH 3/4] cat-file: add --batch-disk-sizes option
Jeff King p...@peff.net writes: I started on this, and it turned out not to really be any simpler So I went ahead with the full formats for my re-roll. It turned out pretty reasonable, I think. Thanks. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] cat-file: add --batch-disk-sizes option
On Sun, Jul 07, 2013 at 10:49:46AM -0700, Junio C Hamano wrote: Jeff King p...@peff.net writes: Perhaps we need git cat-file --batch-format=%(disk-size) %(object) or similar. I agree with your reasoning. It may be simpler to give an interface to ask for which pieces of info, e.g. --batch-cols=size,disksize, without giving the readers a flexible format. I started on this, and it turned out not to really be any simpler. In particular there is the question of whether git cat-file --batch-cols=size,type is different from git cat-file --batch-cols=type,size If so, then you are basically implementing the equivalent of a macro format anyway (you have to parse it left to right to know the order). And if not, you end up translating the column list into a bit-field, and the boilerplate for adding a new item is about the same as for a macro format. So I went ahead with the full formats for my re-roll. It turned out pretty reasonable, I think. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] cat-file: add --batch-disk-sizes option
On Mon, Jul 08, 2013 at 07:07:01PM +0530, Ramkumar Ramachandra wrote: There's also syntax sharing. I don't think each command should have its own syntax. f-e-r already has %(objectsize). If we plan to have a common syntax, perhaps %(disk-size) should be %(objectsize:disk) or something. Ofcourse. I didn't notice %(objectsize); %(objectsize[:disk]) is a fine suggestion. Adding formatting to cat-file --batch from scratch could be another big chunk of code (that also comes with bugs, usually) and may or may not be compatible with the common syntax because of some oversight. Oh, I'm proposing that Peff implements just %H and %(objectsize[:disk]) for _now_, because that's what he wants. It should be a tiny 20-line parser that's easy to swap out. I went with %(objectname), %(objectsize), and %(objectsize:disk). The former match for-each-ref, and the latter extends it in a natural-ish way (though it is arguable whether foo:modifier should mean do something with the foo value or this is like foo, but not quite). In the long run, I would like to see long names for each atom, with short aliases (so %H and %h for %(objectname) and %(objectname:short), available everywhere). But I think it is OK to start without the aliases, and then pick them up as the implementations and interfaces are unified. IOW, it is OK to say cat-file has not learned about %H yet, and later add it; we cannot teach it %H now and then change our minds later. --batch-cols=... or --batch-disk-size would be simpler, but we might never be able to remove that code. Agreed. The approach paints us into a design-corner, and must therefore be avoided. I would say it is worth one of the other routes if it turned out to be dramatically simpler. But having just done the work to add formats for cat-file, it is really not too bad. It lacks some of the niceties of the other formatters (e.g., colors), but again, we can always add them in later as the implementations unify. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] cat-file: add --batch-disk-sizes option
On Sun, Jul 07, 2013 at 09:15:41PM +, brian m. carlson wrote: On Sun, Jul 07, 2013 at 06:09:49AM -0400, Jeff King wrote: +NOTE: The on-disk size reported is accurate, but care should be taken in +drawing conclusions about which refs or objects are responsible for disk +usage. The size of a packed non-delta object be much larger than the You probably meant may be here. ^ Thanks, fixed in my re-roll. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] cat-file: add --batch-disk-sizes option
Duy Nguyen wrote: - We might overlook something. The best way to avoid missing is finish and verify it. - A promise to do things later could happen really late, or never happens. As you are sastisfied with the functionality you have less motivation to clean the code. Meanwhile the maintainer takes extra maintenance cost. I know. You know what my counter-argument looks like already: A promise to deliver a perfect series sometime in the future risks never reaching that perfection, and stalling everyone else's work. Even if we do manage to complete that perfect series, there is no guarantee that we'll get sufficient reviewer-interest or traction for merge. You think people are more likely to look at a 50-part series than a 15-part series? Either way, I'm not interested in arguing: for now, I'll repost the old 15-part series and try to get some reviews. Start writing code, and let's finish this thing. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] cat-file: add --batch-disk-sizes option
un Mon, Jul 8, 2013 at 12:49 AM, Junio C Hamano gits...@pobox.com wrote: Jeff King p...@peff.net writes: Perhaps we need git cat-file --batch-format=%(disk-size) %(object) or similar. This is what I wanted to do with the in for-each-ref's pretty formatting [1]. I used to hack cat-file --batch to extract info I needed for experimenting with various pack index extensions. If you are not in hurry, maybe we can introduce something similar to your syntax, but applicable for all for-each-ref, branch and log family. Ram, are you still interested in the awesome branch series? I agree with your reasoning. It may be simpler to give an interface to ask for which pieces of info, e.g. --batch-cols=size,disksize, without giving the readers a flexible format. [1] http://thread.gmane.org/gmane.comp.version-control.git/227057/focus=227223 -- Duy -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] cat-file: add --batch-disk-sizes option
Duy Nguyen wrote: Ram, are you still interested in the awesome branch series? Yep, but it got stalled due to lack of reviewer-interest :/ I'm a bit under the weather at the moment, but it's good to see that you're back: let's finish this soon. Perhaps we need git cat-file --batch-format=%(disk-size) %(object) or similar. This is what I wanted to do with the in for-each-ref's pretty formatting [1]. I used to hack cat-file --batch to extract info I needed for experimenting with various pack index extensions. If you are not in hurry, maybe we can introduce something similar to your syntax, but applicable for all for-each-ref, branch and log family. I'm still quite confused about this grand plan. We have short commit-specific format specifiers that don't work with refs, among several other quirks in [1]. I personally think we should absolutely stay away from short format-specifiers (like %H, %f, %e; we'll soon run out of letters, and nobody can tell what they are without the documentation anyway) for the new options, and just start adding new long-form ones as and when they are necessary. I think refname:short, upstream:track, upstream:trackshort are very sensible choices, and that we should continue along that line. I'm fine with format-specifiers having meanings only in certain contexts as long as we document it properly (how can we possibly get %(refname) to mean something sensible in cat-file?). As far as this series is concerned, I think Peff can implement %H and %(object:[disk-]size) locally without worrying about code-sharing or waiting for us. Then, after the for-each-ref-pretty thing matures, we can just replace the code underneath. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] cat-file: add --batch-disk-sizes option
On Mon, Jul 8, 2013 at 7:00 PM, Ramkumar Ramachandra artag...@gmail.com wrote: This is what I wanted to do with the in for-each-ref's pretty formatting [1]. I used to hack cat-file --batch to extract info I needed for experimenting with various pack index extensions. If you are not in hurry, maybe we can introduce something similar to your syntax, but applicable for all for-each-ref, branch and log family. I'm still quite confused about this grand plan. We have short commit-specific format specifiers that don't work with refs, among several other quirks in [1]. I personally think we should absolutely stay away from short format-specifiers (like %H, %f, %e; we'll soon run out of letters, and nobody can tell what they are without the documentation anyway) for the new options, and just start adding new long-form ones as and when they are necessary. I think refname:short, upstream:track, upstream:trackshort are very sensible choices, and that we should continue along that line. I'm fine with format-specifiers having meanings only in certain contexts as long as we document it properly (how can we possibly get %(refname) to mean something sensible in cat-file?). The short/long naming is the least I worry about. We could add long names to pretty specifiers. The thing about the last attempt is, you add some extra things on top elsewhere, but format_commit_item code may need to be aware of those changes, which are not obvious when sombody just focuses on format_commit_item. Having all specifiers in one place would be better (hence no hooks, no callbacks) because we get a full picture. And yes we need to deal with specifers that make no sense in certain context. All that makes changes bigger, but when format_commit_item (now just format_item) knows how to deal with all kinds of objects and refs, it becomes a small declaration language to extract things out of git. You can use it for pretty printing or mass extraction in the case of cat-file --batch. cat-file --batch is just some bonus on top, mostly to exercise that we do it right. As far as this series is concerned, I think Peff can implement %H and %(object:[disk-]size) locally without worrying about code-sharing or waiting for us. Then, after the for-each-ref-pretty thing matures, we can just replace the code underneath. There's also syntax sharing. I don't think each command should have its own syntax. f-e-r already has %(objectsize). If we plan to have a common syntax, perhaps %(disk-size) should be %(objectsize:disk) or something. Adding formatting to cat-file --batch from scratch could be another big chunk of code (that also comes with bugs, usually) and may or may not be compatible with the common syntax because of some oversight. --batch-cols=... or --batch-disk-size would be simpler, but we might never be able to remove that code. -- Duy -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] cat-file: add --batch-disk-sizes option
Duy Nguyen wrote: The short/long naming is the least I worry about. We could add long names to pretty specifiers. The thing about the last attempt is, you add some extra things on top elsewhere, but format_commit_item code may need to be aware of those changes, which are not obvious when sombody just focuses on format_commit_item. Having all specifiers in one place would be better (hence no hooks, no callbacks) because we get a full picture. And yes we need to deal with specifers that make no sense in certain context. Yeah, it would certainly be nice to have all the format-specifiers that one unified parser acts on, but isn't this just a matter of refactoring? Shouldn't we be starting with cheap callbacks, get things working, and guard against regressions in the refactoring phase first? How else do you propose to start out? There's also syntax sharing. I don't think each command should have its own syntax. f-e-r already has %(objectsize). If we plan to have a common syntax, perhaps %(disk-size) should be %(objectsize:disk) or something. Ofcourse. I didn't notice %(objectsize); %(objectsize[:disk]) is a fine suggestion. Adding formatting to cat-file --batch from scratch could be another big chunk of code (that also comes with bugs, usually) and may or may not be compatible with the common syntax because of some oversight. Oh, I'm proposing that Peff implements just %H and %(objectsize[:disk]) for _now_, because that's what he wants. It should be a tiny 20-line parser that's easy to swap out. --batch-cols=... or --batch-disk-size would be simpler, but we might never be able to remove that code. Agreed. The approach paints us into a design-corner, and must therefore be avoided. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] cat-file: add --batch-disk-sizes option
Duy Nguyen pclo...@gmail.com writes: There's also syntax sharing. I don't think each command should have its own syntax. f-e-r already has %(objectsize). If we plan to have a common syntax, perhaps %(disk-size) should be %(objectsize:disk) or something. Adding formatting to cat-file --batch from scratch could be another big chunk of code (that also comes with bugs, usually) and may or may not be compatible with the common syntax because of some oversight. --batch-cols=... or --batch-disk-size would be simpler, but we might never be able to remove that code. True, but cat-file being a low-level plumbing, I actually am not all that convinced that it should even know the custom formatting. Configurability and customizability may look always good, but that is true only until one realizes that they come with associated cost. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] cat-file: add --batch-disk-sizes option
On Mon, Jul 8, 2013 at 8:37 PM, Ramkumar Ramachandra artag...@gmail.com wrote: Duy Nguyen wrote: The short/long naming is the least I worry about. We could add long names to pretty specifiers. The thing about the last attempt is, you add some extra things on top elsewhere, but format_commit_item code may need to be aware of those changes, which are not obvious when sombody just focuses on format_commit_item. Having all specifiers in one place would be better (hence no hooks, no callbacks) because we get a full picture. And yes we need to deal with specifers that make no sense in certain context. Yeah, it would certainly be nice to have all the format-specifiers that one unified parser acts on, but isn't this just a matter of refactoring? Shouldn't we be starting with cheap callbacks, get things working, and guard against regressions in the refactoring phase first? How else do you propose to start out? I prefer a series merged to master is a complete change. If refactoring is needed, it should be done as part of the series as well. Two reasons: - We might overlook something. The best way to avoid missing is finish and verify it. - A promise to do things later could happen really late, or never happens. As you are sastisfied with the functionality you have less motivation to clean the code. Meanwhile the maintainer takes extra maintenance cost. -- Duy -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] cat-file: add --batch-disk-sizes option
Jeff King p...@peff.net writes: Perhaps we need git cat-file --batch-format=%(disk-size) %(object) or similar. I agree with your reasoning. It may be simpler to give an interface to ask for which pieces of info, e.g. --batch-cols=size,disksize, without giving the readers a flexible format. +NOTE: The on-disk size reported is accurate, but care should be taken in +drawing conclusions about which refs or objects are responsible for disk +usage. The size of a packed non-delta object be much larger than the +size of objects which delta against it, but the choice of which object +is the base and which is the delta is arbitrary and is subject to change +during a repack. Note also that multiple copies of an object may be +present in the object database; in this case, it is undefined which +copy's size will be reported. This is a good note to leave to the readers. I was wondering how valid to accuse that B is taking a lot of space compared to C when you have three objects A, B and C (in decreasing order of on-disk footprint) when A is huge and C is a small delta against A and B is independent. The role of A and C in their delta chain could easily be swapped during the next full repack and then C will appear a lot larger than B. It might be interesting to measure the total disk footprint of an entire delta family (the objects that delta against the same base). You may find out that hello.c with a manageable size have very many revisions and overall have a larger on-disk footprint than a single copy of unchanging help.mov clip used in the documentation does, which may be an interesting observation to make. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] cat-file: add --batch-disk-sizes option
On Sun, Jul 07, 2013 at 10:49:46AM -0700, Junio C Hamano wrote: Jeff King p...@peff.net writes: Perhaps we need git cat-file --batch-format=%(disk-size) %(object) or similar. I agree with your reasoning. It may be simpler to give an interface to ask for which pieces of info, e.g. --batch-cols=size,disksize, without giving the readers a flexible format. Yeah, that is probably a lot more sane. That would be sufficient for my use, I doubt anyone really wants the full format, and it would be easy to add it later if we are wrong. It would also be easy to add other items from the sha1_object_info_extended list, too (e.g., loose/cached/packed). I'll do that in my re-roll. +NOTE: The on-disk size reported is accurate, but care should be taken in +drawing conclusions about which refs or objects are responsible for disk +usage. [...] This is a good note to leave to the readers. I was wondering how valid to accuse that B is taking a lot of space compared to C when you have three objects A, B and C (in decreasing order of on-disk footprint) when A is huge and C is a small delta against A and B is independent. The role of A and C in their delta chain could easily be swapped during the next full repack and then C will appear a lot larger than B. Yeah. I exercise a lot of human analysis when I use this tool myself. What I am usually looking for is that somebody has forked a 100M repo, and then dumped 2G of extra data on top. Those cases are not all that hard to spot, and would not usually change too much in a repack. It might be interesting to measure the total disk footprint of an entire delta family (the objects that delta against the same base). You may find out that hello.c with a manageable size have very many revisions and overall have a larger on-disk footprint than a single copy of unchanging help.mov clip used in the documentation does, which may be an interesting observation to make. Yeah, that is an interesting stat, though I have not had a need for it myself. Certainly you could do: git rev-list --objects --all | grep ' hello.c$' | cut -d' ' -f1 | git cat-file --batch-disk-sizes to see hello.c's size. But I cannot think offhand of a way to get the list of objects that are in a delta chain together (potentially crossing path boundaries), short of parsing verfiy-pack output myself. I think it is orthogonal to this patch, though. This exposes more information about objects themselves; it would be up to another patch to help discover and narrow the list of interesting objects. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] cat-file: add --batch-disk-sizes option
On Sun, Jul 07, 2013 at 06:09:49AM -0400, Jeff King wrote: +NOTE: The on-disk size reported is accurate, but care should be taken in +drawing conclusions about which refs or objects are responsible for disk +usage. The size of a packed non-delta object be much larger than the You probably meant may be here. ^ -- brian m. carlson / brian with sandals: Houston, Texas, US +1 832 623 2791 | http://www.crustytoothpaste.net/~bmc | My opinion only OpenPGP: RSA v4 4096b: 88AC E9B2 9196 305B A994 7552 F1BA 225C 0223 B187 signature.asc Description: Digital signature