Re: [PATCH 3/4] cat-file: add --batch-disk-sizes option

2013-07-11 Thread Junio C Hamano
Jeff King p...@peff.net writes:

 I started on this, and it turned out not to really be any simpler
 So I went ahead with the full formats for my re-roll. It turned out
 pretty reasonable, I think.

Thanks.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] cat-file: add --batch-disk-sizes option

2013-07-10 Thread Jeff King
On Sun, Jul 07, 2013 at 10:49:46AM -0700, Junio C Hamano wrote:

 Jeff King p...@peff.net writes:
 
  Perhaps we need
 
git cat-file --batch-format=%(disk-size) %(object)
 
  or similar.
 
 I agree with your reasoning.  It may be simpler to give an interface
 to ask for which pieces of info, e.g. --batch-cols=size,disksize,
 without giving the readers a flexible format.

I started on this, and it turned out not to really be any simpler. In
particular there is the question of whether

  git cat-file --batch-cols=size,type

is different from

  git cat-file --batch-cols=type,size

If so, then you are basically implementing the equivalent of a macro
format anyway (you have to parse it left to right to know the order).
And if not, you end up translating the column list into a bit-field, and
the boilerplate for adding a new item is about the same as for a macro
format.

So I went ahead with the full formats for my re-roll. It turned out
pretty reasonable, I think.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] cat-file: add --batch-disk-sizes option

2013-07-10 Thread Jeff King
On Mon, Jul 08, 2013 at 07:07:01PM +0530, Ramkumar Ramachandra wrote:

  There's also syntax sharing. I don't think each command should have
  its own syntax. f-e-r already has %(objectsize). If we plan to have a
  common syntax, perhaps %(disk-size) should be %(objectsize:disk) or
  something.
 
 Ofcourse.  I didn't notice %(objectsize); %(objectsize[:disk]) is a
 fine suggestion.
 
  Adding formatting to cat-file --batch from scratch could be
  another big chunk of code (that also comes with bugs, usually) and may
  or may not be compatible with the common syntax because of some
  oversight.
 
 Oh, I'm proposing that Peff implements just %H and
 %(objectsize[:disk]) for _now_, because that's what he wants.  It
 should be a tiny 20-line parser that's easy to swap out.

I went with %(objectname), %(objectsize), and %(objectsize:disk). The
former match for-each-ref, and the latter extends it in a natural-ish
way (though it is arguable whether foo:modifier should mean do
something with the foo value or this is like foo, but not quite). In
the long run, I would like to see long names for each atom, with short
aliases (so %H and %h for %(objectname) and %(objectname:short),
available everywhere).

But I think it is OK to start without the aliases, and then pick them up
as the implementations and interfaces are unified. IOW, it is OK to say
cat-file has not learned about %H yet, and later add it; we cannot
teach it %H now and then change our minds later.

  --batch-cols=... or --batch-disk-size would be simpler, but
  we might never be able to remove that code.
 
 Agreed.  The approach paints us into a design-corner, and must
 therefore be avoided.

I would say it is worth one of the other routes if it turned out to be
dramatically simpler. But having just done the work to add formats for
cat-file, it is really not too bad. It lacks some of the niceties of the
other formatters (e.g., colors), but again, we can always add them in
later as the implementations unify.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] cat-file: add --batch-disk-sizes option

2013-07-10 Thread Jeff King
On Sun, Jul 07, 2013 at 09:15:41PM +, brian m. carlson wrote:

 On Sun, Jul 07, 2013 at 06:09:49AM -0400, Jeff King wrote:
  +NOTE: The on-disk size reported is accurate, but care should be taken in
  +drawing conclusions about which refs or objects are responsible for disk
  +usage. The size of a packed non-delta object be much larger than the
 
 You probably meant may be here.   ^

Thanks, fixed in my re-roll.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] cat-file: add --batch-disk-sizes option

2013-07-09 Thread Ramkumar Ramachandra
Duy Nguyen wrote:
  - We might overlook something. The best way to avoid missing is
 finish and verify it.
  - A promise to do things later could happen really late, or never
 happens. As you are sastisfied with the functionality you have less
 motivation to clean the code. Meanwhile the maintainer takes extra
 maintenance cost.

I know.  You know what my counter-argument looks like already:

A promise to deliver a perfect series sometime in the future risks
never reaching that perfection, and stalling everyone else's work.
Even if we do manage to complete that perfect series, there is no
guarantee that we'll get sufficient reviewer-interest or traction for
merge.  You think people are more likely to look at a 50-part series
than a 15-part series?

Either way, I'm not interested in arguing: for now, I'll repost the
old 15-part series and try to get some reviews.  Start writing code,
and let's finish this thing.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] cat-file: add --batch-disk-sizes option

2013-07-08 Thread Duy Nguyen
un Mon, Jul 8, 2013 at 12:49 AM, Junio C Hamano gits...@pobox.com wrote:
 Jeff King p...@peff.net writes:

 Perhaps we need

   git cat-file --batch-format=%(disk-size) %(object)

 or similar.

This is what I wanted to do with the in for-each-ref's pretty
formatting [1]. I used to hack cat-file --batch to extract info I
needed for experimenting with various pack index extensions. If you
are not in hurry, maybe we can introduce something similar to your
syntax, but applicable for all for-each-ref, branch and log family.
Ram, are you still interested in the awesome branch series?

 I agree with your reasoning.  It may be simpler to give an interface
 to ask for which pieces of info, e.g. --batch-cols=size,disksize,
 without giving the readers a flexible format.

[1] http://thread.gmane.org/gmane.comp.version-control.git/227057/focus=227223
--
Duy
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] cat-file: add --batch-disk-sizes option

2013-07-08 Thread Ramkumar Ramachandra
Duy Nguyen wrote:
 Ram, are you still interested in the awesome branch series?

Yep, but it got stalled due to lack of reviewer-interest :/

I'm a bit under the weather at the moment, but it's good to see that
you're back: let's finish this soon.

 Perhaps we need

   git cat-file --batch-format=%(disk-size) %(object)

 or similar.

 This is what I wanted to do with the in for-each-ref's pretty
 formatting [1]. I used to hack cat-file --batch to extract info I
 needed for experimenting with various pack index extensions. If you
 are not in hurry, maybe we can introduce something similar to your
 syntax, but applicable for all for-each-ref, branch and log family.

I'm still quite confused about this grand plan.  We have short
commit-specific format specifiers that don't work with refs, among
several other quirks in [1].  I personally think we should absolutely
stay away from short format-specifiers (like %H, %f, %e; we'll soon
run out of letters, and nobody can tell what they are without the
documentation anyway) for the new options, and just start adding new
long-form ones as and when they are necessary.  I think refname:short,
upstream:track, upstream:trackshort are very sensible choices, and
that we should continue along that line.  I'm fine with
format-specifiers having meanings only in certain contexts as long as
we document it properly (how can we possibly get %(refname) to mean
something sensible in cat-file?).

As far as this series is concerned, I think Peff can implement %H and
%(object:[disk-]size) locally without worrying about code-sharing or
waiting for us.  Then, after the for-each-ref-pretty thing matures, we
can just replace the code underneath.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] cat-file: add --batch-disk-sizes option

2013-07-08 Thread Duy Nguyen
On Mon, Jul 8, 2013 at 7:00 PM, Ramkumar Ramachandra artag...@gmail.com wrote:
 This is what I wanted to do with the in for-each-ref's pretty
 formatting [1]. I used to hack cat-file --batch to extract info I
 needed for experimenting with various pack index extensions. If you
 are not in hurry, maybe we can introduce something similar to your
 syntax, but applicable for all for-each-ref, branch and log family.

 I'm still quite confused about this grand plan.  We have short
 commit-specific format specifiers that don't work with refs, among
 several other quirks in [1].  I personally think we should absolutely
 stay away from short format-specifiers (like %H, %f, %e; we'll soon
 run out of letters, and nobody can tell what they are without the
 documentation anyway) for the new options, and just start adding new
 long-form ones as and when they are necessary.  I think refname:short,
 upstream:track, upstream:trackshort are very sensible choices, and
 that we should continue along that line.  I'm fine with
 format-specifiers having meanings only in certain contexts as long as
 we document it properly (how can we possibly get %(refname) to mean
 something sensible in cat-file?).

The short/long naming is the least I worry about. We could add long
names to pretty specifiers. The thing about the last attempt is, you
add some extra things on top elsewhere, but format_commit_item code
may need to be aware of those changes, which are not obvious when
sombody just focuses on format_commit_item. Having all specifiers in
one place would be better (hence no hooks, no callbacks) because we
get a full picture. And yes we need to deal with specifers that make
no sense in certain context.

All that makes changes bigger, but when format_commit_item (now just
format_item) knows how to deal with all kinds of objects and refs,
it becomes a small declaration language to extract things out of git.
You can use it for pretty printing or mass extraction in the case of
cat-file --batch. cat-file --batch is just some bonus on top,
mostly to exercise that we do it right.

 As far as this series is concerned, I think Peff can implement %H and
 %(object:[disk-]size) locally without worrying about code-sharing or
 waiting for us.  Then, after the for-each-ref-pretty thing matures, we
 can just replace the code underneath.

There's also syntax sharing. I don't think each command should have
its own syntax. f-e-r already has %(objectsize). If we plan to have a
common syntax, perhaps %(disk-size) should be %(objectsize:disk) or
something. Adding formatting to cat-file --batch from scratch could be
another big chunk of code (that also comes with bugs, usually) and may
or may not be compatible with the common syntax because of some
oversight. --batch-cols=... or --batch-disk-size would be simpler, but
we might never be able to remove that code.
--
Duy
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] cat-file: add --batch-disk-sizes option

2013-07-08 Thread Ramkumar Ramachandra
Duy Nguyen wrote:
 The short/long naming is the least I worry about. We could add long
 names to pretty specifiers. The thing about the last attempt is, you
 add some extra things on top elsewhere, but format_commit_item code
 may need to be aware of those changes, which are not obvious when
 sombody just focuses on format_commit_item. Having all specifiers in
 one place would be better (hence no hooks, no callbacks) because we
 get a full picture. And yes we need to deal with specifers that make
 no sense in certain context.

Yeah, it would certainly be nice to have all the format-specifiers
that one unified parser acts on, but isn't this just a matter of
refactoring?  Shouldn't we be starting with cheap callbacks, get
things working, and guard against regressions in the refactoring phase
first?  How else do you propose to start out?

 There's also syntax sharing. I don't think each command should have
 its own syntax. f-e-r already has %(objectsize). If we plan to have a
 common syntax, perhaps %(disk-size) should be %(objectsize:disk) or
 something.

Ofcourse.  I didn't notice %(objectsize); %(objectsize[:disk]) is a
fine suggestion.

 Adding formatting to cat-file --batch from scratch could be
 another big chunk of code (that also comes with bugs, usually) and may
 or may not be compatible with the common syntax because of some
 oversight.

Oh, I'm proposing that Peff implements just %H and
%(objectsize[:disk]) for _now_, because that's what he wants.  It
should be a tiny 20-line parser that's easy to swap out.

 --batch-cols=... or --batch-disk-size would be simpler, but
 we might never be able to remove that code.

Agreed.  The approach paints us into a design-corner, and must
therefore be avoided.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] cat-file: add --batch-disk-sizes option

2013-07-08 Thread Junio C Hamano
Duy Nguyen pclo...@gmail.com writes:

 There's also syntax sharing. I don't think each command should have
 its own syntax. f-e-r already has %(objectsize). If we plan to have a
 common syntax, perhaps %(disk-size) should be %(objectsize:disk) or
 something. Adding formatting to cat-file --batch from scratch could be
 another big chunk of code (that also comes with bugs, usually) and may
 or may not be compatible with the common syntax because of some
 oversight. --batch-cols=... or --batch-disk-size would be simpler, but
 we might never be able to remove that code.

True, but cat-file being a low-level plumbing, I actually am not all
that convinced that it should even know the custom formatting.
Configurability and customizability may look always good, but that
is true only until one realizes that they come with associated cost.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] cat-file: add --batch-disk-sizes option

2013-07-08 Thread Duy Nguyen
On Mon, Jul 8, 2013 at 8:37 PM, Ramkumar Ramachandra artag...@gmail.com wrote:
 Duy Nguyen wrote:
 The short/long naming is the least I worry about. We could add long
 names to pretty specifiers. The thing about the last attempt is, you
 add some extra things on top elsewhere, but format_commit_item code
 may need to be aware of those changes, which are not obvious when
 sombody just focuses on format_commit_item. Having all specifiers in
 one place would be better (hence no hooks, no callbacks) because we
 get a full picture. And yes we need to deal with specifers that make
 no sense in certain context.

 Yeah, it would certainly be nice to have all the format-specifiers
 that one unified parser acts on, but isn't this just a matter of
 refactoring?  Shouldn't we be starting with cheap callbacks, get
 things working, and guard against regressions in the refactoring phase
 first?  How else do you propose to start out?

I prefer a series merged to master is a complete change. If
refactoring is needed, it should be done as part of the series as
well. Two reasons:
 - We might overlook something. The best way to avoid missing is
finish and verify it.
 - A promise to do things later could happen really late, or never
happens. As you are sastisfied with the functionality you have less
motivation to clean the code. Meanwhile the maintainer takes extra
maintenance cost.
--
Duy
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] cat-file: add --batch-disk-sizes option

2013-07-07 Thread Junio C Hamano
Jeff King p...@peff.net writes:

 Perhaps we need

   git cat-file --batch-format=%(disk-size) %(object)

 or similar.

I agree with your reasoning.  It may be simpler to give an interface
to ask for which pieces of info, e.g. --batch-cols=size,disksize,
without giving the readers a flexible format.

 +NOTE: The on-disk size reported is accurate, but care should be taken in
 +drawing conclusions about which refs or objects are responsible for disk
 +usage. The size of a packed non-delta object be much larger than the
 +size of objects which delta against it, but the choice of which object
 +is the base and which is the delta is arbitrary and is subject to change
 +during a repack. Note also that multiple copies of an object may be
 +present in the object database; in this case, it is undefined which
 +copy's size will be reported.

This is a good note to leave to the readers. I was wondering how
valid to accuse that B is taking a lot of space compared to C when
you have three objects A, B and C (in decreasing order of on-disk
footprint) when A is huge and C is a small delta against A and B is
independent.  The role of A and C in their delta chain could easily
be swapped during the next full repack and then C will appear a lot
larger than B.

It might be interesting to measure the total disk footprint of an
entire delta family (the objects that delta against the same
base).  You may find out that hello.c with a manageable size have
very many revisions and overall have a larger on-disk footprint than
a single copy of unchanging help.mov clip used in the documentation
does, which may be an interesting observation to make.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] cat-file: add --batch-disk-sizes option

2013-07-07 Thread Jeff King
On Sun, Jul 07, 2013 at 10:49:46AM -0700, Junio C Hamano wrote:

 Jeff King p...@peff.net writes:
 
  Perhaps we need
 
git cat-file --batch-format=%(disk-size) %(object)
 
  or similar.
 
 I agree with your reasoning.  It may be simpler to give an interface
 to ask for which pieces of info, e.g. --batch-cols=size,disksize,
 without giving the readers a flexible format.

Yeah, that is probably a lot more sane. That would be sufficient for my
use, I doubt anyone really wants the full format, and it would be easy
to add it later if we are wrong. It would also be easy to add other
items from the sha1_object_info_extended list, too (e.g.,
loose/cached/packed).

I'll do that in my re-roll.

  +NOTE: The on-disk size reported is accurate, but care should be taken in
  +drawing conclusions about which refs or objects are responsible for disk
  +usage. [...]
 
 This is a good note to leave to the readers. I was wondering how
 valid to accuse that B is taking a lot of space compared to C when
 you have three objects A, B and C (in decreasing order of on-disk
 footprint) when A is huge and C is a small delta against A and B is
 independent.  The role of A and C in their delta chain could easily
 be swapped during the next full repack and then C will appear a lot
 larger than B.

Yeah. I exercise a lot of human analysis when I use this tool myself.
What I am usually looking for is that somebody has forked a 100M repo,
and then dumped 2G of extra data on top. Those cases are not all that
hard to spot, and would not usually change too much in a repack.

 It might be interesting to measure the total disk footprint of an
 entire delta family (the objects that delta against the same
 base).  You may find out that hello.c with a manageable size have
 very many revisions and overall have a larger on-disk footprint than
 a single copy of unchanging help.mov clip used in the documentation
 does, which may be an interesting observation to make.

Yeah, that is an interesting stat, though I have not had a need for it
myself. Certainly you could do:

  git rev-list --objects --all |
  grep ' hello.c$' |
  cut -d' ' -f1 |
  git cat-file --batch-disk-sizes

to see hello.c's size. But I cannot think offhand of a way to get the
list of objects that are in a delta chain together (potentially crossing
path boundaries), short of parsing verfiy-pack output myself. I think it
is orthogonal to this patch, though. This exposes more information about
objects themselves; it would be up to another patch to help discover and
narrow the list of interesting objects.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] cat-file: add --batch-disk-sizes option

2013-07-07 Thread brian m. carlson
On Sun, Jul 07, 2013 at 06:09:49AM -0400, Jeff King wrote:
 +NOTE: The on-disk size reported is accurate, but care should be taken in
 +drawing conclusions about which refs or objects are responsible for disk
 +usage. The size of a packed non-delta object be much larger than the

You probably meant may be here.   ^

-- 
brian m. carlson / brian with sandals: Houston, Texas, US
+1 832 623 2791 | http://www.crustytoothpaste.net/~bmc | My opinion only
OpenPGP: RSA v4 4096b: 88AC E9B2 9196 305B A994 7552 F1BA 225C 0223 B187


signature.asc
Description: Digital signature