Re: [PATCH 2/2] cat-file: handle --batch format with missing type/size

2013-12-11 Thread Jonathan Nieder
Jeff King wrote:

 We could do the same for the type. However, besides our
 consistency check, we also care about the type in deciding
 whether to stream or not. We therefore make sure to always
 trigger a type lookup when we are printing, so that

This We make sure is the behavior after this patch, not before,
right?

[...]
 --- a/builtin/cat-file.c
 +++ b/builtin/cat-file.c
 @@ -211,7 +211,7 @@ static void print_object_or_die(int fd, struct 
 expand_data *data)
   die(object %s disappeared, sha1_to_hex(sha1));
   if (type != data-type)
   die(object %s changed type!?, sha1_to_hex(sha1));

Maybe an assert(data.info.typep) or similar would make this more
locally readable.

[...]
 @@ -276,6 +276,13 @@ static int batch_objects(struct batch_options *opt)
   data.mark_query = 0;
  
 + /*
 +  * If we are printing out the object, then always fill in the type,
 +  * since we will want to decide whether or not to stream.
 +  */
 + if (opt-print_contents)
 + data.info.typep = data.type;

Oof.  I guess this means that the optimization from 98e2092b wasn't being
applied by 'git cat-file --batch' with format specifiers that don't
include %(objecttype), but no one would have noticed because of the
changed type thing. :)

 --- a/t/t1006-cat-file.sh
 +++ b/t/t1006-cat-file.sh
 @@ -85,6 +85,28 @@ $content
   git cat-file --batch-check=%(objecttype) %(rest) actual 
   test_cmp expect actual
  '
 +
 +test -z $content ||
 +test_expect_success --batch without type ($type) '
 + {
 + echo $size 
 + maybe_remove_timestamp $content $no_ts
 + } expect 
 + echo $sha1 | git cat-file --batch=%(objectsize) actual.full 
 + maybe_remove_timestamp $(cat actual.full) $no_ts actual 
 + test_cmp expect actual
 +'
 +
 +test -z $content ||
 +test_expect_success --batch without size ($type) '
 + {
 + echo $type 
 + maybe_remove_timestamp $content $no_ts
 + } expect 
 + echo $sha1 | git cat-file --batch=%(objecttype) actual.full 
 + maybe_remove_timestamp $(cat actual.full) $no_ts actual 
 + test_cmp expect actual
 +'
  }

Looks good.

(not about this patch) I suspect a test_cmp_ignore_timestamp helper
could simplify these tests somewhat. :)

For what it's worth, with or without commit message changes or the
check that data-type is initialized,

Reviewed-by: Jonathan Nieder jrnie...@gmail.com
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] cat-file: handle --batch format with missing type/size

2013-12-11 Thread Jeff King
On Wed, Dec 11, 2013 at 12:42:00PM -0800, Jonathan Nieder wrote:

  We could do the same for the type. However, besides our
  consistency check, we also care about the type in deciding
  whether to stream or not. We therefore make sure to always
  trigger a type lookup when we are printing, so that
 
 This We make sure is the behavior after this patch, not before,
 right?

Correct. I'll clarify that.

  --- a/builtin/cat-file.c
  +++ b/builtin/cat-file.c
  @@ -211,7 +211,7 @@ static void print_object_or_die(int fd, struct 
  expand_data *data)
  die(object %s disappeared, sha1_to_hex(sha1));
  if (type != data-type)
  die(object %s changed type!?, sha1_to_hex(sha1));
 
 Maybe an assert(data.info.typep) or similar would make this more
 locally readable.

I'm not sure it makes it more readable, but it would protect against
violating our assumptions later (and ease people's minds who wonder why
we can touch data-type without a similar check).

  +   /*
  +* If we are printing out the object, then always fill in the type,
  +* since we will want to decide whether or not to stream.
  +*/
  +   if (opt-print_contents)
  +   data.info.typep = data.type;
 
 Oof.  I guess this means that the optimization from 98e2092b wasn't being
 applied by 'git cat-file --batch' with format specifiers that don't
 include %(objecttype), but no one would have noticed because of the
 changed type thing. :)

Yes. The loss of the optimization was a small thing compared to being
totally broken. :)

  +   {
  +   echo $type 
  +   maybe_remove_timestamp $content $no_ts
  +   } expect 
  +   echo $sha1 | git cat-file --batch=%(objecttype) actual.full 
  +   maybe_remove_timestamp $(cat actual.full) $no_ts actual 
  +   test_cmp expect actual
 [...]
 (not about this patch) I suspect a test_cmp_ignore_timestamp helper
 could simplify these tests somewhat. :)

Yeah, the maybe_remove_timestamp is ugly on so many levels. I used it
because it's deeply embedded in the existing tests, and I didn't want to
tackle refactoring the whole thing. Be my guest if you want to do it on
top. :)

 For what it's worth, with or without commit message changes or the
 check that data-type is initialized,
 
 Reviewed-by: Jonathan Nieder jrnie...@gmail.com

Thanks. Updated patch is below.

-- 8 --
Subject: cat-file: handle --batch format with missing type/size

Commit 98e2092 taught cat-file to stream blobs with --batch,
which requires that we look up the object type before
loading it into memory.  As a result, we now print the
object header from information in sha1_object_info, and the
actual contents from the read_sha1_file. We double-check
that the information we printed in the header matches the
content we are about to show.

Later, commit 93d2a60 allowed custom header lines for
--batch, and commit 5b08640 made type lookups optional. As a
result, specifying a header line without the type or size
means that we will not look up those items at all.

This causes our double-checking to erroneously die with an
error; we think the type or size has changed, when in fact
it was simply left at 0.

For the size, we can fix this by only doing the consistency
double-check when we have retrieved the size via
sha1_object_info. In the case that we have not retrieved the
value, that means we also did not print it, so there is
nothing for us to check that we are consistent with.

We could do the same for the type. However, besides our
consistency check, we also care about the type in deciding
whether to stream or not. So instead of handling the case
where we do not know the type, this patch instead makes sure
that we always trigger a type lookup when we are printing,
so that even a format without the type will stream as we
would in the normal case.

Reviewed-by: Jonathan Nieder jrnie...@gmail.com
Signed-off-by: Jeff King p...@peff.net
---
 builtin/cat-file.c  | 11 ++-
 t/t1006-cat-file.sh | 22 ++
 2 files changed, 32 insertions(+), 1 deletion(-)

diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 1434afb..f8288c8 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -197,6 +197,8 @@ static void print_object_or_die(int fd, struct expand_data 
*data)
 {
const unsigned char *sha1 = data-sha1;
 
+   assert(data-info.typep);
+
if (data-type == OBJ_BLOB) {
if (stream_blob_to_fd(fd, sha1, NULL, 0)  0)
die(unable to stream %s to stdout, sha1_to_hex(sha1));
@@ -211,7 +213,7 @@ static void print_object_or_die(int fd, struct expand_data 
*data)
die(object %s disappeared, sha1_to_hex(sha1));
if (type != data-type)
die(object %s changed type!?, sha1_to_hex(sha1));
-   if (size != data-size)
+   if (data-info.sizep  size != data-size)
die(object %s changed size!?, sha1_to_hex(sha1));
 
   

Re: [PATCH 2/2] cat-file: handle --batch format with missing type/size

2013-12-11 Thread Jonathan Nieder
Jeff King wrote:

 Updated patch is below.

Thanks!  v2 of both patches looks good.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] cat-file: handle --batch format with missing type/size

2013-12-11 Thread Junio C Hamano
Jeff King p...@peff.net writes:

 Yes. The loss of the optimization was a small thing compared to being
 totally broken. :)
 ...
 Reviewed-by: Jonathan Nieder jrnie...@gmail.com

 Thanks. Updated patch is below.

;-)

I like it when I see patches are polished between the submitter and
reviewer(s) fully, before the maintainer has a chance to pick an
intermediate version (only to later replace and requeue).

Thanks, both.

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html