Re: [regression] Re: git-cat-file --batch reversion; cannot query filenames with spaces

2013-08-03 Thread Jeff King
On Fri, Aug 02, 2013 at 11:30:03AM -0700, Junio C Hamano wrote:

  I didn't see the result of your wrangling in pu, but I will keep an eye
  out to double-check it (unless you did not finish, in which case I am
  happy to do the wrangling myself).
 
 Here is what is on top of the revert that has been pushed out on
 'pu'.

Thanks, that looks good to me.

We may want to also squash in the patch below, which puts the pointer
variable in the most-local block and re-wraps the newly indented comment
for line length. Neither introduced by your adaptation, but they became
more obvious to me when seen on top of the revert.

-Peff

---
diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 07b4818..41afaa5 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -286,15 +286,15 @@ static int batch_objects(struct batch_options *opt)
warn_on_object_refname_ambiguity = 0;
 
while (strbuf_getline(buf, stdin, '\n') != EOF) {
-   char *p;
int error;
 
if (data.split_on_whitespace) {
/*
-* Split at first whitespace, tying off the beginning 
of the
-* string and saving the remainder (or NULL) in 
data.rest.
+* Split at first whitespace, tying off the beginning
+* of the string and saving the remainder (or NULL) in
+* data.rest.
 */
-   p = strpbrk(buf.buf,  \t);
+   char *p = strpbrk(buf.buf,  \t);
if (p) {
while (*p  strchr( \t, *p))
*p++ = '\0';

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [regression] Re: git-cat-file --batch reversion; cannot query filenames with spaces

2013-08-02 Thread Jeff King
On Thu, Aug 01, 2013 at 11:40:03PM -0700, Jonathan Nieder wrote:

  Commit c334b87b30c1464a1ab563fe1fb8de5eaf0e5bac caused a reversion in
  git-cat-file --batch.
 
  With an older version:
 
  joey@gnu:~/tmp/rrrgit cat-file --batch
  :file name
  e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 blob 0
 
  With the new version:
 
  joey@wren:~/tmp/rgit cat-file --batch
  :file name
  :file missing
 [...]
 Oh dear.  Luckily you caught this before the final 1.8.4 release.  I
 wonder if we should just revert c334b87b (cat-file: split --batch
 input lines on whitespace, 2013-07-11) for now.

Ugh. Yeah, the incorrect assumption from the commit message of c334b87b
is Object names cannot contain spaces Refs cannot, but filename
specifiers after a colon can.

We need to revert that commit before the release. It can either be
replaced with:

  1. A --split (or similar) option to use the behavior only when
 desired.

  2. Enabling splitting only when %(rest) is used in the output format.

And I suppose it is too late in the cycle for either of those to go into
v1.8.4. That's a shame, but I think losing that particular patch does
not affect the rest of the series, so we are OK to ship without it.

Thanks Joey for a timely bug report.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [regression] Re: git-cat-file --batch reversion; cannot query filenames with spaces

2013-08-02 Thread Jeff King
On Fri, Aug 02, 2013 at 03:54:02AM -0700, Jeff King wrote:

 We need to revert that commit before the release. It can either be
 replaced with:
 
   1. A --split (or similar) option to use the behavior only when
  desired.
 
   2. Enabling splitting only when %(rest) is used in the output format.

Of the two, I think the latter is more sensible; the former is
unnecessarily placing the burden on the user to match --split with
their use of %(rest). The second is pointless without the first.

A patch to implement (2) is below.

By the way, Joey, I am not sure how safe git cat-file --batch-check is
for arbitrary filenames. In particular, I don't know how it would react
to a filename with an embedded newline (and I do not think it will undo
quoting). Certainly that does not excuse this regression; even if what
you are doing is not 100% reliable, it is good enough in sane situations
and we should not be breaking it. But you may want to double-check the
behavior of your scripts in such a case, and we may need to add a -z
to support it reliably.

The rev-list --objects output may contain such paths, of course, but
they will be quoted, and %(rest) does not care (it is not trying to
interpret the paths, but will reliably relay the quoted bits to the
output).

-- 8 --
Subject: [PATCH] cat-file: only split on whitespace when %(rest) is used

Commit c334b87 recently taught `cat-file --batch-check` to
split input lines on whitespace, and stash everything after
the first token into the %(rest) output format element. That
commit claims:

   Object names cannot contain spaces, so any input with
   spaces would have resulted in a missing line.

But that is not correct. Refs, object sha1s, and various
peeling suffixes cannot contain spaces, but some object
names can. In particular:

  1. Tree paths like [tree]:path with whitespace

  2. Reflog specifications like @{2 days ago}

  3. Commit searches like rev^{/grep me} or :/grep me

To remain backwards compatible, we cannot split on
whitespace by default. This patch teaches cat-file to only
do the splitting when %(rest) is used by the output
format. Since that element did not exist at all until
c334b87, old scripts cannot be affected.

The existence of object names with spaces does mean that you
cannot reliably do:

  echo :path with space and other data |
  git cat-file --batch-check=%(objectname) %(rest)

as it would split the path and feed only :path to
get_sha1. But that command is nonsensical. If you wanted to
see and other data in %(rest), git cannot possibly know
where the filename ends and the rest begins.

It might be more robust to have something like -z to
separate the input elements. But this patch is still a
reasonable step before having that.  It makes the easy cases
easy; people who do not care about %(rest) do not have to
consider it, and the %(rest) code handles the spaces and
newlines of rev-list --objects correctly.

Hard cases remain hard but possible (if you might get
whitespace in your input, you do not get to use %(rest) and
must split and join the output yourself using more flexible
tools). And most importantly, it does not preclude us from
having different splitting rules later if a -z (or
similar) option is added.  So we can make the hard
cases easier later, if we choose to.

Signed-off-by: Jeff King p...@peff.net
---
 Documentation/git-cat-file.txt | 16 
 builtin/cat-file.c | 31 +--
 t/t1006-cat-file.sh|  8 
 3 files changed, 37 insertions(+), 18 deletions(-)

diff --git a/Documentation/git-cat-file.txt b/Documentation/git-cat-file.txt
index 3ddec0b..21cffe2 100644
--- a/Documentation/git-cat-file.txt
+++ b/Documentation/git-cat-file.txt
@@ -86,12 +86,9 @@ If `--batch` or `--batch-check` is given, `cat-file` will 
read objects
 
 
 If `--batch` or `--batch-check` is given, `cat-file` will read objects
-from stdin, one per line, and print information about them.
-
-Each line is split at the first whitespace boundary. All characters
-before that whitespace are considered as a whole object name, and are
-parsed as if given to linkgit:git-rev-parse[1]. Characters after that
-whitespace can be accessed using the `%(rest)` atom (see below).
+from stdin, one per line, and print information about them. By default,
+the whole line is considered as an object, as if it were fed to
+linkgit:git-rev-parse[1].
 
 You can specify the information shown for each object by using a custom
 `format`. The `format` is copied literally to stdout for each
@@ -113,8 +110,11 @@ newline. The available atoms are:
note about on-disk sizes in the `CAVEATS` section below.
 
 `rest`::
-   The text (if any) found after the first run of whitespace on the
-   input line (i.e., the rest of the line).
+   If this atom is used in the output string, input lines are split
+   at the first whitespace boundary. All characters before that
+   whitespace are considered to 

Re: [regression] Re: git-cat-file --batch reversion; cannot query filenames with spaces

2013-08-02 Thread Joey Hess
Jeff King wrote:
 By the way, Joey, I am not sure how safe git cat-file --batch-check is
 for arbitrary filenames. In particular, I don't know how it would react
 to a filename with an embedded newline (and I do not think it will undo
 quoting). Certainly that does not excuse this regression; even if what
 you are doing is not 100% reliable, it is good enough in sane situations
 and we should not be breaking it. But you may want to double-check the
 behavior of your scripts in such a case, and we may need to add a -z
 to support it reliably.

Yes, I would prefer to have a -z mode. I think my code otherwise handles
newlines.

Thanks for the quick fix. I agree that only enabling the behavior with
%{rest} makes sense.

-- 
see shy jo


signature.asc
Description: Digital signature


Re: [regression] Re: git-cat-file --batch reversion; cannot query filenames with spaces

2013-08-02 Thread Brandon Casey
On Fri, Aug 2, 2013 at 8:27 AM, Joey Hess j...@kitenet.net wrote:
 Jeff King wrote:
 By the way, Joey, I am not sure how safe git cat-file --batch-check is
 for arbitrary filenames. In particular, I don't know how it would react
 to a filename with an embedded newline (and I do not think it will undo
 quoting). Certainly that does not excuse this regression; even if what
 you are doing is not 100% reliable, it is good enough in sane situations
 and we should not be breaking it. But you may want to double-check the
 behavior of your scripts in such a case, and we may need to add a -z
 to support it reliably.

 Yes, I would prefer to have a -z mode. I think my code otherwise handles
 newlines.

 Thanks for the quick fix. I agree that only enabling the behavior with
 %{rest} makes sense.

 --
 see shy jo

/methinks we've identified a gap in our test coverage.  Care to add a
test that covers the functionality that git-annex depends on?

-Brandon
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [regression] Re: git-cat-file --batch reversion; cannot query filenames with spaces

2013-08-02 Thread Junio C Hamano
Jeff King p...@peff.net writes:

 We need to revert that commit before the release. It can either be
 replaced with:

   1. A --split (or similar) option to use the behavior only when
  desired.

   2. Enabling splitting only when %(rest) is used in the output format.

 And I suppose it is too late in the cycle for either of those to go into
 v1.8.4. That's a shame, but I think losing that particular patch does
 not affect the rest of the series, so we are OK to ship without it.

 Thanks Joey for a timely bug report.

Thanks.  Will do this to jk/cat-file-batch-optim topic and merge it
to 'master' for now.

-- 8 --
Subject: [PATCH] Revert cat-file: split --batch input lines on whitespace

This reverts commit c334b87b30c1464a1ab563fe1fb8de5eaf0e5bac; the
update assumed that people only used the command to read from
rev-list --objects output, whose lines begin with a 40-hex object
name followed by a whitespace, but it turns out that scripts feed
random extended SHA-1 expressions (e.g. HEAD:$pathname) in which
a whitespace has to be kept.
---
 Documentation/git-cat-file.txt | 10 ++
 builtin/cat-file.c | 20 +---
 t/t1006-cat-file.sh|  7 ---
 3 files changed, 3 insertions(+), 34 deletions(-)

diff --git a/Documentation/git-cat-file.txt b/Documentation/git-cat-file.txt
index 3ddec0b..10fbc6a 100644
--- a/Documentation/git-cat-file.txt
+++ b/Documentation/git-cat-file.txt
@@ -88,10 +88,8 @@ BATCH OUTPUT
 If `--batch` or `--batch-check` is given, `cat-file` will read objects
 from stdin, one per line, and print information about them.
 
-Each line is split at the first whitespace boundary. All characters
-before that whitespace are considered as a whole object name, and are
-parsed as if given to linkgit:git-rev-parse[1]. Characters after that
-whitespace can be accessed using the `%(rest)` atom (see below).
+Each line is considered as a whole object name, and is parsed as if
+given to linkgit:git-rev-parse[1].
 
 You can specify the information shown for each object by using a custom
 `format`. The `format` is copied literally to stdout for each
@@ -112,10 +110,6 @@ newline. The available atoms are:
The size, in bytes, that the object takes up on disk. See the
note about on-disk sizes in the `CAVEATS` section below.
 
-`rest`::
-   The text (if any) found after the first run of whitespace on the
-   input line (i.e., the rest of the line).
-
 If no format is specified, the default format is `%(objectname)
 %(objecttype) %(objectsize)`.
 
diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 163ce6c..4253460 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -119,7 +119,6 @@ struct expand_data {
enum object_type type;
unsigned long size;
unsigned long disk_size;
-   const char *rest;
 
/*
 * If mark_query is true, we do not expand anything, but rather
@@ -164,9 +163,6 @@ static void expand_atom(struct strbuf *sb, const char 
*atom, int len,
data-info.disk_sizep = data-disk_size;
else
strbuf_addf(sb, %lu, data-disk_size);
-   } else if (is_atom(rest, atom, len)) {
-   if (!data-mark_query  data-rest)
-   strbuf_addstr(sb, data-rest);
} else
die(unknown format element: %.*s, len, atom);
 }
@@ -277,21 +273,7 @@ static int batch_objects(struct batch_options *opt)
warn_on_object_refname_ambiguity = 0;
 
while (strbuf_getline(buf, stdin, '\n') != EOF) {
-   char *p;
-   int error;
-
-   /*
-* Split at first whitespace, tying off the beginning of the
-* string and saving the remainder (or NULL) in data.rest.
-*/
-   p = strpbrk(buf.buf,  \t);
-   if (p) {
-   while (*p  strchr( \t, *p))
-   *p++ = '\0';
-   }
-   data.rest = p;
-
-   error = batch_one_object(buf.buf, opt, data);
+   int error = batch_one_object(buf.buf, opt, data);
if (error)
return error;
}
diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index d499d02..4e911fb 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -78,13 +78,6 @@ $content
echo $sha1 | git cat-file --batch-check=%(objecttype) %(objectname) 
actual 
test_cmp expect actual
 '
-
-test_expect_success '--batch-check with %(rest)' '
-   echo $type this is some extra content expect 
-   echo $sha1this is some extra content |
-   git cat-file --batch-check=%(objecttype) %(rest) actual 
-   test_cmp expect actual
-'
 }
 
 hello_content=Hello World
-- 
1.8.4-rc1-125-g7a0ec02

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo 

Re: [regression] Re: git-cat-file --batch reversion; cannot query filenames with spaces

2013-08-02 Thread Junio C Hamano
Jeff King p...@peff.net writes:

 On Fri, Aug 02, 2013 at 03:54:02AM -0700, Jeff King wrote:

 We need to revert that commit before the release. It can either be
 replaced with:
 
   1. A --split (or similar) option to use the behavior only when
  desired.
 
   2. Enabling splitting only when %(rest) is used in the output format.

 Of the two, I think the latter is more sensible; the former is
 unnecessarily placing the burden on the user to match --split with
 their use of %(rest). The second is pointless without the first.

 A patch to implement (2) is below.

As I'd queue this on top of the revert, I had to wrangle it a bit to
make it relative, i.e. this resurrects what the other reverted
patch did but in a weaker/safer form.

This will be kept outside this cycle.  Thanks for a quick fix.


--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [regression] Re: git-cat-file --batch reversion; cannot query filenames with spaces

2013-08-02 Thread Jeff King
On Fri, Aug 02, 2013 at 09:41:52AM -0700, Junio C Hamano wrote:

  Of the two, I think the latter is more sensible; the former is
  unnecessarily placing the burden on the user to match --split with
  their use of %(rest). The second is pointless without the first.
 
  A patch to implement (2) is below.
 
 As I'd queue this on top of the revert, I had to wrangle it a bit to
 make it relative, i.e. this resurrects what the other reverted
 patch did but in a weaker/safer form.

Yeah, sorry. After doing the patch I had the thought that maybe the
least invasive thing would be the fix rather than the straight revert
(we are counting on my assertion that just reverting out part of the
series will be OK; I'm pretty sure that is the case, but it is not
risk-free, either).

I didn't see the result of your wrangling in pu, but I will keep an eye
out to double-check it (unless you did not finish, in which case I am
happy to do the wrangling myself).

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [regression] Re: git-cat-file --batch reversion; cannot query filenames with spaces

2013-08-02 Thread Junio C Hamano
Jeff King p...@peff.net writes:

 On Fri, Aug 02, 2013 at 09:41:52AM -0700, Junio C Hamano wrote:

  Of the two, I think the latter is more sensible; the former is
  unnecessarily placing the burden on the user to match --split with
  their use of %(rest). The second is pointless without the first.
 
  A patch to implement (2) is below.
 
 As I'd queue this on top of the revert, I had to wrangle it a bit to
 make it relative, i.e. this resurrects what the other reverted
 patch did but in a weaker/safer form.

 Yeah, sorry. After doing the patch I had the thought that maybe the
 least invasive thing would be the fix rather than the straight revert
 (we are counting on my assertion that just reverting out part of the
 series will be OK; I'm pretty sure that is the case, but it is not
 risk-free, either).

 I didn't see the result of your wrangling in pu, but I will keep an eye
 out to double-check it (unless you did not finish, in which case I am
 happy to do the wrangling myself).

Here is what is on top of the revert that has been pushed out on
'pu'.

Thanks.

-- 8 --
From: Jeff King p...@peff.net
Date: Fri, 2 Aug 2013 04:59:07 -0700
Subject: [PATCH] cat-file: only split on whitespace when %(rest) is used

Commit c334b87b (cat-file: split --batch input lines on whitespace,
2013-07-11) taught `cat-file --batch-check` to split input lines on
the first whitespace, and stash everything after the first token
into the %(rest) output format element.  It claimed:

   Object names cannot contain spaces, so any input with
   spaces would have resulted in a missing line.

But that is not correct.  Refs, object sha1s, and various peeling
suffixes cannot contain spaces, but some object names can. In
particular:

  1. Tree paths like [tree]:path with whitespace

  2. Reflog specifications like @{2 days ago}

  3. Commit searches like rev^{/grep me} or :/grep me

To remain backwards compatible, we cannot split on whitespace by
default, hence we will ship 1.8.4 with the commit reverted.

Resurrect its attempt but in a weaker form; only do the splitting
when %(rest) is used in the output format. Since that element did
not exist at all before c334b87, old scripts cannot be affected.

The existence of object names with spaces does mean that you
cannot reliably do:

  echo :path with space and other data |
  git cat-file --batch-check=%(objectname) %(rest)

as it would split the path and feed only :path to get_sha1. But
that command is nonsensical. If you wanted to see and other data
in %(rest), git cannot possibly know where the filename ends and
the rest begins.

It might be more robust to have something like -z to separate the
input elements. But this patch is still a reasonable step before
having that.  It makes the easy cases easy; people who do not care
about %(rest) do not have to consider it, and the %(rest) code
handles the spaces and newlines of rev-list --objects correctly.

Hard cases remain hard but possible (if you might get whitespace in
your input, you do not get to use %(rest) and must split and join
the output yourself using more flexible tools). And most
importantly, it does not preclude us from having different splitting
rules later if a -z (or similar) option is added.  So we can make
the hard cases easier later, if we choose to.

Signed-off-by: Jeff King p...@peff.net
Signed-off-by: Junio C Hamano gits...@pobox.com
---
 Documentation/git-cat-file.txt | 14 ++
 builtin/cat-file.c | 31 ++-
 t/t1006-cat-file.sh| 15 +++
 3 files changed, 55 insertions(+), 5 deletions(-)

diff --git a/Documentation/git-cat-file.txt b/Documentation/git-cat-file.txt
index 10fbc6a..21cffe2 100644
--- a/Documentation/git-cat-file.txt
+++ b/Documentation/git-cat-file.txt
@@ -86,10 +86,9 @@ BATCH OUTPUT
 
 
 If `--batch` or `--batch-check` is given, `cat-file` will read objects
-from stdin, one per line, and print information about them.
-
-Each line is considered as a whole object name, and is parsed as if
-given to linkgit:git-rev-parse[1].
+from stdin, one per line, and print information about them. By default,
+the whole line is considered as an object, as if it were fed to
+linkgit:git-rev-parse[1].
 
 You can specify the information shown for each object by using a custom
 `format`. The `format` is copied literally to stdout for each
@@ -110,6 +109,13 @@ newline. The available atoms are:
The size, in bytes, that the object takes up on disk. See the
note about on-disk sizes in the `CAVEATS` section below.
 
+`rest`::
+   If this atom is used in the output string, input lines are split
+   at the first whitespace boundary. All characters before that
+   whitespace are considered to be the object name; characters
+   after that first run of whitespace (i.e., the rest of the
+   line) are output in place of the `%(rest)` atom.
+
 If no format is specified, the default format is `%(objectname)
 %(objecttype) 

Re: [regression] Re: git-cat-file --batch reversion; cannot query filenames with spaces

2013-08-02 Thread Jonathan Nieder
Junio C Hamano wrote:

 Here is what is on top of the revert that has been pushed out on
 'pu'.

For what it's worth,
Reviewed-by: Jonathan Nieder jrnie...@gmail.com

[...]
 To remain backwards compatible, we cannot split on whitespace by
 default, hence we will ship 1.8.4 with the commit reverted.
[...]
 It might be more robust to have something like -z to separate the
 input elements. But this patch is still a reasonable step before
 having that.  It makes the easy cases easy; people who do not care
 about %(rest) do not have to consider it, and the %(rest) code
 handles the spaces and newlines of rev-list --objects correctly.

Another idea for the future might be to start rejecting refnames
starting with a double-quote '', which would make it safe to treat a
leading quote-mark as the start of a C-style quoted string.  But
currently that would technically be a breaking change, making -z
more useful in the meantime.

I think several commands already don't deal well with filenames with
newlines.  I hope POSIX forbids them (with some suitable migration
plan) soonish and even wouldn't mind if git were taught to refuse to
track them.

Thanks,
Jonathan
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html