Re: [regression] Re: git-cat-file --batch reversion; cannot query filenames with spaces
On Fri, Aug 02, 2013 at 11:30:03AM -0700, Junio C Hamano wrote: > > I didn't see the result of your wrangling in pu, but I will keep an eye > > out to double-check it (unless you did not finish, in which case I am > > happy to do the wrangling myself). > > Here is what is on top of the revert that has been pushed out on > 'pu'. Thanks, that looks good to me. We may want to also squash in the patch below, which puts the pointer variable in the most-local block and re-wraps the newly indented comment for line length. Neither introduced by your adaptation, but they became more obvious to me when seen on top of the revert. -Peff --- diff --git a/builtin/cat-file.c b/builtin/cat-file.c index 07b4818..41afaa5 100644 --- a/builtin/cat-file.c +++ b/builtin/cat-file.c @@ -286,15 +286,15 @@ static int batch_objects(struct batch_options *opt) warn_on_object_refname_ambiguity = 0; while (strbuf_getline(&buf, stdin, '\n') != EOF) { - char *p; int error; if (data.split_on_whitespace) { /* -* Split at first whitespace, tying off the beginning of the -* string and saving the remainder (or NULL) in data.rest. +* Split at first whitespace, tying off the beginning +* of the string and saving the remainder (or NULL) in +* data.rest. */ - p = strpbrk(buf.buf, " \t"); + char *p = strpbrk(buf.buf, " \t"); if (p) { while (*p && strchr(" \t", *p)) *p++ = '\0'; -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [regression] Re: git-cat-file --batch reversion; cannot query filenames with spaces
Junio C Hamano wrote: > Here is what is on top of the revert that has been pushed out on > 'pu'. For what it's worth, Reviewed-by: Jonathan Nieder [...] > To remain backwards compatible, we cannot split on whitespace by > default, hence we will ship 1.8.4 with the commit reverted. [...] > It might be more robust to have something like "-z" to separate the > input elements. But this patch is still a reasonable step before > having that. It makes the easy cases easy; people who do not care > about %(rest) do not have to consider it, and the %(rest) code > handles the spaces and newlines of "rev-list --objects" correctly. Another idea for the future might be to start rejecting refnames starting with a double-quote '"', which would make it safe to treat a leading quote-mark as the start of a C-style quoted string. But currently that would technically be a breaking change, making "-z" more useful in the meantime. I think several commands already don't deal well with filenames with newlines. I hope POSIX forbids them (with some suitable migration plan) soonish and even wouldn't mind if git were taught to refuse to track them. Thanks, Jonathan -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [regression] Re: git-cat-file --batch reversion; cannot query filenames with spaces
Jeff King writes: > On Fri, Aug 02, 2013 at 09:41:52AM -0700, Junio C Hamano wrote: > >> > Of the two, I think the latter is more sensible; the former is >> > unnecessarily placing the burden on the user to match "--split" with >> > their use of "%(rest)". The second is pointless without the first. >> > >> > A patch to implement (2) is below. >> >> As I'd queue this on top of the revert, I had to wrangle it a bit to >> make it relative, i.e. "this resurrects what the other reverted >> patch did but in a weaker/safer form". > > Yeah, sorry. After doing the patch I had the thought that maybe the > least invasive thing would be the fix rather than the straight revert > (we are counting on my assertion that just reverting out part of the > series will be OK; I'm pretty sure that is the case, but it is not > risk-free, either). > > I didn't see the result of your wrangling in pu, but I will keep an eye > out to double-check it (unless you did not finish, in which case I am > happy to do the wrangling myself). Here is what is on top of the revert that has been pushed out on 'pu'. Thanks. -- >8 -- From: Jeff King Date: Fri, 2 Aug 2013 04:59:07 -0700 Subject: [PATCH] cat-file: only split on whitespace when %(rest) is used Commit c334b87b (cat-file: split --batch input lines on whitespace, 2013-07-11) taught `cat-file --batch-check` to split input lines on the first whitespace, and stash everything after the first token into the %(rest) output format element. It claimed: Object names cannot contain spaces, so any input with spaces would have resulted in a "missing" line. But that is not correct. Refs, object sha1s, and various peeling suffixes cannot contain spaces, but some object names can. In particular: 1. Tree paths like "[]:path with whitespace" 2. Reflog specifications like "@{2 days ago}" 3. Commit searches like "rev^{/grep me}" or ":/grep me" To remain backwards compatible, we cannot split on whitespace by default, hence we will ship 1.8.4 with the commit reverted. Resurrect its attempt but in a weaker form; only do the splitting when "%(rest)" is used in the output format. Since that element did not exist at all before c334b87, old scripts cannot be affected. The existence of object names with spaces does mean that you cannot reliably do: echo ":path with space and other data" | git cat-file --batch-check="%(objectname) %(rest)" as it would split the path and feed only ":path" to get_sha1. But that command is nonsensical. If you wanted to see "and other data" in "%(rest)", git cannot possibly know where the filename ends and the "rest" begins. It might be more robust to have something like "-z" to separate the input elements. But this patch is still a reasonable step before having that. It makes the easy cases easy; people who do not care about %(rest) do not have to consider it, and the %(rest) code handles the spaces and newlines of "rev-list --objects" correctly. Hard cases remain hard but possible (if you might get whitespace in your input, you do not get to use %(rest) and must split and join the output yourself using more flexible tools). And most importantly, it does not preclude us from having different splitting rules later if a "-z" (or similar) option is added. So we can make the hard cases easier later, if we choose to. Signed-off-by: Jeff King Signed-off-by: Junio C Hamano --- Documentation/git-cat-file.txt | 14 ++ builtin/cat-file.c | 31 ++- t/t1006-cat-file.sh| 15 +++ 3 files changed, 55 insertions(+), 5 deletions(-) diff --git a/Documentation/git-cat-file.txt b/Documentation/git-cat-file.txt index 10fbc6a..21cffe2 100644 --- a/Documentation/git-cat-file.txt +++ b/Documentation/git-cat-file.txt @@ -86,10 +86,9 @@ BATCH OUTPUT If `--batch` or `--batch-check` is given, `cat-file` will read objects -from stdin, one per line, and print information about them. - -Each line is considered as a whole object name, and is parsed as if -given to linkgit:git-rev-parse[1]. +from stdin, one per line, and print information about them. By default, +the whole line is considered as an object, as if it were fed to +linkgit:git-rev-parse[1]. You can specify the information shown for each object by using a custom ``. The `` is copied literally to stdout for each @@ -110,6 +109,13 @@ newline. The available atoms are: The size, in bytes, that the object takes up on disk. See the note about on-disk sizes in the `CAVEATS` section below. +`rest`:: + If this atom is used in the output string, input lines are split + at the first whitespace boundary. All characters before that + whitespace are considered to be the object name; characters + after that first run of whitespace (i.e., the "rest" of the + line) are output in place of the `%(rest)` atom. + If no format is specified, the default format is `%(objectname) %(objecttype)
Re: [regression] Re: git-cat-file --batch reversion; cannot query filenames with spaces
On Fri, Aug 02, 2013 at 09:41:52AM -0700, Junio C Hamano wrote: > > Of the two, I think the latter is more sensible; the former is > > unnecessarily placing the burden on the user to match "--split" with > > their use of "%(rest)". The second is pointless without the first. > > > > A patch to implement (2) is below. > > As I'd queue this on top of the revert, I had to wrangle it a bit to > make it relative, i.e. "this resurrects what the other reverted > patch did but in a weaker/safer form". Yeah, sorry. After doing the patch I had the thought that maybe the least invasive thing would be the fix rather than the straight revert (we are counting on my assertion that just reverting out part of the series will be OK; I'm pretty sure that is the case, but it is not risk-free, either). I didn't see the result of your wrangling in pu, but I will keep an eye out to double-check it (unless you did not finish, in which case I am happy to do the wrangling myself). -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [regression] Re: git-cat-file --batch reversion; cannot query filenames with spaces
Jeff King writes: > On Fri, Aug 02, 2013 at 03:54:02AM -0700, Jeff King wrote: > >> We need to revert that commit before the release. It can either be >> replaced with: >> >> 1. A "--split" (or similar) option to use the behavior only when >> desired. >> >> 2. Enabling splitting only when %(rest) is used in the output format. > > Of the two, I think the latter is more sensible; the former is > unnecessarily placing the burden on the user to match "--split" with > their use of "%(rest)". The second is pointless without the first. > > A patch to implement (2) is below. As I'd queue this on top of the revert, I had to wrangle it a bit to make it relative, i.e. "this resurrects what the other reverted patch did but in a weaker/safer form". This will be kept outside this cycle. Thanks for a quick fix. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [regression] Re: git-cat-file --batch reversion; cannot query filenames with spaces
Jeff King writes: > We need to revert that commit before the release. It can either be > replaced with: > > 1. A "--split" (or similar) option to use the behavior only when > desired. > > 2. Enabling splitting only when %(rest) is used in the output format. > > And I suppose it is too late in the cycle for either of those to go into > v1.8.4. That's a shame, but I think losing that particular patch does > not affect the rest of the series, so we are OK to ship without it. > > Thanks Joey for a timely bug report. Thanks. Will do this to jk/cat-file-batch-optim topic and merge it to 'master' for now. -- >8 -- Subject: [PATCH] Revert "cat-file: split --batch input lines on whitespace" This reverts commit c334b87b30c1464a1ab563fe1fb8de5eaf0e5bac; the update assumed that people only used the command to read from "rev-list --objects" output, whose lines begin with a 40-hex object name followed by a whitespace, but it turns out that scripts feed random extended SHA-1 expressions (e.g. "HEAD:$pathname") in which a whitespace has to be kept. --- Documentation/git-cat-file.txt | 10 ++ builtin/cat-file.c | 20 +--- t/t1006-cat-file.sh| 7 --- 3 files changed, 3 insertions(+), 34 deletions(-) diff --git a/Documentation/git-cat-file.txt b/Documentation/git-cat-file.txt index 3ddec0b..10fbc6a 100644 --- a/Documentation/git-cat-file.txt +++ b/Documentation/git-cat-file.txt @@ -88,10 +88,8 @@ BATCH OUTPUT If `--batch` or `--batch-check` is given, `cat-file` will read objects from stdin, one per line, and print information about them. -Each line is split at the first whitespace boundary. All characters -before that whitespace are considered as a whole object name, and are -parsed as if given to linkgit:git-rev-parse[1]. Characters after that -whitespace can be accessed using the `%(rest)` atom (see below). +Each line is considered as a whole object name, and is parsed as if +given to linkgit:git-rev-parse[1]. You can specify the information shown for each object by using a custom ``. The `` is copied literally to stdout for each @@ -112,10 +110,6 @@ newline. The available atoms are: The size, in bytes, that the object takes up on disk. See the note about on-disk sizes in the `CAVEATS` section below. -`rest`:: - The text (if any) found after the first run of whitespace on the - input line (i.e., the "rest" of the line). - If no format is specified, the default format is `%(objectname) %(objecttype) %(objectsize)`. diff --git a/builtin/cat-file.c b/builtin/cat-file.c index 163ce6c..4253460 100644 --- a/builtin/cat-file.c +++ b/builtin/cat-file.c @@ -119,7 +119,6 @@ struct expand_data { enum object_type type; unsigned long size; unsigned long disk_size; - const char *rest; /* * If mark_query is true, we do not expand anything, but rather @@ -164,9 +163,6 @@ static void expand_atom(struct strbuf *sb, const char *atom, int len, data->info.disk_sizep = &data->disk_size; else strbuf_addf(sb, "%lu", data->disk_size); - } else if (is_atom("rest", atom, len)) { - if (!data->mark_query && data->rest) - strbuf_addstr(sb, data->rest); } else die("unknown format element: %.*s", len, atom); } @@ -277,21 +273,7 @@ static int batch_objects(struct batch_options *opt) warn_on_object_refname_ambiguity = 0; while (strbuf_getline(&buf, stdin, '\n') != EOF) { - char *p; - int error; - - /* -* Split at first whitespace, tying off the beginning of the -* string and saving the remainder (or NULL) in data.rest. -*/ - p = strpbrk(buf.buf, " \t"); - if (p) { - while (*p && strchr(" \t", *p)) - *p++ = '\0'; - } - data.rest = p; - - error = batch_one_object(buf.buf, opt, &data); + int error = batch_one_object(buf.buf, opt, &data); if (error) return error; } diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh index d499d02..4e911fb 100755 --- a/t/t1006-cat-file.sh +++ b/t/t1006-cat-file.sh @@ -78,13 +78,6 @@ $content" echo $sha1 | git cat-file --batch-check="%(objecttype) %(objectname)" >actual && test_cmp expect actual ' - -test_expect_success '--batch-check with %(rest)' ' - echo "$type this is some extra content" >expect && - echo "$sha1this is some extra content" | - git cat-file --batch-check="%(objecttype) %(rest)" >actual && - test_cmp expect actual -' } hello_content="Hello World" -- 1.8.4-rc1-125-g7a0ec02 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to
Re: [regression] Re: git-cat-file --batch reversion; cannot query filenames with spaces
On Fri, Aug 2, 2013 at 8:27 AM, Joey Hess wrote: > Jeff King wrote: >> By the way, Joey, I am not sure how safe "git cat-file --batch-check" is >> for arbitrary filenames. In particular, I don't know how it would react >> to a filename with an embedded newline (and I do not think it will undo >> quoting). Certainly that does not excuse this regression; even if what >> you are doing is not 100% reliable, it is good enough in sane situations >> and we should not be breaking it. But you may want to double-check the >> behavior of your scripts in such a case, and we may need to add a "-z" >> to support it reliably. > > Yes, I would prefer to have a -z mode. I think my code otherwise handles > newlines. > > Thanks for the quick fix. I agree that only enabling the behavior with > %{rest} makes sense. > > -- > see shy jo /methinks we've identified a gap in our test coverage. Care to add a test that covers the functionality that git-annex depends on? -Brandon -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [regression] Re: git-cat-file --batch reversion; cannot query filenames with spaces
Jeff King wrote: > By the way, Joey, I am not sure how safe "git cat-file --batch-check" is > for arbitrary filenames. In particular, I don't know how it would react > to a filename with an embedded newline (and I do not think it will undo > quoting). Certainly that does not excuse this regression; even if what > you are doing is not 100% reliable, it is good enough in sane situations > and we should not be breaking it. But you may want to double-check the > behavior of your scripts in such a case, and we may need to add a "-z" > to support it reliably. Yes, I would prefer to have a -z mode. I think my code otherwise handles newlines. Thanks for the quick fix. I agree that only enabling the behavior with %{rest} makes sense. -- see shy jo signature.asc Description: Digital signature
Re: [regression] Re: git-cat-file --batch reversion; cannot query filenames with spaces
On Fri, Aug 02, 2013 at 03:54:02AM -0700, Jeff King wrote: > We need to revert that commit before the release. It can either be > replaced with: > > 1. A "--split" (or similar) option to use the behavior only when > desired. > > 2. Enabling splitting only when %(rest) is used in the output format. Of the two, I think the latter is more sensible; the former is unnecessarily placing the burden on the user to match "--split" with their use of "%(rest)". The second is pointless without the first. A patch to implement (2) is below. By the way, Joey, I am not sure how safe "git cat-file --batch-check" is for arbitrary filenames. In particular, I don't know how it would react to a filename with an embedded newline (and I do not think it will undo quoting). Certainly that does not excuse this regression; even if what you are doing is not 100% reliable, it is good enough in sane situations and we should not be breaking it. But you may want to double-check the behavior of your scripts in such a case, and we may need to add a "-z" to support it reliably. The "rev-list --objects" output may contain such paths, of course, but they will be quoted, and "%(rest)" does not care (it is not trying to interpret the paths, but will reliably relay the quoted bits to the output). -- >8 -- Subject: [PATCH] cat-file: only split on whitespace when %(rest) is used Commit c334b87 recently taught `cat-file --batch-check` to split input lines on whitespace, and stash everything after the first token into the %(rest) output format element. That commit claims: Object names cannot contain spaces, so any input with spaces would have resulted in a "missing" line. But that is not correct. Refs, object sha1s, and various peeling suffixes cannot contain spaces, but some object names can. In particular: 1. Tree paths like "[]:path with whitespace" 2. Reflog specifications like "@{2 days ago}" 3. Commit searches like "rev^{/grep me}" or ":/grep me" To remain backwards compatible, we cannot split on whitespace by default. This patch teaches cat-file to only do the splitting when "%(rest)" is used by the output format. Since that element did not exist at all until c334b87, old scripts cannot be affected. The existence of object names with spaces does mean that you cannot reliably do: echo ":path with space and other data" | git cat-file --batch-check="%(objectname) %(rest)" as it would split the path and feed only ":path" to get_sha1. But that command is nonsensical. If you wanted to see "and other data" in "%(rest)", git cannot possibly know where the filename ends and the "rest" begins. It might be more robust to have something like "-z" to separate the input elements. But this patch is still a reasonable step before having that. It makes the easy cases easy; people who do not care about %(rest) do not have to consider it, and the %(rest) code handles the spaces and newlines of "rev-list --objects" correctly. Hard cases remain hard but possible (if you might get whitespace in your input, you do not get to use %(rest) and must split and join the output yourself using more flexible tools). And most importantly, it does not preclude us from having different splitting rules later if a "-z" (or similar) option is added. So we can make the hard cases easier later, if we choose to. Signed-off-by: Jeff King --- Documentation/git-cat-file.txt | 16 builtin/cat-file.c | 31 +-- t/t1006-cat-file.sh| 8 3 files changed, 37 insertions(+), 18 deletions(-) diff --git a/Documentation/git-cat-file.txt b/Documentation/git-cat-file.txt index 3ddec0b..21cffe2 100644 --- a/Documentation/git-cat-file.txt +++ b/Documentation/git-cat-file.txt @@ -86,12 +86,9 @@ If `--batch` or `--batch-check` is given, `cat-file` will read objects If `--batch` or `--batch-check` is given, `cat-file` will read objects -from stdin, one per line, and print information about them. - -Each line is split at the first whitespace boundary. All characters -before that whitespace are considered as a whole object name, and are -parsed as if given to linkgit:git-rev-parse[1]. Characters after that -whitespace can be accessed using the `%(rest)` atom (see below). +from stdin, one per line, and print information about them. By default, +the whole line is considered as an object, as if it were fed to +linkgit:git-rev-parse[1]. You can specify the information shown for each object by using a custom ``. The `` is copied literally to stdout for each @@ -113,8 +110,11 @@ newline. The available atoms are: note about on-disk sizes in the `CAVEATS` section below. `rest`:: - The text (if any) found after the first run of whitespace on the - input line (i.e., the "rest" of the line). + If this atom is used in the output string, input lines are split + at the first whitespace boundary. All characters before that + whites
Re: [regression] Re: git-cat-file --batch reversion; cannot query filenames with spaces
On Thu, Aug 01, 2013 at 11:40:03PM -0700, Jonathan Nieder wrote: > > Commit c334b87b30c1464a1ab563fe1fb8de5eaf0e5bac caused a reversion in > > git-cat-file --batch. > > > > With an older version: > > > > joey@gnu:~/tmp/rrr>git cat-file --batch > > :file name > > e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 blob 0 > > > > With the new version: > > > > joey@wren:~/tmp/r>git cat-file --batch > > :file name > > :file missing > [...] > Oh dear. Luckily you caught this before the final 1.8.4 release. I > wonder if we should just revert c334b87b (cat-file: split --batch > input lines on whitespace, 2013-07-11) for now. Ugh. Yeah, the incorrect assumption from the commit message of c334b87b is "Object names cannot contain spaces...". Refs cannot, but filename specifiers after a colon can. We need to revert that commit before the release. It can either be replaced with: 1. A "--split" (or similar) option to use the behavior only when desired. 2. Enabling splitting only when %(rest) is used in the output format. And I suppose it is too late in the cycle for either of those to go into v1.8.4. That's a shame, but I think losing that particular patch does not affect the rest of the series, so we are OK to ship without it. Thanks Joey for a timely bug report. -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html