Re: Bug in "git rev-parse --verify"
Fwiw, look very a sound idea for me. Best 2013/3/30, Junio C Hamano : > Junio C Hamano writes: > >> What we may want is another type peeling operator, ^{object}. >> that makes sure it is an object, like this: >> >> rev-parse --verify 572a535454612a046e7dd7404dcca94d6243c788^{object} >> >> It asks "I have this 40-hex; I want an object out of it", just like >> frotz^{tree} is "I have 'frotz'; I want a tree-ish" for any value of >> 'frotz'. >> >> With that, a use case that it wants to see _any_ object can safely >> use 'rev-parse --verify "$userinput^{object}' without an annotated >> tag getting in the way. >> >> How does that sound? > > Perhaps something like this. Note that the last hunk is unrelated > thinko-fix I noticed while browsing the code. > > -- >8 -- > Subject: sha1_name.c: ^{object} peeler > > A string that names an object can be suffixed with ^{type} peeler to > say "I have this object name; peel it until you get this type. If > you cannot do so, it is an error". v1.8.2^{commit} asks for a commit > that is pointed at an annotated tag v1.8.2; v1.8.2^{tree} unwraps it > further to the top-level tree object. A special suffix ^{} (i.e. no > type specified) means "I do not care what it unwraps to; just peel > annotated tag until you get something that is not a tag". > > When you have a random user-supplied string, you can turn it to a > bare 40-hex object name, and cause it to error out if such an object > does not exist, with: > > git rev-parse --verify "$userstring^{}" > > for most objects, but this does not yield the tag object name when > $userstring refers to an annotated tag. > > Introduce a new suffix, ^{object}, that only makes sure the given > name refers to an existing object. Then > > git rev-parse --verify "$userstring^{object}" > > becomes a way to make sure $userstring refers to an existing object. > > Signed-off-by: Junio C Hamano > --- > sha1_name.c | 6 +- > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff --git a/sha1_name.c b/sha1_name.c > index c50630a..85b6e75 100644 > --- a/sha1_name.c > +++ b/sha1_name.c > @@ -594,7 +594,7 @@ struct object *peel_to_type(const char *name, int > namelen, > while (1) { > if (!o || (!o->parsed && !parse_object(o->sha1))) > return NULL; > - if (o->type == expected_type) > + if (expected_type == OBJ_ANY || o->type == expected_type) > return o; > if (o->type == OBJ_TAG) > o = ((struct tag*) o)->tagged; > @@ -645,6 +645,8 @@ static int peel_onion(const char *name, int len, > unsigned char *sha1) > expected_type = OBJ_TREE; > else if (!strncmp(blob_type, sp, 4) && sp[4] == '}') > expected_type = OBJ_BLOB; > + else if (!prefixcmp(sp, "object}")) > + expected_type = OBJ_ANY; > else if (sp[0] == '}') > expected_type = OBJ_NONE; > else if (sp[0] == '/') > @@ -654,6 +656,8 @@ static int peel_onion(const char *name, int len, > unsigned char *sha1) > > if (expected_type == OBJ_COMMIT) > lookup_flags = GET_SHA1_COMMITTISH; > + else if (expected_type == OBJ_TREE) > + lookup_flags = GET_SHA1_TREEISH; > > if (get_sha1_1(name, sp - name - 2, outer, lookup_flags)) > return -1; > -- > To unsubscribe from this list: send the line "unsubscribe git" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- Inviato dal mio dispositivo mobile -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Bug in "git rev-parse --verify"
On 03/30/2013 08:05 AM, Junio C Hamano wrote: > Junio C Hamano writes: > >> What we may want is another type peeling operator, ^{object}. >> that makes sure it is an object, like this: >> >> rev-parse --verify 572a535454612a046e7dd7404dcca94d6243c788^{object} >> >> It asks "I have this 40-hex; I want an object out of it", just like >> frotz^{tree} is "I have 'frotz'; I want a tree-ish" for any value of >> 'frotz'. >> >> With that, a use case that it wants to see _any_ object can safely >> use 'rev-parse --verify "$userinput^{object}' without an annotated >> tag getting in the way. >> >> How does that sound? > > Perhaps something like this. Note that the last hunk is unrelated > thinko-fix I noticed while browsing the code. Sounds reasonable to me. I'm not familiar with this code, but your change looks simple enough. Plus documentation change in Documentation/revisions.txt, of course. Thanks, Michael -- Michael Haggerty mhag...@alum.mit.edu http://softwareswirl.blogspot.com/ -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Bug in "git rev-parse --verify"
Junio C Hamano writes: > What we may want is another type peeling operator, ^{object}. > that makes sure it is an object, like this: > > rev-parse --verify 572a535454612a046e7dd7404dcca94d6243c788^{object} > > It asks "I have this 40-hex; I want an object out of it", just like > frotz^{tree} is "I have 'frotz'; I want a tree-ish" for any value of > 'frotz'. > > With that, a use case that it wants to see _any_ object can safely > use 'rev-parse --verify "$userinput^{object}' without an annotated > tag getting in the way. > > How does that sound? Perhaps something like this. Note that the last hunk is unrelated thinko-fix I noticed while browsing the code. -- >8 -- Subject: sha1_name.c: ^{object} peeler A string that names an object can be suffixed with ^{type} peeler to say "I have this object name; peel it until you get this type. If you cannot do so, it is an error". v1.8.2^{commit} asks for a commit that is pointed at an annotated tag v1.8.2; v1.8.2^{tree} unwraps it further to the top-level tree object. A special suffix ^{} (i.e. no type specified) means "I do not care what it unwraps to; just peel annotated tag until you get something that is not a tag". When you have a random user-supplied string, you can turn it to a bare 40-hex object name, and cause it to error out if such an object does not exist, with: git rev-parse --verify "$userstring^{}" for most objects, but this does not yield the tag object name when $userstring refers to an annotated tag. Introduce a new suffix, ^{object}, that only makes sure the given name refers to an existing object. Then git rev-parse --verify "$userstring^{object}" becomes a way to make sure $userstring refers to an existing object. Signed-off-by: Junio C Hamano --- sha1_name.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/sha1_name.c b/sha1_name.c index c50630a..85b6e75 100644 --- a/sha1_name.c +++ b/sha1_name.c @@ -594,7 +594,7 @@ struct object *peel_to_type(const char *name, int namelen, while (1) { if (!o || (!o->parsed && !parse_object(o->sha1))) return NULL; - if (o->type == expected_type) + if (expected_type == OBJ_ANY || o->type == expected_type) return o; if (o->type == OBJ_TAG) o = ((struct tag*) o)->tagged; @@ -645,6 +645,8 @@ static int peel_onion(const char *name, int len, unsigned char *sha1) expected_type = OBJ_TREE; else if (!strncmp(blob_type, sp, 4) && sp[4] == '}') expected_type = OBJ_BLOB; + else if (!prefixcmp(sp, "object}")) + expected_type = OBJ_ANY; else if (sp[0] == '}') expected_type = OBJ_NONE; else if (sp[0] == '/') @@ -654,6 +656,8 @@ static int peel_onion(const char *name, int len, unsigned char *sha1) if (expected_type == OBJ_COMMIT) lookup_flags = GET_SHA1_COMMITTISH; + else if (expected_type == OBJ_TREE) + lookup_flags = GET_SHA1_TREEISH; if (get_sha1_1(name, sp - name - 2, outer, lookup_flags)) return -1; -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Bug in "git rev-parse --verify"
Michael Haggerty writes: > On 03/30/2013 05:09 AM, Junio C Hamano wrote: >> So why not verify arguments while making sure of their type early >> with 'rev-parse --verify "$userinput^{desiredtype}"? > > Yes, that's a better solution in almost all cases. Thanks for the tip. > > (It doesn't change my opinion that the documentation for "rev-parse > --verify" is misleading, but given that you don't appear to want to > change its behavior I will submit a documentation patch.) It does not matter what I want. What matters is that changing the definition is a _wrong_ thing to do, as --verify is designed to be usable for objects you may not yet have. What we may want is another type peeling operator, ^{object}. makes sure it is an object, that lets you say: rev-parse --verify 572a535454612a046e7dd7404dcca94d6243c788^{object} It asks "I have this 40-hex; I want an object out of it", just like frotz^{tree} is "I have 'frotz'; I want a tree-ish" for any value of 'frotz'. With that, a use case that it wants to see _any_ object can safely use 'rev-parse --verify "$userinput^{object}' without an annotated tag getting in the way. How does that sound? -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Bug in "git rev-parse --verify"
On 03/30/2013 05:09 AM, Junio C Hamano wrote: > So why not verify arguments while making sure of their type early > with 'rev-parse --verify "$userinput^{desiredtype}"? Yes, that's a better solution in almost all cases. Thanks for the tip. (It doesn't change my opinion that the documentation for "rev-parse --verify" is misleading, but given that you don't appear to want to change its behavior I will submit a documentation patch.) Michael -- Michael Haggerty mhag...@alum.mit.edu http://softwareswirl.blogspot.com/ -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Bug in "git rev-parse --verify"
Junio C Hamano writes: >> 3. Verifying arguments at one spot centralizes error-checking at the >> start of a script and eliminates one reason for random git commands to >> fail later (when error recovery is perhaps more difficult). > > Not necessarily, unless your script is a very narrow corner case > that is satisfied with "any kind of object goes". Clarification. I meant "A single central point to check is a useful concept, but centrally checking 'does this exist? I do not care what it is' is not necessarily useful and I suspect it is not useful more often than it is". When you care if something exists, you often want to know what it is and other aspects of that thing at the same time. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Bug in "git rev-parse --verify"
Michael Haggerty writes: > 1. An SHA1 is a canonical representation of the argument, useful for > example as the key in a hash map for for looking for the presence of a > commit in a rev-list output. > > 2. An SHA1 is persistent. For example, I use them when caching > benchmark results across versions. Moreover, they are safe for use in > filenames. The persistence also makes scripts more robust against > simultaneous changes to the repo by other processes, whereas if I use a > string like "branch^" multiple times, there is no guarantee that it > always refers to the same commit. These two are half-irrelevant; they only call for use of the current "rev-parse --verify" that always gives you 40-hex. The more important one is the next one. > 3. Verifying arguments at one spot centralizes error-checking at the > start of a script and eliminates one reason for random git commands to > fail later (when error recovery is perhaps more difficult). Not necessarily, unless your script is a very narrow corner case that is satisfied with "any kind of object goes". When a parameter has to be commit for some purpose and another parameter can be any tree-ish, you would want to validate them _with_ type requirement. If you are using the "object name persistency" to create a cache of "how many times the word 'gogopuff' appears in the entire tree?", you want to cache the result keyed with the object name of the tree, whether your user gives you v1.8.0 (tag), v1.8.0^0 (commit), or v1.8.0^{tree} (tree), and you want to unwrap the user input down to tree object name to look up the pre-computed result for the cache to be any useful. > 4. Converting once avoids the overhead of repeated conversion from a > free-form committish into an object name if the argument needs to be > passed to multiple git commands (though presumably the overhead is > insignificant in most cases). True. So why not verify arguments while making sure of their type early with 'rev-parse --verify "$userinput^{desiredtype}"? -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Bug in "git rev-parse --verify"
On 03/28/2013 05:52 PM, Junio C Hamano wrote: > You could force rev-parse to resolve the input to an existing > object, with something like this: > > git rev-parse --verify "$ARG^{}" > > It will unwrap a tag, so the output may end up pointing at a object > that is different from $ARG in such a case. Yes, if unwrapping tags is OK then this would work. > But what is the purpose of turning a random string to a random > 40-hex in the first place? In non-trivial scripts, it makes sense to convert user input into a known and verified quantity (SHA1) once, while processing external inputs, and not have to think about it afterwards. Verifying and converting to pure-SHA1s as soon as possible has several advantages: 1. An SHA1 is a canonical representation of the argument, useful for example as the key in a hash map for for looking for the presence of a commit in a rev-list output. 2. An SHA1 is persistent. For example, I use them when caching benchmark results across versions. Moreover, they are safe for use in filenames. The persistence also makes scripts more robust against simultaneous changes to the repo by other processes, whereas if I use a string like "branch^" multiple times, there is no guarantee that it always refers to the same commit. 3. Verifying arguments at one spot centralizes error-checking at the start of a script and eliminates one reason for random git commands to fail later (when error recovery is perhaps more difficult). 4. Converting once avoids the overhead of repeated conversion from a free-form committish into an object name if the argument needs to be passed to multiple git commands (though presumably the overhead is insignificant in most cases). Undoubtedly in many cases this practice of early-verification-and-conversion is unnecessary, or the same benefit could be had from either verifying or converting without doing both. But verifying-and-converting is easy, cheap, and means not having to decide on a case-by-case basis whether it could have been avoided. Michael -- Michael Haggerty mhag...@alum.mit.edu http://softwareswirl.blogspot.com/ -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Bug in "git rev-parse --verify"
Jeff King writes: > On Thu, Mar 28, 2013 at 04:34:19PM +0100, Michael Haggerty wrote: > >> Is there a simple way to verify an object name more strictly and convert >> it to an SHA1? I can only think of solutions that require two commands, >> like >> >> git cat-file -e $ARG && git rev-parse --verify $ARG > > Is the rev-parse line doing anything there? If $ARG does not resolve to > a sha1, then wouldn't cat-file have failed? > > -Peff You could force rev-parse to resolve the input to an existing object, with something like this: git rev-parse --verify "$ARG^{}" It will unwrap a tag, so the output may end up pointing at a object that is different from $ARG in such a case. But what is the purpose of turning a random string to a random 40-hex in the first place? -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Bug in "git rev-parse --verify"
On Thu, Mar 28, 2013 at 04:52:15PM +0100, Michael Haggerty wrote: > On 03/28/2013 04:38 PM, Jeff King wrote: > > On Thu, Mar 28, 2013 at 04:34:19PM +0100, Michael Haggerty wrote: > > > >> Is there a simple way to verify an object name more strictly and convert > >> it to an SHA1? I can only think of solutions that require two commands, > >> like > >> > >> git cat-file -e $ARG && git rev-parse --verify $ARG > > > > Is the rev-parse line doing anything there? If $ARG does not resolve to > > a sha1, then wouldn't cat-file have failed? > > It's outputting the SHA1, which cat-file seems incapable of providing in > a useful way. Ah, I see; I was looking too much at your example, and not thinking about how you would want to use it in a script. -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Bug in "git rev-parse --verify"
On 03/28/2013 04:38 PM, Jeff King wrote: > On Thu, Mar 28, 2013 at 04:34:19PM +0100, Michael Haggerty wrote: > >> Is there a simple way to verify an object name more strictly and convert >> it to an SHA1? I can only think of solutions that require two commands, >> like >> >> git cat-file -e $ARG && git rev-parse --verify $ARG > > Is the rev-parse line doing anything there? If $ARG does not resolve to > a sha1, then wouldn't cat-file have failed? It's outputting the SHA1, which cat-file seems incapable of providing in a useful way. Michael -- Michael Haggerty mhag...@alum.mit.edu http://softwareswirl.blogspot.com/ -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Bug in "git rev-parse --verify"
On Thu, Mar 28, 2013 at 04:34:19PM +0100, Michael Haggerty wrote: > Is there a simple way to verify an object name more strictly and convert > it to an SHA1? I can only think of solutions that require two commands, > like > > git cat-file -e $ARG && git rev-parse --verify $ARG Is the rev-parse line doing anything there? If $ARG does not resolve to a sha1, then wouldn't cat-file have failed? -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Bug in "git rev-parse --verify"
On 03/28/2013 02:48 PM, Junio C Hamano wrote: > I think it has always been about "is this well formed and we can turn it > into a raw 20-byte object name?" and never about"does it exist?" That's surprising. The man page says --verify The parameter given must be usable as a single, valid object name. Otherwise barf and abort. "Valid", to me, implies that the parameter should be the name of an actual object, and this also seems a more useful concept to me and more consistent with the command's behavior when passed other arguments. Is there a simple way to verify an object name more strictly and convert it to an SHA1? I can only think of solutions that require two commands, like git cat-file -e $ARG && git rev-parse --verify $ARG I suppose in most contexts where one wants to know whether an object name is valid, one should also verify that the object has the type that you expect: test X$(git cat-file -t $ARG) = Xcommit && git rev-parse --verify $ARG or (allowing tag dereferencing) git cat-file -e $ARG^{commit} && git rev-parse --verify $ARG^{commit} Michael -- Michael Haggerty mhag...@alum.mit.edu http://softwareswirl.blogspot.com/ -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Bug in "git rev-parse --verify"
Jeff King writes: > On Thu, Mar 28, 2013 at 02:04:27PM +0100, Michael Haggerty wrote: > >> $ git rev-parse --verify >> >> $ echo $? >> 0 >> >> [...] >> >> I believe that "git rev-parse --verify" is meant to verify that the >> argument is an actual object, and that it should reject fictional SHA1s. >> (If not then the documentation should be clarified.) The same problem >> also exists in 1.8.2 but I haven't checked how much older it is. > > I think it is only about verifying the name of the object. I.e., that > you can resolve the argument to an object. It has always behaved this > way; I even just tested with git v0.99. Correct. It is about "is it well formed and something we can turn into 20-byte object name?" and nothing more. It certainly does not mean "do we have it?", as the function needs to be able to validate something we do not yet have. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Bug in "git rev-parse --verify"
On Thu, Mar 28, 2013 at 02:04:27PM +0100, Michael Haggerty wrote: > $ git rev-parse --verify > > $ echo $? > 0 > > [...] > > I believe that "git rev-parse --verify" is meant to verify that the > argument is an actual object, and that it should reject fictional SHA1s. > (If not then the documentation should be clarified.) The same problem > also exists in 1.8.2 but I haven't checked how much older it is. I think it is only about verifying the name of the object. I.e., that you can resolve the argument to an object. It has always behaved this way; I even just tested with git v0.99. I can't think of any reason it would not be helpful to have it _also_ verify that we have the object in question. Most of the time that would be a no-op, as any ref we resolve should point to a valid object, but it would mean we could catch repository corruption (i.e., missing objects) early. Doing a has_sha1_file() check would be relatively quick. Doing a full parse_object and checking the sha1 would be more robust, but would mean that: blob=`git rev-parse --verify HEAD:some-large-file` would become much more expensive (and presumably you are about to access the object _anyway_, at which point you would notice problems, so it is redundantly expensive). -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Bug in "git rev-parse --verify"
On Junio's master, "git rev-parse --verify" accepts *any* 40-digit hexadecimal number. For example, pass it 40 "1" characters, and it accepts the argument: $ git rev-parse --verify $ echo $? 0 Obviously, my repo doesn't have an object with this hash :-) so I think this argument should be rejected. If you add or remove a digit (to make the length different than 40), it is correctly rejected: $ git rev-parse --verify 111 fatal: Needed a single revision $ echo $? 128 I believe that "git rev-parse --verify" is meant to verify that the argument is an actual object, and that it should reject fictional SHA1s. (If not then the documentation should be clarified.) The same problem also exists in 1.8.2 but I haven't checked how much older it is. The behavior presumably comes from the following clause in get_sha1_basic(): if (len == 40 && !get_sha1_hex(str, sha1)) return 0; I won't have time to pursue this. Michael -- Michael Haggerty mhag...@alum.mit.edu http://softwareswirl.blogspot.com/ -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html