Re: Bug in "git rev-parse --verify"

2013-03-30 Thread Elia Pinto
Fwiw, look very a sound idea for me.

Best

2013/3/30, Junio C Hamano :
> Junio C Hamano  writes:
>
>> What we may want is another type peeling operator, ^{object}.
>> that makes sure it is an object, like this:
>>
>> rev-parse --verify 572a535454612a046e7dd7404dcca94d6243c788^{object}
>>
>> It asks "I have this 40-hex; I want an object out of it", just like
>> frotz^{tree} is "I have 'frotz'; I want a tree-ish" for any value of
>> 'frotz'.
>>
>> With that, a use case that it wants to see _any_ object can safely
>> use 'rev-parse --verify "$userinput^{object}' without an annotated
>> tag getting in the way.
>>
>> How does that sound?
>
> Perhaps something like this.  Note that the last hunk is unrelated
> thinko-fix I noticed while browsing the code.
>
> -- >8 --
> Subject: sha1_name.c: ^{object} peeler
>
> A string that names an object can be suffixed with ^{type} peeler to
> say "I have this object name; peel it until you get this type. If
> you cannot do so, it is an error".  v1.8.2^{commit} asks for a commit
> that is pointed at an annotated tag v1.8.2; v1.8.2^{tree} unwraps it
> further to the top-level tree object.  A special suffix ^{} (i.e. no
> type specified) means "I do not care what it unwraps to; just peel
> annotated tag until you get something that is not a tag".
>
> When you have a random user-supplied string, you can turn it to a
> bare 40-hex object name, and cause it to error out if such an object
> does not exist, with:
>
>   git rev-parse --verify "$userstring^{}"
>
> for most objects, but this does not yield the tag object name when
> $userstring refers to an annotated tag.
>
> Introduce a new suffix, ^{object}, that only makes sure the given
> name refers to an existing object.  Then
>
>   git rev-parse --verify "$userstring^{object}"
>
> becomes a way to make sure $userstring refers to an existing object.
>
> Signed-off-by: Junio C Hamano 
> ---
>  sha1_name.c | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/sha1_name.c b/sha1_name.c
> index c50630a..85b6e75 100644
> --- a/sha1_name.c
> +++ b/sha1_name.c
> @@ -594,7 +594,7 @@ struct object *peel_to_type(const char *name, int
> namelen,
>   while (1) {
>   if (!o || (!o->parsed && !parse_object(o->sha1)))
>   return NULL;
> - if (o->type == expected_type)
> + if (expected_type == OBJ_ANY || o->type == expected_type)
>   return o;
>   if (o->type == OBJ_TAG)
>   o = ((struct tag*) o)->tagged;
> @@ -645,6 +645,8 @@ static int peel_onion(const char *name, int len,
> unsigned char *sha1)
>   expected_type = OBJ_TREE;
>   else if (!strncmp(blob_type, sp, 4) && sp[4] == '}')
>   expected_type = OBJ_BLOB;
> + else if (!prefixcmp(sp, "object}"))
> + expected_type = OBJ_ANY;
>   else if (sp[0] == '}')
>   expected_type = OBJ_NONE;
>   else if (sp[0] == '/')
> @@ -654,6 +656,8 @@ static int peel_onion(const char *name, int len,
> unsigned char *sha1)
>
>   if (expected_type == OBJ_COMMIT)
>   lookup_flags = GET_SHA1_COMMITTISH;
> + else if (expected_type == OBJ_TREE)
> + lookup_flags = GET_SHA1_TREEISH;
>
>   if (get_sha1_1(name, sp - name - 2, outer, lookup_flags))
>   return -1;
> --
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

-- 
Inviato dal mio dispositivo mobile
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Bug in "git rev-parse --verify"

2013-03-30 Thread Michael Haggerty
On 03/30/2013 08:05 AM, Junio C Hamano wrote:
> Junio C Hamano  writes:
> 
>> What we may want is another type peeling operator, ^{object}.
>> that makes sure it is an object, like this:
>>
>> rev-parse --verify 572a535454612a046e7dd7404dcca94d6243c788^{object}
>>
>> It asks "I have this 40-hex; I want an object out of it", just like
>> frotz^{tree} is "I have 'frotz'; I want a tree-ish" for any value of
>> 'frotz'.
>>
>> With that, a use case that it wants to see _any_ object can safely
>> use 'rev-parse --verify "$userinput^{object}' without an annotated
>> tag getting in the way.
>>
>> How does that sound?
> 
> Perhaps something like this.  Note that the last hunk is unrelated
> thinko-fix I noticed while browsing the code.

Sounds reasonable to me.  I'm not familiar with this code, but your
change looks simple enough.  Plus documentation change in
Documentation/revisions.txt, of course.

Thanks,
Michael

-- 
Michael Haggerty
mhag...@alum.mit.edu
http://softwareswirl.blogspot.com/
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Bug in "git rev-parse --verify"

2013-03-30 Thread Junio C Hamano
Junio C Hamano  writes:

> What we may want is another type peeling operator, ^{object}.
> that makes sure it is an object, like this:
>
> rev-parse --verify 572a535454612a046e7dd7404dcca94d6243c788^{object}
>
> It asks "I have this 40-hex; I want an object out of it", just like
> frotz^{tree} is "I have 'frotz'; I want a tree-ish" for any value of
> 'frotz'.
>
> With that, a use case that it wants to see _any_ object can safely
> use 'rev-parse --verify "$userinput^{object}' without an annotated
> tag getting in the way.
>
> How does that sound?

Perhaps something like this.  Note that the last hunk is unrelated
thinko-fix I noticed while browsing the code.

-- >8 --
Subject: sha1_name.c: ^{object} peeler

A string that names an object can be suffixed with ^{type} peeler to
say "I have this object name; peel it until you get this type. If
you cannot do so, it is an error".  v1.8.2^{commit} asks for a commit
that is pointed at an annotated tag v1.8.2; v1.8.2^{tree} unwraps it
further to the top-level tree object.  A special suffix ^{} (i.e. no
type specified) means "I do not care what it unwraps to; just peel
annotated tag until you get something that is not a tag".

When you have a random user-supplied string, you can turn it to a
bare 40-hex object name, and cause it to error out if such an object
does not exist, with:

git rev-parse --verify "$userstring^{}"

for most objects, but this does not yield the tag object name when
$userstring refers to an annotated tag.

Introduce a new suffix, ^{object}, that only makes sure the given
name refers to an existing object.  Then

git rev-parse --verify "$userstring^{object}"

becomes a way to make sure $userstring refers to an existing object.

Signed-off-by: Junio C Hamano 
---
 sha1_name.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/sha1_name.c b/sha1_name.c
index c50630a..85b6e75 100644
--- a/sha1_name.c
+++ b/sha1_name.c
@@ -594,7 +594,7 @@ struct object *peel_to_type(const char *name, int namelen,
while (1) {
if (!o || (!o->parsed && !parse_object(o->sha1)))
return NULL;
-   if (o->type == expected_type)
+   if (expected_type == OBJ_ANY || o->type == expected_type)
return o;
if (o->type == OBJ_TAG)
o = ((struct tag*) o)->tagged;
@@ -645,6 +645,8 @@ static int peel_onion(const char *name, int len, unsigned 
char *sha1)
expected_type = OBJ_TREE;
else if (!strncmp(blob_type, sp, 4) && sp[4] == '}')
expected_type = OBJ_BLOB;
+   else if (!prefixcmp(sp, "object}"))
+   expected_type = OBJ_ANY;
else if (sp[0] == '}')
expected_type = OBJ_NONE;
else if (sp[0] == '/')
@@ -654,6 +656,8 @@ static int peel_onion(const char *name, int len, unsigned 
char *sha1)
 
if (expected_type == OBJ_COMMIT)
lookup_flags = GET_SHA1_COMMITTISH;
+   else if (expected_type == OBJ_TREE)
+   lookup_flags = GET_SHA1_TREEISH;
 
if (get_sha1_1(name, sp - name - 2, outer, lookup_flags))
return -1;
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Bug in "git rev-parse --verify"

2013-03-29 Thread Junio C Hamano
Michael Haggerty  writes:

> On 03/30/2013 05:09 AM, Junio C Hamano wrote:
>> So why not verify arguments while making sure of their type early
>> with 'rev-parse --verify "$userinput^{desiredtype}"?
>
> Yes, that's a better solution in almost all cases.  Thanks for the tip.
>
> (It doesn't change my opinion that the documentation for "rev-parse
> --verify" is misleading, but given that you don't appear to want to
> change its behavior I will submit a documentation patch.)

It does not matter what I want.  What matters is that changing the
definition is a _wrong_ thing to do, as --verify is designed to be
usable for objects you may not yet have.

What we may want is another type peeling operator, ^{object}.
makes sure it is an object, that lets you say:

rev-parse --verify 572a535454612a046e7dd7404dcca94d6243c788^{object}

It asks "I have this 40-hex; I want an object out of it", just like
frotz^{tree} is "I have 'frotz'; I want a tree-ish" for any value of
'frotz'.

With that, a use case that it wants to see _any_ object can safely
use 'rev-parse --verify "$userinput^{object}' without an annotated
tag getting in the way.

How does that sound?
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Bug in "git rev-parse --verify"

2013-03-29 Thread Michael Haggerty
On 03/30/2013 05:09 AM, Junio C Hamano wrote:
> So why not verify arguments while making sure of their type early
> with 'rev-parse --verify "$userinput^{desiredtype}"?

Yes, that's a better solution in almost all cases.  Thanks for the tip.

(It doesn't change my opinion that the documentation for "rev-parse
--verify" is misleading, but given that you don't appear to want to
change its behavior I will submit a documentation patch.)

Michael

-- 
Michael Haggerty
mhag...@alum.mit.edu
http://softwareswirl.blogspot.com/
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Bug in "git rev-parse --verify"

2013-03-29 Thread Junio C Hamano
Junio C Hamano  writes:

>> 3. Verifying arguments at one spot centralizes error-checking at the
>> start of a script and eliminates one reason for random git commands to
>> fail later (when error recovery is perhaps more difficult).
>
> Not necessarily, unless your script is a very narrow corner case
> that is satisfied with "any kind of object goes".

Clarification.  I meant "A single central point to check is a useful
concept, but centrally checking 'does this exist? I do not care what
it is' is not necessarily useful and I suspect it is not useful more
often than it is".  When you care if something exists, you often
want to know what it is and other aspects of that thing at the same
time.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Bug in "git rev-parse --verify"

2013-03-29 Thread Junio C Hamano
Michael Haggerty  writes:

> 1. An SHA1 is a canonical representation of the argument, useful for
> example as the key in a hash map for for looking for the presence of a
> commit in a rev-list output.
>
> 2. An SHA1 is persistent.  For example, I use them when caching
> benchmark results across versions.  Moreover, they are safe for use in
> filenames.  The persistence also makes scripts more robust against
> simultaneous changes to the repo by other processes, whereas if I use a
> string like "branch^" multiple times, there is no guarantee that it
> always refers to the same commit.

These two are half-irrelevant; they only call for use of the current
"rev-parse --verify" that always gives you 40-hex.  The more
important one is the next one.

> 3. Verifying arguments at one spot centralizes error-checking at the
> start of a script and eliminates one reason for random git commands to
> fail later (when error recovery is perhaps more difficult).

Not necessarily, unless your script is a very narrow corner case
that is satisfied with "any kind of object goes".  When a parameter
has to be commit for some purpose and another parameter can be any
tree-ish, you would want to validate them _with_ type requirement.

If you are using the "object name persistency" to create a cache of
"how many times the word 'gogopuff' appears in the entire tree?",
you want to cache the result keyed with the object name of the tree,
whether your user gives you v1.8.0 (tag), v1.8.0^0 (commit), or
v1.8.0^{tree} (tree), and you want to unwrap the user input down to
tree object name to look up the pre-computed result for the cache to
be any useful.

> 4. Converting once avoids the overhead of repeated conversion from a
> free-form committish into an object name if the argument needs to be
> passed to multiple git commands (though presumably the overhead is
> insignificant in most cases).

True.

So why not verify arguments while making sure of their type early
with 'rev-parse --verify "$userinput^{desiredtype}"?


--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Bug in "git rev-parse --verify"

2013-03-29 Thread Michael Haggerty
On 03/28/2013 05:52 PM, Junio C Hamano wrote:
> You could force rev-parse to resolve the input to an existing
> object, with something like this:
> 
>   git rev-parse --verify "$ARG^{}"
> 
> It will unwrap a tag, so the output may end up pointing at a object
> that is different from $ARG in such a case.

Yes, if unwrapping tags is OK then this would work.

> But what is the purpose of turning a random string to a random
> 40-hex in the first place?

In non-trivial scripts, it makes sense to convert user input into a
known and verified quantity (SHA1) once, while processing external
inputs, and not have to think about it afterwards.  Verifying and
converting to pure-SHA1s as soon as possible has several advantages:

1. An SHA1 is a canonical representation of the argument, useful for
example as the key in a hash map for for looking for the presence of a
commit in a rev-list output.

2. An SHA1 is persistent.  For example, I use them when caching
benchmark results across versions.  Moreover, they are safe for use in
filenames.  The persistence also makes scripts more robust against
simultaneous changes to the repo by other processes, whereas if I use a
string like "branch^" multiple times, there is no guarantee that it
always refers to the same commit.

3. Verifying arguments at one spot centralizes error-checking at the
start of a script and eliminates one reason for random git commands to
fail later (when error recovery is perhaps more difficult).

4. Converting once avoids the overhead of repeated conversion from a
free-form committish into an object name if the argument needs to be
passed to multiple git commands (though presumably the overhead is
insignificant in most cases).

Undoubtedly in many cases this practice of
early-verification-and-conversion is unnecessary, or the same benefit
could be had from either verifying or converting without doing both.
But verifying-and-converting is easy, cheap, and means not having to
decide on a case-by-case basis whether it could have been avoided.

Michael

-- 
Michael Haggerty
mhag...@alum.mit.edu
http://softwareswirl.blogspot.com/
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Bug in "git rev-parse --verify"

2013-03-28 Thread Junio C Hamano
Jeff King  writes:

> On Thu, Mar 28, 2013 at 04:34:19PM +0100, Michael Haggerty wrote:
>
>> Is there a simple way to verify an object name more strictly and convert
>> it to an SHA1?  I can only think of solutions that require two commands,
>> like
>> 
>> git cat-file -e $ARG && git rev-parse --verify $ARG
>
> Is the rev-parse line doing anything there? If $ARG does not resolve to
> a sha1, then wouldn't cat-file have failed?
>
> -Peff

You could force rev-parse to resolve the input to an existing
object, with something like this:

git rev-parse --verify "$ARG^{}"

It will unwrap a tag, so the output may end up pointing at a object
that is different from $ARG in such a case.

But what is the purpose of turning a random string to a random
40-hex in the first place?
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Bug in "git rev-parse --verify"

2013-03-28 Thread Jeff King
On Thu, Mar 28, 2013 at 04:52:15PM +0100, Michael Haggerty wrote:

> On 03/28/2013 04:38 PM, Jeff King wrote:
> > On Thu, Mar 28, 2013 at 04:34:19PM +0100, Michael Haggerty wrote:
> > 
> >> Is there a simple way to verify an object name more strictly and convert
> >> it to an SHA1?  I can only think of solutions that require two commands,
> >> like
> >>
> >> git cat-file -e $ARG && git rev-parse --verify $ARG
> > 
> > Is the rev-parse line doing anything there? If $ARG does not resolve to
> > a sha1, then wouldn't cat-file have failed?
> 
> It's outputting the SHA1, which cat-file seems incapable of providing in
> a useful way.

Ah, I see; I was looking too much at your example, and not thinking
about how you would want to use it in a script.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Bug in "git rev-parse --verify"

2013-03-28 Thread Michael Haggerty
On 03/28/2013 04:38 PM, Jeff King wrote:
> On Thu, Mar 28, 2013 at 04:34:19PM +0100, Michael Haggerty wrote:
> 
>> Is there a simple way to verify an object name more strictly and convert
>> it to an SHA1?  I can only think of solutions that require two commands,
>> like
>>
>> git cat-file -e $ARG && git rev-parse --verify $ARG
> 
> Is the rev-parse line doing anything there? If $ARG does not resolve to
> a sha1, then wouldn't cat-file have failed?

It's outputting the SHA1, which cat-file seems incapable of providing in
a useful way.

Michael

-- 
Michael Haggerty
mhag...@alum.mit.edu
http://softwareswirl.blogspot.com/
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Bug in "git rev-parse --verify"

2013-03-28 Thread Jeff King
On Thu, Mar 28, 2013 at 04:34:19PM +0100, Michael Haggerty wrote:

> Is there a simple way to verify an object name more strictly and convert
> it to an SHA1?  I can only think of solutions that require two commands,
> like
> 
> git cat-file -e $ARG && git rev-parse --verify $ARG

Is the rev-parse line doing anything there? If $ARG does not resolve to
a sha1, then wouldn't cat-file have failed?

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Bug in "git rev-parse --verify"

2013-03-28 Thread Michael Haggerty
On 03/28/2013 02:48 PM, Junio C Hamano wrote:
> I think it has always been about "is this well formed and we can turn it
> into a raw 20-byte object name?" and never about"does it exist?"

That's surprising.  The man page says

--verify
The parameter given must be usable as a single, valid object name.
Otherwise barf and abort.

"Valid", to me, implies that the parameter should be the name of an
actual object, and this also seems a more useful concept to me and more
consistent with the command's behavior when passed other arguments.


Is there a simple way to verify an object name more strictly and convert
it to an SHA1?  I can only think of solutions that require two commands,
like

git cat-file -e $ARG && git rev-parse --verify $ARG

I suppose in most contexts where one wants to know whether an object
name is valid, one should also verify that the object has the type that
you expect:

test X$(git cat-file -t $ARG) = Xcommit &&
git rev-parse --verify $ARG

or (allowing tag dereferencing)

git cat-file -e $ARG^{commit} &&
git rev-parse --verify $ARG^{commit}

Michael

-- 
Michael Haggerty
mhag...@alum.mit.edu
http://softwareswirl.blogspot.com/
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Bug in "git rev-parse --verify"

2013-03-28 Thread Junio C Hamano
Jeff King  writes:

> On Thu, Mar 28, 2013 at 02:04:27PM +0100, Michael Haggerty wrote:
>
>> $ git rev-parse --verify 
>> 
>> $ echo $?
>> 0
>>
>> [...]
>>
>> I believe that "git rev-parse --verify" is meant to verify that the
>> argument is an actual object, and that it should reject fictional SHA1s.
>>  (If not then the documentation should be clarified.)  The same problem
>> also exists in 1.8.2 but I haven't checked how much older it is.
>
> I think it is only about verifying the name of the object. I.e., that
> you can resolve the argument to an object. It has always behaved this
> way; I even just tested with git v0.99.

Correct.  It is about "is it well formed and something we can turn
into 20-byte object name?" and nothing more.  It certainly does not
mean "do we have it?", as the function needs to be able to validate
something we do not yet have.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Bug in "git rev-parse --verify"

2013-03-28 Thread Jeff King
On Thu, Mar 28, 2013 at 02:04:27PM +0100, Michael Haggerty wrote:

> $ git rev-parse --verify 
> 
> $ echo $?
> 0
>
> [...]
>
> I believe that "git rev-parse --verify" is meant to verify that the
> argument is an actual object, and that it should reject fictional SHA1s.
>  (If not then the documentation should be clarified.)  The same problem
> also exists in 1.8.2 but I haven't checked how much older it is.

I think it is only about verifying the name of the object. I.e., that
you can resolve the argument to an object. It has always behaved this
way; I even just tested with git v0.99.

I can't think of any reason it would not be helpful to have it _also_
verify that we have the object in question. Most of the time that would
be a no-op, as any ref we resolve should point to a valid object, but it
would mean we could catch repository corruption (i.e., missing objects)
early.

Doing a has_sha1_file() check would be relatively quick. Doing a full
parse_object and checking the sha1 would be more robust, but would mean
that:

  blob=`git rev-parse --verify HEAD:some-large-file`

would become much more expensive (and presumably you are about to access
the object _anyway_, at which point you would notice problems, so it is
redundantly expensive).

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Bug in "git rev-parse --verify"

2013-03-28 Thread Michael Haggerty
On Junio's master, "git rev-parse --verify" accepts *any* 40-digit
hexadecimal number.  For example, pass it 40 "1" characters, and it
accepts the argument:

$ git rev-parse --verify 

$ echo $?
0

Obviously, my repo doesn't have an object with this hash :-) so I think
this argument should be rejected.

If you add or remove a digit (to make the length different than 40), it
is correctly rejected:

$ git rev-parse --verify 111
fatal: Needed a single revision
$ echo $?
128

I believe that "git rev-parse --verify" is meant to verify that the
argument is an actual object, and that it should reject fictional SHA1s.
 (If not then the documentation should be clarified.)  The same problem
also exists in 1.8.2 but I haven't checked how much older it is.

The behavior presumably comes from the following clause in get_sha1_basic():

if (len == 40 && !get_sha1_hex(str, sha1))
return 0;

I won't have time to pursue this.

Michael

-- 
Michael Haggerty
mhag...@alum.mit.edu
http://softwareswirl.blogspot.com/
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html