Re: [RFC/PATCH 4/4] grep: obey --textconv for the case rev:path

2013-02-08 Thread Michael J Gruber
Junio C Hamano venit, vidit, dixit 07.02.2013 19:03:
> Michael J Gruber  writes:
> 
 (cd t && git grep GET_SHA1_QUIETLY HEAD:../cache.h)
 ../HEAD:../cache.h:#define GET_SHA1_QUIETLY01
>>>
>>> Yuck.
>>
>> And even more yuck:
>>
>> (cd t && git grep --full-name GET_SHA1_QUIETLY HEAD:../cache.h)
>> HEAD:../cache.h:#define GET_SHA1_QUIETLY01
>>
>> Someone does not expect a "rev:" to be in there, it seems ;)
> 
> I think stepping outside of $(cwd) is an afterthought the code does
> not anticipate.
> 

Well, we do resolve relative paths correctly, and there are even some
"chdir" in the code path. It's just that the output label is incorrect.

Michael
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH 4/4] grep: obey --textconv for the case rev:path

2013-02-07 Thread Junio C Hamano
Michael J Gruber  writes:

>>> (cd t && git grep GET_SHA1_QUIETLY HEAD:../cache.h)
>>> ../HEAD:../cache.h:#define GET_SHA1_QUIETLY01
>> 
>> Yuck.
>
> And even more yuck:
>
> (cd t && git grep --full-name GET_SHA1_QUIETLY HEAD:../cache.h)
> HEAD:../cache.h:#define GET_SHA1_QUIETLY01
>
> Someone does not expect a "rev:" to be in there, it seems ;)

I think stepping outside of $(cwd) is an afterthought the code does
not anticipate.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH 4/4] grep: obey --textconv for the case rev:path

2013-02-07 Thread Michael J Gruber
Jeff King venit, vidit, dixit 07.02.2013 10:55:
> On Thu, Feb 07, 2013 at 10:47:55AM +0100, Michael J Gruber wrote:
> 
>>> I'd be OK if we had an exterior object_context that could be handled
>>> in the same way. But how do we tell setup_revisions that we are
>>> interested in seeing the object_context from each parsed item, where
>>> does the allocation come from (is it malloc'd by setup_revisions?), and
>>> who is responsible for freeing it when we pop pending objects in
>>> get_revisions and similar?
>>
>> Do we really need all of tree, path and mode in object_context (I mean
>> not just here, but other users), or only the path? I'd try and resurrect
>> the virtual path name objects then, they would be just like "item"
>> storage-wise.
> 
> We need at least mode, since that is how the mode parameter of
> object_array_entry gets set. I do not know off-hand who uses "tree". I
> suspect the intent was to do .gitattributes lookups inside that tree,
> but I do not think we actually do in-tree lookups currently.
> 
>>> I don't think it's as clear cut.
>>>
>>> I wonder, though...what we really care about here is just the pathname.
>>> But if it is a pending object that comes from a blob revision argument,
>>> won't it always be of the form "treeish:path"? Could we not even resolve
>>> the sha1 again, but instead just parse out the ":path" bit?
>>
>> Do we have that, and in what form (e.g. magic expanded etc.)?
> 
> Ah, I should have mentioned that. :) We should have the original rev
> name in the object_array_entry's name field, shouldn't we? It's just a
> matter of re-parsing it.
> 
>> Another thing I noted is that our path mangling at least for grep has
>> some issues:
>>
>> (cd t && git grep GET_SHA1_QUIETLY HEAD:../cache.h)
>> ../HEAD:../cache.h:#define GET_SHA1_QUIETLY01
> 
> Yuck.

And even more yuck:

(cd t && git grep --full-name GET_SHA1_QUIETLY HEAD:../cache.h)
HEAD:../cache.h:#define GET_SHA1_QUIETLY01

Someone does not expect a "rev:" to be in there, it seems ;)

Michael
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH 4/4] grep: obey --textconv for the case rev:path

2013-02-07 Thread Jeff King
On Thu, Feb 07, 2013 at 10:47:55AM +0100, Michael J Gruber wrote:

> > I'd be OK if we had an exterior object_context that could be handled
> > in the same way. But how do we tell setup_revisions that we are
> > interested in seeing the object_context from each parsed item, where
> > does the allocation come from (is it malloc'd by setup_revisions?), and
> > who is responsible for freeing it when we pop pending objects in
> > get_revisions and similar?
> 
> Do we really need all of tree, path and mode in object_context (I mean
> not just here, but other users), or only the path? I'd try and resurrect
> the virtual path name objects then, they would be just like "item"
> storage-wise.

We need at least mode, since that is how the mode parameter of
object_array_entry gets set. I do not know off-hand who uses "tree". I
suspect the intent was to do .gitattributes lookups inside that tree,
but I do not think we actually do in-tree lookups currently.

> > I don't think it's as clear cut.
> > 
> > I wonder, though...what we really care about here is just the pathname.
> > But if it is a pending object that comes from a blob revision argument,
> > won't it always be of the form "treeish:path"? Could we not even resolve
> > the sha1 again, but instead just parse out the ":path" bit?
> 
> Do we have that, and in what form (e.g. magic expanded etc.)?

Ah, I should have mentioned that. :) We should have the original rev
name in the object_array_entry's name field, shouldn't we? It's just a
matter of re-parsing it.

> Another thing I noted is that our path mangling at least for grep has
> some issues:
> 
> (cd t && git grep GET_SHA1_QUIETLY HEAD:../cache.h)
> ../HEAD:../cache.h:#define GET_SHA1_QUIETLY01

Yuck.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH 4/4] grep: obey --textconv for the case rev:path

2013-02-07 Thread Michael J Gruber
Jeff King venit, vidit, dixit 07.02.2013 10:26:
> On Thu, Feb 07, 2013 at 10:05:57AM +0100, Michael J Gruber wrote:
> 
 @@ -265,9 +260,28 @@ void add_object_array_with_mode(struct object *obj, 
 const char *name, struct obj
objects[nr].item = obj;
objects[nr].name = name;
objects[nr].mode = mode;
 +  objects[nr].context = context;
array->nr = ++nr;
  }
>>>
>>> This seems a little gross. Who is responsible for allocating the
>>> context? Who frees it? It looks like we duplicate it in cmd_grep. Which
>>
>> Well, who is responsible for allocating and freeing name and item? I
>> didn't want to introduce a new member which is a struct when all other
>> complex members are pointers. Wouldn't that be confusing?
> 
> We cheat on those two. "item" is always a pointer to a "struct object",
> which lasts forever and never gets freed. When "name" is set by
> setup_revisions, it comes from the argv list, which is assumed to last
> forever (and when we add pending blobs for a "--objects" traversal, it
> is the empty string (literal).

I see, so they are really different.

> I'd be OK if we had an exterior object_context that could be handled
> in the same way. But how do we tell setup_revisions that we are
> interested in seeing the object_context from each parsed item, where
> does the allocation come from (is it malloc'd by setup_revisions?), and
> who is responsible for freeing it when we pop pending objects in
> get_revisions and similar?

Do we really need all of tree, path and mode in object_context (I mean
not just here, but other users), or only the path? I'd try and resurrect
the virtual path name objects then, they would be just like "item"
storage-wise.

> I don't think it's as clear cut.
> 
> I wonder, though...what we really care about here is just the pathname.
> But if it is a pending object that comes from a blob revision argument,
> won't it always be of the form "treeish:path"? Could we not even resolve
> the sha1 again, but instead just parse out the ":path" bit?

Do we have that, and in what form (e.g. magic expanded etc.)?

> That is sort of like what the repeated call to get_sha1_with_context
> does in your first patch. Except that we do not actually want to lookup
> the sha1, and it is harmful to do so (e.g., if the ref had moved on to a
> new tree that does not have that path, get_sha1 would fail, but we do
> not even care what is in the tree; we only want the parsing side effects
> of get_sha1).
> 
> Hmm.
> 
> -Peff
> 
> PS By the way, while looking at the object_array code (which I have not
>really used much before), I noticed that add_pending_commit_list sets
>the "name" field to the result of sha1_to_hex. Which means that it is
>likely to be completely bogus by the time you read it. I'm not even
>sure where it gets read or if this matters. And obviously it's
>completely unrelated to what we were discussing; just something I
>noticed.

Another thing I noted is that our path mangling at least for grep has
some issues:

(cd t && git grep GET_SHA1_QUIETLY HEAD:../cache.h)
../HEAD:../cache.h:#define GET_SHA1_QUIETLY01

Taking everything right of ":" could still work.

Michael
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH 4/4] grep: obey --textconv for the case rev:path

2013-02-07 Thread Jeff King
On Thu, Feb 07, 2013 at 10:05:57AM +0100, Michael J Gruber wrote:

> >> @@ -265,9 +260,28 @@ void add_object_array_with_mode(struct object *obj, 
> >> const char *name, struct obj
> >>objects[nr].item = obj;
> >>objects[nr].name = name;
> >>objects[nr].mode = mode;
> >> +  objects[nr].context = context;
> >>array->nr = ++nr;
> >>  }
> > 
> > This seems a little gross. Who is responsible for allocating the
> > context? Who frees it? It looks like we duplicate it in cmd_grep. Which
> 
> Well, who is responsible for allocating and freeing name and item? I
> didn't want to introduce a new member which is a struct when all other
> complex members are pointers. Wouldn't that be confusing?

We cheat on those two. "item" is always a pointer to a "struct object",
which lasts forever and never gets freed. When "name" is set by
setup_revisions, it comes from the argv list, which is assumed to last
forever (and when we add pending blobs for a "--objects" traversal, it
is the empty string (literal).

I'd be OK if we had an exterior object_context that could be handled
in the same way. But how do we tell setup_revisions that we are
interested in seeing the object_context from each parsed item, where
does the allocation come from (is it malloc'd by setup_revisions?), and
who is responsible for freeing it when we pop pending objects in
get_revisions and similar?

I don't think it's as clear cut.

I wonder, though...what we really care about here is just the pathname.
But if it is a pending object that comes from a blob revision argument,
won't it always be of the form "treeish:path"? Could we not even resolve
the sha1 again, but instead just parse out the ":path" bit?

That is sort of like what the repeated call to get_sha1_with_context
does in your first patch. Except that we do not actually want to lookup
the sha1, and it is harmful to do so (e.g., if the ref had moved on to a
new tree that does not have that path, get_sha1 would fail, but we do
not even care what is in the tree; we only want the parsing side effects
of get_sha1).

Hmm.

-Peff

PS By the way, while looking at the object_array code (which I have not
   really used much before), I noticed that add_pending_commit_list sets
   the "name" field to the result of sha1_to_hex. Which means that it is
   likely to be completely bogus by the time you read it. I'm not even
   sure where it gets read or if this matters. And obviously it's
   completely unrelated to what we were discussing; just something I
   noticed.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH 4/4] grep: obey --textconv for the case rev:path

2013-02-07 Thread Michael J Gruber
Jeff King venit, vidit, dixit 06.02.2013 23:36:
> On Wed, Feb 06, 2013 at 04:08:53PM +0100, Michael J Gruber wrote:
> 
>> -add_object_array(object, arg, &list);
>> +add_object_array_with_context(object, arg, &list, 
>> xmemdupz(&oc, sizeof(struct object_context)));
> 
> If we go this route, this new _with_context variant should be used in
> patch 1, too.
> 
>> @@ -265,9 +260,28 @@ void add_object_array_with_mode(struct object *obj, 
>> const char *name, struct obj
>>  objects[nr].item = obj;
>>  objects[nr].name = name;
>>  objects[nr].mode = mode;
>> +objects[nr].context = context;
>>  array->nr = ++nr;
>>  }
> 
> This seems a little gross. Who is responsible for allocating the
> context? Who frees it? It looks like we duplicate it in cmd_grep. Which

Well, who is responsible for allocating and freeing name and item? I
didn't want to introduce a new member which is a struct when all other
complex members are pointers. Wouldn't that be confusing?

> I think is OK, but it means all of this context infrastructure in
> object.[ch] is just bolted-on junk waiting for somebody to use it wrong
> or get confused.  It does not get set, for example, by the regular
> setup_revisions code path.

Sure, it's NULL when there is no context info, just like in many other
cases.

> It would be nice if we could just always have the context available,
> then setup_revisions could set it up by default (and replace the "mode"
> parameter entirely). But we'd need to do something to avoid the
> PATH_MAX-sized buffer for each entry, as some code paths may have a
> large number of pending objects.

If the information is always available even if we don't need it then it
always takes space. The only way out would be pointing into a pool of
path names rather having a copy in each entry. It's not like I hadn't
talked about providing virtual (blob) objects for path names keyed by
their sha1 before... It's just that I want my grep --textconv now ;)

Michael
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH 4/4] grep: obey --textconv for the case rev:path

2013-02-06 Thread Jeff King
On Wed, Feb 06, 2013 at 04:08:53PM +0100, Michael J Gruber wrote:

> - add_object_array(object, arg, &list);
> + add_object_array_with_context(object, arg, &list, 
> xmemdupz(&oc, sizeof(struct object_context)));

If we go this route, this new _with_context variant should be used in
patch 1, too.

> @@ -265,9 +260,28 @@ void add_object_array_with_mode(struct object *obj, 
> const char *name, struct obj
>   objects[nr].item = obj;
>   objects[nr].name = name;
>   objects[nr].mode = mode;
> + objects[nr].context = context;
>   array->nr = ++nr;
>  }

This seems a little gross. Who is responsible for allocating the
context? Who frees it? It looks like we duplicate it in cmd_grep. Which
I think is OK, but it means all of this context infrastructure in
object.[ch] is just bolted-on junk waiting for somebody to use it wrong
or get confused.  It does not get set, for example, by the regular
setup_revisions code path.

It would be nice if we could just always have the context available,
then setup_revisions could set it up by default (and replace the "mode"
parameter entirely). But we'd need to do something to avoid the
PATH_MAX-sized buffer for each entry, as some code paths may have a
large number of pending objects.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html