Re: [RFC/PATCH 4/4] grep: obey --textconv for the case rev:path
Junio C Hamano venit, vidit, dixit 07.02.2013 19:03: > Michael J Gruber writes: > (cd t && git grep GET_SHA1_QUIETLY HEAD:../cache.h) ../HEAD:../cache.h:#define GET_SHA1_QUIETLY01 >>> >>> Yuck. >> >> And even more yuck: >> >> (cd t && git grep --full-name GET_SHA1_QUIETLY HEAD:../cache.h) >> HEAD:../cache.h:#define GET_SHA1_QUIETLY01 >> >> Someone does not expect a "rev:" to be in there, it seems ;) > > I think stepping outside of $(cwd) is an afterthought the code does > not anticipate. > Well, we do resolve relative paths correctly, and there are even some "chdir" in the code path. It's just that the output label is incorrect. Michael -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/PATCH 4/4] grep: obey --textconv for the case rev:path
Michael J Gruber writes: >>> (cd t && git grep GET_SHA1_QUIETLY HEAD:../cache.h) >>> ../HEAD:../cache.h:#define GET_SHA1_QUIETLY01 >> >> Yuck. > > And even more yuck: > > (cd t && git grep --full-name GET_SHA1_QUIETLY HEAD:../cache.h) > HEAD:../cache.h:#define GET_SHA1_QUIETLY01 > > Someone does not expect a "rev:" to be in there, it seems ;) I think stepping outside of $(cwd) is an afterthought the code does not anticipate. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/PATCH 4/4] grep: obey --textconv for the case rev:path
Jeff King venit, vidit, dixit 07.02.2013 10:55: > On Thu, Feb 07, 2013 at 10:47:55AM +0100, Michael J Gruber wrote: > >>> I'd be OK if we had an exterior object_context that could be handled >>> in the same way. But how do we tell setup_revisions that we are >>> interested in seeing the object_context from each parsed item, where >>> does the allocation come from (is it malloc'd by setup_revisions?), and >>> who is responsible for freeing it when we pop pending objects in >>> get_revisions and similar? >> >> Do we really need all of tree, path and mode in object_context (I mean >> not just here, but other users), or only the path? I'd try and resurrect >> the virtual path name objects then, they would be just like "item" >> storage-wise. > > We need at least mode, since that is how the mode parameter of > object_array_entry gets set. I do not know off-hand who uses "tree". I > suspect the intent was to do .gitattributes lookups inside that tree, > but I do not think we actually do in-tree lookups currently. > >>> I don't think it's as clear cut. >>> >>> I wonder, though...what we really care about here is just the pathname. >>> But if it is a pending object that comes from a blob revision argument, >>> won't it always be of the form "treeish:path"? Could we not even resolve >>> the sha1 again, but instead just parse out the ":path" bit? >> >> Do we have that, and in what form (e.g. magic expanded etc.)? > > Ah, I should have mentioned that. :) We should have the original rev > name in the object_array_entry's name field, shouldn't we? It's just a > matter of re-parsing it. > >> Another thing I noted is that our path mangling at least for grep has >> some issues: >> >> (cd t && git grep GET_SHA1_QUIETLY HEAD:../cache.h) >> ../HEAD:../cache.h:#define GET_SHA1_QUIETLY01 > > Yuck. And even more yuck: (cd t && git grep --full-name GET_SHA1_QUIETLY HEAD:../cache.h) HEAD:../cache.h:#define GET_SHA1_QUIETLY01 Someone does not expect a "rev:" to be in there, it seems ;) Michael -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/PATCH 4/4] grep: obey --textconv for the case rev:path
On Thu, Feb 07, 2013 at 10:47:55AM +0100, Michael J Gruber wrote: > > I'd be OK if we had an exterior object_context that could be handled > > in the same way. But how do we tell setup_revisions that we are > > interested in seeing the object_context from each parsed item, where > > does the allocation come from (is it malloc'd by setup_revisions?), and > > who is responsible for freeing it when we pop pending objects in > > get_revisions and similar? > > Do we really need all of tree, path and mode in object_context (I mean > not just here, but other users), or only the path? I'd try and resurrect > the virtual path name objects then, they would be just like "item" > storage-wise. We need at least mode, since that is how the mode parameter of object_array_entry gets set. I do not know off-hand who uses "tree". I suspect the intent was to do .gitattributes lookups inside that tree, but I do not think we actually do in-tree lookups currently. > > I don't think it's as clear cut. > > > > I wonder, though...what we really care about here is just the pathname. > > But if it is a pending object that comes from a blob revision argument, > > won't it always be of the form "treeish:path"? Could we not even resolve > > the sha1 again, but instead just parse out the ":path" bit? > > Do we have that, and in what form (e.g. magic expanded etc.)? Ah, I should have mentioned that. :) We should have the original rev name in the object_array_entry's name field, shouldn't we? It's just a matter of re-parsing it. > Another thing I noted is that our path mangling at least for grep has > some issues: > > (cd t && git grep GET_SHA1_QUIETLY HEAD:../cache.h) > ../HEAD:../cache.h:#define GET_SHA1_QUIETLY01 Yuck. -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/PATCH 4/4] grep: obey --textconv for the case rev:path
Jeff King venit, vidit, dixit 07.02.2013 10:26: > On Thu, Feb 07, 2013 at 10:05:57AM +0100, Michael J Gruber wrote: > @@ -265,9 +260,28 @@ void add_object_array_with_mode(struct object *obj, const char *name, struct obj objects[nr].item = obj; objects[nr].name = name; objects[nr].mode = mode; + objects[nr].context = context; array->nr = ++nr; } >>> >>> This seems a little gross. Who is responsible for allocating the >>> context? Who frees it? It looks like we duplicate it in cmd_grep. Which >> >> Well, who is responsible for allocating and freeing name and item? I >> didn't want to introduce a new member which is a struct when all other >> complex members are pointers. Wouldn't that be confusing? > > We cheat on those two. "item" is always a pointer to a "struct object", > which lasts forever and never gets freed. When "name" is set by > setup_revisions, it comes from the argv list, which is assumed to last > forever (and when we add pending blobs for a "--objects" traversal, it > is the empty string (literal). I see, so they are really different. > I'd be OK if we had an exterior object_context that could be handled > in the same way. But how do we tell setup_revisions that we are > interested in seeing the object_context from each parsed item, where > does the allocation come from (is it malloc'd by setup_revisions?), and > who is responsible for freeing it when we pop pending objects in > get_revisions and similar? Do we really need all of tree, path and mode in object_context (I mean not just here, but other users), or only the path? I'd try and resurrect the virtual path name objects then, they would be just like "item" storage-wise. > I don't think it's as clear cut. > > I wonder, though...what we really care about here is just the pathname. > But if it is a pending object that comes from a blob revision argument, > won't it always be of the form "treeish:path"? Could we not even resolve > the sha1 again, but instead just parse out the ":path" bit? Do we have that, and in what form (e.g. magic expanded etc.)? > That is sort of like what the repeated call to get_sha1_with_context > does in your first patch. Except that we do not actually want to lookup > the sha1, and it is harmful to do so (e.g., if the ref had moved on to a > new tree that does not have that path, get_sha1 would fail, but we do > not even care what is in the tree; we only want the parsing side effects > of get_sha1). > > Hmm. > > -Peff > > PS By the way, while looking at the object_array code (which I have not >really used much before), I noticed that add_pending_commit_list sets >the "name" field to the result of sha1_to_hex. Which means that it is >likely to be completely bogus by the time you read it. I'm not even >sure where it gets read or if this matters. And obviously it's >completely unrelated to what we were discussing; just something I >noticed. Another thing I noted is that our path mangling at least for grep has some issues: (cd t && git grep GET_SHA1_QUIETLY HEAD:../cache.h) ../HEAD:../cache.h:#define GET_SHA1_QUIETLY01 Taking everything right of ":" could still work. Michael -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/PATCH 4/4] grep: obey --textconv for the case rev:path
On Thu, Feb 07, 2013 at 10:05:57AM +0100, Michael J Gruber wrote: > >> @@ -265,9 +260,28 @@ void add_object_array_with_mode(struct object *obj, > >> const char *name, struct obj > >>objects[nr].item = obj; > >>objects[nr].name = name; > >>objects[nr].mode = mode; > >> + objects[nr].context = context; > >>array->nr = ++nr; > >> } > > > > This seems a little gross. Who is responsible for allocating the > > context? Who frees it? It looks like we duplicate it in cmd_grep. Which > > Well, who is responsible for allocating and freeing name and item? I > didn't want to introduce a new member which is a struct when all other > complex members are pointers. Wouldn't that be confusing? We cheat on those two. "item" is always a pointer to a "struct object", which lasts forever and never gets freed. When "name" is set by setup_revisions, it comes from the argv list, which is assumed to last forever (and when we add pending blobs for a "--objects" traversal, it is the empty string (literal). I'd be OK if we had an exterior object_context that could be handled in the same way. But how do we tell setup_revisions that we are interested in seeing the object_context from each parsed item, where does the allocation come from (is it malloc'd by setup_revisions?), and who is responsible for freeing it when we pop pending objects in get_revisions and similar? I don't think it's as clear cut. I wonder, though...what we really care about here is just the pathname. But if it is a pending object that comes from a blob revision argument, won't it always be of the form "treeish:path"? Could we not even resolve the sha1 again, but instead just parse out the ":path" bit? That is sort of like what the repeated call to get_sha1_with_context does in your first patch. Except that we do not actually want to lookup the sha1, and it is harmful to do so (e.g., if the ref had moved on to a new tree that does not have that path, get_sha1 would fail, but we do not even care what is in the tree; we only want the parsing side effects of get_sha1). Hmm. -Peff PS By the way, while looking at the object_array code (which I have not really used much before), I noticed that add_pending_commit_list sets the "name" field to the result of sha1_to_hex. Which means that it is likely to be completely bogus by the time you read it. I'm not even sure where it gets read or if this matters. And obviously it's completely unrelated to what we were discussing; just something I noticed. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/PATCH 4/4] grep: obey --textconv for the case rev:path
Jeff King venit, vidit, dixit 06.02.2013 23:36: > On Wed, Feb 06, 2013 at 04:08:53PM +0100, Michael J Gruber wrote: > >> -add_object_array(object, arg, &list); >> +add_object_array_with_context(object, arg, &list, >> xmemdupz(&oc, sizeof(struct object_context))); > > If we go this route, this new _with_context variant should be used in > patch 1, too. > >> @@ -265,9 +260,28 @@ void add_object_array_with_mode(struct object *obj, >> const char *name, struct obj >> objects[nr].item = obj; >> objects[nr].name = name; >> objects[nr].mode = mode; >> +objects[nr].context = context; >> array->nr = ++nr; >> } > > This seems a little gross. Who is responsible for allocating the > context? Who frees it? It looks like we duplicate it in cmd_grep. Which Well, who is responsible for allocating and freeing name and item? I didn't want to introduce a new member which is a struct when all other complex members are pointers. Wouldn't that be confusing? > I think is OK, but it means all of this context infrastructure in > object.[ch] is just bolted-on junk waiting for somebody to use it wrong > or get confused. It does not get set, for example, by the regular > setup_revisions code path. Sure, it's NULL when there is no context info, just like in many other cases. > It would be nice if we could just always have the context available, > then setup_revisions could set it up by default (and replace the "mode" > parameter entirely). But we'd need to do something to avoid the > PATH_MAX-sized buffer for each entry, as some code paths may have a > large number of pending objects. If the information is always available even if we don't need it then it always takes space. The only way out would be pointing into a pool of path names rather having a copy in each entry. It's not like I hadn't talked about providing virtual (blob) objects for path names keyed by their sha1 before... It's just that I want my grep --textconv now ;) Michael -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/PATCH 4/4] grep: obey --textconv for the case rev:path
On Wed, Feb 06, 2013 at 04:08:53PM +0100, Michael J Gruber wrote: > - add_object_array(object, arg, &list); > + add_object_array_with_context(object, arg, &list, > xmemdupz(&oc, sizeof(struct object_context))); If we go this route, this new _with_context variant should be used in patch 1, too. > @@ -265,9 +260,28 @@ void add_object_array_with_mode(struct object *obj, > const char *name, struct obj > objects[nr].item = obj; > objects[nr].name = name; > objects[nr].mode = mode; > + objects[nr].context = context; > array->nr = ++nr; > } This seems a little gross. Who is responsible for allocating the context? Who frees it? It looks like we duplicate it in cmd_grep. Which I think is OK, but it means all of this context infrastructure in object.[ch] is just bolted-on junk waiting for somebody to use it wrong or get confused. It does not get set, for example, by the regular setup_revisions code path. It would be nice if we could just always have the context available, then setup_revisions could set it up by default (and replace the "mode" parameter entirely). But we'd need to do something to avoid the PATH_MAX-sized buffer for each entry, as some code paths may have a large number of pending objects. -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html