Re: [RFC/PATCH] replace: add --graft option
From: Michael Haggerty > On 05/18/2014 08:29 PM, Christian Couder wrote: >> The usage string for this option is: >> >> git replace [-f] --graft [...] >> >> First we create a new commit that is the same as >> except that its parents are [...] >> >> Then we create a replace ref that replace with >> the commit we just created. >> >> With this new option, it should be straightforward to >> convert grafts to replace refs, with something like: >> >> cat .git/info/grafts | while read line >> do git replace --graft $line; done > > I love the functionality; I think it's a great step towards making > grafts obsolete. > > I haven't worked with Git's object reading/writing code much, but it > surprised me that you are editing the commit object basically as a > string, using hard-coded length constants and stuff. It seems > error-prone, and we already have a commit parser. > > Would it be possible to program this at a higher layer of abstraction > based on the commit object produced by the existing commit parser? > E.g., edit the object it produces, and write the result? Or create a > new commit object out of the parsed commit object and write that? I tried to program this at a higher layer of abstraction first, but it was not easy to properly write the new commit. > It's great that you're working on this! Thanks, Christian. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/PATCH] replace: add --graft option
Jeff King writes: > On Mon, May 19, 2014 at 10:25:10AM -0700, Junio C Hamano wrote: > >> The headers up to committer are cast in stone in their ordering, and >> I do not immediately see how loosening it would be beneficial. >> >> Unless you are trying to give users a new way to record exactly the >> same commit in twenty-four (or more) ways with their own object >> names, that is ;-) > > Sorry, I didn't mean to imply that people can do what they want with > that ordering. Implementations that reorder the headers are stupid and > wrong, and should be fixed. Yeah, that was the only thing I meant to say, and what you said in the rest of message makes sense to me. I very much like the approach to parse line-by-line, noticing products from stupid and wrong implementations and warning or erroring out against them, while allowing an option to be lenient to help users who want to fix their repositories contaminated with such objects. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/PATCH] replace: add --graft option
On Mon, May 19, 2014 at 10:25:10AM -0700, Junio C Hamano wrote: > Jeff King writes: > > > It might make sense to just teach parse_commit_header to be a little > > more thorough; it has to read past those lines anyway to find the author > > and committer lines, so it would not be much more expensive to note > > them. And then of course the code needs to be pulled out of the > > pretty-printer and made generally accessible. > > I notice that you said "might" above. Yeah. I think having a reusable parser is definitely a good thing. I'm not sure if it's worth the massive amount of refactoring that would be required for this particular use case. > > That's more or less what Christian's function is doing, though it > > assumes things like that the parents come immediately after the tree, > > and that they are all in a group. Those are all true for objects created > > by git, but I think we can be a little flexible. > > The headers up to committer are cast in stone in their ordering, and > I do not immediately see how loosening it would be beneficial. > > Unless you are trying to give users a new way to record exactly the > same commit in twenty-four (or more) ways with their own object > names, that is ;-) Sorry, I didn't mean to imply that people can do what they want with that ordering. Implementations that reorder the headers are stupid and wrong, and should be fixed. BUT. I really don't like making these assumptions or doing ad-hoc parsing all through the code. We _do_ see quirky, wrong objects, and want to handle them sanely and consistently. That's hard to do when there are parsers sprinkled throughout the code which handle each case slightly differently. I don't recall seeing this with header ordering, but I know that broken ident lines have caused us headaches in the past, and I'm happy that we've (mostly) settled on the code in split_ident_line to handle this. Things like: + /* find existing parents */ + parent_start = buf.buf; + parent_start += 46; /* "tree " + "hex sha1" + "\n" */ + parent_end = parent_start; scare me. Is buf actually 46 bytes long? If not, we read past the end of the buffer. What if it contains something besides "tree \n" at the beginning? We don't even notice! The version I posted elsewhere in the thread at least operates on a line-by-line basis, and tries to verify its assumptions (and barfs if not). It's still doing its own parsing, but at least it's keeping its assumptions on the object format to a minimum ("drop old parent lines", "add new ones after tree, and barf if there's not exactly one tree line"). -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/PATCH] replace: add --graft option
Jeff King writes: > It might make sense to just teach parse_commit_header to be a little > more thorough; it has to read past those lines anyway to find the author > and committer lines, so it would not be much more expensive to note > them. And then of course the code needs to be pulled out of the > pretty-printer and made generally accessible. I notice that you said "might" above. > That's more or less what Christian's function is doing, though it > assumes things like that the parents come immediately after the tree, > and that they are all in a group. Those are all true for objects created > by git, but I think we can be a little flexible. The headers up to committer are cast in stone in their ordering, and I do not immediately see how loosening it would be beneficial. Unless you are trying to give users a new way to record exactly the same commit in twenty-four (or more) ways with their own object names, that is ;-) -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/PATCH] replace: add --graft option
On Sun, May 18, 2014 at 08:29:38PM +0200, Christian Couder wrote: > +static int create_graft(int argc, const char **argv, int force) > +{ > + unsigned char old[20], new[20]; > + const char *old_ref = argv[0]; > + struct strbuf buf = STRBUF_INIT; > + struct strbuf new_parents = STRBUF_INIT; > + const char *parent_start, *parent_end; > + int i; > + > + if (get_sha1(old_ref, old) < 0) > + die(_("Not a valid object name: '%s'"), old_ref); > + lookup_commit_or_die(old, old_ref); > + if (read_sha1_commit(old, &buf)) > + die(_("Invalid commit: '%s'"), old_ref); Do we want to peel to commits here? That is, should: git replace --graft v1.5.0 v1.4.0 work? On the one hand, I see it as friendly. On the other, it may be a bit surprising when working with something as potentially dangerous a replace refs. If we don't do it automatically, the user can still say "v1.5.0^{commit}" to be explicit. I dunno; maybe I am being overly paranoid. > + /* prepare new parents */ > + for (i = 1; i < argc; i++) { > + unsigned char sha1[20]; > + if (get_sha1(argv[i], sha1) < 0) > + die(_("Not a valid object name: '%s'"), argv[i]); > + lookup_commit_or_die(sha1, argv[i]); > + strbuf_addf(&new_parents, "parent %s\n", sha1_to_hex(sha1)); > + } Either way, I think _this_ peeling is a sane thing to do. -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/PATCH] replace: add --graft option
On Mon, May 19, 2014 at 11:42:07AM +0200, Michael Haggerty wrote: > On 05/18/2014 08:29 PM, Christian Couder wrote: > > The usage string for this option is: > > > > git replace [-f] --graft [...] > > > > First we create a new commit that is the same as > > except that its parents are [...] > > > > Then we create a replace ref that replace with > > the commit we just created. > > > > With this new option, it should be straightforward to > > convert grafts to replace refs, with something like: > > > > cat .git/info/grafts | while read line > > do git replace --graft $line; done > > I love the functionality; I think it's a great step towards making > grafts obsolete. Me too. > I haven't worked with Git's object reading/writing code much, but it > surprised me that you are editing the commit object basically as a > string, using hard-coded length constants and stuff. It seems > error-prone, and we already have a commit parser. I think we have at least two commit parsers already. :) The one in commit.c:parse_commit_buffer tries hard to be fast and only get out enough information for a traversal (so parents, commit timestamp, and tree pointer). The ones in pretty.c that back format_commit_message (parse_commit_header, parse_commit_message) are a bit more flexible, as they record the offsets and lengths of certain key lines. But "parents" is not one of those lines (we rely on the already-parsed copy in the "struct commit" for parents and tree information). It might make sense to just teach parse_commit_header to be a little more thorough; it has to read past those lines anyway to find the author and committer lines, so it would not be much more expensive to note them. And then of course the code needs to be pulled out of the pretty-printer and made generally accessible. While I do like the thought of having a decent reusable commit parser, this particular case is really about trying to tweak one header, without touching anything else. IOW, it is basically: sed '/^parent /d; /^tree /a\ parent ...\ parent ... ' That's more or less what Christian's function is doing, though it assumes things like that the parents come immediately after the tree, and that they are all in a group. Those are all true for objects created by git, but I think we can be a little flexible. It turns out I wrote a proof-of-concept of this in March when we had the original discussion, but forgot to ever send it to the list. My parsing function looked like: /* * Rewrite the commit object found in "buf", removing any existing * parents and adding lines for any parents in the NULL-terminated * array "parents" (whose strings should be 40-char hex sha1s). * * The output is written to the strbuf "out". */ static void rewrite_parents(struct strbuf *out, const char *buf, const char **parents) { int added = 0; while (buf && *buf) { const char *eol = strchrnul(buf, '\n'); const char *next = *eol ? eol + 1 : eol; if (eol == buf + 1) break; if (!starts_with(buf, "parent ")) strbuf_add(out, buf, next - buf); /* * We always put our parent lines right after the tree line, * which matches how git creates commit objects. */ if (starts_with(buf, "tree ")) { const char **p; if (added++) die("commit has two tree lines?"); for (p = parents; *p; p++) strbuf_addf(out, "parent %s\n", *p); } buf = next; } if (!added) die("commit has no tree line?"); strbuf_addstr(out, buf); } Rather than a splice, I did it as I copied into the strbuf, and I treated each line independently (dropping parent lines, no matter where, and then finding the tree line as the "proper" place to add new parent lines). -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/PATCH] replace: add --graft option
On 05/18/2014 08:29 PM, Christian Couder wrote: > The usage string for this option is: > > git replace [-f] --graft [...] > > First we create a new commit that is the same as > except that its parents are [...] > > Then we create a replace ref that replace with > the commit we just created. > > With this new option, it should be straightforward to > convert grafts to replace refs, with something like: > > cat .git/info/grafts | while read line > do git replace --graft $line; done I love the functionality; I think it's a great step towards making grafts obsolete. I haven't worked with Git's object reading/writing code much, but it surprised me that you are editing the commit object basically as a string, using hard-coded length constants and stuff. It seems error-prone, and we already have a commit parser. Would it be possible to program this at a higher layer of abstraction based on the commit object produced by the existing commit parser? E.g., edit the object it produces, and write the result? Or create a new commit object out of the parsed commit object and write that? It's great that you're working on this! Michael -- Michael Haggerty mhag...@alum.mit.edu http://softwareswirl.blogspot.com/ -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html