Re: [PATCH] reset cached ident date before creating objects
On Mon, Aug 01, 2016 at 02:54:47PM -0700, Junio C Hamano wrote: > Jeff Kingwrites: > > > I also didn't go looking for other spots which might want similar > > treatment. Junio mentioned that the sequencer code still uses an > > external commit process, so cherry-pick/revert are OK. git-fast-import > > creates a stream of commits, but already deals with this issue in a > > separate way. And I couldn't think of other programs that want to make a > > string of commits. > > Thanks. I wonder if do_commit() may be a more appropriate place to > reset this, but the difference only would matter in "am --resolved", > and a call to do_commit() it has will always be the first commit in > the process, so there would not be anything to clear, which would in > turn mean that it would not make any difference. Yeah, you may be right. I didn't actually find that spot. I was focused on the looping aspect. And seeing how complex the loop was, my thought was to just put it at the top, so we know it happens once per iteration. But I think do_commit() is a reasonble spot, too (as long as it comes before the call to fmt_ident(), of course). -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] reset cached ident date before creating objects
Jeff Kingwrites: > I also didn't go looking for other spots which might want similar > treatment. Junio mentioned that the sequencer code still uses an > external commit process, so cherry-pick/revert are OK. git-fast-import > creates a stream of commits, but already deals with this issue in a > separate way. And I couldn't think of other programs that want to make a > string of commits. Thanks. I wonder if do_commit() may be a more appropriate place to reset this, but the difference only would matter in "am --resolved", and a call to do_commit() it has will always be the first commit in the process, so there would not be anything to clear, which would in turn mean that it would not make any difference. Will queue. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] reset cached ident date before creating objects
On Mon, Aug 01, 2016 at 02:00:47PM -0400, Jeff King wrote: > So at this point I think I'd lean towards programs which have multiple > logical commit operations explicitly saying "I am starting a new > operation". It may be that we end up attaching more to that in the long > run than just timestamp resets, too (i.e., what other cached data might > there be that used to happen in separate sub-processes?). Here's that patch. I still just put in a bare "reset_ident_date()" in git-am. This _could_ be wrapped in: begin_logical_commit() or something, but it would just be a wrapper around resetting the ident_date. So I'm inclined not to worry about such semantic obfuscation until we actually have a second thing to reset. :) I also didn't go looking for other spots which might want similar treatment. Junio mentioned that the sequencer code still uses an external commit process, so cherry-pick/revert are OK. git-fast-import creates a stream of commits, but already deals with this issue in a separate way. And I couldn't think of other programs that want to make a string of commits. -- >8 -- Subject: [PATCH] am: reset cached ident date for each patch When we compute the date to go in author/committer lines of commits, or tagger lines of tags, we get the current date once and then cache it for the rest of the program. This is a good thing in some cases, like "git commit", because it means we do not racily assign different times to the author/committer fields of a single commit object. But as more programs start to make many commits in a single process (e.g., the recently builtin "git am"), it means that you'll get long strings of commits with identical committer timestamps (whereas before, we invoked "git commit" many times and got true timestamps). This patch addresses it by letting callers reset the cached time, which means they'll get a fresh time on their next call to git_committer_info() or git_author_info(). The first caller to do so is "git am", which resets the time for each patch it applies. It would be nice if we could just do this automatically before filling in the ident fields of commit and tag objects. Unfortunately, it's hard to know where a particular logical operation begins and ends. For instance, if commit_tree_extended() were to call reset_ident_date() before getting the committer/author ident, that doesn't quite work; sometimes the author info is passed in to us as a parameter, and it may or may not have come from a previous call to ident_default_date(). So in those cases, we lose the property that the committer and the author timestamp always match. You could similarly put a date-reset at the end of commit_tree_extended(). That actually works in the current code base, but it's fragile. It makes the assumption that after commit_tree_extended() finishes, the caller has no other operations that would logically want to fall into the same timestamp. So instead we provide the tool to easily do the reset, and let the high-level callers use it to annotate their own logical operations. There's no automated test, because it would be inherently racy (it depends on whether the program takes multiple seconds to run). But you can see the effect with something like: # make a fake 100-patch series top=$(git rev-parse HEAD) bottom=$(git rev-list --first-parent -100 HEAD | tail -n 1) git log --format=email --reverse --first-parent \ --binary -m -p $bottom..$top >patch # now apply it; this presumably takes multiple seconds git checkout --detach $bottom git am Helped-by: Paul TanSigned-off-by: Jeff King --- builtin/am.c | 2 ++ cache.h | 1 + ident.c | 5 + 3 files changed, 8 insertions(+) diff --git a/builtin/am.c b/builtin/am.c index b77bf11..8aa9b5b 100644 --- a/builtin/am.c +++ b/builtin/am.c @@ -1840,6 +1840,8 @@ static void am_run(struct am_state *state, int resume) const char *mail = am_path(state, msgnum(state)); int apply_status; + reset_ident_date(); + if (!file_exists(mail)) goto next; diff --git a/cache.h b/cache.h index b5f76a4..31e65f9 100644 --- a/cache.h +++ b/cache.h @@ -1269,6 +1269,7 @@ extern const char *ident_default_email(void); extern const char *git_editor(void); extern const char *git_pager(int stdout_is_tty); extern int git_ident_config(const char *, const char *, void *); +extern void reset_ident_date(void); struct ident_split { const char *name_begin; diff --git a/ident.c b/ident.c index 139c528..e20a772 100644 --- a/ident.c +++ b/ident.c @@ -184,6 +184,11 @@ static const char *ident_default_date(void) return git_default_date.buf; } +void reset_ident_date(void) +{ + strbuf_reset(_default_date); +} + static int crud(unsigned char c) { return c <= 32 || -- 2.9.2.670.gf6ce898 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message
Re: [PATCH] reset cached ident date before creating objects
On Mon, Aug 01, 2016 at 10:49:12AM -0700, Junio C Hamano wrote: > Jeff Kingwrites: > > >> So maybe we would have to put reset_ident_date() at the end of the > >> function instead, at least after git_committer_info() is called. > > > > Yes, although "reset and end" still feels a bit weird to me. > > > > I'd almost prefer to just have long-running programs insert resets at > > strategic points. > > Certainly "reset at the end" feels weird but it can be explained as > "for a one-shot thing we use the first time of the default date and > it gives a consistent timestamp; conceptually, things that make > multiple commits are like doing that one-shot thing multiple times > in a row." > > When viewed that way, it is not _too_ bad, I would guess. I think what bothers me is that you could use similar logic for "reset at the beginning". The problem is that we don't know when "the beginning" is. I thought it was inside commit_tree_extended(), but for some operations, it's not, as Paul showed. So when is "the end"? Is it at the end of commit_tree_extended()? I'm not sure. Already we know it depends on whether you consider the ref update part of the same operation. Whether that matters is debatable. Are there other cases (an obvious one would be the human-readable bits printed by git-commit, but it looks like those are done beforehand)? Even if we do check every case, though, it seems like a fragile assumption to be making. So at this point I think I'd lean towards programs which have multiple logical commit operations explicitly saying "I am starting a new operation". It may be that we end up attaching more to that in the long run than just timestamp resets, too (i.e., what other cached data might there be that used to happen in separate sub-processes?). -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] reset cached ident date before creating objects
On Mon, Aug 01, 2016 at 02:12:34PM -0400, Jeff King wrote: > Before I switched to "reset at the beginning" in my original patch, I > also noticed this issue, but decided the other way: to only reset after > a successful creation. > > I think in most cases it wouldn't matter anyway, because the caller will > generally abort as soon as this returns failure anyway. But I wondered > about something like: > > author = prepare_author_info(); > if (commit_tree_extended(..., author, ...) < 0) { > /* oops, we failed. Do a thing and try again. */ > possible_fix(); > if (commit_tree_extended(..., author, ...) < 0) > die("giving up"); > } > > In the second call (but only the second call!) the committer and author > can diverge. To be clear, I checked all of the callers and nobody actually does this. Every caller proceeds straight to a die() except the one in notes_cache_write(), which silently ignores error (which is the correct thing to do). This is more "it seems like a fragile pattern to me". -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] reset cached ident date before creating objects
On Mon, Aug 01, 2016 at 10:58:47AM -0700, Junio C Hamano wrote: > diff --git a/commit.c b/commit.c > index b02f3c4..db24013 100644 > --- a/commit.c > +++ b/commit.c > @@ -1543,7 +1543,6 @@ int commit_tree_extended(const char *msg, size_t > msg_len, > } > > /* Person/date information */ > - reset_ident_date(); > if (!author) > author = git_author_info(IDENT_STRICT); > strbuf_addf(, "author %s\n", author); > @@ -1564,11 +1563,21 @@ int commit_tree_extended(const char *msg, size_t > msg_len, > if (encoding_is_utf8 && !verify_utf8()) > fprintf(stderr, commit_utf8_warn); > > - if (sign_commit && do_sign_commit(, sign_commit)) > - return -1; > + if (sign_commit && do_sign_commit(, sign_commit)) { > + result = -1; > + goto out; > + } > > result = write_sha1_file(buffer.buf, buffer.len, commit_type, ret); > + > +out: > strbuf_release(); > + > + /* > + * Reset the default timestamp for the next call to create tag/commit > + * object, so that they get their own fresh timestamp. > + */ > + reset_ident_date(); > return result; > } Before I switched to "reset at the beginning" in my original patch, I also noticed this issue, but decided the other way: to only reset after a successful creation. I think in most cases it wouldn't matter anyway, because the caller will generally abort as soon as this returns failure anyway. But I wondered about something like: author = prepare_author_info(); if (commit_tree_extended(..., author, ...) < 0) { /* oops, we failed. Do a thing and try again. */ possible_fix(); if (commit_tree_extended(..., author, ...) < 0) die("giving up"); } In the second call (but only the second call!) the committer and author can diverge. -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] reset cached ident date before creating objects
Junio C Hamanowrites: > Jeff King writes: > >>> So maybe we would have to put reset_ident_date() at the end of the >>> function instead, at least after git_committer_info() is called. >> >> Yes, although "reset and end" still feels a bit weird to me. >> >> I'd almost prefer to just have long-running programs insert resets at >> strategic points. > > Certainly "reset at the end" feels weird but it can be explained as > "for a one-shot thing we use the first time of the default date and > it gives a consistent timestamp; conceptually, things that make > multiple commits are like doing that one-shot thing multiple times > in a row." > > When viewed that way, it is not _too_ bad, I would guess. An interdiff to what we queued previously would look like this: builtin/tag.c | 11 ++- commit.c | 15 --- 2 files changed, 22 insertions(+), 4 deletions(-) diff --git a/builtin/tag.c b/builtin/tag.c index 5dccae3..e852ded 100644 --- a/builtin/tag.c +++ b/builtin/tag.c @@ -225,7 +225,6 @@ static void create_tag(const unsigned char *object, const char *tag, if (type <= OBJ_NONE) die(_("bad object type.")); - reset_ident_date(); header_len = snprintf(header_buf, sizeof(header_buf), "object %s\n" "type %s\n" @@ -287,6 +286,16 @@ static void create_tag(const unsigned char *object, const char *tag, unlink_or_warn(path); free(path); } + + /* +* Reset the default timestamp for the next call to create tag/commit +* object, so that they get their own fresh timestamp. +* +* NOTE NOTE NOTE! if you are libifying this function later by +* turning exit/die in the above code to return an error, be +* sure to jump here to make this call happen. +*/ + reset_ident_date(); } struct msg_arg { diff --git a/commit.c b/commit.c index b02f3c4..db24013 100644 --- a/commit.c +++ b/commit.c @@ -1543,7 +1543,6 @@ int commit_tree_extended(const char *msg, size_t msg_len, } /* Person/date information */ - reset_ident_date(); if (!author) author = git_author_info(IDENT_STRICT); strbuf_addf(, "author %s\n", author); @@ -1564,11 +1563,21 @@ int commit_tree_extended(const char *msg, size_t msg_len, if (encoding_is_utf8 && !verify_utf8()) fprintf(stderr, commit_utf8_warn); - if (sign_commit && do_sign_commit(, sign_commit)) - return -1; + if (sign_commit && do_sign_commit(, sign_commit)) { + result = -1; + goto out; + } result = write_sha1_file(buffer.buf, buffer.len, commit_type, ret); + +out: strbuf_release(); + + /* +* Reset the default timestamp for the next call to create tag/commit +* object, so that they get their own fresh timestamp. +*/ + reset_ident_date(); return result; } -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] reset cached ident date before creating objects
Jeff Kingwrites: >> So maybe we would have to put reset_ident_date() at the end of the >> function instead, at least after git_committer_info() is called. > > Yes, although "reset and end" still feels a bit weird to me. > > I'd almost prefer to just have long-running programs insert resets at > strategic points. Certainly "reset at the end" feels weird but it can be explained as "for a one-shot thing we use the first time of the default date and it gives a consistent timestamp; conceptually, things that make multiple commits are like doing that one-shot thing multiple times in a row." When viewed that way, it is not _too_ bad, I would guess. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] reset cached ident date before creating objects
On Sat, Jul 30, 2016 at 10:11:56AM +0800, Paul Tan wrote: > > diff --git a/commit.c b/commit.c > > index 71a360d..7ddbffe 100644 > > --- a/commit.c > > +++ b/commit.c > > @@ -1548,6 +1548,7 @@ int commit_tree_extended(const char *msg, size_t > > msg_len, > > } > > > > /* Person/date information */ > > + reset_ident_date(); > > if (!author) > > author = git_author_info(IDENT_STRICT); > > strbuf_addf(, "author %s\n", author); > > But since builtin/commit.c constructs its author ident string before > calling the editor and then commit_tree_extended(), this would cause > the resulting commits to have committer timestamps which differ from > their author timestamps. Hrm, yeah. I assumed it would only pass in the author string for things like "-c" or "--amend". But it looks like it unconditionally passes in the author. And it would be slightly difficult to have it pass NULL, because it may actually have _part_ of an author (e.g., "--author" will come up with name and email but not the date), so it has to sometimes combine those bits with things like ident_default_date() itself. I guess one option would be to commit_tree_extended() to take the broken-down author bits and call fmt_ident() itself. That's what all of the callers are doing (that, or just passing NULL). It would make the interface a bit clunkier, but I think the end result would be more flexible. I suppose that would be tricky for git-commit, because in addition to passing the result of fmt_ident() to commit_tree_extended(), it wants to take the pieces and put them in the environment for hooks to see. And if the data is available only inside commit_tree_extended(), we don't have it for the hooks. > So maybe we would have to put reset_ident_date() at the end of the > function instead, at least after git_committer_info() is called. Yes, although "reset and end" still feels a bit weird to me. I'd almost prefer to just have long-running programs insert resets at strategic points. -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] reset cached ident date before creating objects
Hi Jeff, On Sat, Jul 30, 2016 at 2:05 AM, Jeff Kingwrote: > When we compute the date to put in author/committer lines of > commits, or tagger lines of tags, we get the current date > once and then cache it for the rest of the program. This is > a good thing in some cases, like "git commit", because it > means we do not racily assign different times to the > author/committer fields of a single commit object. So commits created with "git commit" should have the same author and committer timestamps... > diff --git a/commit.c b/commit.c > index 71a360d..7ddbffe 100644 > --- a/commit.c > +++ b/commit.c > @@ -1548,6 +1548,7 @@ int commit_tree_extended(const char *msg, size_t > msg_len, > } > > /* Person/date information */ > + reset_ident_date(); > if (!author) > author = git_author_info(IDENT_STRICT); > strbuf_addf(, "author %s\n", author); But since builtin/commit.c constructs its author ident string before calling the editor and then commit_tree_extended(), this would cause the resulting commits to have committer timestamps which differ from their author timestamps. So maybe we would have to put reset_ident_date() at the end of the function instead, at least after git_committer_info() is called. Regards, Paul -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] reset cached ident date before creating objects
On Fri, Jul 29, 2016 at 11:15:39AM -0700, Junio C Hamano wrote: > > It does feel a little backwards to cache by default, and then try to > > catch all the places that want to reset. Another way of thinking about > > it would be to almost _never_ cache, but let a few callsites like (the > > commit object creation) explicitly ask for a stable timestamp between > > the author and committer. > > ... and the reflog? I left that part out. I can't decide if it is a bug or a feature that the reflog retains the same timestamp. I.e., I'm not sure it would be wrong to do: diff --git a/refs/files-backend.c b/refs/files-backend.c index 12290d2..465cfc5 100644 --- a/refs/files-backend.c +++ b/refs/files-backend.c @@ -2886,6 +2886,8 @@ static int log_ref_write_1(const char *refname, const unsigned char *old_sha1, logfd = open(logfile->buf, oflags); if (logfd < 0) return 0; + + reset_ident_date(); result = log_ref_write_fd(logfd, old_sha1, new_sha1, git_committer_info(0), msg); if (result) { on top. If somebody writes a lot of refs without updating any commit objects, should those all have a stable timestamp or not? On the one hand, moving the timestamp reflects reality. On the other, I have done dirty things in the past like "undoing" the results of somebody's mistaken: git clone ... git push --mirror ;# oops, this deletes everything except master! by grouping ref updates according to their reflog mtimes. So I kind of punted by not changing it but also not making any claims either way. :) -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] reset cached ident date before creating objects
Jeff Kingwrites: > Linus suggested resetting the timestamp after making the commit, to > clear the way for the next commit. But if we reset any cached value > _before_ making the commit, this has a few advantages: I guess our mails crossed ;-). > It does feel a little backwards to cache by default, and then try to > catch all the places that want to reset. Another way of thinking about > it would be to almost _never_ cache, but let a few callsites like (the > commit object creation) explicitly ask for a stable timestamp between > the author and committer. ... and the reflog? I would say that the approach taken by this version is perfectly sensible, if we don't look at it as a "cache" and instead look at it as a "snapshot" of the clock for the duration of the operation. "reset" is like "now we are starting another operation, so grab a snapshot please". The changes to both tagging and committing look sensible. Thanks. > -- >8 -- > Subject: reset cached ident date before creating objects > > When we compute the date to put in author/committer lines of > commits, or tagger lines of tags, we get the current date > once and then cache it for the rest of the program. This is > a good thing in some cases, like "git commit", because it > means we do not racily assign different times to the > author/committer fields of a single commit object. > > But as more programs start to make many commits in a single > process (e.g., the recently builtin "git am"), it means that > you'll get long strings of commits with identical committer > timestamps (whereas before, we invoked "git commit" many > times and got true timestamps). > > This patch addresses it by letting callers reset the cached > time, which means they'll get a fresh time on their next > call to git_committer_info() or git_author_info(). We do so > automatically before filling in the ident fields of commit > and tag objects. That retains the property that committers > and authors in a single object will match, but means that > separate objects we create should always get their own > fresh timestamps. > > There's no automated test, because it would be inherently > racy (it depends on whether the program takes multiple > seconds to run). But you can see the effect with something > like: > > # make a fake 100-patch series; use --first-parent > # so that we pretend merges are just more patches > top=$(git rev-parse HEAD) > bottom=$(git rev-list --first-parent -100 HEAD | tail -n 1) > git log --format=email --reverse --first-parent \ > --binary -m -p $bottom..$top >patch > > # now apply it; this presumably takes multiple seconds > git checkout --detach $bottom > git am > # now count the number of distinct committer times; > # prior to this patch, there would only be one, but > # now we'd typically see several. > git log --format=%ct $bottom.. | sort -u > > Suggested-by: Linus Torvalds > Signed-off-by: Jeff King > --- > builtin/tag.c | 1 + > cache.h | 1 + > commit.c | 1 + > ident.c | 5 + > 4 files changed, 8 insertions(+) > > diff --git a/builtin/tag.c b/builtin/tag.c > index 50e4ae5..3025e7f 100644 > --- a/builtin/tag.c > +++ b/builtin/tag.c > @@ -225,6 +225,7 @@ static void create_tag(const unsigned char *object, const > char *tag, > if (type <= OBJ_NONE) > die(_("bad object type.")); > > + reset_ident_date(); > header_len = snprintf(header_buf, sizeof(header_buf), > "object %s\n" > "type %s\n" > diff --git a/cache.h b/cache.h > index b5f76a4..31e65f9 100644 > --- a/cache.h > +++ b/cache.h > @@ -1269,6 +1269,7 @@ extern const char *ident_default_email(void); > extern const char *git_editor(void); > extern const char *git_pager(int stdout_is_tty); > extern int git_ident_config(const char *, const char *, void *); > +extern void reset_ident_date(void); > > struct ident_split { > const char *name_begin; > diff --git a/commit.c b/commit.c > index 71a360d..7ddbffe 100644 > --- a/commit.c > +++ b/commit.c > @@ -1548,6 +1548,7 @@ int commit_tree_extended(const char *msg, size_t > msg_len, > } > > /* Person/date information */ > + reset_ident_date(); > if (!author) > author = git_author_info(IDENT_STRICT); > strbuf_addf(, "author %s\n", author); > diff --git a/ident.c b/ident.c > index 139c528..e20a772 100644 > --- a/ident.c > +++ b/ident.c > @@ -184,6 +184,11 @@ static const char *ident_default_date(void) > return git_default_date.buf; > } > > +void reset_ident_date(void) > +{ > + strbuf_reset(_default_date); > +} > + > static int crud(unsigned char c) > { > return c <= 32 || -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] reset cached ident date before creating objects
On Fri, Jul 29, 2016 at 11:05 AM, Jeff Kingwrote: > > Linus suggested resetting the timestamp after making the commit, to > clear the way for the next commit. But if we reset any cached value > _before_ making the commit, this has a few advantages: Looks fine to me. It should trivially fix the git-am issue, and I can't see what it could break. Famous last words. Linus -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html