Re: [PATCH] commit: drop uses of get_cached_commit_buffer()
On 2/21/2018 6:13 PM, Jeff King wrote: On Wed, Feb 21, 2018 at 02:17:11PM -0500, Derrick Stolee wrote: The get_cached_commit_buffer() method provides access to the buffer loaded for a struct commit, if it was ever loadead and was not freed. Two places use this to inform how to output information about commits. log-tree.c uses this method to short-circuit the output of commit information when the buffer is not cached. However, this leads to incorrect output in 'git log --oneline' where the short-OID is written but then the rest of the commit information is dropped and the next commit is written on the same line. rev-list uses this method for two reasons: - First, if the revision walk visits a commit twice, the buffer was freed by rev-list in the first write. The output then does not match the format expectations, since the OID is written without the rest of the content. I'm not sure after my earlier digging if there is even a way to trigger this (and if so, it is probably accidental, since those lines were added explicitly for --show-all). And actually after re-reading the commit message for 3131b7130 again, I think the current behavior is definitely not something that was carefully planned. So I'd propose a commit message like below. I only submitted my patch to avoid making you do the work of writing the commit message. My messages still don't have quite the right amount of detail (or the correct details, in this case). Junio: please add Reported-by: Derrick Stolee Thanks, -Stolee -- >8 -- Subject: [PATCH] commit: drop uses of get_cached_commit_buffer() The "--show-all" revision option shows UNINTERESTING commits. Some of these commits may be unparsed when we try to show them (since we may or may not need to walk their parents to fulfill the request). Commit 3131b71301 (Add "--show-all" revision walker flag for debugging, 2008-02-09) resolved this by just skipping pretty-printing for commits without their object contents cached, saying: Because we now end up listing commits we may not even have been parsed at all "show_log" and "show_commit" need to protect against commits that don't have a commit buffer entry. That was the easy fix to avoid the pretty-printer segfaulting, but: 1. It doesn't work for all formats. E.g., --oneline prints the oid for each such commit but not a trailing newline, leading to jumbled output. 2. It only affects some commits, depending on whether we happened to parse them or not (so if they were at the tip of an UNINTERESTING starting point, or if we happened to traverse over them, you'd see more data). 3. It unncessarily ties the decision to show the verbose header to whether the commit buffer was cached. That makes it harder to change the logic around caching (e.g., if we could traverse without actually loading the full commit objects). These days it's safe to feed such a commit to the pretty-print code. Since be5c9fb904 (logmsg_reencode: lazily load missing commit buffers, 2013-01-26), we'll load it on demand in such a case. So let's just always show the verbose headers. This does change the behavior of plumbing, but: a. The --show-all option was explicitly introduced as a debugging aid, and was never documented (and has rarely even been mentioned on the list by git devs). b. Avoiding the commits was already not deterministic due to (2) above. So the caller might have seen full headers for these commits anyway, and would need to be prepared for it. Signed-off-by: Jeff King --- builtin/rev-list.c | 2 +- log-tree.c | 3 --- 2 files changed, 1 insertion(+), 4 deletions(-) diff --git a/builtin/rev-list.c b/builtin/rev-list.c index 48300d9e11..d320b6f1e3 100644 --- a/builtin/rev-list.c +++ b/builtin/rev-list.c @@ -134,7 +134,7 @@ static void show_commit(struct commit *commit, void *data) else putchar('\n'); - if (revs->verbose_header && get_cached_commit_buffer(commit, NULL)) { + if (revs->verbose_header) { struct strbuf buf = STRBUF_INIT; struct pretty_print_context ctx = {0}; ctx.abbrev = revs->abbrev; diff --git a/log-tree.c b/log-tree.c index fc0cc0d6d1..22b2fb6c58 100644 --- a/log-tree.c +++ b/log-tree.c @@ -659,9 +659,6 @@ void show_log(struct rev_info *opt) show_mergetag(opt, commit); } - if (!get_cached_commit_buffer(commit, NULL)) - return; - if (opt->show_notes) { int raw; struct strbuf notebuf = STRBUF_INIT;
Re: [PATCH] commit: drop uses of get_cached_commit_buffer()
On Wed, Feb 21, 2018 at 02:19:17PM -0500, Derrick Stolee wrote: > > These behaviors are undocumented, untested, and unlikely to be > > expected by users or other software attempting to parse this output. > > > > Helped-by: Jeff King > > Signed-off-by: Derrick Stolee > > This would be a good time to allow multiple authors, or to just change the > author, since this is exactly the diff you (Peff) provided in an earlier > email. The commit message hopefully summarizes our discussion, but I welcome > edits. The point is moot if we take the revision I just sent (though in retrospect I really ought to have put in a Reported-by: for you there). But some communities are settling on Co-authored-by as a trailer for this case. And GitHub has started parsing and showing that along with author information: https://github.com/blog/2496-commit-together-with-co-authors -Peff
Re: [PATCH] commit: drop uses of get_cached_commit_buffer()
On Wed, Feb 21, 2018 at 03:22:02PM -0800, Stefan Beller wrote: > > Subject: [PATCH] commit: drop uses of get_cached_commit_buffer() > > --- > > builtin/rev-list.c | 2 +- > > log-tree.c | 3 --- > > 2 files changed, 1 insertion(+), 4 deletions(-) > > Now if we'd get around to rewrite pretty.c as well, we could make it static, > giving a stronger reason of not using that function. But it looks a bit > complicated to me, who is not familiar in that area of the code. > > Thanks for making less use of this suboptimal API, I'm not sure the API is suboptimal. It's not wrong to ask "do you have a cached copy of this?". It was just being used poorly here. :) See the discussion in https://public-inbox.org/git/20180221184811.gd4...@sigill.intra.peff.net/ -Peff
Re: [PATCH] commit: drop uses of get_cached_commit_buffer()
> Subject: [PATCH] commit: drop uses of get_cached_commit_buffer() > --- > builtin/rev-list.c | 2 +- > log-tree.c | 3 --- > 2 files changed, 1 insertion(+), 4 deletions(-) Now if we'd get around to rewrite pretty.c as well, we could make it static, giving a stronger reason of not using that function. But it looks a bit complicated to me, who is not familiar in that area of the code. Thanks for making less use of this suboptimal API, Stefan
Re: [PATCH] commit: drop uses of get_cached_commit_buffer()
On Wed, Feb 21, 2018 at 02:17:11PM -0500, Derrick Stolee wrote: > The get_cached_commit_buffer() method provides access to the buffer > loaded for a struct commit, if it was ever loadead and was not freed. > > Two places use this to inform how to output information about commits. > > log-tree.c uses this method to short-circuit the output of commit > information when the buffer is not cached. However, this leads to > incorrect output in 'git log --oneline' where the short-OID is written > but then the rest of the commit information is dropped and the next > commit is written on the same line. > > rev-list uses this method for two reasons: > > - First, if the revision walk visits a commit twice, the buffer was > freed by rev-list in the first write. The output then does not > match the format expectations, since the OID is written without the > rest of the content. I'm not sure after my earlier digging if there is even a way to trigger this (and if so, it is probably accidental, since those lines were added explicitly for --show-all). And actually after re-reading the commit message for 3131b7130 again, I think the current behavior is definitely not something that was carefully planned. So I'd propose a commit message like below. -- >8 -- Subject: [PATCH] commit: drop uses of get_cached_commit_buffer() The "--show-all" revision option shows UNINTERESTING commits. Some of these commits may be unparsed when we try to show them (since we may or may not need to walk their parents to fulfill the request). Commit 3131b71301 (Add "--show-all" revision walker flag for debugging, 2008-02-09) resolved this by just skipping pretty-printing for commits without their object contents cached, saying: Because we now end up listing commits we may not even have been parsed at all "show_log" and "show_commit" need to protect against commits that don't have a commit buffer entry. That was the easy fix to avoid the pretty-printer segfaulting, but: 1. It doesn't work for all formats. E.g., --oneline prints the oid for each such commit but not a trailing newline, leading to jumbled output. 2. It only affects some commits, depending on whether we happened to parse them or not (so if they were at the tip of an UNINTERESTING starting point, or if we happened to traverse over them, you'd see more data). 3. It unncessarily ties the decision to show the verbose header to whether the commit buffer was cached. That makes it harder to change the logic around caching (e.g., if we could traverse without actually loading the full commit objects). These days it's safe to feed such a commit to the pretty-print code. Since be5c9fb904 (logmsg_reencode: lazily load missing commit buffers, 2013-01-26), we'll load it on demand in such a case. So let's just always show the verbose headers. This does change the behavior of plumbing, but: a. The --show-all option was explicitly introduced as a debugging aid, and was never documented (and has rarely even been mentioned on the list by git devs). b. Avoiding the commits was already not deterministic due to (2) above. So the caller might have seen full headers for these commits anyway, and would need to be prepared for it. Signed-off-by: Jeff King --- builtin/rev-list.c | 2 +- log-tree.c | 3 --- 2 files changed, 1 insertion(+), 4 deletions(-) diff --git a/builtin/rev-list.c b/builtin/rev-list.c index 48300d9e11..d320b6f1e3 100644 --- a/builtin/rev-list.c +++ b/builtin/rev-list.c @@ -134,7 +134,7 @@ static void show_commit(struct commit *commit, void *data) else putchar('\n'); - if (revs->verbose_header && get_cached_commit_buffer(commit, NULL)) { + if (revs->verbose_header) { struct strbuf buf = STRBUF_INIT; struct pretty_print_context ctx = {0}; ctx.abbrev = revs->abbrev; diff --git a/log-tree.c b/log-tree.c index fc0cc0d6d1..22b2fb6c58 100644 --- a/log-tree.c +++ b/log-tree.c @@ -659,9 +659,6 @@ void show_log(struct rev_info *opt) show_mergetag(opt, commit); } - if (!get_cached_commit_buffer(commit, NULL)) - return; - if (opt->show_notes) { int raw; struct strbuf notebuf = STRBUF_INIT; -- 2.16.2.555.g885a024879
Re: [PATCH] commit: drop uses of get_cached_commit_buffer()
On 2/21/2018 2:17 PM, Derrick Stolee wrote: The get_cached_commit_buffer() method provides access to the buffer loaded for a struct commit, if it was ever loadead and was not freed. Two places use this to inform how to output information about commits. log-tree.c uses this method to short-circuit the output of commit information when the buffer is not cached. However, this leads to incorrect output in 'git log --oneline' where the short-OID is written but then the rest of the commit information is dropped and the next commit is written on the same line. rev-list uses this method for two reasons: - First, if the revision walk visits a commit twice, the buffer was freed by rev-list in the first write. The output then does not match the format expectations, since the OID is written without the rest of the content. - Second, if the revision walk visits a commit that was marked UNINTERESTING, the walk may never load a buffer and hence rev-list will not output the verbose information. These behaviors are undocumented, untested, and unlikely to be expected by users or other software attempting to parse this output. Helped-by: Jeff King Signed-off-by: Derrick Stolee This would be a good time to allow multiple authors, or to just change the author, since this is exactly the diff you (Peff) provided in an earlier email. The commit message hopefully summarizes our discussion, but I welcome edits. --- builtin/rev-list.c | 2 +- log-tree.c | 3 --- 2 files changed, 1 insertion(+), 4 deletions(-) diff --git a/builtin/rev-list.c b/builtin/rev-list.c index 48300d9..d320b6f 100644 --- a/builtin/rev-list.c +++ b/builtin/rev-list.c @@ -134,7 +134,7 @@ static void show_commit(struct commit *commit, void *data) else putchar('\n'); - if (revs->verbose_header && get_cached_commit_buffer(commit, NULL)) { + if (revs->verbose_header) { struct strbuf buf = STRBUF_INIT; struct pretty_print_context ctx = {0}; ctx.abbrev = revs->abbrev; diff --git a/log-tree.c b/log-tree.c index fc0cc0d..22b2fb6 100644 --- a/log-tree.c +++ b/log-tree.c @@ -659,9 +659,6 @@ void show_log(struct rev_info *opt) show_mergetag(opt, commit); } - if (!get_cached_commit_buffer(commit, NULL)) - return; - if (opt->show_notes) { int raw; struct strbuf notebuf = STRBUF_INIT;
[PATCH] commit: drop uses of get_cached_commit_buffer()
The get_cached_commit_buffer() method provides access to the buffer loaded for a struct commit, if it was ever loadead and was not freed. Two places use this to inform how to output information about commits. log-tree.c uses this method to short-circuit the output of commit information when the buffer is not cached. However, this leads to incorrect output in 'git log --oneline' where the short-OID is written but then the rest of the commit information is dropped and the next commit is written on the same line. rev-list uses this method for two reasons: - First, if the revision walk visits a commit twice, the buffer was freed by rev-list in the first write. The output then does not match the format expectations, since the OID is written without the rest of the content. - Second, if the revision walk visits a commit that was marked UNINTERESTING, the walk may never load a buffer and hence rev-list will not output the verbose information. These behaviors are undocumented, untested, and unlikely to be expected by users or other software attempting to parse this output. Helped-by: Jeff King Signed-off-by: Derrick Stolee --- builtin/rev-list.c | 2 +- log-tree.c | 3 --- 2 files changed, 1 insertion(+), 4 deletions(-) diff --git a/builtin/rev-list.c b/builtin/rev-list.c index 48300d9..d320b6f 100644 --- a/builtin/rev-list.c +++ b/builtin/rev-list.c @@ -134,7 +134,7 @@ static void show_commit(struct commit *commit, void *data) else putchar('\n'); - if (revs->verbose_header && get_cached_commit_buffer(commit, NULL)) { + if (revs->verbose_header) { struct strbuf buf = STRBUF_INIT; struct pretty_print_context ctx = {0}; ctx.abbrev = revs->abbrev; diff --git a/log-tree.c b/log-tree.c index fc0cc0d..22b2fb6 100644 --- a/log-tree.c +++ b/log-tree.c @@ -659,9 +659,6 @@ void show_log(struct rev_info *opt) show_mergetag(opt, commit); } - if (!get_cached_commit_buffer(commit, NULL)) - return; - if (opt->show_notes) { int raw; struct strbuf notebuf = STRBUF_INIT; -- 2.7.4