Re: [PATCH 2/3] diff_flush_patch_id: stop returning error result
On Fri, Sep 09, 2016 at 02:58:25PM +0200, Johannes Schindelin wrote: > > Yes, I agree that this is the opposite direction of libification. And I > > agree that the current message is not very helpful. > > > > But I am not sure that returning the error up the stack will actually > > help somebody move forward. The reason these are all die() calls in the > > rest of the diff code is that they are generally indicative of > > unrecoverable repository corruption. So any advice does not really > > depend on what operation you are performing; it is always "stop what you > > are doing immediately, run fsck, and try to get the broken objects from > > somebody else". > > > > So IMHO, on balance this is not hurting anything. > > Well, you make such a situation even worse than it already is. > > It would be one thing to change the code to actually say "stop what you > are doing immediately, run `git fsck` and try to get the broken objects > from somewhere else", *before* saying how to proceed after that. > > But that is not what your patch does. > > What your patch does is to remove *even the possibility* of saying how to > proceed after getting the repository corruption fixed. And instead of > saying how the corruption could be fixed, it outputs a terse "cannot read > files to diff". > > I do not think that is a wise direction. First, do not blame me for the terse "cannot read files to diff". That is the current message. And my patch does not make changing that message any more difficult. You are welcome to change it in its error() form. You are welcome to change it in the resulting die(). The quality of that message is totally orthogonal to what the patch is doing. The _only_ thing it is losing is the ability to for the caller to then additionally say "once you have finished uncorrupting the repository, you can resume your operation with ...". My point is that this is not useful advice. No callers give it, and I don't foresee other callers giving it. My argument above was basically that it is such an exceptional condition it is not worth worrying about. -Peff
Re: [PATCH 2/3] diff_flush_patch_id: stop returning error result
Hi Peff, On Fri, 9 Sep 2016, Jeff King wrote: > On Fri, Sep 09, 2016 at 12:28:38PM +0200, Johannes Schindelin wrote: > > > I like the simplification, but I *hate* the fact that the calling code has > > *no way* to inform the user about the proper next steps. > > > > You are touching code that is really quite at the bottom of a lot of call > > chains. For example in the one of `git pull --rebase`. I just spent an > > insane amount of time trying to make sure that this command will not > > simply die() somewhere deep in the code, leaving the user puzzled. > > > > Please see 3be18b4 (t5520: verify that `pull --rebase` shows the helpful > > advice when failing, 2016-07-26) for more details. > > Yes, I agree that this is the opposite direction of libification. And I > agree that the current message is not very helpful. > > But I am not sure that returning the error up the stack will actually > help somebody move forward. The reason these are all die() calls in the > rest of the diff code is that they are generally indicative of > unrecoverable repository corruption. So any advice does not really > depend on what operation you are performing; it is always "stop what you > are doing immediately, run fsck, and try to get the broken objects from > somebody else". > > So IMHO, on balance this is not hurting anything. Well, you make such a situation even worse than it already is. It would be one thing to change the code to actually say "stop what you are doing immediately, run `git fsck` and try to get the broken objects from somewhere else", *before* saying how to proceed after that. But that is not what your patch does. What your patch does is to remove *even the possibility* of saying how to proceed after getting the repository corruption fixed. And instead of saying how the corruption could be fixed, it outputs a terse "cannot read files to diff". I do not think that is a wise direction. Ciao, Dscho
Re: [PATCH 2/3] diff_flush_patch_id: stop returning error result
On Fri, Sep 09, 2016 at 12:28:38PM +0200, Johannes Schindelin wrote: > I like the simplification, but I *hate* the fact that the calling code has > *no way* to inform the user about the proper next steps. > > You are touching code that is really quite at the bottom of a lot of call > chains. For example in the one of `git pull --rebase`. I just spent an > insane amount of time trying to make sure that this command will not > simply die() somewhere deep in the code, leaving the user puzzled. > > Please see 3be18b4 (t5520: verify that `pull --rebase` shows the helpful > advice when failing, 2016-07-26) for more details. Yes, I agree that this is the opposite direction of libification. And I agree that the current message is not very helpful. But I am not sure that returning the error up the stack will actually help somebody move forward. The reason these are all die() calls in the rest of the diff code is that they are generally indicative of unrecoverable repository corruption. So any advice does not really depend on what operation you are performing; it is always "stop what you are doing immediately, run fsck, and try to get the broken objects from somebody else". So IMHO, on balance this is not hurting anything. > A much better way, in my opinion, would be to introduce a new flag, say, > skip_merges, and pass that to the diff_flush_patch_id() function. You > could also consider consolidating that flag with the diff_header_only flag > into a "flags" argument via something like diff_flush_patch_id() doesn't care about merges; that's too late. The change has to happen in commit_patch_id(). And the problem is not one of passing in "skip merges" (we _always_ want to skip merges). It is rather distinguishing the reason that commit_patch_id() told us it did not fill in the sha1: because it was an error, or because the patch id is undefined (one triggers a die(), the other a silent continue). I think I laid out that path already in the cover letter of the original. If the consensus is that this is too ugly, I can implement that approach. -Peff
Re: [PATCH 2/3] diff_flush_patch_id: stop returning error result
Hi Peff, On Wed, 7 Sep 2016, Jeff King wrote: > All of our errors come from diff_get_patch_id(), which has > exactly three error conditions. The first is an internal > assertion, which should be a die("BUG") in the first place. > > The other two are caused by an inability to two diff blobs, > which is an indication of a serious problem (probably > repository corruption). All the rest of the diff subsystem > dies immediately on these conditions. By passing up the > error, in theory we can keep going even if patch-id is > unable to function. But in practice this means we may > generate subtly wrong results (e.g., by failing to correlate > two commits). Let's just die(), as we're better off making > it clear to the user that their repository is not > functional. > > As a result, we can simplify the calling code. I like the simplification, but I *hate* the fact that the calling code has *no way* to inform the user about the proper next steps. You are touching code that is really quite at the bottom of a lot of call chains. For example in the one of `git pull --rebase`. I just spent an insane amount of time trying to make sure that this command will not simply die() somewhere deep in the code, leaving the user puzzled. Please see 3be18b4 (t5520: verify that `pull --rebase` shows the helpful advice when failing, 2016-07-26) for more details. A much better way, in my opinion, would be to introduce a new flag, say, skip_merges, and pass that to the diff_flush_patch_id() function. You could also consider consolidating that flag with the diff_header_only flag into a "flags" argument via something like enum diff_flush_patch_id { DIFF_HEADER_ONLY = 1, SKIP_MERGES = 2 } But it is definitely not a good idea to reintroduce the bad practice of die()ing deep down in library code. I know, you want proper exception handling. We cannot have that. We use C. But die() is not a solution: it introduces new problems. Mind you: I agree that there are serious problems in the cases you illustrated. But none of those problems give us license to leave the user utterly puzzled by not even telling them what is going on: spouting internals such as "unable to read files to diff" is *most definitely* not helping users who simply want to run a `git pull --rebase`. Ciao, Dscho
Re: [PATCH 2/3] diff_flush_patch_id: stop returning error result
On Thu, Sep 08, 2016 at 01:51:05AM +0100, Ramsay Jones wrote: > > > On 07/09/16 23:04, Jeff King wrote: > > All of our errors come from diff_get_patch_id(), which has > > exactly three error conditions. The first is an internal > > assertion, which should be a die("BUG") in the first place. > > > > The other two are caused by an inability to two diff blobs, >^ > Huh? ... to diff two blobs? Sorry. English my getting worse be to seems. Will fix in a re-roll. -Peff
Re: [PATCH 2/3] diff_flush_patch_id: stop returning error result
On 07/09/16 23:04, Jeff King wrote: > All of our errors come from diff_get_patch_id(), which has > exactly three error conditions. The first is an internal > assertion, which should be a die("BUG") in the first place. > > The other two are caused by an inability to two diff blobs, ^ Huh? ... to diff two blobs? ATB, Ramsay Jones
[PATCH 2/3] diff_flush_patch_id: stop returning error result
All of our errors come from diff_get_patch_id(), which has exactly three error conditions. The first is an internal assertion, which should be a die("BUG") in the first place. The other two are caused by an inability to two diff blobs, which is an indication of a serious problem (probably repository corruption). All the rest of the diff subsystem dies immediately on these conditions. By passing up the error, in theory we can keep going even if patch-id is unable to function. But in practice this means we may generate subtly wrong results (e.g., by failing to correlate two commits). Let's just die(), as we're better off making it clear to the user that their repository is not functional. As a result, we can simplify the calling code. Signed-off-by: Jeff King --- This is a prerequisite for patch 3, since it means that commit_patch_id() stops returning "real" errors. But obviously if this is distasteful (and it does feel a little weird to convert error() to die(), even though the rest of the diff code-base behaves this way), we can teach commit_patch_id() to distinguish between "this has no patch-id" and "a real error occured" in its return value. diff.c | 18 -- diff.h | 2 +- patch-ids.c | 3 ++- 3 files changed, 11 insertions(+), 12 deletions(-) diff --git a/diff.c b/diff.c index 534c12e..d0594f6 100644 --- a/diff.c +++ b/diff.c @@ -4462,7 +4462,7 @@ static void patch_id_consume(void *priv, char *line, unsigned long len) } /* returns 0 upon success, and writes result into sha1 */ -static int diff_get_patch_id(struct diff_options *options, unsigned char *sha1, int diff_header_only) +static void diff_get_patch_id(struct diff_options *options, unsigned char *sha1, int diff_header_only) { struct diff_queue_struct *q = &diff_queued_diff; int i; @@ -4484,7 +4484,7 @@ static int diff_get_patch_id(struct diff_options *options, unsigned char *sha1, memset(&xpp, 0, sizeof(xpp)); memset(&xecfg, 0, sizeof(xecfg)); if (p->status == 0) - return error("internal diff status error"); + die("BUG: diff status unset while computing patch_id"); if (p->status == DIFF_STATUS_UNKNOWN) continue; if (diff_unmodified_pair(p)) @@ -4536,7 +4536,7 @@ static int diff_get_patch_id(struct diff_options *options, unsigned char *sha1, if (fill_mmfile(&mf1, p->one) < 0 || fill_mmfile(&mf2, p->two) < 0) - return error("unable to read files to diff"); + die("unable to read files to diff"); if (diff_filespec_is_binary(p->one) || diff_filespec_is_binary(p->two)) { @@ -4552,27 +4552,25 @@ static int diff_get_patch_id(struct diff_options *options, unsigned char *sha1, xecfg.flags = 0; if (xdi_diff_outf(&mf1, &mf2, patch_id_consume, &data, &xpp, &xecfg)) - return error("unable to generate patch-id diff for %s", -p->one->path); + die("unable to generate patch-id diff for %s", + p->one->path); } git_SHA1_Final(sha1, &ctx); - return 0; } -int diff_flush_patch_id(struct diff_options *options, unsigned char *sha1, int diff_header_only) +void diff_flush_patch_id(struct diff_options *options, unsigned char *sha1, int diff_header_only) { struct diff_queue_struct *q = &diff_queued_diff; int i; - int result = diff_get_patch_id(options, sha1, diff_header_only); + + diff_get_patch_id(options, sha1, diff_header_only); for (i = 0; i < q->nr; i++) diff_free_filepair(q->queue[i]); free(q->queue); DIFF_QUEUE_CLEAR(q); - - return result; } static int is_summary_empty(const struct diff_queue_struct *q) diff --git a/diff.h b/diff.h index 7883729..f4dcfe1 100644 --- a/diff.h +++ b/diff.h @@ -342,7 +342,7 @@ extern int run_diff_files(struct rev_info *revs, unsigned int option); extern int run_diff_index(struct rev_info *revs, int cached); extern int do_diff_cache(const unsigned char *, struct diff_options *); -extern int diff_flush_patch_id(struct diff_options *, unsigned char *, int); +extern void diff_flush_patch_id(struct diff_options *, unsigned char *, int); extern int diff_result_code(struct diff_options *, int); diff --git a/patch-ids.c b/patch-ids.c index 77e4663..0e95220 100644 --- a/patch-ids.c +++ b/patch-ids.c @@ -13,7 +13,8 @@ int commit_patch_id(struct commit *commit, struct diff_options *options, else diff_root_tree_sha1(commit->object.oid.hash, "", options); diffcore_std(options); - return diff_flush_patch_id(options, sha1, diff_header_only); + diff_flush_patch_id(options, sha1, diff_header_only); +