Re: [PATCH v4 0/4] Add --no-ahead-behind to status
On Wed, Jan 10, 2018 at 12:22:10PM -0800, Junio C Hamano wrote: > Jeff Kingwrites: > > > To be clear, which approach are we talking about? I think there are > > three options: > > > > 1. The user tells us not to bother computing real ahead/behind values. > > We always say "same" or "not the same". > > > > 2. The user tells us not to bother computing ahead/behind values > > with more effort than N. After traversing N commits without getting > > an answer, we say "same" or "not the same". But we may sometimes > > give a real answer if we found it within N. > > > > 3. The user tells us not to spend more effort than N. After traversing > > N commits we try to make some partial statement based on > > generations (or commit timestamps as a proxy for them). > > > > I agree that (3) is probably not going to be useful enough in the > > general case to merit the implementation effort and confusion. But is > > there anything wrong with (2)? > > I agree (3) would not be all that interesting. Offhand I do not see > a problem with (2). I think with "real" in your "sometimes give a > real answer" you meant to say that we limit our answers to just one > three ("same", "not the same", "ahead/behind by exactly N/M") and I > think it is a good choice that is easy to explain. Yes, exactly. That's a better way of saying it. -Peff
Re: [PATCH v4 0/4] Add --no-ahead-behind to status
Jeff Kingwrites: > To be clear, which approach are we talking about? I think there are > three options: > > 1. The user tells us not to bother computing real ahead/behind values. > We always say "same" or "not the same". > > 2. The user tells us not to bother computing ahead/behind values > with more effort than N. After traversing N commits without getting > an answer, we say "same" or "not the same". But we may sometimes > give a real answer if we found it within N. > > 3. The user tells us not to spend more effort than N. After traversing > N commits we try to make some partial statement based on > generations (or commit timestamps as a proxy for them). > > I agree that (3) is probably not going to be useful enough in the > general case to merit the implementation effort and confusion. But is > there anything wrong with (2)? I agree (3) would not be all that interesting. Offhand I do not see a problem with (2). I think with "real" in your "sometimes give a real answer" you meant to say that we limit our answers to just one three ("same", "not the same", "ahead/behind by exactly N/M") and I think it is a good choice that is easy to explain. We might be able to say things other than these three, namely, "ahead by no more than N, behind by no more than M", but I do not know if that is useful or merely more confusing than it's worth.
Re: [PATCH v4 0/4] Add --no-ahead-behind to status
On Tue, Jan 09, 2018 at 09:29:31AM -0500, Derrick Stolee wrote: > > > But even still, finding small answers quickly and accurately and punting > > > to "really far, I didn't bother to compute it" on the big ones would be > > > an improvement over always punting. > > Indeed. The longer I think about it, the more I like the "100+ commits > > apart" idea. > > > > Again, I strongly suggest we drop this approach because it will be more pain > than it is worth. To be clear, which approach are we talking about? I think there are three options: 1. The user tells us not to bother computing real ahead/behind values. We always say "same" or "not the same". 2. The user tells us not to bother computing ahead/behind values with more effort than N. After traversing N commits without getting an answer, we say "same" or "not the same". But we may sometimes give a real answer if we found it within N. 3. The user tells us not to spend more effort than N. After traversing N commits we try to make some partial statement based on generations (or commit timestamps as a proxy for them). I agree that (3) is probably not going to be useful enough in the general case to merit the implementation effort and confusion. But is there anything wrong with (2)? -Peff
Re: [PATCH v4 0/4] Add --no-ahead-behind to status
On Tue, Jan 09, 2018 at 02:15:47PM +0100, Johannes Schindelin wrote: > > I like this direction a lot. I had hoped we could say "100+ commits > > ahead", > > How about "100+ commits apart" instead? Yeah, that is probably more accurate for the general case. > > but I don't think we can do so accurately without generation numbers. > > Even with generation numbers, it is not possible to say whether two given > commits reflect diverging branches or have an ancestor-descendant > relationship (or in graph speak: whether they are comparable). > > It could potentially make it possible to cut off the commit traversal, but > I do not even see how that would be possible. > > The only thing you could say for sure is that two different commits with > the same generation number are for sure "uncomparable", i.e. reflect > diverging branches. I think sometimes we can say more. E.g., imagine we have two commits, A and B, with generation numbers N and N+1000. We walk back 100 commits deep from B and end up at a commit around N+900. We know that there are at least 100 commits in B that are not in A (the 100 we walked, which we know cannot be ancestors of A because of their generation numbers). That's true even if there is no ancestry relationship between the two commits at all. But we cannot say in that case how many (if any) commits are in A but not B. So we can say "100+ commits ahead, unknown behind" (or if the two generation numbers are reversed, we can say "unknown ahead, 100+ commits behind). In the more general case, we could actually walk _past_ generation N, and end up at N-25 or similar. There we can't say anything intelligent about the commits with generations <= N. But we could say something like "there are 75 commits in B that cannot be in A" (note that it's probably _not_ actually 75; it's however many we walked until we crossed N). So that was what I was getting at in the earlier mail. We can sometimes give a partial answer. But I think giving that partial answer is likely to be more confusing than useful to a user, because there are a lot of caveats about how much we know for a given situation. And I think from what you wrote below that you probably agree with that (that we can make a few claims, but not enough to really be useful). -Peff
Re: [PATCH v4 0/4] Add --no-ahead-behind to status
Hi Stolee, On Tue, 9 Jan 2018, Derrick Stolee wrote: > On 1/9/2018 8:15 AM, Johannes Schindelin wrote: > > > > On Tue, 9 Jan 2018, Jeff King wrote: > > > > > But I don't think you can approximate both ahead and behind together > > > without finding the actual merge base. > > > > > > But even still, finding small answers quickly and accurately and > > > punting to "really far, I didn't bother to compute it" on the big > > > ones would be an improvement over always punting. > > > > Indeed. The longer I think about it, the more I like the "100+ commits > > apart" idea. > > Again, I strongly suggest we drop this approach because it will be more pain > than it is worth. So what you are saying is if there is a commit graph with *heavy* clock skew, you might overestimate how many commits apart the tips are. I say that this is striking the balance between correctness and usability on the *wrong* side. Sure, it might be wrong if your commit graph suffers heavily from clock skew. In most cases, you still get a pretty darn useful hint where you're at. The alternative would be *not* to show any useful hint in most cases, i.e. when you did not find all merge bases within commits. I would really hate it if Git spent so much time and did not even give me a hint. Totally unsatisfying user experience. Ciao, Johannes
Re: [PATCH v4 0/4] Add --no-ahead-behind to status
On 1/9/2018 9:29 AM, Derrick Stolee wrote: On 1/9/2018 8:15 AM, Johannes Schindelin wrote: Hi Peff, On Tue, 9 Jan 2018, Jeff King wrote: On Mon, Jan 08, 2018 at 03:04:20PM -0500, Jeff Hostetler wrote: I was thinking about something similar to the logic we use today about whether to start reporting progress on other long commands. That would mean you could still get the ahead/behind values if you aren't that far behind but would only get "different" if that calculation gets too expensive (which implies the actual value isn't going to be all that helpful anyway). After a off-line conversation with the others I'm going to look into a version that is limited to n commits rather than be completely on or off. I think if you are for example less than 100 a/b then those numbers have value; if you are past n, then they have much less value. I'd rather do it by a fixed limit than by time to ensure that output is predictable on graph shape and not on system load. I like this direction a lot. I had hoped we could say "100+ commits ahead", How about "100+ commits apart" instead? Unfortunately, we can _never_ guarantee more than 1 commit ahead/behind unless we walk to the merge base (or have generation numbers). For example, suppose the 101st commit in each history has a parent that in the recent history of the other commit. (There must be merge commits to make this work without creating cycles, but the ahead/behind counts could be much lower than the number of walked commits.) but I don't think we can do so accurately without generation numbers. Even with generation numbers, it is not possible to say whether two given commits reflect diverging branches or have an ancestor-descendant relationship (or in graph speak: whether they are comparable). If you walk commits using a priority queue where the priority is the generation number, then you can know that you have walked all reachable commits with generation greater than X, so you know among those commits which are comparable or not. For this to work accurately, you must walk from both tips to a generation lower than each. It does not help the case where one branch is 100,000+ commits ahead, where most of those commits have higher generation number than the behind commit. It could potentially make it possible to cut off the commit traversal, but I do not even see how that would be possible. The only thing you could say for sure is that two different commits with the same generation number are for sure "uncomparable", i.e. reflect diverging branches. E.g., the case I mentioned at the bottom of this mail: https://public-inbox.org/git/20171224143318.gc23...@sigill.intra.peff.net/ I haven't thought too hard on it, but I suspect with generation numbers you could bound it and at least say "100+ ahead" or "100+ behind". If you have walked 100 commits and still have not found a merge base, there is no telling whether one start point is the ancestor of the other. All you can say is that there are more than 100 commits in the "difference". You would not even be able to say that the *shortest* path between those two start points is longer than 100 commits, you can construct pathological DAGs pretty easily. Even if you had generation numbers, and one commit's generation number was, say, 17, and the other one's was 17,171, you could not necessarily assume that the 17 one is the ancestor of the 17,171 one, all you can say that it is not possible the other way round. This is why we cannot _always_ use generation numbers, but they do help in some cases. But I don't think you can approximate both ahead and behind together without finding the actual merge base. But even still, finding small answers quickly and accurately and punting to "really far, I didn't bother to compute it" on the big ones would be an improvement over always punting. Indeed. The longer I think about it, the more I like the "100+ commits apart" idea. Again, I strongly suggest we drop this approach because it will be more pain than it is worth. Good comments all. Thanks! I will leave the patch series as it is with the existing on/off setting and call it quits. Thanks, Jeff
Re: [PATCH v4 0/4] Add --no-ahead-behind to status
On 1/9/2018 8:15 AM, Johannes Schindelin wrote: Hi Peff, On Tue, 9 Jan 2018, Jeff King wrote: On Mon, Jan 08, 2018 at 03:04:20PM -0500, Jeff Hostetler wrote: I was thinking about something similar to the logic we use today about whether to start reporting progress on other long commands. That would mean you could still get the ahead/behind values if you aren't that far behind but would only get "different" if that calculation gets too expensive (which implies the actual value isn't going to be all that helpful anyway). After a off-line conversation with the others I'm going to look into a version that is limited to n commits rather than be completely on or off. I think if you are for example less than 100 a/b then those numbers have value; if you are past n, then they have much less value. I'd rather do it by a fixed limit than by time to ensure that output is predictable on graph shape and not on system load. I like this direction a lot. I had hoped we could say "100+ commits ahead", How about "100+ commits apart" instead? Unfortunately, we can _never_ guarantee more than 1 commit ahead/behind unless we walk to the merge base (or have generation numbers). For example, suppose the 101st commit in each history has a parent that in the recent history of the other commit. (There must be merge commits to make this work without creating cycles, but the ahead/behind counts could be much lower than the number of walked commits.) but I don't think we can do so accurately without generation numbers. Even with generation numbers, it is not possible to say whether two given commits reflect diverging branches or have an ancestor-descendant relationship (or in graph speak: whether they are comparable). If you walk commits using a priority queue where the priority is the generation number, then you can know that you have walked all reachable commits with generation greater than X, so you know among those commits which are comparable or not. For this to work accurately, you must walk from both tips to a generation lower than each. It does not help the case where one branch is 100,000+ commits ahead, where most of those commits have higher generation number than the behind commit. It could potentially make it possible to cut off the commit traversal, but I do not even see how that would be possible. The only thing you could say for sure is that two different commits with the same generation number are for sure "uncomparable", i.e. reflect diverging branches. E.g., the case I mentioned at the bottom of this mail: https://public-inbox.org/git/20171224143318.gc23...@sigill.intra.peff.net/ I haven't thought too hard on it, but I suspect with generation numbers you could bound it and at least say "100+ ahead" or "100+ behind". If you have walked 100 commits and still have not found a merge base, there is no telling whether one start point is the ancestor of the other. All you can say is that there are more than 100 commits in the "difference". You would not even be able to say that the *shortest* path between those two start points is longer than 100 commits, you can construct pathological DAGs pretty easily. Even if you had generation numbers, and one commit's generation number was, say, 17, and the other one's was 17,171, you could not necessarily assume that the 17 one is the ancestor of the 17,171 one, all you can say that it is not possible the other way round. This is why we cannot _always_ use generation numbers, but they do help in some cases. But I don't think you can approximate both ahead and behind together without finding the actual merge base. But even still, finding small answers quickly and accurately and punting to "really far, I didn't bother to compute it" on the big ones would be an improvement over always punting. Indeed. The longer I think about it, the more I like the "100+ commits apart" idea. Again, I strongly suggest we drop this approach because it will be more pain than it is worth. Thanks, -Stolee
Re: [PATCH v4 0/4] Add --no-ahead-behind to status
Hi Peff, On Tue, 9 Jan 2018, Jeff King wrote: > On Mon, Jan 08, 2018 at 03:04:20PM -0500, Jeff Hostetler wrote: > > > > I was thinking about something similar to the logic we use today > > > about whether to start reporting progress on other long commands. > > > That would mean you could still get the ahead/behind values if you > > > aren't that far behind but would only get "different" if that > > > calculation gets too expensive (which implies the actual value isn't > > > going to be all that helpful anyway). > > > > After a off-line conversation with the others I'm going to look into > > a version that is limited to n commits rather than be completely on or > > off. I think if you are for example less than 100 a/b then those numbers > > have value; if you are past n, then they have much less value. > > > > I'd rather do it by a fixed limit than by time to ensure that output > > is predictable on graph shape and not on system load. > > I like this direction a lot. I had hoped we could say "100+ commits > ahead", How about "100+ commits apart" instead? > but I don't think we can do so accurately without generation numbers. Even with generation numbers, it is not possible to say whether two given commits reflect diverging branches or have an ancestor-descendant relationship (or in graph speak: whether they are comparable). It could potentially make it possible to cut off the commit traversal, but I do not even see how that would be possible. The only thing you could say for sure is that two different commits with the same generation number are for sure "uncomparable", i.e. reflect diverging branches. > E.g., the case I mentioned at the bottom of this mail: > > https://public-inbox.org/git/20171224143318.gc23...@sigill.intra.peff.net/ > > I haven't thought too hard on it, but I suspect with generation numbers > you could bound it and at least say "100+ ahead" or "100+ behind". If you have walked 100 commits and still have not found a merge base, there is no telling whether one start point is the ancestor of the other. All you can say is that there are more than 100 commits in the "difference". You would not even be able to say that the *shortest* path between those two start points is longer than 100 commits, you can construct pathological DAGs pretty easily. Even if you had generation numbers, and one commit's generation number was, say, 17, and the other one's was 17,171, you could not necessarily assume that the 17 one is the ancestor of the 17,171 one, all you can say that it is not possible the other way round. > But I don't think you can approximate both ahead and behind together > without finding the actual merge base. > > But even still, finding small answers quickly and accurately and punting > to "really far, I didn't bother to compute it" on the big ones would be > an improvement over always punting. Indeed. The longer I think about it, the more I like the "100+ commits apart" idea. Ciao, Dscho
Re: [PATCH v4 0/4] Add --no-ahead-behind to status
On Mon, Jan 08, 2018 at 03:04:20PM -0500, Jeff Hostetler wrote: > > I was thinking about something similar to the logic we use today > > about whether to start reporting progress on other long commands. > > That would mean you could still get the ahead/behind values if you > > aren't that far behind but would only get "different" if that > > calculation gets too expensive (which implies the actual value isn't > > going to be all that helpful anyway). > > After a off-line conversation with the others I'm going to look into > a version that is limited to n commits rather than be completely on or > off. I think if you are for example less than 100 a/b then those numbers > have value; if you are past n, then they have much less value. > > I'd rather do it by a fixed limit than by time to ensure that output > is predictable on graph shape and not on system load. I like this direction a lot. I had hoped we could say "100+ commits ahead", but I don't think we can do so accurately without generation numbers. E.g., the case I mentioned at the bottom of this mail: https://public-inbox.org/git/20171224143318.gc23...@sigill.intra.peff.net/ I haven't thought too hard on it, but I suspect with generation numbers you could bound it and at least say "100+ ahead" or "100+ behind". But I don't think you can approximate both ahead and behind together without finding the actual merge base. But even still, finding small answers quickly and accurately and punting to "really far, I didn't bother to compute it" on the big ones would be an improvement over always punting. -Peff
Re: [PATCH v4 0/4] Add --no-ahead-behind to status
On 1/8/2018 2:49 PM, Ben Peart wrote: On 1/8/2018 10:48 AM, Jeff Hostetler wrote: From: Jeff HostetlerThis is version 4 of my patch series to avoid expensive ahead/behind calculations in status. This version removes the last commit containing the experimental config setting. And removes the undefined return values for the nr_ahead/nr_behind arguments as discussed on the mailing list. While I like the simplicity of just turning this off completely, I do wonder if we could come up with a better user experience. For example, could we somehow limit the time spent computing the before/after and if it exceeds that limit, drop back to saying they are "different" rather than computing the exact number of commits before/after. I was thinking about something similar to the logic we use today about whether to start reporting progress on other long commands. That would mean you could still get the ahead/behind values if you aren't that far behind but would only get "different" if that calculation gets too expensive (which implies the actual value isn't going to be all that helpful anyway). After a off-line conversation with the others I'm going to look into a version that is limited to n commits rather than be completely on or off. I think if you are for example less than 100 a/b then those numbers have value; if you are past n, then they have much less value. I'd rather do it by a fixed limit than by time to ensure that output is predictable on graph shape and not on system load. Jeff
Re: [PATCH v4 0/4] Add --no-ahead-behind to status
On 1/8/2018 10:48 AM, Jeff Hostetler wrote: From: Jeff HostetlerThis is version 4 of my patch series to avoid expensive ahead/behind calculations in status. This version removes the last commit containing the experimental config setting. And removes the undefined return values for the nr_ahead/nr_behind arguments as discussed on the mailing list. While I like the simplicity of just turning this off completely, I do wonder if we could come up with a better user experience. For example, could we somehow limit the time spent computing the before/after and if it exceeds that limit, drop back to saying they are "different" rather than computing the exact number of commits before/after. I was thinking about something similar to the logic we use today about whether to start reporting progress on other long commands. That would mean you could still get the ahead/behind values if you aren't that far behind but would only get "different" if that calculation gets too expensive (which implies the actual value isn't going to be all that helpful anyway). This version does not address "git branch -vv", but that requires passing data across the for-each-ref barrier and that seemed beyond the scope of this task. Jeff Hostetler (4): stat_tracking_info: return +1 when branches not equal status: add --[no-]ahead-behind to status and commit for V2 format. status: update short status to respect --no-ahead-behind status: support --no-ahead-behind in long format Documentation/config.txt | 6 Documentation/git-status.txt | 5 +++ builtin/checkout.c | 2 +- builtin/commit.c | 18 ++- ref-filter.c | 8 ++--- remote.c | 50 -- remote.h | 12 ++-- t/t6040-tracking-info.sh | 73 t/t7064-wtstatus-pv2.sh | 69 + wt-status.c | 41 + wt-status.h | 2 ++ 11 files changed, 250 insertions(+), 36 deletions(-)
[PATCH v4 0/4] Add --no-ahead-behind to status
From: Jeff HostetlerThis is version 4 of my patch series to avoid expensive ahead/behind calculations in status. This version removes the last commit containing the experimental config setting. And removes the undefined return values for the nr_ahead/nr_behind arguments as discussed on the mailing list. This version does not address "git branch -vv", but that requires passing data across the for-each-ref barrier and that seemed beyond the scope of this task. Jeff Hostetler (4): stat_tracking_info: return +1 when branches not equal status: add --[no-]ahead-behind to status and commit for V2 format. status: update short status to respect --no-ahead-behind status: support --no-ahead-behind in long format Documentation/config.txt | 6 Documentation/git-status.txt | 5 +++ builtin/checkout.c | 2 +- builtin/commit.c | 18 ++- ref-filter.c | 8 ++--- remote.c | 50 -- remote.h | 12 ++-- t/t6040-tracking-info.sh | 73 t/t7064-wtstatus-pv2.sh | 69 + wt-status.c | 41 + wt-status.h | 2 ++ 11 files changed, 250 insertions(+), 36 deletions(-) -- 2.9.3