Re: history damage in linux.git
Hi Linus, On Thu, 21 Apr 2016, Linus Torvalds wrote: > On Thu, Apr 21, 2016 at 11:05 AM, Jeff King wrote: > > > > Sadly, neither git's internal version-sorting nor GNU's "sort -V" > > knows that "v1.0-rc1" comes before "v1.0", so I had to rely on > > "--sort=taggerdate". > > I'm not seeing the "sadly". > > I think "--sort=taggerdate" is pretty much the only sane sort there is > for tags, unless you do a true and full topological one (ie sort based > on by how many commits that tag encompasses, but also by how each tag > contains another tag). Turns out it is pretty easy to implement this in name-rev. Expect the patch in your inbox in a minute. Ciao, Dscho -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: history damage in linux.git
Stefan Beller writes: > Combining Junios and Linus idea: > > * We want to have the minimal history, i.e. that tag with the fewest > cummulative parent commits. (i.e. v3.13-rc7 is better than v3.13 > because `git log --oneline v3.13-rc7 |wc -l` (414317) is smaller tha > `git log --oneline v3.13 |wc -l` (414530). > The difference is 213. > > tags/v3.13-rc7~9^2~14^2~42 has 9 + 14 + 42 additional steps (65) > > tags/v3.13~5^2~4^2~2^2~1^2~42 has 5 + 4 + 2 + 1 +42 steps (54) > > tags/v3.13~5^2~4^2~2^2~1^2~42 has 9 less steps, but its base tag > has a higher weight by 213. > > v4.6-rc1 has even more weight (588477). > > So I guess what I propose is to take the weight of a tag into account > via `git log --oneline |wc -l` as that gives the tag which encloses > least history? > > We also do not want to have "a lot of side traversals", so we could > punish each additional addendum by a heuristic. These are essentially shooting for what Linus meant "something optimal", contrasting his "improved heuristics" version, and it is good that you are thinking about these issues. It may be a bit tricky to think the "optimum description" in terms of "how many commits are behind these tags", though. One thing that commonly happens is: (1) A fix X is developed on an ancient codebase, e.g. on top of v1.0.0 release. (2) X is first merged to the current 'master' and becomes part of the next release v2.0.0. (3) X is later merged to the maintenance track and included in v1.0.1. There are a few questions you would want to ask "describe --contains X" and depending on the reason why you are asking the question, the desired answer may be different: - In which release did the fix X first appear? (answer: v2.0.0) - What is the oldest release with the fix X?(answer: v1.0.1) I happen to be a maintainer who prefers to have a tidy containment relationships among his releases, and after the above three steps, there will be this: (4) Later v1.0.1 is merged back to 'master' branch and a tag v2.0.1 on the maintenance track for v2.0.x series would contain it. But not all projects are run that way. Often older maintenance tracks will never merge back to the current development track (i.e. fixes are only ported-back). If the v1.0.x codebase had an unrelated problem in the code that no longer exists in v2.0.x codebase, the maintenance track may have quite a many commits that exist only there and not in 'master', and when (3) happens, the "weight", i.e. the commits behind the tag, of v1.0.1 may be greater than the weight of v2.0.0 in such a project. In other words, an algorithm that uses "how many commits are behind the tag" as one of the deciding factor to name X would choose between v1.0.1 and v2.0.0 depending on what other things that have nothing to do with X happend on these parallel development tracks, which may not be desirable. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: history damage in linux.git
On Thu, Apr 21, 2016 at 12:27 PM, Junio C Hamano wrote: > Linus Torvalds writes: > >> But this patch is small and simple, and has some excuses for its >> behavior. What do people think? > > I like it that you call it "excuse" not "rationale", as I couldn't > form a logical connection between your "4 (2) letters" and "1 > (100)" at all ;-) Think of the distance number as a "order of magnitude in complexity", and it actually makes a certain amount of sense. It's not the same as the length of the string, but the "log()" of the distance number really does give a kind of complexity value. Think of it this way: if things are entirely linear (all just first parenthood), there will be just a single simple number, and the relationship between the simple distance number (that just increments by one for each parent traversed) and the length of the string that describes it will really be "log10(distance)". That's literally how many characters you need to describe the linear distance number. So a simple linear distance of 'n' commits will need on the order of 'log10(n)' digits to describe it (ie a number around a thousand will need around three digits). The "100" and "1" are just extending that notion of distance to the more complex cases., and expresses their complexity in the same logarithmic units. The same way you need four digits to express a _linear_ distance of 1, you need four characters to express that "~n^p" case of "merge parent p, n generations back". And if you don't have the generation thing, you only need two characters to express parent #'p': "^p". So two characters really *are* equivalent to ~100 linear steps, and four characters really *are* equivalent to ~1 linear steps. So it's not _just_ an excuse. There's an actual rationale for picking those numbers, and why they are equivalent in a complexity measure. Linus -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: history damage in linux.git
Linus Torvalds writes: > But this patch is small and simple, and has some excuses for its > behavior. What do people think? I like it that you call it "excuse" not "rationale", as I couldn't form a logical connection between your "4 (2) letters" and "1 (100)" at all ;-) Modulo the usual style issues (e.g. we frown upon patches in attachement that makes it harder to quote and comment), I think this is a strict improvement and is a good measure until somebody does a full "topologically closest" solution. > Linus > > builtin/name-rev.c | 16 ++-- > 1 file changed, 10 insertions(+), 6 deletions(-) > > diff --git a/builtin/name-rev.c b/builtin/name-rev.c > index 092e03c3cc9b..0354c8d222e1 100644 > --- a/builtin/name-rev.c > +++ b/builtin/name-rev.c > @@ -16,9 +16,6 @@ typedef struct rev_name { > > static long cutoff = LONG_MAX; > > -/* How many generations are maximally preferred over _one_ merge traversal? > */ > -#define MERGE_TRAVERSAL_WEIGHT 65535 > - > static void name_rev(struct commit *commit, > const char *tip_name, int generation, int distance, > int deref) > @@ -55,19 +52,26 @@ copy_data: > parents; > parents = parents->next, parent_number++) { > if (parent_number > 1) { > + int weight; > size_t len; > char *new_name; > > strip_suffix(tip_name, "^0", &len); > - if (generation > 0) > + > + // The extra merge traversal "weight" depends > + // on how complex the resulting name is. > + if (generation > 0) { > + weight = 1; > new_name = xstrfmt("%.*s~%d^%d", (int)len, > tip_name, > generation, parent_number); > - else > + } else { > + weight = 100; > new_name = xstrfmt("%.*s^%d", (int)len, > tip_name, > parent_number); > + } > > name_rev(parents->item, new_name, 0, > - distance + MERGE_TRAVERSAL_WEIGHT, 0); > + distance + weight, 0); > } else { > name_rev(parents->item, tip_name, generation + 1, > distance + 1, 0); -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: history damage in linux.git
On Thu, Apr 21, 2016 at 11:05 AM, Jeff King wrote: > > I actually think the best name for aed06b9 is probably: > > v3.13-rc1~65^2^2~42 Sounds likely. I don't know how to find that best match without a complete rewrite, though. My recent patch that got v3.13-rc2~32^2^2~47 comes close to that, and the complexity is similar but the numbers are actually smaller, so I guess my heuristic did indeed find a "simpler" name, but yes, the one based on 3.13-rc1 would definitely be the better one. > which I found by picking the oldest tag from "git tag --contains" and > plugging it into "git describe --match". Yeah, so you basically did the "let's figure out minimal inclusion" by hand. > Sadly, neither git's internal > version-sorting nor GNU's "sort -V" knows that "v1.0-rc1" comes before > "v1.0", so I had to rely on "--sort=taggerdate". I'm not seeing the "sadly". I think "--sort=taggerdate" is pretty much the only sane sort there is for tags, unless you do a true and full topological one (ie sort based on by how many commits that tag encompasses, but also by how each tag contains another tag). Linus -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: history damage in linux.git
On Thu, Apr 21, 2016 at 10:59:52AM -0700, Linus Torvalds wrote: > That said, I do think that a much bigger conceptual change that > actually does full traversal and be much more complicated might be the > only "correct" solution. > > So my patch is just a "improve heuristics" small fixlet rather than > something optimal. Yeah, I'd agree with both points. Unless somebody is planning to work on the bigger change in the near future, your patch is a strict improvement to the heuristic, and is worth applying. -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: history damage in linux.git
On Thu, Apr 21, 2016 at 10:23:10AM -0700, Linus Torvalds wrote: > > which is technically true, but kind of painful to read. It may be that a > > reasonable weight is somewhere between "1" and "65535", though. > > Based on my tests, the "right" number is somewhere in the 500-1000 > range for this particular case. But it's still a completely made up > number. Yeah, exactly. I think if we're going to tweak the weight heuristic it would be worth choosing a random sample of commits throughout history and seeing how they look with various weights. > > However, I think the more fundamental confusion with git-describe is > > that people expect the shortest distance to be the "first" tag that > > contained the commit, and that is clearly not true in a branchy history. > > Yeah. > > And I don't think people care *too* much, because I'm sure this has > happened before, it's just that before when it happened it wasn't > quite _so_ far off the expected path.. I think about once a year somebody complains to the list that git-describe chose a bad name. I don't know how many confused users it takes to muster one complain to the list, though. ;) > > I actually think most people would be happy with an algorithm more like: > > > > 1. Find the "oldest" tag (either by timestamp, or by version-sorting > > the tags) that contains the commit in question. > > Yes, we might want to base the "distance" at least partly on the age > of the base commits. I had actually meant my (1) and (2) to be part of the same algorithm. That is, to literally* do a two-pass check over the history, where first we find the "best" tag, and then compute the distance from that tag. The first concern trumps the latter completely. * Where "literally" only means that's the conceptual model. We probably could do it in one pass if we're clever, but it would behave as if we made the two passes. Another way to find the "oldest" tag that I didn't mention is to find all containing tags, and then eliminate any that contain another tag (similar to the way we cull duplicate merge bases). That gives you an answer based on the topology, which is more robust than timestamps or tag names. But it doesn't necessarily provide a single answer, so you'd still have to break ties with timestamps or name-sorting. > > 2. Find the "simplest" path from that tag to the commit, where we > > are striving mostly for shortness of explanation, not of path (so > > "~500" is way better than "~20^2~30^2~14", even though the latter > > is technically a shorter path). > > Well, so the three different paths I've seen are: > > - standard git (65536), and 1000: >aed06b9 tags/v4.6-rc1~9^2~792 > > - non-first-parent cost: 500: >aed06b9 tags/v3.13-rc7~9^2~14^2~42 > > - non-first parent cost: 1: >aed06b9 tags/v3.13~5^2~4^2~2^2~1^2~42 > > so there clearly are multiple valid answers. > > I would actually claim that the middle one is the best one - but I > claim that based on your algorithm case #1. The last one may be the > shortest actual path, but it's a shorter path to a newer tag that is a > superset of the older tag, so the middle one is actually not just > better based on age, but is a better choice based on "minimal actual > history". Yeah, I'd agree the middle one is the best one, because the other tags contain -rc7. Finding the best tag is much more important than the path distance, because that's the part that humans read and care about. The rest is mostly machine-readable to find the exact commit, so we want any path that's accurate, and not too cumbersome to look at or cut and paste (and obviously shorter path is better, too). I actually think the best name for aed06b9 is probably: v3.13-rc1~65^2^2~42 which I found by picking the oldest tag from "git tag --contains" and plugging it into "git describe --match". Sadly, neither git's internal version-sorting nor GNU's "sort -V" knows that "v1.0-rc1" comes before "v1.0", so I had to rely on "--sort=taggerdate". -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: history damage in linux.git
On Thu, Apr 21, 2016 at 10:43 AM, Linus Torvalds wrote: > > In other words, I'm trying to convince people that my patch not only > gives a good result, but that the "weight numbers" I use make some > kind of conceptual sense from a complexity cost angle. Basically, the patch approximates the numerical representation of the distance measure with the complexity of the suffix. It's not a *true* length of the suffix (which does heavily favor first-parent use, kind of like the current code), but I think it's better than the pretty random "+65535" that the current git code has. That number is clearly just completely made up. The new numbers at least have some kind of logic behind them. And the current code obviously does give really bad results. Picking the v4.6-rc1 tag as a base just because it happens to get a lot of first-parent traversals (800!) from one second-parent commit that is close to 4.6-rc1 is just obscene. So the more I look at my patch, the more I go "it's a real improvement on the current situation". That said, I do think that a much bigger conceptual change that actually does full traversal and be much more complicated might be the only "correct" solution. So my patch is just a "improve heuristics" small fixlet rather than something optimal. Linus -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: history damage in linux.git
On Thu, Apr 21, 2016 at 10:23 AM, Linus Torvalds wrote: > On Thu, Apr 21, 2016 at 10:08 AM, Jeff King wrote: >> >> Right, because it makes the names longer. We give the second-parent >> traversal a heuristic cost. If we drop that cost to "1", like: > > So I dropped it to 500 (removed the two last digits), and it gave a > reasonable answer. With 1000, it gave the same "based on 4.6" answer > as the current 65536 value does. > >> which is technically true, but kind of painful to read. It may be that a >> reasonable weight is somewhere between "1" and "65535", though. > > Based on my tests, the "right" number is somewhere in the 500-1000 > range for this particular case. But it's still a completely made up > number. > >> However, I think the more fundamental confusion with git-describe is >> that people expect the shortest distance to be the "first" tag that >> contained the commit, and that is clearly not true in a branchy history. > > Yeah. > > And I don't think people care *too* much, because I'm sure this has > happened before, it's just that before when it happened it wasn't > quite _so_ far off the expected path.. > >> I actually think most people would be happy with an algorithm more like: >> >> 1. Find the "oldest" tag (either by timestamp, or by version-sorting >> the tags) that contains the commit in question. > > Yes, we might want to base the "distance" at least partly on the age > of the base commits. > >> 2. Find the "simplest" path from that tag to the commit, where we >> are striving mostly for shortness of explanation, not of path (so >> "~500" is way better than "~20^2~30^2~14", even though the latter >> is technically a shorter path). > > Well, so the three different paths I've seen are: > > - standard git (65536), and 1000: >aed06b9 tags/v4.6-rc1~9^2~792 > > - non-first-parent cost: 500: >aed06b9 tags/v3.13-rc7~9^2~14^2~42 > > - non-first parent cost: 1: >aed06b9 tags/v3.13~5^2~4^2~2^2~1^2~42 > > so there clearly are multiple valid answers. > > I would actually claim that the middle one is the best one - but I > claim that based on your algorithm case #1. The last one may be the > shortest actual path, but it's a shorter path to a newer tag that is a > superset of the older tag, so the middle one is actually not just > better based on age, but is a better choice based on "minimal actual > history". > >Linus Combining Junios and Linus idea: * We want to have the minimal history, i.e. that tag with the fewest cummulative parent commits. (i.e. v3.13-rc7 is better than v3.13 because `git log --oneline v3.13-rc7 |wc -l` (414317) is smaller tha `git log --oneline v3.13 |wc -l` (414530). The difference is 213. tags/v3.13-rc7~9^2~14^2~42 has 9 + 14 + 42 additional steps (65) tags/v3.13~5^2~4^2~2^2~1^2~42 has 5 + 4 + 2 + 1 +42 steps (54) tags/v3.13~5^2~4^2~2^2~1^2~42 has 9 less steps, but its base tag has a higher weight by 213. v4.6-rc1 has even more weight (588477). So I guess what I propose is to take the weight of a tag into account via `git log --oneline |wc -l` as that gives the tag which encloses least history? We also do not want to have "a lot of side traversals", so we could punish each additional addendum by a heuristic. > -- > To unsubscribe from this list: send the line "unsubscribe git" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: history damage in linux.git
On Thu, Apr 21, 2016 at 10:23 AM, Junio C Hamano wrote: > > I think avoiding side branches to describe with the weight is a > right thing to do, i.e. if you have this history: > > X---o---o---o---o---v4.6 > \ / > o---o > > you do not want to explain X as "v4.6~^2~2", and instead you want it > as "v4.6~5", even though the former is 4 hops while the latter is 5 > hops (which is longer). Yes. I have a new suggestion: make the "merge weight" depend on how complex the name ends up being. And that algorithm actually gives a completely new and improved path: aed06b9 tags/v3.13-rc2~32^2^2~47 which is still complex, but is actually the best one I've found so far. To compare against the previous ones I looked at: v4.6-rc1~9^2~792<- current git code v3.13-rc2~32^2^2~47 <- new with attached patch v3.13-rc7~9^2~14^2~42 <- merge weight 500 v3.13~5^2~4^2~2^2~1^2~42 <- merge weight 1 that new one is actually the second-most dense, and uses the oldest tag. So it's a good combination of denseness, but still gets the best actual base tag. The logic of the patch is that there are different "complexities" in the naming, and it's not just whether you are following a second parent, it's also if you're doing generational hops. So when you do a first-parent change, the name stays simple (the last number changes), and that means that the "distance" weight is low (so that's the normal "+1" we do for first-parent. But if it's not a first parent, there are two different cases: - generation > 0: we add "~%d^%d", and we approximate that with "four characters", and use a cost that is four orders of magnitude higher (so 1). - generation == 0: we add just "^%d", so generally just two characters. So use a cost that is just two orders of magnitude higher (so 100). In other words, I'm trying to convince people that my patch not only gives a good result, but that the "weight numbers" I use make some kind of conceptual sense from a complexity cost angle. With that, the patch itself is attached. I think it's better than what "git name-rev" does now. Is it optimal? No, I think the *optimal* would use some kind of "does one tag contain the other" logic, and discarding all base names that are just supersets of another base that still reaches the target. But this patch is small and simple, and has some excuses for its behavior. What do people think? Linus builtin/name-rev.c | 16 ++-- 1 file changed, 10 insertions(+), 6 deletions(-) diff --git a/builtin/name-rev.c b/builtin/name-rev.c index 092e03c3cc9b..0354c8d222e1 100644 --- a/builtin/name-rev.c +++ b/builtin/name-rev.c @@ -16,9 +16,6 @@ typedef struct rev_name { static long cutoff = LONG_MAX; -/* How many generations are maximally preferred over _one_ merge traversal? */ -#define MERGE_TRAVERSAL_WEIGHT 65535 - static void name_rev(struct commit *commit, const char *tip_name, int generation, int distance, int deref) @@ -55,19 +52,26 @@ copy_data: parents; parents = parents->next, parent_number++) { if (parent_number > 1) { + int weight; size_t len; char *new_name; strip_suffix(tip_name, "^0", &len); - if (generation > 0) + + // The extra merge traversal "weight" depends + // on how complex the resulting name is. + if (generation > 0) { + weight = 1; new_name = xstrfmt("%.*s~%d^%d", (int)len, tip_name, generation, parent_number); - else + } else { + weight = 100; new_name = xstrfmt("%.*s^%d", (int)len, tip_name, parent_number); + } name_rev(parents->item, new_name, 0, - distance + MERGE_TRAVERSAL_WEIGHT, 0); + distance + weight, 0); } else { name_rev(parents->item, tip_name, generation + 1, distance + 1, 0);
Re: history damage in linux.git
Linus Torvalds writes: > On Thu, Apr 21, 2016 at 9:36 AM, Linus Torvalds > wrote: >> >> This seems to be a git bug. >> >> That commit aed06b9 can also be described as >> >> v3.13-rc7~9^2~14^2~42 >> >> so describing it as 'v4.6-rc1~9^2~792' is clearly not closer in any way. > > Hmm. I think I see what's up. The git distance function has a special > hack for preferring first-parent traversal, introduced long long ago > with commit ac076c29ae8d ("name-rev: Fix non-shortest description"). > > Changing that > > #define MERGE_TRAVERSAL_WEIGHT 65535 > > to be a smaller value makes git find the shorter path. > > I do not know what the correct fix is, though. I think avoiding side branches to describe with the weight is a right thing to do, i.e. if you have this history: X---o---o---o---o---v4.6 \ / o---o you do not want to explain X as "v4.6~^2~2", and instead you want it as "v4.6~5", even though the former is 4 hops while the latter is 5 hops (which is longer). But when comparing a name based on v4.6 (which I think the algorithm with the weight heuristics would choose v4.6~5) and another name based on v3.13, I suspect that we compare them with number of hops with the weight heuristics, and that is what gives us a wrong result, isn't it? I think it should instead compare the number of true hops. v3.13-rc7~9^2~14^2~42 = 9 + 1 + 14 + 1 + 42 = 67 hops from v3.13 v4.6-rc1~9^2~792 = 9 + 1 + 792 = 802 hops from v4.6-rc1 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: history damage in linux.git
On Thu, Apr 21, 2016 at 10:08 AM, Jeff King wrote: > > Right, because it makes the names longer. We give the second-parent > traversal a heuristic cost. If we drop that cost to "1", like: So I dropped it to 500 (removed the two last digits), and it gave a reasonable answer. With 1000, it gave the same "based on 4.6" answer as the current 65536 value does. > which is technically true, but kind of painful to read. It may be that a > reasonable weight is somewhere between "1" and "65535", though. Based on my tests, the "right" number is somewhere in the 500-1000 range for this particular case. But it's still a completely made up number. > However, I think the more fundamental confusion with git-describe is > that people expect the shortest distance to be the "first" tag that > contained the commit, and that is clearly not true in a branchy history. Yeah. And I don't think people care *too* much, because I'm sure this has happened before, it's just that before when it happened it wasn't quite _so_ far off the expected path.. > I actually think most people would be happy with an algorithm more like: > > 1. Find the "oldest" tag (either by timestamp, or by version-sorting > the tags) that contains the commit in question. Yes, we might want to base the "distance" at least partly on the age of the base commits. > 2. Find the "simplest" path from that tag to the commit, where we > are striving mostly for shortness of explanation, not of path (so > "~500" is way better than "~20^2~30^2~14", even though the latter > is technically a shorter path). Well, so the three different paths I've seen are: - standard git (65536), and 1000: aed06b9 tags/v4.6-rc1~9^2~792 - non-first-parent cost: 500: aed06b9 tags/v3.13-rc7~9^2~14^2~42 - non-first parent cost: 1: aed06b9 tags/v3.13~5^2~4^2~2^2~1^2~42 so there clearly are multiple valid answers. I would actually claim that the middle one is the best one - but I claim that based on your algorithm case #1. The last one may be the shortest actual path, but it's a shorter path to a newer tag that is a superset of the older tag, so the middle one is actually not just better based on age, but is a better choice based on "minimal actual history". Linus -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: history damage in linux.git
On Thu, Apr 21, 2016 at 09:59:13AM -0700, Junio C Hamano wrote: > Linus Torvalds writes: > > > That commit aed06b9 can also be described as > > > > v3.13-rc7~9^2~14^2~42 > > > > so describing it as 'v4.6-rc1~9^2~792' is clearly not closer in any way. > > I seem to recall that name-rev had an unexplained heuristics to > strongly avoid following second parent changes (I see two ^2 in the > path from 3.13-rc7 above). Right, because it makes the names longer. We give the second-parent traversal a heuristic cost. If we drop that cost to "1", like: diff --git a/builtin/name-rev.c b/builtin/name-rev.c index 092e03c..03be8be 100644 --- a/builtin/name-rev.c +++ b/builtin/name-rev.c @@ -17,7 +17,7 @@ typedef struct rev_name { static long cutoff = LONG_MAX; /* How many generations are maximally preferred over _one_ merge traversal? */ -#define MERGE_TRAVERSAL_WEIGHT 65535 +#define MERGE_TRAVERSAL_WEIGHT 1 static void name_rev(struct commit *commit, const char *tip_name, int generation, int distance, then this case gives: v3.13~5^2~4^2~2^2~1^2~42 which is technically true, but kind of painful to read. It may be that a reasonable weight is somewhere between "1" and "65535", though. However, I think the more fundamental confusion with git-describe is that people expect the shortest distance to be the "first" tag that contained the commit, and that is clearly not true in a branchy history. I actually think most people would be happy with an algorithm more like: 1. Find the "oldest" tag (either by timestamp, or by version-sorting the tags) that contains the commit in question. 2. Find the "simplest" path from that tag to the commit, where we are striving mostly for shortness of explanation, not of path (so "~500" is way better than "~20^2~30^2~14", even though the latter is technically a shorter path). -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: history damage in linux.git
On Thu, Apr 21, 2016 at 9:36 AM, Linus Torvalds wrote: > > This seems to be a git bug. > > That commit aed06b9 can also be described as > > v3.13-rc7~9^2~14^2~42 > > so describing it as 'v4.6-rc1~9^2~792' is clearly not closer in any way. Hmm. I think I see what's up. The git distance function has a special hack for preferring first-parent traversal, introduced long long ago with commit ac076c29ae8d ("name-rev: Fix non-shortest description"). Changing that #define MERGE_TRAVERSAL_WEIGHT 65535 to be a smaller value makes git find the shorter path. I do not know what the correct fix is, though. Linus -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: history damage in linux.git
Linus Torvalds writes: > That commit aed06b9 can also be described as > > v3.13-rc7~9^2~14^2~42 > > so describing it as 'v4.6-rc1~9^2~792' is clearly not closer in any way. I seem to recall that name-rev had an unexplained heuristics to strongly avoid following second parent changes (I see two ^2 in the path from 3.13-rc7 above). > However did git decide to use that further-away name? That looks odd. > > I'm wondering if it's because of the new root commit we have, and that > confuses the distance calculation somehow? That came in somewhat > recently (mid March) with the GPIO pull. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: history damage in linux.git
Olaf Hering writes: > On Thu, Apr 21, John Keeping wrote: > >> $ git tag --contains aed06b9cfcabf8644ac5f6f108c0b3d01522f88b > > Thanks for that, I did not know this variant. > > Unless git does not do it for me, I may hackup my script like that to > find the earlierst tag: "git tag" has a --sort key, so you can sort on dates, and "| head -n 1". See also "git for-each-ref" which is essentially a superset of what "git tag" does, and for-each-ref is plumbing, so safer to use in scripts (we have a strong tradition of backward compatibility for plumbing). This relies on the dates recorded in the commits, which may be wrong (typically if someone commited on a machine with an incorrect clock). But hopefully not. -- Matthieu Moy http://www-verimag.imag.fr/~moy/ -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: history damage in linux.git
On Thu, Apr 21, 2016 at 6:24 AM, Andreas Schwab wrote: > > The branches recently pulled by Linus contain commits with rather old > author dates, eg 8cad489261c564d4ee1db4de4ac365af56807d8a or > 281baf7a702693deaa45c98ef0c5161006b48257. That will probably cause git > describe --contains to take a different path through the history. Nothing in git should look at author dates (except for the obvious code that then shows the date). The committer date is in fact used for the traversal heuristics for some cases, but those are different and recent in the examples you note. This seems to be a git bug. That commit aed06b9 can also be described as v3.13-rc7~9^2~14^2~42 so describing it as 'v4.6-rc1~9^2~792' is clearly not closer in any way. However did git decide to use that further-away name? That looks odd. I'm wondering if it's because of the new root commit we have, and that confuses the distance calculation somehow? That came in somewhat recently (mid March) with the GPIO pull. Linus -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: history damage in linux.git
On Thu, Apr 21, John Keeping wrote: > $ git tag --contains aed06b9cfcabf8644ac5f6f108c0b3d01522f88b Thanks for that, I did not know this variant. Unless git does not do it for me, I may hackup my script like that to find the earlierst tag: for i in `git tag --contains aed06b9cfcabf8644ac5f6f108c0b3d01522f88b` do git log --max-count=1 --oneline --pretty=%ct:%h:$i $i | cat done | sort -n | sed 's@^.*:@@;q' Olaf -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: history damage in linux.git
Olaf Hering writes: > How can I find out whats going on? Is my git(1) 2.8.1 broken, or did > Linus just pull some junk tree (and does he continue to do so)? The branches recently pulled by Linus contain commits with rather old author dates, eg 8cad489261c564d4ee1db4de4ac365af56807d8a or 281baf7a702693deaa45c98ef0c5161006b48257. That will probably cause git describe --contains to take a different path through the history. Andreas. -- Andreas Schwab, sch...@linux-m68k.org GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 "And now for something completely different." -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: history damage in linux.git
On Thu, Apr 21, 2016 at 01:30:04PM +0200, Olaf Hering wrote: > To track the changes in hyperv related files I created some scripts > years ago to automate the process of finding relevant commits in > linux.git. Part of that process is to record the tag when a commit > appeared in mainline. This worked fine, until very recently. > > Suddenly years-old commits are declared as having-just-arrived in > linux.git. Look at this example: > > $ git log --oneline -- drivers/input/serio/hyperv-keyboard.c > 2048157 Drivers: hv: vmbus: fix the building warning with hyperv-keyboard > 62238f3 Input: hyperv-keyboard - register as a wakeup source > c3c4d99 Input: hyperv-keyboard - pass through 0xE1 prefix > aed06b9 Input: add a driver to support Hyper-V synthetic keyboard > $ git describe --contains aed06b9 > v4.6-rc1~9^2~792 > $ git show aed06b9 | head > commit aed06b9cfcabf8644ac5f6f108c0b3d01522f88b > Author: K. Y. Srinivasan > Date: Wed Sep 18 12:50:42 2013 -0700 > > Obviously that and other commits are in the tree since a very long time. > > How can I find out whats going on? Is my git(1) 2.8.1 broken, or did > Linus just pull some junk tree (and does he continue to do so)? I suspect it indicates that an old tree was pulled in such that the path to v4.6-rc1 is shorter than to the older version. The commit is clearly in v3.13-rc1: $ git tag --contains aed06b9cfcabf8644ac5f6f108c0b3d01522f88b v3.13 v3.13-rc1 v3.13-rc2 [snip] The behaviour of describe is a bit clearer if you limit it to v3.*: $ git describe --match='v3.*' --contains aed06b9cfcabf8644ac5f6f108c0b3d01522f88b v3.13-rc7~9^2~14^2~42 $ git describe --match='v3.13-rc1' --contains aed06b9cfcabf8644ac5f6f108c0b3d01522f88b v3.13-rc1~65^2^2~42 It seems that the path to v4.6-rc1 is "more direct" than to either of these commits: there is only one second-parent merge transition. >From a quick look, I think the problem is in commit c155c7492c9a ("Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input") which merges a branch that has repeatedly had master merged back into it but does not build on any recent releases. The most recent tag on the first-parent history of that branch is v3.0-rc4. I think it is as simple as git-describe (or git-name-rev which is used in the --contains case) preferring a less branchy path, which has been introduced in v4.6 with the merge commit above. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: history damage in linux.git
Olaf Hering writes: > On Thu, Apr 21, Matthieu Moy wrote: > >> My guess is that this commit has been sitting for a long time in a >> repo outside Linus' linux.git, and got merged only recently. > > Thats what it looks like. And thats what I'm complaining about. But in > fact that file is there since v3.13-rc7 (if the tag is really correct in > my patch file), since at least end of 2014. Ah, indeed. It was merged right before 3.13. See "git tag --contains aed06b9cfcab". It's indeed weird that "git describe --contains" gives a named based on tag v4.6-rc1, but it is not really incorrect since tags/v4.6-rc1~9^2~792 is indeed a correct name for aed06b9cfcabf8644ac5f6f108c0b3d01522f88b, and actually uses a tag that follows it. -- Matthieu Moy http://www-verimag.imag.fr/~moy/ -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: history damage in linux.git
On Thu, Apr 21, Matthieu Moy wrote: > My guess is that this commit has been sitting for a long time in a > repo outside Linus' linux.git, and got merged only recently. Thats what it looks like. And thats what I'm complaining about. But in fact that file is there since v3.13-rc7 (if the tag is really correct in my patch file), since at least end of 2014. And after some checking its in linux-3.13-rc1 from 2013-11-22, which indicates that such damage has already happend earlier. At least I have a record from Nov 2014 for the patch file. There is a slim chance that I tweaked that particular tag menually in the patch file. This is a list of changed tags in my patch collection: Date: Wed, 18 Sep 2013 12:50:42 -0700 -Patch-mainline: v3.13-rc7 +Patch-mainline: v4.6-rc1 Subject: Input: add a driver to support Hyper-V synthetic keyboard Git-commit: aed06b9cfcabf8644ac5f6f108c0b3d01522f88b Date: Sun, 12 Jan 2014 11:09:14 -0800 -Patch-mainline: v3.14 +Patch-mainline: v4.6-rc1 Subject: Input: hyperv-keyboard - pass through 0xE1 prefix Git-commit: c3c4d99485ea51cd354ed3cd955a8310703456b6 Date: Wed, 6 Aug 2014 13:33:54 -0700 -Patch-mainline: v3.17-rc1 +Patch-mainline: v4.6-rc1 Subject: Input: hyperv-keyboard - register as a wakeup source Git-commit: 62238f3aadc9bc56da70100e19ec61b9f8d72a5f Date: Mon, 30 Nov 2015 19:22:13 +0300 -Patch-mainline: v4.5-rc1 +Patch-mainline: v4.5-rc2 Subject: drivers/hv: replace enum hv_message_type by u32 Git-commit: 7797dcf63f11b6e1d34822daf2317223d0f4ad46 Date: Mon, 30 Nov 2015 19:22:14 +0300 -Patch-mainline: v4.5-rc1 +Patch-mainline: v4.5-rc2 Subject: drivers/hv: Move HV_SYNIC_STIMER_COUNT into Hyper-V UAPI x86 header Git-commit: 4f39bcfd1c132522380138a323f9af7902766301 Date: Mon, 30 Nov 2015 19:22:15 +0300 -Patch-mainline: v4.5-rc1 +Patch-mainline: v4.5-rc2 Subject: drivers/hv: Move struct hv_message into UAPI Hyper-V x86 header Git-commit: 5b423efe11e822e092e8c911a6bad17eadf718eb Date: Mon, 30 Nov 2015 19:22:16 +0300 -Patch-mainline: v4.5-rc1 +Patch-mainline: v4.5-rc2 Subject: drivers/hv: Move struct hv_timer_message_payload into UAPI Hyper-V x86 header Git-commit: c71acc4c74dddeeede69fdd4f0b1a124f9df Date: Mon, 30 Nov 2015 19:22:21 +0300 -Patch-mainline: v4.5-rc1 +Patch-mainline: v4.5-rc2 Subject: kvm/x86: Hyper-V SynIC timers Git-commit: 1f4b34f825e8cef6f493d06b46605384785b3d16 Date: Tue, 2 Feb 2016 11:45:02 +0800 -Patch-mainline: v4.6-rc1 +Patch-mainline: v4.6-rc2 Subject: x86/cpu: Convert printk(KERN_ ...) to pr_(...) Git-commit: 1b74dde7c47c19a73ea3e9fac95ac27b5d3d50c5 Olaf -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: history damage in linux.git
Olaf Hering writes: > To track the changes in hyperv related files I created some scripts > years ago to automate the process of finding relevant commits in > linux.git. Part of that process is to record the tag when a commit > appeared in mainline. This worked fine, until very recently. > > Suddenly years-old commits are declared as having-just-arrived in > linux.git. Look at this example: > > $ git log --oneline -- drivers/input/serio/hyperv-keyboard.c > 2048157 Drivers: hv: vmbus: fix the building warning with hyperv-keyboard > 62238f3 Input: hyperv-keyboard - register as a wakeup source > c3c4d99 Input: hyperv-keyboard - pass through 0xE1 prefix > aed06b9 Input: add a driver to support Hyper-V synthetic keyboard > $ git describe --contains aed06b9 > v4.6-rc1~9^2~792 > $ git show aed06b9 | head > commit aed06b9cfcabf8644ac5f6f108c0b3d01522f88b > Author: K. Y. Srinivasan > Date: Wed Sep 18 12:50:42 2013 -0700 > > Obviously that and other commits are in the tree since a very long time. I cannot take that conclusion from the excerpts above. What I can conclude is that the commit was made a long time ago (according to https://github.com/torvalds/linux/commit/aed06b9cfcabf8644ac5f6f108c0b3d01522f88b the commit was initially made on Sep 18 2013 by K. Y. Srinivasan, and then applied by Dmitry Torokhov the day after to "some" tree. My guess is that this commit has been sitting for a long time in a repo outside Linus' linux.git, and got merged only recently. -- Matthieu Moy http://www-verimag.imag.fr/~moy/ -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html