Re: history damage in linux.git

2016-04-22 Thread Johannes Schindelin
Hi Linus,

On Thu, 21 Apr 2016, Linus Torvalds wrote:

> On Thu, Apr 21, 2016 at 11:05 AM, Jeff King  wrote:
> >
> > Sadly, neither git's internal version-sorting nor GNU's "sort -V"
> > knows that "v1.0-rc1" comes before "v1.0", so I had to rely on
> > "--sort=taggerdate".
> 
> I'm not seeing the "sadly".
> 
> I think "--sort=taggerdate" is pretty much the only sane sort there is
> for tags, unless you do a true and full topological one (ie sort based
> on by how many commits that tag encompasses, but also by how each tag
> contains another tag).

Turns out it is pretty easy to implement this in name-rev. Expect the
patch in your inbox in a minute.

Ciao,
Dscho
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: history damage in linux.git

2016-04-21 Thread Junio C Hamano
Stefan Beller  writes:

> Combining Junios and Linus idea:
>
> * We want to have the minimal history, i.e. that tag with the fewest
> cummulative parent commits. (i.e. v3.13-rc7 is better than v3.13
> because `git log --oneline v3.13-rc7 |wc -l` (414317) is smaller tha
> `git log --oneline v3.13 |wc -l` (414530).
> The difference is 213.
>
> tags/v3.13-rc7~9^2~14^2~42 has 9 + 14 + 42 additional steps (65)
>
> tags/v3.13~5^2~4^2~2^2~1^2~42 has 5 + 4 + 2 + 1 +42 steps (54)
>
> tags/v3.13~5^2~4^2~2^2~1^2~42 has 9 less steps, but its base tag
> has a higher weight by 213.
>
> v4.6-rc1 has even more weight (588477).
>
> So I guess what I propose is to take the weight of a tag into account
> via `git log --oneline  |wc -l` as that gives the tag which encloses
> least history?
>
> We also do not want to have "a lot of side traversals", so we could
> punish each additional addendum by a heuristic.

These are essentially shooting for what Linus meant "something
optimal", contrasting his "improved heuristics" version, and it is
good that you are thinking about these issues.

It may be a bit tricky to think the "optimum description" in terms
of "how many commits are behind these tags", though.  One thing that
commonly happens is:

 (1) A fix X is developed on an ancient codebase, e.g. on top of
 v1.0.0 release.

 (2) X is first merged to the current 'master' and becomes part of
 the next release v2.0.0.

 (3) X is later merged to the maintenance track and included in
 v1.0.1.

There are a few questions you would want to ask "describe --contains
X" and depending on the reason why you are asking the question, the
desired answer may be different:

 - In which release did the fix X first appear?  (answer: v2.0.0)
 - What is the oldest release with the fix X?(answer: v1.0.1)

I happen to be a maintainer who prefers to have a tidy containment
relationships among his releases, and after the above three steps,
there will be this:

 (4) Later v1.0.1 is merged back to 'master' branch and a tag v2.0.1
 on the maintenance track for v2.0.x series would contain it.

But not all projects are run that way.  Often older maintenance
tracks will never merge back to the current development track
(i.e. fixes are only ported-back).

If the v1.0.x codebase had an unrelated problem in the code that no
longer exists in v2.0.x codebase, the maintenance track may have
quite a many commits that exist only there and not in 'master', and
when (3) happens, the "weight", i.e. the commits behind the tag, of
v1.0.1 may be greater than the weight of v2.0.0 in such a project.

In other words, an algorithm that uses "how many commits are behind
the tag" as one of the deciding factor to name X would choose
between v1.0.1 and v2.0.0 depending on what other things that have
nothing to do with X happend on these parallel development tracks,
which may not be desirable.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: history damage in linux.git

2016-04-21 Thread Linus Torvalds
On Thu, Apr 21, 2016 at 12:27 PM, Junio C Hamano  wrote:
> Linus Torvalds  writes:
>
>> But this patch is small and simple, and has some excuses for its
>> behavior. What do people think?
>
> I like it that you call it "excuse" not "rationale", as I couldn't
> form a logical connection between your "4 (2) letters" and "1
> (100)" at all ;-)

Think of the distance number as a "order of magnitude in complexity",
and it actually makes a certain amount of sense.

It's not the same as the length of the string, but the "log()" of the
distance number really does give a kind of complexity value.

Think of it this way: if things are entirely linear (all just first
parenthood), there will be just a single simple number, and the
relationship between the simple distance number (that just increments
by one for each parent traversed) and the length of the string that
describes it will really be "log10(distance)". That's literally how
many characters you need to describe the linear distance number.

So a simple linear distance of 'n' commits will need on the order of
'log10(n)' digits to describe it (ie a number around a thousand will
need around three digits).

The "100" and "1" are just extending that notion of distance to
the more complex cases., and expresses their complexity in the same
logarithmic units. The same way you need four digits to express a
_linear_ distance of 1, you need four characters to express that
"~n^p" case of "merge parent p, n generations back".

And if you don't have the generation thing, you only need two
characters to express parent #'p': "^p".

So two characters really *are* equivalent to ~100 linear steps, and
four characters really *are* equivalent to ~1 linear steps.

So it's not _just_ an excuse. There's an actual rationale for picking
those numbers, and why they are equivalent in a complexity measure.

 Linus
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: history damage in linux.git

2016-04-21 Thread Junio C Hamano
Linus Torvalds  writes:

> But this patch is small and simple, and has some excuses for its
> behavior. What do people think?

I like it that you call it "excuse" not "rationale", as I couldn't
form a logical connection between your "4 (2) letters" and "1
(100)" at all ;-)

Modulo the usual style issues (e.g. we frown upon patches in
attachement that makes it harder to quote and comment), I think this
is a strict improvement and is a good measure until somebody does a
full "topologically closest" solution.

>  Linus
>
>  builtin/name-rev.c | 16 ++--
>  1 file changed, 10 insertions(+), 6 deletions(-)
>
> diff --git a/builtin/name-rev.c b/builtin/name-rev.c
> index 092e03c3cc9b..0354c8d222e1 100644
> --- a/builtin/name-rev.c
> +++ b/builtin/name-rev.c
> @@ -16,9 +16,6 @@ typedef struct rev_name {
>  
>  static long cutoff = LONG_MAX;
>  
> -/* How many generations are maximally preferred over _one_ merge traversal? 
> */
> -#define MERGE_TRAVERSAL_WEIGHT 65535
> -
>  static void name_rev(struct commit *commit,
>   const char *tip_name, int generation, int distance,
>   int deref)
> @@ -55,19 +52,26 @@ copy_data:
>   parents;
>   parents = parents->next, parent_number++) {
>   if (parent_number > 1) {
> + int weight;
>   size_t len;
>   char *new_name;
>  
>   strip_suffix(tip_name, "^0", &len);
> - if (generation > 0)
> +
> + // The extra merge traversal "weight" depends
> + // on how complex the resulting name is.
> + if (generation > 0) {
> + weight = 1;
>   new_name = xstrfmt("%.*s~%d^%d", (int)len, 
> tip_name,
>  generation, parent_number);
> - else
> + } else {
> + weight = 100;
>   new_name = xstrfmt("%.*s^%d", (int)len, 
> tip_name,
>  parent_number);
> + }
>  
>   name_rev(parents->item, new_name, 0,
> - distance + MERGE_TRAVERSAL_WEIGHT, 0);
> + distance + weight, 0);
>   } else {
>   name_rev(parents->item, tip_name, generation + 1,
>   distance + 1, 0);
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: history damage in linux.git

2016-04-21 Thread Linus Torvalds
On Thu, Apr 21, 2016 at 11:05 AM, Jeff King  wrote:
>
> I actually think the best name for aed06b9 is probably:
>
>   v3.13-rc1~65^2^2~42

Sounds likely. I don't know how to find that best match without a
complete rewrite, though.

My recent patch that got

  v3.13-rc2~32^2^2~47

comes close to that, and the complexity is similar but the numbers are
actually smaller, so I guess my heuristic did indeed find a "simpler"
name, but yes, the one based on 3.13-rc1 would definitely be the
better one.

> which I found by picking the oldest tag from "git tag --contains" and
> plugging it into "git describe --match".

Yeah, so you basically did the "let's figure out minimal inclusion" by hand.

> Sadly, neither git's internal
> version-sorting nor GNU's "sort -V" knows that "v1.0-rc1" comes before
> "v1.0", so I had to rely on "--sort=taggerdate".

I'm not seeing the "sadly".

I think "--sort=taggerdate" is pretty much the only sane sort there is
for tags, unless you do a true and full topological one (ie sort based
on by how many commits that tag encompasses, but also by how each tag
contains another tag).

  Linus
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: history damage in linux.git

2016-04-21 Thread Jeff King
On Thu, Apr 21, 2016 at 10:59:52AM -0700, Linus Torvalds wrote:

> That said, I do think that a much bigger conceptual change that
> actually does full traversal and be much more complicated might be the
> only "correct" solution.
> 
> So my patch is just a "improve heuristics" small fixlet rather than
> something optimal.

Yeah, I'd agree with both points. Unless somebody is planning to work on
the bigger change in the near future, your patch is a strict improvement
to the heuristic, and is worth applying.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: history damage in linux.git

2016-04-21 Thread Jeff King
On Thu, Apr 21, 2016 at 10:23:10AM -0700, Linus Torvalds wrote:

> > which is technically true, but kind of painful to read. It may be that a
> > reasonable weight is somewhere between "1" and "65535", though.
> 
> Based on my tests, the "right" number is somewhere in the 500-1000
> range for this particular case. But it's still a completely made up
> number.

Yeah, exactly. I think if we're going to tweak the weight heuristic it
would be worth choosing a random sample of commits throughout history
and seeing how they look with various weights.

> > However, I think the more fundamental confusion with git-describe is
> > that people expect the shortest distance to be the "first" tag that
> > contained the commit, and that is clearly not true in a branchy history.
> 
> Yeah.
> 
> And I don't think people care *too* much, because I'm sure this has
> happened before, it's just that before when it happened it wasn't
> quite _so_ far off the expected path..

I think about once a year somebody complains to the list that
git-describe chose a bad name. I don't know how many confused users it
takes to muster one complain to the list, though. ;)

> > I actually think most people would be happy with an algorithm more like:
> >
> >   1. Find the "oldest" tag (either by timestamp, or by version-sorting
> >  the tags) that contains the commit in question.
> 
> Yes, we might want to base the "distance" at least partly on the age
> of the base commits.

I had actually meant my (1) and (2) to be part of the same algorithm.
That is, to literally* do a two-pass check over the history, where first
we find the "best" tag, and then compute the distance from that tag. The
first concern trumps the latter completely.

* Where "literally" only means that's the conceptual model. We probably
  could do it in one pass if we're clever, but it would behave as if
  we made the two passes.

Another way to find the "oldest" tag that I didn't mention is to find
all containing tags, and then eliminate any that contain another tag
(similar to the way we cull duplicate merge bases). That gives you an
answer based on the topology, which is more robust than timestamps or
tag names. But it doesn't necessarily provide a single answer, so you'd
still have to break ties with timestamps or name-sorting.

> >   2. Find the "simplest" path from that tag to the commit, where we
> >  are striving mostly for shortness of explanation, not of path (so
> >  "~500" is way better than "~20^2~30^2~14", even though the latter
> >  is technically a shorter path).
> 
> Well, so the three different paths I've seen are:
> 
>  - standard git (65536), and 1000:
>aed06b9 tags/v4.6-rc1~9^2~792
> 
>  - non-first-parent cost: 500:
>aed06b9 tags/v3.13-rc7~9^2~14^2~42
> 
>  - non-first parent cost: 1:
>aed06b9 tags/v3.13~5^2~4^2~2^2~1^2~42
> 
> so there clearly are multiple valid answers.
> 
> I would actually claim that the middle one is the best one - but I
> claim that based on your algorithm case #1. The last one may be the
> shortest actual path, but it's a shorter path to a newer tag that is a
> superset of the older tag, so the middle one is actually not just
> better based on age, but is a better choice based on "minimal actual
> history".

Yeah, I'd agree the middle one is the best one, because the other tags
contain -rc7. Finding the best tag is much more important than the path
distance, because that's the part that humans read and care about. The
rest is mostly machine-readable to find the exact commit, so we want any
path that's accurate, and not too cumbersome to look at or cut and
paste (and obviously shorter path is better, too).

I actually think the best name for aed06b9 is probably:

  v3.13-rc1~65^2^2~42

which I found by picking the oldest tag from "git tag --contains" and
plugging it into "git describe --match". Sadly, neither git's internal
version-sorting nor GNU's "sort -V" knows that "v1.0-rc1" comes before
"v1.0", so I had to rely on "--sort=taggerdate".

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: history damage in linux.git

2016-04-21 Thread Linus Torvalds
On Thu, Apr 21, 2016 at 10:43 AM, Linus Torvalds
 wrote:
>
> In other words, I'm trying to convince people that my patch not only
> gives a good result, but that the "weight numbers" I use make some
> kind of conceptual sense from a complexity cost angle.

Basically, the patch approximates the numerical representation of the
distance measure with the complexity of the suffix.

It's not a *true* length of the suffix (which does heavily favor
first-parent use, kind of like the current code), but I think it's
better than the pretty random "+65535" that the current git code has.
That number is clearly just completely made up. The new numbers at
least have some kind of logic behind them.

And the current code obviously does give really bad results. Picking
the v4.6-rc1 tag as a base just because it happens to get a lot of
first-parent traversals (800!) from one second-parent commit that is
close to 4.6-rc1 is just obscene.

So the more I look at my patch, the more I go "it's a real improvement
on the current situation".

That said, I do think that a much bigger conceptual change that
actually does full traversal and be much more complicated might be the
only "correct" solution.

So my patch is just a "improve heuristics" small fixlet rather than
something optimal.

 Linus
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: history damage in linux.git

2016-04-21 Thread Stefan Beller
On Thu, Apr 21, 2016 at 10:23 AM, Linus Torvalds
 wrote:
> On Thu, Apr 21, 2016 at 10:08 AM, Jeff King  wrote:
>>
>> Right, because it makes the names longer. We give the second-parent
>> traversal a heuristic cost. If we drop that cost to "1", like:
>
> So I dropped it to 500 (removed the two last digits), and it gave a
> reasonable answer. With 1000, it gave the same "based on 4.6" answer
> as the current 65536 value does.
>
>> which is technically true, but kind of painful to read. It may be that a
>> reasonable weight is somewhere between "1" and "65535", though.
>
> Based on my tests, the "right" number is somewhere in the 500-1000
> range for this particular case. But it's still a completely made up
> number.
>
>> However, I think the more fundamental confusion with git-describe is
>> that people expect the shortest distance to be the "first" tag that
>> contained the commit, and that is clearly not true in a branchy history.
>
> Yeah.
>
> And I don't think people care *too* much, because I'm sure this has
> happened before, it's just that before when it happened it wasn't
> quite _so_ far off the expected path..
>
>> I actually think most people would be happy with an algorithm more like:
>>
>>   1. Find the "oldest" tag (either by timestamp, or by version-sorting
>>  the tags) that contains the commit in question.
>
> Yes, we might want to base the "distance" at least partly on the age
> of the base commits.
>
>>   2. Find the "simplest" path from that tag to the commit, where we
>>  are striving mostly for shortness of explanation, not of path (so
>>  "~500" is way better than "~20^2~30^2~14", even though the latter
>>  is technically a shorter path).
>
> Well, so the three different paths I've seen are:
>
>  - standard git (65536), and 1000:
>aed06b9 tags/v4.6-rc1~9^2~792
>
>  - non-first-parent cost: 500:
>aed06b9 tags/v3.13-rc7~9^2~14^2~42
>
>  - non-first parent cost: 1:
>aed06b9 tags/v3.13~5^2~4^2~2^2~1^2~42
>
> so there clearly are multiple valid answers.
>
> I would actually claim that the middle one is the best one - but I
> claim that based on your algorithm case #1. The last one may be the
> shortest actual path, but it's a shorter path to a newer tag that is a
> superset of the older tag, so the middle one is actually not just
> better based on age, but is a better choice based on "minimal actual
> history".
>
>Linus

Combining Junios and Linus idea:

* We want to have the minimal history, i.e. that tag with the fewest
cummulative parent commits. (i.e. v3.13-rc7 is better than v3.13
because `git log --oneline v3.13-rc7 |wc -l` (414317) is smaller tha
`git log --oneline v3.13 |wc -l` (414530).
The difference is 213.

tags/v3.13-rc7~9^2~14^2~42 has 9 + 14 + 42 additional steps (65)

tags/v3.13~5^2~4^2~2^2~1^2~42 has 5 + 4 + 2 + 1 +42 steps (54)

tags/v3.13~5^2~4^2~2^2~1^2~42 has 9 less steps, but its base tag
has a higher weight by 213.

v4.6-rc1 has even more weight (588477).

So I guess what I propose is to take the weight of a tag into account
via `git log --oneline  |wc -l` as that gives the tag which encloses
least history?

We also do not want to have "a lot of side traversals", so we could
punish each additional addendum by a heuristic.


> --
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: history damage in linux.git

2016-04-21 Thread Linus Torvalds
On Thu, Apr 21, 2016 at 10:23 AM, Junio C Hamano  wrote:
>
> I think avoiding side branches to describe with the weight is a
> right thing to do, i.e. if you have this history:
>
> X---o---o---o---o---v4.6
>  \ /
>   o---o
>
> you do not want to explain X as "v4.6~^2~2", and instead you want it
> as "v4.6~5", even though the former is 4 hops while the latter is 5
> hops (which is longer).

Yes. I have a new suggestion: make the "merge weight" depend on how
complex the name ends up being.

And that algorithm actually gives a completely new and improved path:

   aed06b9 tags/v3.13-rc2~32^2^2~47

which is still complex, but is actually the best one I've found so far.

To compare against the previous ones I looked at:

   v4.6-rc1~9^2~792<- current git code
   v3.13-rc2~32^2^2~47 <- new with attached patch
   v3.13-rc7~9^2~14^2~42 <- merge weight 500
   v3.13~5^2~4^2~2^2~1^2~42   <- merge weight 1

that new one is actually the second-most dense, and uses the oldest
tag. So it's a good combination of denseness, but still gets the best
actual base tag.

The logic of the patch is that there are different "complexities" in
the naming, and it's not just whether you are following a second
parent, it's also if you're doing generational hops.

So when you do a first-parent change, the name stays simple (the last
number changes), and that means that the "distance" weight is low (so
that's the normal "+1" we do for first-parent.

But if it's not a first parent, there are two different cases:

 - generation > 0: we add "~%d^%d", and we approximate that with "four
characters", and use a cost that is four orders of magnitude higher
(so 1).

 - generation == 0: we add just "^%d", so generally just two
characters. So use a cost that is just two orders of magnitude higher
(so 100).

In other words, I'm trying to convince people that my patch not only
gives a good result, but that the "weight numbers" I use make some
kind of conceptual sense from a complexity cost angle.

With that, the patch itself is attached.

I think it's better than what "git name-rev" does now. Is it optimal?
No, I think the *optimal* would use some kind of "does one tag contain
the other" logic, and discarding all base names that are just
supersets of another base that still reaches the target.

But this patch is small and simple, and has some excuses for its
behavior. What do people think?

 Linus
 builtin/name-rev.c | 16 ++--
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/builtin/name-rev.c b/builtin/name-rev.c
index 092e03c3cc9b..0354c8d222e1 100644
--- a/builtin/name-rev.c
+++ b/builtin/name-rev.c
@@ -16,9 +16,6 @@ typedef struct rev_name {
 
 static long cutoff = LONG_MAX;
 
-/* How many generations are maximally preferred over _one_ merge traversal? */
-#define MERGE_TRAVERSAL_WEIGHT 65535
-
 static void name_rev(struct commit *commit,
const char *tip_name, int generation, int distance,
int deref)
@@ -55,19 +52,26 @@ copy_data:
parents;
parents = parents->next, parent_number++) {
if (parent_number > 1) {
+   int weight;
size_t len;
char *new_name;
 
strip_suffix(tip_name, "^0", &len);
-   if (generation > 0)
+
+   // The extra merge traversal "weight" depends
+   // on how complex the resulting name is.
+   if (generation > 0) {
+   weight = 1;
new_name = xstrfmt("%.*s~%d^%d", (int)len, 
tip_name,
   generation, parent_number);
-   else
+   } else {
+   weight = 100;
new_name = xstrfmt("%.*s^%d", (int)len, 
tip_name,
   parent_number);
+   }
 
name_rev(parents->item, new_name, 0,
-   distance + MERGE_TRAVERSAL_WEIGHT, 0);
+   distance + weight, 0);
} else {
name_rev(parents->item, tip_name, generation + 1,
distance + 1, 0);


Re: history damage in linux.git

2016-04-21 Thread Junio C Hamano
Linus Torvalds  writes:

> On Thu, Apr 21, 2016 at 9:36 AM, Linus Torvalds
>  wrote:
>>
>> This seems to be a git bug.
>>
>> That commit aed06b9 can also be described as
>>
>> v3.13-rc7~9^2~14^2~42
>>
>> so describing it as 'v4.6-rc1~9^2~792' is clearly not closer in any way.
>
> Hmm. I think I see what's up. The git distance function has a special
> hack for preferring first-parent traversal, introduced long long ago
> with commit ac076c29ae8d ("name-rev: Fix non-shortest description").
>
> Changing that
>
>   #define MERGE_TRAVERSAL_WEIGHT 65535
>
> to be a smaller value makes git find the shorter path.
>
> I do not know what the correct fix is, though.

I think avoiding side branches to describe with the weight is a
right thing to do, i.e. if you have this history:

X---o---o---o---o---v4.6
 \ /
  o---o

you do not want to explain X as "v4.6~^2~2", and instead you want it
as "v4.6~5", even though the former is 4 hops while the latter is 5
hops (which is longer).

But when comparing a name based on v4.6 (which I think the algorithm
with the weight heuristics would choose v4.6~5) and another name
based on v3.13, I suspect that we compare them with number of hops
with the weight heuristics, and that is what gives us a wrong
result, isn't it?

I think it should instead compare the number of true hops.

v3.13-rc7~9^2~14^2~42 = 9 + 1 + 14 + 1 + 42 = 67 hops from v3.13
v4.6-rc1~9^2~792 = 9 + 1 + 792 = 802 hops from v4.6-rc1

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: history damage in linux.git

2016-04-21 Thread Linus Torvalds
On Thu, Apr 21, 2016 at 10:08 AM, Jeff King  wrote:
>
> Right, because it makes the names longer. We give the second-parent
> traversal a heuristic cost. If we drop that cost to "1", like:

So I dropped it to 500 (removed the two last digits), and it gave a
reasonable answer. With 1000, it gave the same "based on 4.6" answer
as the current 65536 value does.

> which is technically true, but kind of painful to read. It may be that a
> reasonable weight is somewhere between "1" and "65535", though.

Based on my tests, the "right" number is somewhere in the 500-1000
range for this particular case. But it's still a completely made up
number.

> However, I think the more fundamental confusion with git-describe is
> that people expect the shortest distance to be the "first" tag that
> contained the commit, and that is clearly not true in a branchy history.

Yeah.

And I don't think people care *too* much, because I'm sure this has
happened before, it's just that before when it happened it wasn't
quite _so_ far off the expected path..

> I actually think most people would be happy with an algorithm more like:
>
>   1. Find the "oldest" tag (either by timestamp, or by version-sorting
>  the tags) that contains the commit in question.

Yes, we might want to base the "distance" at least partly on the age
of the base commits.

>   2. Find the "simplest" path from that tag to the commit, where we
>  are striving mostly for shortness of explanation, not of path (so
>  "~500" is way better than "~20^2~30^2~14", even though the latter
>  is technically a shorter path).

Well, so the three different paths I've seen are:

 - standard git (65536), and 1000:
   aed06b9 tags/v4.6-rc1~9^2~792

 - non-first-parent cost: 500:
   aed06b9 tags/v3.13-rc7~9^2~14^2~42

 - non-first parent cost: 1:
   aed06b9 tags/v3.13~5^2~4^2~2^2~1^2~42

so there clearly are multiple valid answers.

I would actually claim that the middle one is the best one - but I
claim that based on your algorithm case #1. The last one may be the
shortest actual path, but it's a shorter path to a newer tag that is a
superset of the older tag, so the middle one is actually not just
better based on age, but is a better choice based on "minimal actual
history".

   Linus
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: history damage in linux.git

2016-04-21 Thread Jeff King
On Thu, Apr 21, 2016 at 09:59:13AM -0700, Junio C Hamano wrote:

> Linus Torvalds  writes:
> 
> > That commit aed06b9 can also be described as
> >
> > v3.13-rc7~9^2~14^2~42
> >
> > so describing it as 'v4.6-rc1~9^2~792' is clearly not closer in any way.
> 
> I seem to recall that name-rev had an unexplained heuristics to
> strongly avoid following second parent changes (I see two ^2 in the
> path from 3.13-rc7 above).

Right, because it makes the names longer. We give the second-parent
traversal a heuristic cost. If we drop that cost to "1", like:

diff --git a/builtin/name-rev.c b/builtin/name-rev.c
index 092e03c..03be8be 100644
--- a/builtin/name-rev.c
+++ b/builtin/name-rev.c
@@ -17,7 +17,7 @@ typedef struct rev_name {
 static long cutoff = LONG_MAX;
 
 /* How many generations are maximally preferred over _one_ merge traversal? */
-#define MERGE_TRAVERSAL_WEIGHT 65535
+#define MERGE_TRAVERSAL_WEIGHT 1
 
 static void name_rev(struct commit *commit,
const char *tip_name, int generation, int distance,


then this case gives:

  v3.13~5^2~4^2~2^2~1^2~42

which is technically true, but kind of painful to read. It may be that a
reasonable weight is somewhere between "1" and "65535", though.

However, I think the more fundamental confusion with git-describe is
that people expect the shortest distance to be the "first" tag that
contained the commit, and that is clearly not true in a branchy history.

I actually think most people would be happy with an algorithm more like:

  1. Find the "oldest" tag (either by timestamp, or by version-sorting
 the tags) that contains the commit in question.

  2. Find the "simplest" path from that tag to the commit, where we
 are striving mostly for shortness of explanation, not of path (so
 "~500" is way better than "~20^2~30^2~14", even though the latter
 is technically a shorter path).

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: history damage in linux.git

2016-04-21 Thread Linus Torvalds
On Thu, Apr 21, 2016 at 9:36 AM, Linus Torvalds
 wrote:
>
> This seems to be a git bug.
>
> That commit aed06b9 can also be described as
>
> v3.13-rc7~9^2~14^2~42
>
> so describing it as 'v4.6-rc1~9^2~792' is clearly not closer in any way.

Hmm. I think I see what's up. The git distance function has a special
hack for preferring first-parent traversal, introduced long long ago
with commit ac076c29ae8d ("name-rev: Fix non-shortest description").

Changing that

  #define MERGE_TRAVERSAL_WEIGHT 65535

to be a smaller value makes git find the shorter path.

I do not know what the correct fix is, though.

  Linus
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: history damage in linux.git

2016-04-21 Thread Junio C Hamano
Linus Torvalds  writes:

> That commit aed06b9 can also be described as
>
> v3.13-rc7~9^2~14^2~42
>
> so describing it as 'v4.6-rc1~9^2~792' is clearly not closer in any way.

I seem to recall that name-rev had an unexplained heuristics to
strongly avoid following second parent changes (I see two ^2 in the
path from 3.13-rc7 above).

> However did git decide to use that further-away name? That looks odd.
>
> I'm wondering if it's because of the new root commit we have, and that
> confuses the distance calculation somehow? That came in somewhat
> recently (mid March) with the GPIO pull.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: history damage in linux.git

2016-04-21 Thread Matthieu Moy
Olaf Hering  writes:

> On Thu, Apr 21, John Keeping wrote:
>
>>  $ git tag --contains aed06b9cfcabf8644ac5f6f108c0b3d01522f88b
>
> Thanks for that, I did not know this variant.
>
> Unless git does not do it for me, I may hackup my script like that to
> find the earlierst tag:

"git tag" has a --sort key, so you can sort on dates, and "| head -n 1".
See also "git for-each-ref" which is essentially a superset of what "git
tag" does, and for-each-ref is plumbing, so safer to use in scripts (we
have a strong tradition of backward compatibility for plumbing).

This relies on the dates recorded in the commits, which may be wrong
(typically if someone commited on a machine with an incorrect clock).
But hopefully not.

-- 
Matthieu Moy
http://www-verimag.imag.fr/~moy/
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: history damage in linux.git

2016-04-21 Thread Linus Torvalds
On Thu, Apr 21, 2016 at 6:24 AM, Andreas Schwab  wrote:
>
> The branches recently pulled by Linus contain commits with rather old
> author dates, eg 8cad489261c564d4ee1db4de4ac365af56807d8a or
> 281baf7a702693deaa45c98ef0c5161006b48257.  That will probably cause git
> describe --contains to take a different path through the history.

Nothing in git should look at author dates (except for the obvious
code that then shows the date).

The committer date is in fact used for the traversal heuristics for
some cases, but those are different and recent in the examples you
note.

This seems to be a git bug.

That commit aed06b9 can also be described as

v3.13-rc7~9^2~14^2~42

so describing it as 'v4.6-rc1~9^2~792' is clearly not closer in any way.

However did git decide to use that further-away name? That looks odd.

I'm wondering if it's because of the new root commit we have, and that
confuses the distance calculation somehow? That came in somewhat
recently (mid March) with the GPIO pull.

   Linus
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: history damage in linux.git

2016-04-21 Thread Olaf Hering
On Thu, Apr 21, John Keeping wrote:

>   $ git tag --contains aed06b9cfcabf8644ac5f6f108c0b3d01522f88b

Thanks for that, I did not know this variant.

Unless git does not do it for me, I may hackup my script like that to
find the earlierst tag:

 for i in `git tag --contains aed06b9cfcabf8644ac5f6f108c0b3d01522f88b`
 do
  git log --max-count=1 --oneline --pretty=%ct:%h:$i $i | cat
 done | sort -n | sed 's@^.*:@@;q'

Olaf
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: history damage in linux.git

2016-04-21 Thread Andreas Schwab
Olaf Hering  writes:

> How can I find out whats going on? Is my git(1) 2.8.1 broken, or did
> Linus just pull some junk tree (and does he continue to do so)?

The branches recently pulled by Linus contain commits with rather old
author dates, eg 8cad489261c564d4ee1db4de4ac365af56807d8a or
281baf7a702693deaa45c98ef0c5161006b48257.  That will probably cause git
describe --contains to take a different path through the history.

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: history damage in linux.git

2016-04-21 Thread John Keeping
On Thu, Apr 21, 2016 at 01:30:04PM +0200, Olaf Hering wrote:
> To track the changes in hyperv related files I created some scripts
> years ago to automate the process of finding relevant commits in
> linux.git. Part of that process is to record the tag when a commit
> appeared in mainline. This worked fine, until very recently.
> 
> Suddenly years-old commits are declared as having-just-arrived in
> linux.git. Look at this example:
> 
>   $ git log --oneline -- drivers/input/serio/hyperv-keyboard.c
>   2048157 Drivers: hv: vmbus: fix the building warning with hyperv-keyboard
>   62238f3 Input: hyperv-keyboard - register as a wakeup source
>   c3c4d99 Input: hyperv-keyboard - pass through 0xE1 prefix
>   aed06b9 Input: add a driver to support Hyper-V synthetic keyboard
>   $ git describe --contains aed06b9
>   v4.6-rc1~9^2~792
>   $ git show aed06b9 | head
>   commit aed06b9cfcabf8644ac5f6f108c0b3d01522f88b
>   Author: K. Y. Srinivasan 
>   Date:   Wed Sep 18 12:50:42 2013 -0700
> 
> Obviously that and other commits are in the tree since a very long time.
> 
> How can I find out whats going on? Is my git(1) 2.8.1 broken, or did
> Linus just pull some junk tree (and does he continue to do so)?

I suspect it indicates that an old tree was pulled in such that the path
to v4.6-rc1 is shorter than to the older version.  The commit is clearly
in v3.13-rc1:

$ git tag --contains aed06b9cfcabf8644ac5f6f108c0b3d01522f88b
v3.13
v3.13-rc1
v3.13-rc2
[snip]

The behaviour of describe is a bit clearer if you limit it to v3.*:

$ git describe --match='v3.*' --contains 
aed06b9cfcabf8644ac5f6f108c0b3d01522f88b
v3.13-rc7~9^2~14^2~42

$ git describe --match='v3.13-rc1' --contains 
aed06b9cfcabf8644ac5f6f108c0b3d01522f88b
v3.13-rc1~65^2^2~42

It seems that the path to v4.6-rc1 is "more direct" than to either of
these commits: there is only one second-parent merge transition.

>From a quick look, I think the problem is in commit c155c7492c9a ("Merge
branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input")
which merges a branch that has repeatedly had master merged back into it
but does not build on any recent releases.  The most recent tag on the
first-parent history of that branch is v3.0-rc4.

I think it is as simple as git-describe (or git-name-rev which is used
in the --contains case) preferring a less branchy path, which has been
introduced in v4.6 with the merge commit above.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: history damage in linux.git

2016-04-21 Thread Matthieu Moy
Olaf Hering  writes:

> On Thu, Apr 21, Matthieu Moy wrote:
>
>> My guess is that this commit has been sitting for a long time in a
>> repo outside Linus' linux.git, and got merged only recently.
>
> Thats what it looks like. And thats what I'm complaining about. But in
> fact that file is there since v3.13-rc7 (if the tag is really correct in
> my patch file), since at least end of 2014.

Ah, indeed. It was merged right before 3.13. See "git tag --contains
aed06b9cfcab".

It's indeed weird that "git describe --contains" gives a named based on
tag v4.6-rc1, but it is not really incorrect since tags/v4.6-rc1~9^2~792
is indeed a correct name for aed06b9cfcabf8644ac5f6f108c0b3d01522f88b,
and actually uses a tag that follows it.

-- 
Matthieu Moy
http://www-verimag.imag.fr/~moy/
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: history damage in linux.git

2016-04-21 Thread Olaf Hering
On Thu, Apr 21, Matthieu Moy wrote:

> My guess is that this commit has been sitting for a long time in a
> repo outside Linus' linux.git, and got merged only recently.

Thats what it looks like. And thats what I'm complaining about. But in
fact that file is there since v3.13-rc7 (if the tag is really correct in
my patch file), since at least end of 2014.



And after some checking its in linux-3.13-rc1 from 2013-11-22, which
indicates that such damage has already happend earlier. At least I have
a record from Nov 2014 for the patch file. There is a slim chance that I
tweaked that particular tag menually in the patch file.

This is a list of changed tags in my patch collection:

 Date: Wed, 18 Sep 2013 12:50:42 -0700
-Patch-mainline: v3.13-rc7
+Patch-mainline: v4.6-rc1
 Subject: Input: add a driver to support Hyper-V synthetic keyboard
 Git-commit: aed06b9cfcabf8644ac5f6f108c0b3d01522f88b

 Date: Sun, 12 Jan 2014 11:09:14 -0800
-Patch-mainline: v3.14
+Patch-mainline: v4.6-rc1
 Subject: Input: hyperv-keyboard - pass through 0xE1 prefix
 Git-commit: c3c4d99485ea51cd354ed3cd955a8310703456b6

 Date: Wed, 6 Aug 2014 13:33:54 -0700
-Patch-mainline: v3.17-rc1
+Patch-mainline: v4.6-rc1
 Subject: Input: hyperv-keyboard - register as a wakeup source
 Git-commit: 62238f3aadc9bc56da70100e19ec61b9f8d72a5f

 Date: Mon, 30 Nov 2015 19:22:13 +0300
-Patch-mainline: v4.5-rc1
+Patch-mainline: v4.5-rc2
 Subject: drivers/hv: replace enum hv_message_type by u32
 Git-commit: 7797dcf63f11b6e1d34822daf2317223d0f4ad46

 Date: Mon, 30 Nov 2015 19:22:14 +0300
-Patch-mainline: v4.5-rc1
+Patch-mainline: v4.5-rc2
 Subject: drivers/hv: Move HV_SYNIC_STIMER_COUNT into Hyper-V UAPI x86 header
 Git-commit: 4f39bcfd1c132522380138a323f9af7902766301

 Date: Mon, 30 Nov 2015 19:22:15 +0300
-Patch-mainline: v4.5-rc1
+Patch-mainline: v4.5-rc2
 Subject: drivers/hv: Move struct hv_message into UAPI Hyper-V x86 header
 Git-commit: 5b423efe11e822e092e8c911a6bad17eadf718eb

 Date: Mon, 30 Nov 2015 19:22:16 +0300
-Patch-mainline: v4.5-rc1
+Patch-mainline: v4.5-rc2
 Subject: drivers/hv: Move struct hv_timer_message_payload into UAPI Hyper-V 
x86 header
 Git-commit: c71acc4c74dddeeede69fdd4f0b1a124f9df

 Date: Mon, 30 Nov 2015 19:22:21 +0300
-Patch-mainline: v4.5-rc1
+Patch-mainline: v4.5-rc2
 Subject: kvm/x86: Hyper-V SynIC timers
 Git-commit: 1f4b34f825e8cef6f493d06b46605384785b3d16

 Date: Tue, 2 Feb 2016 11:45:02 +0800
-Patch-mainline: v4.6-rc1
+Patch-mainline: v4.6-rc2
 Subject: x86/cpu: Convert printk(KERN_ ...) to pr_(...)
 Git-commit: 1b74dde7c47c19a73ea3e9fac95ac27b5d3d50c5

Olaf
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: history damage in linux.git

2016-04-21 Thread Matthieu Moy
Olaf Hering  writes:

> To track the changes in hyperv related files I created some scripts
> years ago to automate the process of finding relevant commits in
> linux.git. Part of that process is to record the tag when a commit
> appeared in mainline. This worked fine, until very recently.
>
> Suddenly years-old commits are declared as having-just-arrived in
> linux.git. Look at this example:
>
>   $ git log --oneline -- drivers/input/serio/hyperv-keyboard.c
>   2048157 Drivers: hv: vmbus: fix the building warning with hyperv-keyboard
>   62238f3 Input: hyperv-keyboard - register as a wakeup source
>   c3c4d99 Input: hyperv-keyboard - pass through 0xE1 prefix
>   aed06b9 Input: add a driver to support Hyper-V synthetic keyboard
>   $ git describe --contains aed06b9
>   v4.6-rc1~9^2~792
>   $ git show aed06b9 | head
>   commit aed06b9cfcabf8644ac5f6f108c0b3d01522f88b
>   Author: K. Y. Srinivasan 
>   Date:   Wed Sep 18 12:50:42 2013 -0700
>
> Obviously that and other commits are in the tree since a very long time.

I cannot take that conclusion from the excerpts above. What I can
conclude is that the commit was made a long time ago (according to
https://github.com/torvalds/linux/commit/aed06b9cfcabf8644ac5f6f108c0b3d01522f88b
the commit was initially made on Sep 18 2013 by K. Y. Srinivasan, and
then applied by Dmitry Torokhov the day after to "some" tree.

My guess is that this commit has been sitting for a long time in a
repo outside Linus' linux.git, and got merged only recently.

-- 
Matthieu Moy
http://www-verimag.imag.fr/~moy/
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html