[PATCH] tag: add -i and --introduced modifier for --contains

2014-04-16 Thread Luis R. Rodriguez
From: Luis R. Rodriguez mcg...@suse.com

Upstream Linux kernel commit c5905afb was introduced on v3.4 but
git describe --contains yields v3.5 while if we use git to look
for the first parent with git describe --first-parent yields
v3.3. The reason for this seems to be that the merge commit that
introduced c5905afb was based on v3.3. At least for --contains
its unclear to me why we get v3.5, the result is not intuitive,
as for --first-parent the issue is that the first parent actually
*is* v3.3. The easiest way to address this it to rely on on the
git tag --contains implmenetation and add a modifier that specifies
you want the tag that first introduced the specified commit.

mcgrof@ergon ~/linux (git::master)$ git tag -i --contains c5905afb
v3.4

mcgrof@ergon ~/linux (git::master)$ git tag --introduced --contains c5905afb
v3.4

Cc: Jiri Slaby jsl...@suse.cz
Cc: Andreas Schwab sch...@suse.de
Cc: Jan Kara j...@suse.cz
Signed-off-by: Luis R. Rodriguez mcg...@suse.com
---
 builtin/tag.c | 21 -
 1 file changed, 16 insertions(+), 5 deletions(-)

diff --git a/builtin/tag.c b/builtin/tag.c
index 6c7c6bd..65a939b 100644
--- a/builtin/tag.c
+++ b/builtin/tag.c
@@ -21,7 +21,7 @@
 static const char * const git_tag_usage[] = {
N_(git tag [-a|-s|-u key-id] [-f] [-m msg|-F file] tagname 
[head]),
N_(git tag -d tagname...),
-   N_(git tag -l [-n[num]] [--contains commit] [--points-at object] 

+   N_(git tag -l [-n[num]] [--contains commit] [ -i | --introduced 
--contains commit ] [--points-at object] 
\n\t\t[pattern...]),
N_(git tag -v tagname...),
NULL
@@ -195,13 +195,18 @@ static int sort_by_version(const void *a_, const void *b_)
 }
 
 static int list_tags(const char **patterns, int lines,
-struct commit_list *with_commit, int sort)
+struct commit_list *with_commit, int sort,
+int introduced)
 {
struct tag_filter filter;
 
filter.patterns = patterns;
filter.lines = lines;
-   filter.sort = sort;
+   if (introduced) {
+   sort = VERCMP_SORT;
+   filter.sort = sort;
+   } else
+   filter.sort = sort;
filter.with_commit = with_commit;
memset(filter.tags, 0, sizeof(filter.tags));
filter.tags.strdup_strings = 1;
@@ -216,8 +221,11 @@ static int list_tags(const char **patterns, int lines,
for (i = filter.tags.nr - 1; i = 0; i--)
printf(%s\n, filter.tags.items[i].string);
else
-   for (i = 0; i  filter.tags.nr; i++)
+   for (i = 0; i  filter.tags.nr; i++) {
printf(%s\n, filter.tags.items[i].string);
+   if (introduced)
+   break;
+   }
string_list_clear(filter.tags, 0);
}
return 0;
@@ -493,6 +501,7 @@ int cmd_tag(int argc, const char **argv, const char *prefix)
char *cleanup_arg = NULL;
int annotate = 0, force = 0, lines = -1;
int cmdmode = 0, sort = 0;
+   int introduced = 0;
const char *msgfile = NULL, *keyid = NULL;
struct msg_arg msg = { 0, STRBUF_INIT };
struct commit_list *with_commit = NULL;
@@ -511,6 +520,7 @@ int cmd_tag(int argc, const char **argv, const char *prefix)
 N_(tag message), parse_msg_arg),
OPT_FILENAME('F', file, msgfile, N_(read message from 
file)),
OPT_BOOL('s', sign, opt.sign, N_(annotated and GPG-signed 
tag)),
+   OPT_BOOL('i', introduced, introduced, N_(print the first 
tag that introduced the commit)),
OPT_STRING(0, cleanup, cleanup_arg, N_(mode),
N_(how to strip spaces and #comments from message)),
OPT_STRING('u', local-user, keyid, N_(key-id),
@@ -576,7 +586,8 @@ int cmd_tag(int argc, const char **argv, const char *prefix)
}
if (lines != -1  sort)
die(_(--sort and -n are incompatible));
-   ret = list_tags(argv, lines == -1 ? 0 : lines, with_commit, 
sort);
+   ret = list_tags(argv, lines == -1 ? 0 : lines, with_commit,
+   sort, introduced);
if (column_active(colopts))
stop_column_filter();
return ret;
-- 
1.9.0

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] tag: add -i and --introduced modifier for --contains

2014-04-16 Thread Luis R. Rodriguez
On Wed, Apr 16, 2014 at 3:02 PM, Junio C Hamano gits...@pobox.com wrote:
 Luis R. Rodriguez mcg...@do-not-panic.com writes:

 From: Luis R. Rodriguez mcg...@suse.com

 Upstream Linux kernel commit c5905afb was introduced on v3.4 but
 git describe --contains yields v3.5

 Actually, describe --contains should yield v3.5-rc1~120^3~76^2,
 not v3.5.

Yes, indeed thanks, sorry I should have been explicit.

 And you are right that the commit is contained in v3.4, so we also
 should be able to describe it as v3.4~479^2~9^2 as well.

That'd be swell :)

 And between v3.4 and v3.5-rc1, the latter is a closer anchor point
 for that commit (v3.5-rc1 only needs about 200 hops to reach the
 commit, while from v3.4 you would need close to 500 hops),

Ah! Thanks for explaining this mysterious puzzle to me. I'm a bit
perplexed why still. Can I trouble you for a little elaboration here?
How could one view from a commit merged on v3.4 possibly yield more
commits to v3.4 than to v3.5 ? Is it because it starts counting on the
merge's parent (v3.3) ?

 hence we
 end up picking the latter as a better answer.

 Now, with the explanation of how/why this happens behind us, I see
 two possible issues with this patch:

  - The reason a human-user rejects v3.5-rc1~120^3~76^2 as the
solution and favor v3.4~479^2~9^2 could be because of the -rc1
part in the answer.  Perhaps we would want an option that affects
which tags are to be used (and which tags are to be excluded) as
anchoring points?

I'd take an rc release as a blessed point too so not sure, and come to
think of it I'm not a bit perplexed why the results for my change did
not yield an rc1 as well.

  - If we are truly interested in finding out the earliest tag that
contains the given commit, shouldn't we be ignoring the tagname
and go with the tag with the oldest timestamp?  After all, there
may be a fix merged to v7.0 first on April 1st, and then on a
later date the same fix may be merged to the maintenance track to
be tagged as v6.9.1 on May 5th,

At least for Linux linux-3.X.y branches (one example linux-3.4.y) on
linux-stable has different commit IDs from patches cherry picked from
Linus' tree, and that patch just referneces the upstream commit from
Linus' tree on the commit log, but nothing more.

 and in such a case, wouldn't you  want to say that the fix first appeared on 
 v7.0 on April 1st,
 instead of on May 5th?

Sure, but I'd expect the folks maintaining v6.9.x would just refer to
the upstream commit ID from v7.0.

  Luis
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] tag: add -i and --introduced modifier for --contains

2014-04-18 Thread Luis R. Rodriguez
On Thu, Apr 17, 2014 at 10:04 AM, Junio C Hamano gits...@pobox.com wrote:
 Luis R. Rodriguez mcg...@do-not-panic.com writes:

 And between v3.4 and v3.5-rc1, the latter is a closer anchor point
 for that commit (v3.5-rc1 only needs about 200 hops to reach the
 commit, while from v3.4 you would need close to 500 hops),

 Ah! Thanks for explaining this mysterious puzzle to me. I'm a bit
 perplexed why still. Can I trouble you for a little elaboration here?

 Junio gives a great huge example

Phew! Thanks for the elaborate explanation, this makes perfect sense now!

 Now, as to what *SHOULD* happen, I think the above exercise shows us
 a way to define what the desired semantics is, without resorting to
 heuristics (e.g. which tag has older timestamp? or which tag's
 name sorts older under Linux version naming convention?).

I think ultimately this reveals that given that tags *can* be
arbitrary and subjective, and given that clocks can also pretty much
arbitrary 'git describe --contains' can and probably only should do
best effort (TM) and perhaps one thing to help is documenting this
issue well and provide a set of best practices that are supported for
tagging schemes. I can't describe how many libraries I've reviewed
about software versioning schemes and most of them support a huge
array of things, and funny enough the Linux versioning scheme, was not
supported well, for something so simple as versioning sort. This is
ultimately why I had to implement my own sort solution on rel-html. If
we agree on this we could just for example take on the Linux
versioning scheme as an emum and document that well both on code and a
wiki. More on this below.

With regards to timestamps: care must be taken given that we'd be
assuming that clocks are synchronized, this can likely yield incorrect
results on a distributed development environment with different time
zones, and it can also be easily cheated, which is why I was concerned
over using timestamps. Its still certainly something that can be
considered, but I've heard enough rants of a few maintainers about
crazy dates on patches which makes me believe this could actually be
an issue, specially if we speed up development and need higher degree
of resolution.

I know the above example but its perhaps worth mentioning how Linux
does not follow the above development model for merging stable fixes
or changes though, but it does not prevent folks from branching off of
older tags to do development which Linux will then pull. In Ingo's
case the issue then points then I think to another mild issue -- the
commit was developed on a v3.3 based tag, which is why 'git describe
--first-parent c5905afb' yields v3.3-rc1-41-gc5905af and not v3.4,
which *can also* be a bit perplexing if one does not understand the
above example you provided can be used for a development work flow for
code sent out to Linus. That said then, since we don't follow the
model you laid out it still reveals another issue, and I am not yet
sure I still understand why --contains yields a v3.5 tag in that case
since we ensured commits on v3.5 were already piled up on older
releases, or were being introduced newly on its own release. It smells
to me that the commit's first parent (which can be anything) is used
somehow here as a shortcut ?

This doesn't mean we can't use the work flow above for merging changes
from say a v3.4.x onto a v3.5 -- but we don't -- and perhaps as part
of the documentation about a scheme for Linux, we should advise
against such practices. In any case the closest thing I see we can use
upstream on Linux is 'git cherry-pick -x commit-id' but Greg doesn't
seem to use this and instead appends the commit with the respective
commit ID of the upstream gitsum. Both strategies yield different
commit IDs anyway, so neither practice should interrupt the 'git
describe --contains' practice. In the stable branches to find out when
a commit was introduced one would not rely on the commit ID on the
stable branch but instead of the commit ID of the 'upstream
reference'.

 Commit A can be described in terms of both v3.4 and v9.0,

And in the real example case, why *would* c5905afb' be be described in
terms of v3.5 instead of v3.4 ?

 and it may
 be closer to v9.0 than v3.4, and under that definition we pick the
 closest tag, the current describe --contains behaviour may be
 correct, but from the human point of view, it is *WRONG*.

Yeap, if a development work flow does not follow a strict pattern
(maybe a .git/config variable?) perhaps 'git describe --contains'
should spit out a the few tags it does have?

 It is wrong because v9.0 can reach v3.4.  So perhaps the rule should
 be updated to do something like:

 - find candidate tags that can be used to describe --contains
   the commit A, yielding v3.4, v3.5 (not shown), and v9.0;

Sure.


 - among the candidate tags, cull the ones that contain another
   candidate tag, rejecting v3.5 (not shown) and v9.0;

Sounds good to me but that seems

Re: [PATCH] tag: add -i and --introduced modifier for --contains

2014-04-21 Thread Luis R. Rodriguez
On Fri, Apr 18, 2014 at 4:36 PM, Junio C Hamano gits...@pobox.com wrote:
 Luis R. Rodriguez mcg...@do-not-panic.com writes:

 I think ultimately this reveals that given that tags *can* be
 arbitrary and subjective,...

 Yes; see the part at the bottom.

 Commit A can be described in terms of both v3.4 and v9.0,

 And in the real example case, why *would* c5905afb' be be described in
 terms of v3.5 instead of v3.4 ?

 I am not interested in graphing that particular history between v3.4
 and v3.5 myself.  If you are interested, I already gave you enough
 information on how to figure that out.

I was alluding to another possible issue here, my concern was that the
commit's parent (which is not really the point at which it was merged,
but rather where the topic got forked off to be worked on) could be
used for as reference points but clearly its not given the nature of
how name-rev was implemented. I still do see some possible issues with
it's parent on other commands (but I haven't studied the other's
implementation) that reveals some of my original concerns, but its
unclear if they are related. I also found that if we didn't want to
rely on dates or start defining naming convention we may want to
reconsider the name_rev() recursive implementation. I'll illustrate a
few results that might help to show my concerns for both other
commands perhaps using the parent erroneously, and a possible
alternative implementation for name_rev() or at the very least
contains.

[0] mcgrof@ergon ~/linux (git::master)$ git log c5905afb..v3.5| grep
^commit | wc -l
24878
[1] mcgrof@ergon ~/linux (git::master)$ git log c5905afb..v3.4| grep
^commit | wc -l
13106
[2] mcgrof@ergon ~/linux (git::master)$ git log c5905afb..v3.3| grep
^commit | wc -l
1360

Now that I revised name_rev.c I see the recursive nature of name_rev()
works top down from each tag down to each v* tag object and for each
actual commit pegs a name on it. How we rule out each tag under this
implementation is not that obvious to me, specially when results like
[0] and [1] reveal v3.4 should be 'shorter' in light of number of
commits. I see now how we don't update a commit's name if other
crucial information such as the ones discussed on this thread might be
important for the user, and I can see how this can help but an
alternative approach, which is what I expected to see implemented at
least for 'git describe --contains', would have been to see how many
commits are present from the commit's *merged* upstream parent (not
the actual parent as in c5905afb's commit case its v3.3 which is not
where it got merged). Getting the smallest number of commits under
this logic and stopping when we don't find any commits should yield us
the base tag under which the commit was merged, without any heuristics
on dates. This however applies to Linux though given that we don't
merge commits on stable branches but rather create new commits and
reference the upstream sha1sum, a practice which also solves the
problem Jeff pointed out.

The results for command [2] above however a bit surprising, I'd take a
look but I should go back to look at other stuff, figured I'd at least
bring it up now as it seems relevant.

 - find candidate tags that can be used to describe --contains
   the commit A, yielding v3.4, v3.5 (not shown), and v9.0;

 - among the candidate tags, cull the ones that contain another
   candidate tag, rejecting v3.5 (not shown) and v9.0;

 - among the surviving tags, pick the closest.

 Hmm?

 Sounds good to me!

 Not so fast ;-)

 My other message to Peff in response to his another example has an
 updated position on this.  Reject candidates that can reach other
 candidates is universally correct, but after that point, there are
 at least three but probably more options that suit preference of
 different people and project to break ties:

  - Your case that started this thread may want to favor v3.4 if only
because that v3.4 _sounds_ smaller than v4.0 (in Peff's example),
even when v3.4 and v4.0 do not have ancestry relationship.

  - The closest we have had is a heuristic to produce a result that
is textually shorter.

  - And as I alluded to, which one has the earliest timestamp?, is
another valid question to ask.

The first one above can be subjective if and only if the Linux
upstream model of dealing with stable branches is not followed. In
other words I think its a non issue if you create new commits on the
stable branches instead of merge stuff onto them. This however is
technical practice and I guess not everyone follows.

 And there may be more to appear.  A new command line option (and
 possibly a new configuration) to choose from these three (and more
 heuristics that will be added later) would be necessary.

Yeah this is rather complex, the resolutions to the issue in the ways
you've described seem reasonable to me but do wonder if this can be
simplified by reevaluating how the candidates are considered. You'd
know better

[PATCH] git.c: treat RUN_SETUP_GENTLY and RUN_SETUP as mutually exclusive

2014-04-21 Thread Luis R. Rodriguez
From: Luis R. Rodriguez mcg...@suse.com

This saves us a few branches when RUN_SETUP is set up.

Signed-off-by: Luis R. Rodriguez mcg...@suse.com
---
 git.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/git.c b/git.c
index 9efd1a3..7780572 100644
--- a/git.c
+++ b/git.c
@@ -290,7 +290,7 @@ static int run_builtin(struct cmd_struct *p, int argc, 
const char **argv)
if (!help) {
if (p-option  RUN_SETUP)
prefix = setup_git_directory();
-   if (p-option  RUN_SETUP_GENTLY) {
+   else if (p-option  RUN_SETUP_GENTLY) {
int nongit_ok;
prefix = setup_git_directory_gently(nongit_ok);
}
-- 
1.9.0

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


GNU diff and git diff - difference on myers algorithm?

2015-06-08 Thread Luis R. Rodriguez
Based on a cursory review of the git code I get the impression that
GNU diff and git 'diff' do not share any code for the possible diff
algorithms. I'm in particularly curious more about the default myers
algorithm. I can take time to do a precise code review of the
algorithms used on both GNU diff and git but if someone can already
vet for any differences that'd be appreciated as it would save time.

 Luis
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: GNU diff and git diff - difference on myers algorithm?

2015-06-12 Thread Luis R. Rodriguez
On Tue, Jun 9, 2015 at 1:25 AM, Johannes Schindelin
johannes.schinde...@gmx.de wrote:
 Hi Luis,

 On 2015-06-08 20:34, Luis R. Rodriguez wrote:
 Based on a cursory review of the git code I get the impression that
 GNU diff and git 'diff' do not share any code for the possible diff
 algorithms.

 Indeed, Git's diff machinery is based[*1*] ofn libxdiff[*2*], not on GNU diff.

Great thanks for the confirmation. Interesting point by Linus on that
commit that changed from forking to use GNU diff to libxdiff:

  generating a diff is not an exact
   science - you can get two different diffs (and you will), and they can
   both be perfectly valid. So it's not possible to validate the
   libxdiff output by just comparing it against GNU diff.

Indeed, simple example is using different starting context lines, or
different number of context lines. I do however wonder if under
certain parameters they *need* to be equal, or if this has been
studied exactly before.

 I'm in particularly curious more about the default myers
 algorithm.

 Are you looking for a freely available implementation of the Myers algorithm? 
 Or are you interested in understanding it?

I was trying to determine the above, of possibilities of differences.
Now granted there are the diffs using different output layouts
should differ, and as I note above if you modify the conext preference
you might end up with slightly different diffs, but I am also curious
to know if anyone has done research to see whether or not two hunks
which are slightly different are functionally equivalent. More on the
reasoning behind this below.

 Please note that Myers' algorithm is just one first step in most diff 
 implementations (and that other diff algorithms have become popular, in 
 particular because comparing strings can be accelerated by hashing the text 
 lines first, and those hashes can also be used to identify matching pairs of 
 unique lines, giving rise to yet another huge performance boost for typical 
 uses).

Awesome.

 The reason why Myers' algorithm is not sufficient for diff implementations is 
 that it only optimizes the edit distance, i.e. the amount of added/removed 
 lines, while patches should be readable, too, i.e. prefer *consecutive* edits 
 to disjunct ones.

Indeed, this is along the lines of what I am looking for but with some
other tweaks considered, more on this below.

 Just to mention one post-processing technique that is so useful that I 
 implemented it for Git[*3*]: the patience diff algorithm of Bram Cohen (of 
 BitTorrent fame) finds matching pairs of unique lines -- think of a function 
 from which another function is refactored, for example, intuitively you want 
 the diff to keep the signature of the original function as a context line.

Indeed.

 Disclaimer: While it is true that Gene and I shared an office for one month, 
 and that I am once again working in the same institute as he does, all my 
 knowledge about this algorithm stems from my reading his paper and 
 implementing the algorithm in Java for use in JGit[*3*].

:)

 I can take time to do a precise code review of the
 algorithms used on both GNU diff and git but if someone can already
 vet for any differences that'd be appreciated as it would save time.

 Again, I am curious what your goal is? I am sure I can support your quest 
 better when I understand what the purpose of this code review should be.

OK wells I'm curious about more research / effort when trying to
evaluate a diff with two seprate but adjoining preprocessor directives
and if anyone has implemented an optimizaiton option to let the diff
generator join them.

For example, to let it infer that:

--- a/test.c
+++ b/test.c
@@ -10,8 +10,6 @@ int main(int argc, char *argv[])

 #ifdef FOO
a = 4;
-#endif /* FOO */
-#ifdef FOO
a = 5;
 #endif /* FOO */

is possible.

 Luis
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: GNU diff and git diff - difference on myers algorithm?

2015-07-16 Thread Luis R. Rodriguez
On Fri, Jun 12, 2015 at 11:52 AM, Luis R. Rodriguez
mcg...@do-not-panic.com wrote:
 OK wells I'm curious about more research / effort when trying to
 evaluate a diff with two seprate but adjoining preprocessor directives
 and if anyone has implemented an optimizaiton option to let the diff
 generator join them.

 For example, to let it infer that:

 --- a/test.c
 +++ b/test.c
 @@ -10,8 +10,6 @@ int main(int argc, char *argv[])

  #ifdef FOO
 a = 4;
 -#endif /* FOO */
 -#ifdef FOO
 a = 5;
  #endif /* FOO */

 is possible.

Anyone familiar if any tool exists today that would optimize this? Is
anyone working on it? Would git be a good place for such a thing? I'd
consider it as an option to optimize a diff. This for example is
extremely useful for us working with Coccinelle where we have a tool
writing code for us, while such an optimization might be useful to
Coccinelle it would seem like a rather generic feature, its just not
clear to me where to give such a tool a proper home.

 Luis
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html