[PATCH] tag: add -i and --introduced modifier for --contains
From: Luis R. Rodriguez mcg...@suse.com Upstream Linux kernel commit c5905afb was introduced on v3.4 but git describe --contains yields v3.5 while if we use git to look for the first parent with git describe --first-parent yields v3.3. The reason for this seems to be that the merge commit that introduced c5905afb was based on v3.3. At least for --contains its unclear to me why we get v3.5, the result is not intuitive, as for --first-parent the issue is that the first parent actually *is* v3.3. The easiest way to address this it to rely on on the git tag --contains implmenetation and add a modifier that specifies you want the tag that first introduced the specified commit. mcgrof@ergon ~/linux (git::master)$ git tag -i --contains c5905afb v3.4 mcgrof@ergon ~/linux (git::master)$ git tag --introduced --contains c5905afb v3.4 Cc: Jiri Slaby jsl...@suse.cz Cc: Andreas Schwab sch...@suse.de Cc: Jan Kara j...@suse.cz Signed-off-by: Luis R. Rodriguez mcg...@suse.com --- builtin/tag.c | 21 - 1 file changed, 16 insertions(+), 5 deletions(-) diff --git a/builtin/tag.c b/builtin/tag.c index 6c7c6bd..65a939b 100644 --- a/builtin/tag.c +++ b/builtin/tag.c @@ -21,7 +21,7 @@ static const char * const git_tag_usage[] = { N_(git tag [-a|-s|-u key-id] [-f] [-m msg|-F file] tagname [head]), N_(git tag -d tagname...), - N_(git tag -l [-n[num]] [--contains commit] [--points-at object] + N_(git tag -l [-n[num]] [--contains commit] [ -i | --introduced --contains commit ] [--points-at object] \n\t\t[pattern...]), N_(git tag -v tagname...), NULL @@ -195,13 +195,18 @@ static int sort_by_version(const void *a_, const void *b_) } static int list_tags(const char **patterns, int lines, -struct commit_list *with_commit, int sort) +struct commit_list *with_commit, int sort, +int introduced) { struct tag_filter filter; filter.patterns = patterns; filter.lines = lines; - filter.sort = sort; + if (introduced) { + sort = VERCMP_SORT; + filter.sort = sort; + } else + filter.sort = sort; filter.with_commit = with_commit; memset(filter.tags, 0, sizeof(filter.tags)); filter.tags.strdup_strings = 1; @@ -216,8 +221,11 @@ static int list_tags(const char **patterns, int lines, for (i = filter.tags.nr - 1; i = 0; i--) printf(%s\n, filter.tags.items[i].string); else - for (i = 0; i filter.tags.nr; i++) + for (i = 0; i filter.tags.nr; i++) { printf(%s\n, filter.tags.items[i].string); + if (introduced) + break; + } string_list_clear(filter.tags, 0); } return 0; @@ -493,6 +501,7 @@ int cmd_tag(int argc, const char **argv, const char *prefix) char *cleanup_arg = NULL; int annotate = 0, force = 0, lines = -1; int cmdmode = 0, sort = 0; + int introduced = 0; const char *msgfile = NULL, *keyid = NULL; struct msg_arg msg = { 0, STRBUF_INIT }; struct commit_list *with_commit = NULL; @@ -511,6 +520,7 @@ int cmd_tag(int argc, const char **argv, const char *prefix) N_(tag message), parse_msg_arg), OPT_FILENAME('F', file, msgfile, N_(read message from file)), OPT_BOOL('s', sign, opt.sign, N_(annotated and GPG-signed tag)), + OPT_BOOL('i', introduced, introduced, N_(print the first tag that introduced the commit)), OPT_STRING(0, cleanup, cleanup_arg, N_(mode), N_(how to strip spaces and #comments from message)), OPT_STRING('u', local-user, keyid, N_(key-id), @@ -576,7 +586,8 @@ int cmd_tag(int argc, const char **argv, const char *prefix) } if (lines != -1 sort) die(_(--sort and -n are incompatible)); - ret = list_tags(argv, lines == -1 ? 0 : lines, with_commit, sort); + ret = list_tags(argv, lines == -1 ? 0 : lines, with_commit, + sort, introduced); if (column_active(colopts)) stop_column_filter(); return ret; -- 1.9.0 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] tag: add -i and --introduced modifier for --contains
On Wed, Apr 16, 2014 at 3:02 PM, Junio C Hamano gits...@pobox.com wrote: Luis R. Rodriguez mcg...@do-not-panic.com writes: From: Luis R. Rodriguez mcg...@suse.com Upstream Linux kernel commit c5905afb was introduced on v3.4 but git describe --contains yields v3.5 Actually, describe --contains should yield v3.5-rc1~120^3~76^2, not v3.5. Yes, indeed thanks, sorry I should have been explicit. And you are right that the commit is contained in v3.4, so we also should be able to describe it as v3.4~479^2~9^2 as well. That'd be swell :) And between v3.4 and v3.5-rc1, the latter is a closer anchor point for that commit (v3.5-rc1 only needs about 200 hops to reach the commit, while from v3.4 you would need close to 500 hops), Ah! Thanks for explaining this mysterious puzzle to me. I'm a bit perplexed why still. Can I trouble you for a little elaboration here? How could one view from a commit merged on v3.4 possibly yield more commits to v3.4 than to v3.5 ? Is it because it starts counting on the merge's parent (v3.3) ? hence we end up picking the latter as a better answer. Now, with the explanation of how/why this happens behind us, I see two possible issues with this patch: - The reason a human-user rejects v3.5-rc1~120^3~76^2 as the solution and favor v3.4~479^2~9^2 could be because of the -rc1 part in the answer. Perhaps we would want an option that affects which tags are to be used (and which tags are to be excluded) as anchoring points? I'd take an rc release as a blessed point too so not sure, and come to think of it I'm not a bit perplexed why the results for my change did not yield an rc1 as well. - If we are truly interested in finding out the earliest tag that contains the given commit, shouldn't we be ignoring the tagname and go with the tag with the oldest timestamp? After all, there may be a fix merged to v7.0 first on April 1st, and then on a later date the same fix may be merged to the maintenance track to be tagged as v6.9.1 on May 5th, At least for Linux linux-3.X.y branches (one example linux-3.4.y) on linux-stable has different commit IDs from patches cherry picked from Linus' tree, and that patch just referneces the upstream commit from Linus' tree on the commit log, but nothing more. and in such a case, wouldn't you want to say that the fix first appeared on v7.0 on April 1st, instead of on May 5th? Sure, but I'd expect the folks maintaining v6.9.x would just refer to the upstream commit ID from v7.0. Luis -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] tag: add -i and --introduced modifier for --contains
On Thu, Apr 17, 2014 at 10:04 AM, Junio C Hamano gits...@pobox.com wrote: Luis R. Rodriguez mcg...@do-not-panic.com writes: And between v3.4 and v3.5-rc1, the latter is a closer anchor point for that commit (v3.5-rc1 only needs about 200 hops to reach the commit, while from v3.4 you would need close to 500 hops), Ah! Thanks for explaining this mysterious puzzle to me. I'm a bit perplexed why still. Can I trouble you for a little elaboration here? Junio gives a great huge example Phew! Thanks for the elaborate explanation, this makes perfect sense now! Now, as to what *SHOULD* happen, I think the above exercise shows us a way to define what the desired semantics is, without resorting to heuristics (e.g. which tag has older timestamp? or which tag's name sorts older under Linux version naming convention?). I think ultimately this reveals that given that tags *can* be arbitrary and subjective, and given that clocks can also pretty much arbitrary 'git describe --contains' can and probably only should do best effort (TM) and perhaps one thing to help is documenting this issue well and provide a set of best practices that are supported for tagging schemes. I can't describe how many libraries I've reviewed about software versioning schemes and most of them support a huge array of things, and funny enough the Linux versioning scheme, was not supported well, for something so simple as versioning sort. This is ultimately why I had to implement my own sort solution on rel-html. If we agree on this we could just for example take on the Linux versioning scheme as an emum and document that well both on code and a wiki. More on this below. With regards to timestamps: care must be taken given that we'd be assuming that clocks are synchronized, this can likely yield incorrect results on a distributed development environment with different time zones, and it can also be easily cheated, which is why I was concerned over using timestamps. Its still certainly something that can be considered, but I've heard enough rants of a few maintainers about crazy dates on patches which makes me believe this could actually be an issue, specially if we speed up development and need higher degree of resolution. I know the above example but its perhaps worth mentioning how Linux does not follow the above development model for merging stable fixes or changes though, but it does not prevent folks from branching off of older tags to do development which Linux will then pull. In Ingo's case the issue then points then I think to another mild issue -- the commit was developed on a v3.3 based tag, which is why 'git describe --first-parent c5905afb' yields v3.3-rc1-41-gc5905af and not v3.4, which *can also* be a bit perplexing if one does not understand the above example you provided can be used for a development work flow for code sent out to Linus. That said then, since we don't follow the model you laid out it still reveals another issue, and I am not yet sure I still understand why --contains yields a v3.5 tag in that case since we ensured commits on v3.5 were already piled up on older releases, or were being introduced newly on its own release. It smells to me that the commit's first parent (which can be anything) is used somehow here as a shortcut ? This doesn't mean we can't use the work flow above for merging changes from say a v3.4.x onto a v3.5 -- but we don't -- and perhaps as part of the documentation about a scheme for Linux, we should advise against such practices. In any case the closest thing I see we can use upstream on Linux is 'git cherry-pick -x commit-id' but Greg doesn't seem to use this and instead appends the commit with the respective commit ID of the upstream gitsum. Both strategies yield different commit IDs anyway, so neither practice should interrupt the 'git describe --contains' practice. In the stable branches to find out when a commit was introduced one would not rely on the commit ID on the stable branch but instead of the commit ID of the 'upstream reference'. Commit A can be described in terms of both v3.4 and v9.0, And in the real example case, why *would* c5905afb' be be described in terms of v3.5 instead of v3.4 ? and it may be closer to v9.0 than v3.4, and under that definition we pick the closest tag, the current describe --contains behaviour may be correct, but from the human point of view, it is *WRONG*. Yeap, if a development work flow does not follow a strict pattern (maybe a .git/config variable?) perhaps 'git describe --contains' should spit out a the few tags it does have? It is wrong because v9.0 can reach v3.4. So perhaps the rule should be updated to do something like: - find candidate tags that can be used to describe --contains the commit A, yielding v3.4, v3.5 (not shown), and v9.0; Sure. - among the candidate tags, cull the ones that contain another candidate tag, rejecting v3.5 (not shown) and v9.0; Sounds good to me but that seems
Re: [PATCH] tag: add -i and --introduced modifier for --contains
On Fri, Apr 18, 2014 at 4:36 PM, Junio C Hamano gits...@pobox.com wrote: Luis R. Rodriguez mcg...@do-not-panic.com writes: I think ultimately this reveals that given that tags *can* be arbitrary and subjective,... Yes; see the part at the bottom. Commit A can be described in terms of both v3.4 and v9.0, And in the real example case, why *would* c5905afb' be be described in terms of v3.5 instead of v3.4 ? I am not interested in graphing that particular history between v3.4 and v3.5 myself. If you are interested, I already gave you enough information on how to figure that out. I was alluding to another possible issue here, my concern was that the commit's parent (which is not really the point at which it was merged, but rather where the topic got forked off to be worked on) could be used for as reference points but clearly its not given the nature of how name-rev was implemented. I still do see some possible issues with it's parent on other commands (but I haven't studied the other's implementation) that reveals some of my original concerns, but its unclear if they are related. I also found that if we didn't want to rely on dates or start defining naming convention we may want to reconsider the name_rev() recursive implementation. I'll illustrate a few results that might help to show my concerns for both other commands perhaps using the parent erroneously, and a possible alternative implementation for name_rev() or at the very least contains. [0] mcgrof@ergon ~/linux (git::master)$ git log c5905afb..v3.5| grep ^commit | wc -l 24878 [1] mcgrof@ergon ~/linux (git::master)$ git log c5905afb..v3.4| grep ^commit | wc -l 13106 [2] mcgrof@ergon ~/linux (git::master)$ git log c5905afb..v3.3| grep ^commit | wc -l 1360 Now that I revised name_rev.c I see the recursive nature of name_rev() works top down from each tag down to each v* tag object and for each actual commit pegs a name on it. How we rule out each tag under this implementation is not that obvious to me, specially when results like [0] and [1] reveal v3.4 should be 'shorter' in light of number of commits. I see now how we don't update a commit's name if other crucial information such as the ones discussed on this thread might be important for the user, and I can see how this can help but an alternative approach, which is what I expected to see implemented at least for 'git describe --contains', would have been to see how many commits are present from the commit's *merged* upstream parent (not the actual parent as in c5905afb's commit case its v3.3 which is not where it got merged). Getting the smallest number of commits under this logic and stopping when we don't find any commits should yield us the base tag under which the commit was merged, without any heuristics on dates. This however applies to Linux though given that we don't merge commits on stable branches but rather create new commits and reference the upstream sha1sum, a practice which also solves the problem Jeff pointed out. The results for command [2] above however a bit surprising, I'd take a look but I should go back to look at other stuff, figured I'd at least bring it up now as it seems relevant. - find candidate tags that can be used to describe --contains the commit A, yielding v3.4, v3.5 (not shown), and v9.0; - among the candidate tags, cull the ones that contain another candidate tag, rejecting v3.5 (not shown) and v9.0; - among the surviving tags, pick the closest. Hmm? Sounds good to me! Not so fast ;-) My other message to Peff in response to his another example has an updated position on this. Reject candidates that can reach other candidates is universally correct, but after that point, there are at least three but probably more options that suit preference of different people and project to break ties: - Your case that started this thread may want to favor v3.4 if only because that v3.4 _sounds_ smaller than v4.0 (in Peff's example), even when v3.4 and v4.0 do not have ancestry relationship. - The closest we have had is a heuristic to produce a result that is textually shorter. - And as I alluded to, which one has the earliest timestamp?, is another valid question to ask. The first one above can be subjective if and only if the Linux upstream model of dealing with stable branches is not followed. In other words I think its a non issue if you create new commits on the stable branches instead of merge stuff onto them. This however is technical practice and I guess not everyone follows. And there may be more to appear. A new command line option (and possibly a new configuration) to choose from these three (and more heuristics that will be added later) would be necessary. Yeah this is rather complex, the resolutions to the issue in the ways you've described seem reasonable to me but do wonder if this can be simplified by reevaluating how the candidates are considered. You'd know better
[PATCH] git.c: treat RUN_SETUP_GENTLY and RUN_SETUP as mutually exclusive
From: Luis R. Rodriguez mcg...@suse.com This saves us a few branches when RUN_SETUP is set up. Signed-off-by: Luis R. Rodriguez mcg...@suse.com --- git.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/git.c b/git.c index 9efd1a3..7780572 100644 --- a/git.c +++ b/git.c @@ -290,7 +290,7 @@ static int run_builtin(struct cmd_struct *p, int argc, const char **argv) if (!help) { if (p-option RUN_SETUP) prefix = setup_git_directory(); - if (p-option RUN_SETUP_GENTLY) { + else if (p-option RUN_SETUP_GENTLY) { int nongit_ok; prefix = setup_git_directory_gently(nongit_ok); } -- 1.9.0 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
GNU diff and git diff - difference on myers algorithm?
Based on a cursory review of the git code I get the impression that GNU diff and git 'diff' do not share any code for the possible diff algorithms. I'm in particularly curious more about the default myers algorithm. I can take time to do a precise code review of the algorithms used on both GNU diff and git but if someone can already vet for any differences that'd be appreciated as it would save time. Luis -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: GNU diff and git diff - difference on myers algorithm?
On Tue, Jun 9, 2015 at 1:25 AM, Johannes Schindelin johannes.schinde...@gmx.de wrote: Hi Luis, On 2015-06-08 20:34, Luis R. Rodriguez wrote: Based on a cursory review of the git code I get the impression that GNU diff and git 'diff' do not share any code for the possible diff algorithms. Indeed, Git's diff machinery is based[*1*] ofn libxdiff[*2*], not on GNU diff. Great thanks for the confirmation. Interesting point by Linus on that commit that changed from forking to use GNU diff to libxdiff: generating a diff is not an exact science - you can get two different diffs (and you will), and they can both be perfectly valid. So it's not possible to validate the libxdiff output by just comparing it against GNU diff. Indeed, simple example is using different starting context lines, or different number of context lines. I do however wonder if under certain parameters they *need* to be equal, or if this has been studied exactly before. I'm in particularly curious more about the default myers algorithm. Are you looking for a freely available implementation of the Myers algorithm? Or are you interested in understanding it? I was trying to determine the above, of possibilities of differences. Now granted there are the diffs using different output layouts should differ, and as I note above if you modify the conext preference you might end up with slightly different diffs, but I am also curious to know if anyone has done research to see whether or not two hunks which are slightly different are functionally equivalent. More on the reasoning behind this below. Please note that Myers' algorithm is just one first step in most diff implementations (and that other diff algorithms have become popular, in particular because comparing strings can be accelerated by hashing the text lines first, and those hashes can also be used to identify matching pairs of unique lines, giving rise to yet another huge performance boost for typical uses). Awesome. The reason why Myers' algorithm is not sufficient for diff implementations is that it only optimizes the edit distance, i.e. the amount of added/removed lines, while patches should be readable, too, i.e. prefer *consecutive* edits to disjunct ones. Indeed, this is along the lines of what I am looking for but with some other tweaks considered, more on this below. Just to mention one post-processing technique that is so useful that I implemented it for Git[*3*]: the patience diff algorithm of Bram Cohen (of BitTorrent fame) finds matching pairs of unique lines -- think of a function from which another function is refactored, for example, intuitively you want the diff to keep the signature of the original function as a context line. Indeed. Disclaimer: While it is true that Gene and I shared an office for one month, and that I am once again working in the same institute as he does, all my knowledge about this algorithm stems from my reading his paper and implementing the algorithm in Java for use in JGit[*3*]. :) I can take time to do a precise code review of the algorithms used on both GNU diff and git but if someone can already vet for any differences that'd be appreciated as it would save time. Again, I am curious what your goal is? I am sure I can support your quest better when I understand what the purpose of this code review should be. OK wells I'm curious about more research / effort when trying to evaluate a diff with two seprate but adjoining preprocessor directives and if anyone has implemented an optimizaiton option to let the diff generator join them. For example, to let it infer that: --- a/test.c +++ b/test.c @@ -10,8 +10,6 @@ int main(int argc, char *argv[]) #ifdef FOO a = 4; -#endif /* FOO */ -#ifdef FOO a = 5; #endif /* FOO */ is possible. Luis -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: GNU diff and git diff - difference on myers algorithm?
On Fri, Jun 12, 2015 at 11:52 AM, Luis R. Rodriguez mcg...@do-not-panic.com wrote: OK wells I'm curious about more research / effort when trying to evaluate a diff with two seprate but adjoining preprocessor directives and if anyone has implemented an optimizaiton option to let the diff generator join them. For example, to let it infer that: --- a/test.c +++ b/test.c @@ -10,8 +10,6 @@ int main(int argc, char *argv[]) #ifdef FOO a = 4; -#endif /* FOO */ -#ifdef FOO a = 5; #endif /* FOO */ is possible. Anyone familiar if any tool exists today that would optimize this? Is anyone working on it? Would git be a good place for such a thing? I'd consider it as an option to optimize a diff. This for example is extremely useful for us working with Coccinelle where we have a tool writing code for us, while such an optimization might be useful to Coccinelle it would seem like a rather generic feature, its just not clear to me where to give such a tool a proper home. Luis -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html