Re: [RFD] Long term plan with submodule refs?
On Fri, Nov 10, 2017 at 12:01 PM, Stefan Bellerwrote: >> Basically, a workflow where it's easier to have each submodule checked out at master, and we can still keep track of historical relationship of what commit was the submodule at some time ago, but without causing some of these headaches. >>> >>> So essentially a repo or otherwise parallel workflow just with the >>> versioning >>> happening magically behind your back? >> >> Ideally, my developers would like to just have each submodule checked >> out at master. >> >> Ideally, I'd like to be able to checkout an old version of the parent >> project and have it recorded what version of the shared submodule was >> at at the time. > > This sounds as if a "passive superproject" would work best for you, i.e. > each commit in a submodule is bubbled up into the superproject, > making a commit potentially even behind the scenes, such that the > user interaction with the superproject would be none. > > However this approach also sounds careless, as there is no precondition > that e.g. the superproject builds with all the submodules as is; it is a mere > tracking of "at this time we have the submodules arranged as such", > whereas for the versioning aspect, you would want to have commit messages > in the superproject saying *why* you bumped up a specific submodule. > The user may not like to give such an explanation as they already wrote > a commit message for the individual project. > > Also this approach sounds like a local approach, as it is not clear to me, > why you'd want to share the superproject history. > >> Ideally, my developers don't want to have to worry about knowing that >> they shouldn't "git add -a" or "git commit -a" when they have a >> submodule checked out at a different location from the parent projects >> gitlink. >> >> Thanks, >> Jake >> It doesn't need to be totally passive, in that some (or one maintainer) can manage when the submodule pointer is actually updated, but ideally other users don't have to worry about that and can "pretend" to always keep each submodule at master, as they have always done in the past. Thanks, Jake
Re: Unify annotated and non-annotated tags
On 11/11/2017 03:06, Junio C Hamano wrote: > Igor Djordjevicwrites: > >> If you would like to mimic output of "git show-ref", repeating >> commits for each tag pointing to it and showing full tag name as >> well, you could do something like this, for example: >> >> for tag in $(git for-each-ref --format="%(refname)" refs/tags) >> do >> printf '%s %s\n' "$(git rev-parse $tag^0)" "$tag" >> done >> >> >> Hope that helps a bit. > > If you use for-each-ref's --format option, you could do something > like (pardon a long line): > > git for-each-ref > --format='%(if)%(*objectname)%(then)%(*objectname)%(else)%(objectname)%(end) > %(refname)' refs/tags > > without any loop, I would think. ... and I did have a feeling it should be possible in a single Git command :P Thanks.
Re: use of PWD
Jeff Kingwrites: > So totally orthogonal to your bug, I wonder if we ought to be doing: > > diff --git a/sha1_file.c b/sha1_file.c > index 057262d46e..0b76233aa7 100644 > --- a/sha1_file.c > +++ b/sha1_file.c > @@ -530,11 +530,11 @@ void prepare_alt_odb(void) > if (alt_odb_tail) > return; > > - alt = getenv(ALTERNATE_DB_ENVIRONMENT); > - if (!alt) alt = ""; > - > alt_odb_tail = _odb_list; > - link_alt_odb_entries(alt, strlen(alt), PATH_SEP, NULL, 0); > + > + alt = getenv(ALTERNATE_DB_ENVIRONMENT); > + if (alt) > + link_alt_odb_entries(alt, strlen(alt), PATH_SEP, NULL, 0); > > read_info_alternates(get_object_directory(), 0); > } > > to avoid hitting link_alt_odb_entries() at all when there are no > entries. Sounds sane.
Re: Unify annotated and non-annotated tags
Igor Djordjevicwrites: > If you would like to mimic output of "git show-ref", repeating > commits for each tag pointing to it and showing full tag name as > well, you could do something like this, for example: > > for tag in $(git for-each-ref --format="%(refname)" refs/tags) > do > printf '%s %s\n' "$(git rev-parse $tag^0)" "$tag" > done > > > Hope that helps a bit. If you use for-each-ref's --format option, you could do something like (pardon a long line): git for-each-ref --format='%(if)%(*objectname)%(then)%(*objectname)%(else)%(objectname)%(end) %(refname)' refs/tags without any loop, I would think.
Hello,
-- Good day! What is the best way to reach you for business? I am writing you because I have an opportunity to present to you. I have a business that I would like to discuss with you. Waiting to read from you soon. Thank you
Re: Unify annotated and non-annotated tags
Hi Anatoly, On 10/11/2017 11:58, anatoly techtonik wrote: > It is hard to work with Git tags, because on low level hash > of non-annotated tag is pointing to commit, but hash for > annotated tag is pointing to tag metadata. > > On low level that means that there is no way to get commit > hash from tag in a single step. If tag is annotated, you need > to find and parse ^{} string of show-ref, if not, then look for > string without ^{}. That is not quite true, as you can always dereference any tag (annotated or not) using "^0" notation, see git-rev-parse[1]: "As a special rule, ^0 means the commit itself and is used when is the object name of a tag object that refers to a commit object." > So, why not just make all tags work the same so that every > tag has its own hash and you need to dereference it in the > same way to get commit hash? > > This way I could get all commit hashes with just: > > git show-ref --tags -d | grep "\^{}" > > or abandon ^{} completely and show commit hashes on -d: > > git show-ref --tags --dereference > Depending on what you would _exactly_ like to do, you could get all tagged commit hashes like this: git rev-list --tags --no-walk Note that each commit will be listed only once, even if more tags (annotated or not) point to it. If you would like to mimic output of "git show-ref", repeating commits for each tag pointing to it and showing full tag name as well, you could do something like this, for example: for tag in $(git for-each-ref --format="%(refname)" refs/tags) do printf '%s %s\n' "$(git rev-parse $tag^0)" "$tag" done Hope that helps a bit. Regards, Buga [1] https://git-scm.com/docs/git-rev-parse#git-rev-parse-emltrevgtemegemHEADv1510em
Re: What's cooking in git.git (Nov 2017, #03; Wed, 8)
On Wed, Nov 08, 2017 at 02:50:24PM +0900, Junio C Hamano wrote: > * bc/submitting-patches-in-asciidoc (2017-10-30) 2 commits > - Documentation: convert SubmittingPatches to AsciiDoc > - Documentation: enable compat-mode for Asciidoctor > > The SubmittingPatches document has been converted to produce an > HTML version via AsciiDoc/Asciidoctor. > > Any further comments? Otherwise, will merge to 'next'. I think you had wanted some changes, so let me send out a fixed series. > * bc/hash-algo (2017-10-30) 4 commits > - Switch empty tree and blob lookups to use hash abstraction > - Integrate hash algorithm support with repo setup > - Add structure representing hash algorithm > - setup: expose enumerated repo info > > An infrastructure to define what hash function is used in Git is > introduced, and an effort to plumb that throughout various > codepaths has been started. > > cf. <20171028181239.59458-1-sand...@crustytoothpaste.net> I'll be rerolling this as well. -- brian m. carlson / brian with sandals: Houston, Texas, US https://www.crustytoothpaste.net/~bmc | My opinion only OpenPGP: https://keybase.io/bk2204 signature.asc Description: PGP signature
[PATCH] t/3512: demonstrate unrelated submodule/file conflict as cherry-pick failure
Signed-off-by: Stefan Beller--- I rewrote your script to integrate with our test suite, potentially acting as a regression test once we solve the Directory/File conflict issue. Thanks, Stefan t/t3512-cherry-pick-submodule.sh | 36 1 file changed, 36 insertions(+) diff --git a/t/t3512-cherry-pick-submodule.sh b/t/t3512-cherry-pick-submodule.sh index 6863b7bb6f..17a6773247 100755 --- a/t/t3512-cherry-pick-submodule.sh +++ b/t/t3512-cherry-pick-submodule.sh @@ -10,4 +10,40 @@ KNOWN_FAILURE_NOFF_MERGE_DOESNT_CREATE_EMPTY_SUBMODULE_DIR=1 KNOWN_FAILURE_NOFF_MERGE_ATTEMPTS_TO_MERGE_REMOVED_SUBMODULE_FILES=1 test_submodule_switch "git cherry-pick" +test_expect_failure 'unrelated submodule/file conflict is ignored' ' + test_create_repo sub && + + touch sub/file && + git -C sub add file && + git -C sub commit -m "add a file in a submodule" && + + test_create_repo a_repo && + ( + cd a_repo && + touch a_file && + git add a_file && + git commit -m "add a file" && + + git branch test && + git checkout test && + + mkdir sub && + touch sub/content && + git add sub/content && + git commit -m "add a regular folder with name sub" && + + echo "123" >a_file && + git add a_file && + git commit -m "modify a file" && + + git checkout master && + + git submodule add ../sub sub && + git submodule update sub && + git commit -m "add a submodule info folder with name sub" && + + git cherry-pick test + ) +' + test_done -- 2.15.0.128.gcadd42da22
Re: [PATCH 2/4] Remove silent clamp of renameLimit
On Fri, Nov 10, 2017 at 10:36:17AM -0800, Elijah Newren wrote: > Thanks for taking a look! > > On Fri, Nov 10, 2017 at 10:26 AM, Stefan Bellerwrote: > > From a technical perspective, I would think that if > > (num_create <= rename_limit || num_src <= rename_limit) > > holds true, that the double-cast condition would also be always true? > > Could we just remove that last check? > > Not necessarily. For example, if num_create = rename_limit-1 and > num_src = rename_limit+2, then the first condition will be satisfied > but the second won't. If it was && rather than ||, then the second > condition would be superfluous. > > > Or phrased differently, if we can cast to double and extend the check > > here, do we have to adapt code at other places as well? > > Good point, and yes. Perhaps I should have re-ordered my patch series > because I came back to it later and realized that the progress code > was broken due to overflow/wraparound, and a patch 3 fixed that. > > Further, the later patch used uint64_t instead of double. While > double works, perhaps I should change the double here to uint64_t for > consistency? I'm wondering if maybe you want to use size_t. If you end up using an unsigned type, you might be able to leverage unsigned_mult_overflows to avoid having to write this by hand. -- brian m. carlson / brian with sandals: Houston, Texas, US https://www.crustytoothpaste.net/~bmc | My opinion only OpenPGP: https://keybase.io/bk2204 signature.asc Description: PGP signature
Re: cherry-pick very slow on big repository
On Fri, Nov 10, 2017 at 6:05 AM, Peter Kreftingwrote: > Derrick Stolee: > >> Git is spending time detecting renames, which implies you probably renamed >> a folder or added and deleted a large number of files. This rename detection >> is quadratic (# adds times # deletes). > > Yes, a couple of directories with a lot of template files have been renamed > (and some removed, some added) between the current development branch and > this old maintenance branch. I get the "Performing inexact rename detection" > a lot when merging changes in the other direction. > > However, none of them applies to these particular commits, which only > touches files that are in the exact same location on both branches. I would be very interested to hear how my rename detection performance patches work for you; this kind of usecase was the exact one it was designed to help the most. See https://public-inbox.org/git/20171110222156.23221-1-new...@gmail.com/
Re: [PATCH 00/30] Add directory rename detection to git
On Fri, Nov 10, 2017 at 2:27 PM, Philip Oakleywrote: > From: "Elijah Newren" >> >> In this patchset, I introduce directory rename detection to >> merge-recursive, >> predominantly so that when files are added to directories on one side of >> history and those directories are renamed on the other side of history, >> the >> files will end up in the proper location after a merge or cherry-pick. >> >> However, this isn't limited to that simplistic case. More interesting >> possibilities exist, such as: >> >> * a file being renamed into a directory which is renamed on the other >>side of history, causing the need for a transitive rename. >> > > How does this cope with the case insensitive case preserving file systems on > Mac and Windows, esp when core.ignorecase is true. If it's a bigger problem > that the series already covers, would the likely changes be reasonably > localised? > > This came up recently on GfW for `git checkout` of a branch where the case > changed ("Test" <-> "test"), but git didn't notice that it needed to rename > the directories on such an file system. > https://github.com/git-for-windows/git/issues/1333 I wasn't aware there were problems with git on case insensitive case preserving filesystems; fixing them wasn't something I had in mind when writing this series. However, the particular bug you mention is actually completely orthogonal to this series; it talks about git-checkout without the -m/--merge option, which doesn't touch any code path I modified in my series, so my series can't really fix or worsen that particular issue. But, if there are further issues with such filesystems that also affect merges/cherry-picks/rebases, then I don't think my series will either help or hurt there either. The recursive merge machinery already has remove_file() and update_file() wrappers that it uses whenever it needs to remove/add/update a file in the working directory and/or index, and I have simply continued using those, so the number of places you'd need to modify to fix issues would remain just as localized as before. Also, I continue to depend on the reading of the index & trees that unpack_trees() does, which I haven't modified, so again it'd be the same number of places that someone would need to fix. (However, the whole design to have unpack_trees() do the initial work and then have recursive merge try to "fix it up" is really starting to strain. I'm starting to think, again, that merge recursive needs a redesign, and have some arguments I wanted to float out there...but I've dumped enough on the list for a day.) It's possible that this series fixes one particular issue -- namely when merging, if the merge-base contained a "Test" directory, one side added a file to that directory, and the other side renamed "Test" to "test", and if the presence of both "Test" and "test" directories in the merge result is problematic, then at least with my fixes you wouldn't end up with both directories and could thus avoid that problem in a narrow set of cases. Sorry that I don't have any better news than that for you. Elijah
[PATCH 2/2] stash: implement builtin stash helper
Start moving stash functions over to builtin c code and call them in the shell script, instead of converting it all at once. Signed-off-by: Joel Teichroeb--- Makefile| 1 + builtin.h | 1 + builtin/stash--helper.c | 516 git-stash.sh| 134 + git.c | 1 + 5 files changed, 526 insertions(+), 127 deletions(-) create mode 100644 builtin/stash--helper.c diff --git a/Makefile b/Makefile index ee9d5eb11..3a9bd4d57 100644 --- a/Makefile +++ b/Makefile @@ -1000,6 +1000,7 @@ BUILTIN_OBJS += builtin/send-pack.o BUILTIN_OBJS += builtin/shortlog.o BUILTIN_OBJS += builtin/show-branch.o BUILTIN_OBJS += builtin/show-ref.o +BUILTIN_OBJS += builtin/stash--helper.o BUILTIN_OBJS += builtin/stripspace.o BUILTIN_OBJS += builtin/submodule--helper.o BUILTIN_OBJS += builtin/symbolic-ref.o diff --git a/builtin.h b/builtin.h index 42378f3aa..a14fd85b0 100644 --- a/builtin.h +++ b/builtin.h @@ -219,6 +219,7 @@ extern int cmd_shortlog(int argc, const char **argv, const char *prefix); extern int cmd_show(int argc, const char **argv, const char *prefix); extern int cmd_show_branch(int argc, const char **argv, const char *prefix); extern int cmd_status(int argc, const char **argv, const char *prefix); +extern int cmd_stash__helper(int argc, const char **argv, const char *prefix); extern int cmd_stripspace(int argc, const char **argv, const char *prefix); extern int cmd_submodule__helper(int argc, const char **argv, const char *prefix); extern int cmd_symbolic_ref(int argc, const char **argv, const char *prefix); diff --git a/builtin/stash--helper.c b/builtin/stash--helper.c new file mode 100644 index 0..c8cb667fe --- /dev/null +++ b/builtin/stash--helper.c @@ -0,0 +1,516 @@ +#include "builtin.h" +#include "config.h" +#include "parse-options.h" +#include "refs.h" +#include "lockfile.h" +#include "cache-tree.h" +#include "unpack-trees.h" +#include "merge-recursive.h" +#include "argv-array.h" +#include "run-command.h" + +static const char * const git_stash_usage[] = { + N_("git stash--helper drop [-q|--quiet] []"), + N_("git stash--helper pop [--index] [-q|--quiet] []"), + N_("git stash--helper apply [--index] [-q|--quiet] []"), + N_("git stash--helper branch []"), + N_("git stash--helper clear"), + NULL +}; + +static const char * const git_stash_drop_usage[] = { + N_("git stash--helper drop [-q|--quiet] []"), + NULL +}; + +static const char * const git_stash_pop_usage[] = { + N_("git stash--helper pop [--index] [-q|--quiet] []"), + NULL +}; + +static const char * const git_stash_apply_usage[] = { + N_("git stash--helper apply [--index] [-q|--quiet] []"), + NULL +}; + +static const char * const git_stash_branch_usage[] = { + N_("git stash--helper branch []"), + NULL +}; + +static const char * const git_stash_clear_usage[] = { + N_("git stash--helper clear"), + NULL +}; + +static const char *ref_stash = "refs/stash"; +static int quiet; +static char stash_index_path[PATH_MAX]; + +struct stash_info { + struct object_id w_commit; + struct object_id b_commit; + struct object_id i_commit; + struct object_id u_commit; + struct object_id w_tree; + struct object_id b_tree; + struct object_id i_tree; + struct object_id u_tree; + const char *message; + const char *revision; + int is_stash_ref; + int has_u; + const char *patch; +}; + +static int get_stash_info(struct stash_info *info, const char *commit) +{ + struct strbuf w_commit_rev = STRBUF_INIT; + struct strbuf b_commit_rev = STRBUF_INIT; + struct strbuf w_tree_rev = STRBUF_INIT; + struct strbuf b_tree_rev = STRBUF_INIT; + struct strbuf i_tree_rev = STRBUF_INIT; + struct strbuf u_tree_rev = STRBUF_INIT; + struct strbuf commit_buf = STRBUF_INIT; + struct strbuf symbolic = STRBUF_INIT; + struct strbuf out = STRBUF_INIT; + int ret; + const char *revision = commit; + char *end_of_rev; + struct child_process cp = CHILD_PROCESS_INIT; + info->is_stash_ref = 0; + + if (commit == NULL) { + strbuf_addf(_buf, "%s@{0}", ref_stash); + revision = commit_buf.buf; + } else if (strspn(commit, "0123456789") == strlen(commit)) { + strbuf_addf(_buf, "%s@{%s}", ref_stash, commit); + revision = commit_buf.buf; + } + info->revision = revision; + + strbuf_addf(_commit_rev, "%s", revision); + strbuf_addf(_commit_rev, "%s^1", revision); + strbuf_addf(_tree_rev, "%s:", revision); + strbuf_addf(_tree_rev, "%s^1:", revision); + strbuf_addf(_tree_rev, "%s^2:", revision); + + ret = !get_oid(w_commit_rev.buf, >w_commit) && + !get_oid(b_commit_rev.buf, >b_commit) && +
[PATCH 1/2] merge: close the index lock when not writing the new index
If the merge does not have anything to do, it does not unlock the index, causing any further index operations to fail. Thus, always unlock the index regardless of outcome. Signed-off-by: Joel Teichroeb--- merge-recursive.c | 9 ++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/merge-recursive.c b/merge-recursive.c index 2ca8444c6..225ff3fb5 100644 --- a/merge-recursive.c +++ b/merge-recursive.c @@ -2184,9 +2184,12 @@ int merge_recursive_generic(struct merge_options *o, if (clean < 0) return clean; - if (active_cache_changed && - write_locked_index(_index, , COMMIT_LOCK)) - return err(o, _("Unable to write index.")); + if (active_cache_changed) { + if (write_locked_index(_index, , COMMIT_LOCK)) + return err(o, _("Unable to write index.")); + } else { + rollback_lock_file(); + } return clean ? 0 : 1; } -- 2.15.0
[PATCH 0/2] Convert some stash functionality to a builtin
I've been working on converting all of git stash to be a builtin, however it's hard to get it all working at once with limited time, so I've moved around half of it to a new stash--helper builtin and called these functions from the shell script. Once this is stabalized, it should be easier to convert the rest of the commands one at a time without breaking anything. I've sent most of this code before, but that was targetting a full replacement of stash. The code is overall the same, but with some code review changes and updates for internal api changes. Joel Teichroeb (2): merge: close the index lock when not writing the new index stash: implement builtin stash helper Makefile| 1 + builtin.h | 1 + builtin/stash--helper.c | 516 git-stash.sh| 134 + git.c | 1 + merge-recursive.c | 9 +- 6 files changed, 532 insertions(+), 130 deletions(-) create mode 100644 builtin/stash--helper.c -- 2.15.0
[PATCH] builtin/describe.c: describe a blob
Sometimes users are given a hash of an object and they want to identify it further (ex.: Use verify-pack to find the largest blobs, but what are these? or [1]) When describing commits, we try to anchor them to tags or refs, as these are conceptually on a higher level than the commit. And if there is no ref or tag that matches exactly, we're out of luck. So we employ a heuristic to make up a name for the commit. These names are ambiguous, there might be different tags or refs to anchor to, and there might be different path in the DAG to travel to arrive at the commit precisely. When describing a blob, we want to describe the blob from a higher layer as well, which is a tuple of (commit, deep/path) as the tree objects involved are rather uninteresting. The same blob can be referenced by multiple commits, so how we decide which commit to use? This patch implements a rather naive approach on this: As there are no back pointers from blobs to commits in which the blob occurs, we'll start walking from any tips available, listing the blobs in-order of the commit and once we found the blob, we'll take the first commit that listed the blob. For source code this is likely not the first commit that introduced the blob, but rather the latest commit that contained the blob. For example: git describe v0.99:Makefile v0.99-5-gab6625e06a:Makefile tells us the latest commit that contained the Makefile as it was in tag v0.99 is commit v0.99-5-gab6625e06a (and at the same path), as the next commit on top v0.99-6-gb1de9de2b9 ([PATCH] Bootstrap "make dist", 2005-07-11) touches the Makefile. Let's see how this description turns out, if it is useful in day-to-day use as I have the intuition that we'd rather want to see the *first* commit that this blob was introduced to the repository (which can be achieved easily by giving the `--reverse` flag in the describe_blob rev walk). [1] https://stackoverflow.com/questions/223678/which-commit-has-this-blob Signed-off-by: Stefan BellerSigned-off-by: Junio C Hamano --- Documentation/git-describe.txt | 13 +++- builtin/describe.c | 71 ++ t/t6120-describe.sh| 15 + 3 files changed, 92 insertions(+), 7 deletions(-) diff --git a/Documentation/git-describe.txt b/Documentation/git-describe.txt index c924c945ba..a25443ca91 100644 --- a/Documentation/git-describe.txt +++ b/Documentation/git-describe.txt @@ -3,7 +3,7 @@ git-describe(1) NAME -git-describe - Describe a commit using the most recent tag reachable from it +git-describe - Describe a commit or blob using the graph relations SYNOPSIS @@ -11,6 +11,7 @@ SYNOPSIS [verse] 'git describe' [--all] [--tags] [--contains] [--abbrev=] [...] 'git describe' [--all] [--tags] [--contains] [--abbrev=] --dirty[=] +'git describe' DESCRIPTION --- @@ -24,6 +25,16 @@ By default (without --all or --tags) `git describe` only shows annotated tags. For more information about creating annotated tags see the -a and -s options to linkgit:git-tag[1]. +If the given object refers to a blob, it will be described +as `:`, such that the blob can be found +at `` in the ``. Note, that the commit is likely +not the commit that introduced the blob, but the one that was found +first; to find the commit that introduced the blob, you need to find +the commit that last touched the path, e.g. +`git log -- `. +As blobs do not point at the commits they are contained in, +describing blobs is slow as we have to walk the whole graph. + OPTIONS --- ...:: diff --git a/builtin/describe.c b/builtin/describe.c index 9e9a5ed5d4..acfd853a30 100644 --- a/builtin/describe.c +++ b/builtin/describe.c @@ -3,6 +3,7 @@ #include "lockfile.h" #include "commit.h" #include "tag.h" +#include "blob.h" #include "refs.h" #include "builtin.h" #include "exec_cmd.h" @@ -11,8 +12,9 @@ #include "hashmap.h" #include "argv-array.h" #include "run-command.h" +#include "revision.h" +#include "list-objects.h" -#define SEEN (1u << 0) #define MAX_TAGS (FLAG_BITS - 1) static const char * const describe_usage[] = { @@ -434,6 +436,56 @@ static void describe_commit(struct object_id *oid, struct strbuf *dst) strbuf_addstr(dst, suffix); } +struct process_commit_data { + struct object_id current_commit; + struct object_id looking_for; + struct strbuf *dst; + struct rev_info *revs; +}; + +static void process_commit(struct commit *commit, void *data) +{ + struct process_commit_data *pcd = data; + pcd->current_commit = commit->object.oid; +} + +static void process_object(struct object *obj, const char *path, void *data) +{ + struct process_commit_data *pcd = data; + + if (!oidcmp(>looking_for, >oid) && !pcd->dst->len) { + reset_revision_walk(); + describe_commit(>current_commit, pcd->dst); + strbuf_addf(pcd->dst,
[PATCH 0/1] describe a blob: with better docs
This replaces the last patch of origin/sb/describe-blob. Interdiff is below. I chose to not mention options at all, as currently none are applicable; Check for options and tell the user by die()ing that we don't know about the options for blobs. Thanks, Stefan builtin/describe.c: describe a blob Documentation/git-describe.txt | 13 +++- builtin/describe.c | 71 ++ t/t6120-describe.sh| 15 + 3 files changed, 92 insertions(+), 7 deletions(-) -- 2.15.0.128.gcadd42da22 diff --git c/Documentation/git-describe.txt w/Documentation/git-describe.txt index 79ec0be62a..a25443ca91 100644 --- c/Documentation/git-describe.txt +++ w/Documentation/git-describe.txt @@ -11,7 +11,7 @@ SYNOPSIS [verse] 'git describe' [--all] [--tags] [--contains] [--abbrev=] [...] 'git describe' [--all] [--tags] [--contains] [--abbrev=] --dirty[=] -'git describe' [] +'git describe' DESCRIPTION --- diff --git c/builtin/describe.c w/builtin/describe.c index cf08bef344..acfd853a30 100644 --- c/builtin/describe.c +++ w/builtin/describe.c @@ -501,9 +501,13 @@ static void describe(const char *arg, int last_one) if (cmit) describe_commit(, ); - else if (lookup_blob()) + else if (lookup_blob()) { + if (all || tags || longformat || first_parent || + patterns.nr || exclude_patterns.nr || + always || dirty || broken) + die(_("options not available for describing blobs")); describe_blob(oid, ); - else + } else die(_("%s is neither a commit nor blob"), arg); puts(sb.buf);
Re: [PATCH 00/30] Add directory rename detection to git
From: "Elijah Newren"[This series is entirely independent of my rename detection limits series. However, I have a separate rename detection performance series that depends on both this series and the rename detection limits series.] In this patchset, I introduce directory rename detection to merge-recursive, predominantly so that when files are added to directories on one side of history and those directories are renamed on the other side of history, the files will end up in the proper location after a merge or cherry-pick. However, this isn't limited to that simplistic case. More interesting possibilities exist, such as: * a file being renamed into a directory which is renamed on the other side of history, causing the need for a transitive rename. How does this cope with the case insensitive case preserving file systems on Mac and Windows, esp when core.ignorecase is true. If it's a bigger problem that the series already covers, would the likely changes be reasonably localised? This came up recently on GfW for `git checkout` of a branch where the case changed ("Test" <-> "test"), but git didn't notice that it needed to rename the directories on such an file system. https://github.com/git-for-windows/git/issues/1333 -- Philip
[RFC PATCH 8/9] merge-recursive: Accelerate rename detection
If a file is unmodified on one side of history (no content changes, no name change, and no mode change) and is renamed on the other side, then the correct merge result is to take both the file name and the file contents (and file mode) of the renamed file. merge-recursive detects this rename and gets the correct merge result. Note that if no rename detection is done, then this will appear to the merge machinery as two files: one that was unmodified on one side of history and deleted on the other (thus the merge should delete it), and one which was newly added on one side of history (thus the merge should include it). Thus, even if the rename wasn't detected, we still would have ended up with the correct result. In other words, rename detection is a waste of time for files that were unmodified on the OTHER side of history. We can accelerate rename detection for merges by providing information about the other side of history, which will allow us to remove all such rename sources from the list of candidates we care about. There are two gotchas: 1) Not trying to detect renames for these types of files can result in rename/add conflicts being instead detected as add/add conflicts, and can result in rename/rename(2to1) conflicts being instead detected as either rename/add or add/add conflicts. Luckily for us, these three types of conflicts happen to make the same changes to the index and working tree (what a coincidence...), so this isn't a significant issue; the only annoyance is that the stdout from the merge command will include a "CONFLICT($type)" message for a related conflict type instead of the precise conflict type. 2) If there is a directory rename on one side of history AND all files within the directory are not merely renamed but are modified as well AND none of the original files in the directory are modified on the other side of history AND there are new files added (or moved into) to the original directory on that other side of history, then this change will prevent us from being able to detect that directory rename and placing the new file(s) into the appropriate directory. A subsequent commit will correct this downside. In one particular testcase involving a large repository and some high-level directories having been renamed, this cut the time necessary for a cherry-pick down by a factor of about 8 (from around 4.5 minutes down to around 34 seconds) Signed-off-by: Elijah Newren--- diff.c| 1 + diff.h| 3 +++ diffcore-rename.c | 43 ++- merge-recursive.c | 61 +-- 4 files changed, 101 insertions(+), 7 deletions(-) diff --git a/diff.c b/diff.c index c6597e3231..83723ab26d 100644 --- a/diff.c +++ b/diff.c @@ -4173,6 +4173,7 @@ void diff_setup(struct diff_options *options) } options->color_moved = diff_color_moved_default; + options->ignore_for_renames = NULL; } void diff_setup_done(struct diff_options *options) diff --git a/diff.h b/diff.h index adf7e92eb5..1288f36fd2 100644 --- a/diff.h +++ b/diff.h @@ -196,6 +196,9 @@ struct diff_options { } color_moved; #define COLOR_MOVED_DEFAULT COLOR_MOVED_ZEBRA #define COLOR_MOVED_MIN_ALNUM_COUNT 20 + + /* Paths we should ignore for rename purposes */ + struct string_list *ignore_for_renames; }; void diff_emit_submodule_del(struct diff_options *o, const char *line); diff --git a/diffcore-rename.c b/diffcore-rename.c index b15d9d74ef..aa8e0e4d4a 100644 --- a/diffcore-rename.c +++ b/diffcore-rename.c @@ -437,6 +437,40 @@ static int find_renames(struct diff_score *mx, int dst_cnt, int minimum_score, i return count; } +static int handle_rename_ignores(struct diff_options *options) +{ + int detect_rename = options->detect_rename; + struct string_list *ignores = options->ignore_for_renames; + int ignored = 0; + int i, j; + + /* rename_ignores onlhy relevant when we're not detecting copies */ + if (ignores == NULL || detect_rename == DIFF_DETECT_COPY) + return 0; + + for (i = 0, j = 0; i < ignores->nr && j < rename_src_nr;) { + struct diff_filespec *one = rename_src[j].p->one; + int cmp; + + if (one->rename_used) { + j++; + continue; + } + + cmp = strcmp(ignores->items[i].string, one->path); + if (cmp < 0) + i++; + else if (cmp > 0) + j++; + else { + one->rename_used++; + ignored++; + } + } + + return ignored; +} + void diffcore_rename(struct diff_options *options) { int detect_rename = options->detect_rename; @@ -445,7 +479,7
[RFC PATCH 5/9] merge-recursive: Fix rename/add conflict handling
This makes the rename/add conflict handling make use of the new handle_file_collision() function, which fixes several bugs and improves things for the rename/add case significantly. Previously, rename/add would: * Not leave any higher order stage entries in the index, making it appear as if there were no conflict. * Would place the rename file at the colliding path, and move the added file elsewhere, which combined with the lack of higher order stage entries felt really odd. It's not clear to me why the rename should take precedence over the add; if one should be moved out of the way, they both probably should. * In the recursive case, it would do a two way merge of the added file and the version of the renamed file on the renamed side, completely excluding modifications to the renamed file on the unrenamed side of history. Using the new handle_file_collision() fixes all of these issues, and adds smarts to allow two-way merge OR move colliding files to separate paths depending on the similarity of the colliding files. Signed-off-by: Elijah Newren--- merge-recursive.c | 135 +++--- t/t6036-recursive-corner-cases.sh | 11 ++-- 2 files changed, 104 insertions(+), 42 deletions(-) diff --git a/merge-recursive.c b/merge-recursive.c index 345479ad50..f29cbd1240 100644 --- a/merge-recursive.c +++ b/merge-recursive.c @@ -167,6 +167,7 @@ static int oid_eq(const struct object_id *a, const struct object_id *b) enum rename_type { RENAME_NORMAL = 0, RENAME_DIR, + RENAME_ADD, RENAME_DELETE, RENAME_ONE_FILE_TO_ONE, RENAME_ONE_FILE_TO_TWO, @@ -209,6 +210,7 @@ static inline void setup_rename_conflict_info(enum rename_type rename_type, struct stage_data *src_entry1, struct stage_data *src_entry2) { + int ostage1, ostage2; struct rename_conflict_info *ci = xcalloc(1, sizeof(struct rename_conflict_info)); ci->rename_type = rename_type; ci->pair1 = pair1; @@ -226,18 +228,22 @@ static inline void setup_rename_conflict_info(enum rename_type rename_type, dst_entry2->rename_conflict_info = ci; } - if (rename_type == RENAME_TWO_FILES_TO_ONE) { - /* -* For each rename, there could have been -* modifications on the side of history where that -* file was not renamed. -*/ - int ostage1 = o->branch1 == branch1 ? 3 : 2; - int ostage2 = ostage1 ^ 1; + /* +* For each rename, there could have been +* modifications on the side of history where that +* file was not renamed. +*/ + if (rename_type == RENAME_ADD || + rename_type == RENAME_TWO_FILES_TO_ONE) { + ostage1 = o->branch1 == branch1 ? 3 : 2; ci->ren1_other.path = pair1->one->path; oidcpy(>ren1_other.oid, _entry1->stages[ostage1].oid); ci->ren1_other.mode = src_entry1->stages[ostage1].mode; + } + + if (rename_type == RENAME_TWO_FILES_TO_ONE) { + ostage2 = ostage1 ^ 1; ci->ren2_other.path = pair2->one->path; oidcpy(>ren2_other.oid, _entry2->stages[ostage2].oid); @@ -1104,6 +1110,18 @@ static int merge_file_special_markers(struct merge_options *o, const char *filename2, struct merge_file_info *mfi) { + if (o->branch1 != branch1) { + /* +* It's weird getting a reverse merge with HEAD on the bottom +* and the other branch on the top. Fix that. +*/ + return merge_file_special_markers(o, + one, b, a, + branch2, filename2, + branch1, filename1, + mfi); + } + char *side1 = NULL; char *side2 = NULL; int ret; @@ -1291,6 +1309,21 @@ static int handle_file_collision(struct merge_options *o, struct diff_filespec null, a, b; int minimum_score; + /* +* It's easiest to get the correct things into stage 2 and 3, and +* to make sure that the content merge puts HEAD before the other +* branch if we just ensure that branch1 == o->branch1. So, simply +* flip arguments around if we don't have that. +*/ + if (branch1 != o->branch1) { + return handle_file_collision(o, collide_path, +prev_path2, prev_path1, +branch2, branch1, +b_oid, b_mode, +
[RFC PATCH 6/9] merge-recursive: Improve handling for rename/rename(2to1) conflicts
This makes the rename/rename(2to1) conflicts use the new handle_file_collision() function. Since that function was based originally on the rename/rename(2to1) handling code, the main differences here are in what was added. In particular: * If the two colliding files are similar, instead of being stored at collide_path~HEAD and collide_path~MERGE, the files are two-way merged and recorded at collide_path. * Instead of recording the version of the renamed file that existed on the renamed side in the index (thus ignoring any changes that were made to the file on the side of history without the rename), we do a three-way content merge on the renamed path, then store that at either stage 2 or stage 3. * Note that if either of the three-way content merges done for each rename have conflicts, we do NOT try to estimate the similarity of the resulting two files and just automatically consider them to be dissimilar. This is done to avoid foisting conflicts-of-conflicts on the user. Signed-off-by: Elijah Newren--- Is it too weird to others that I potentially record a merged file with conflict markers at both stage 2 and stage 3 in the index? To me, it seemed less weird than what we previously did, but I am curious what others think of it. merge-recursive.c| 100 +-- t/t6042-merge-rename-corner-cases.sh | 2 +- 2 files changed, 14 insertions(+), 88 deletions(-) diff --git a/merge-recursive.c b/merge-recursive.c index f29cbd1240..b8108740c4 100644 --- a/merge-recursive.c +++ b/merge-recursive.c @@ -647,26 +647,6 @@ static int update_stages(struct merge_options *opt, const char *path, return 0; } -static int update_stages_for_stage_data(struct merge_options *opt, - const char *path, - const struct stage_data *stage_data) -{ - struct diff_filespec o, a, b; - o.mode = stage_data->stages[1].mode; - oidcpy(, _data->stages[1].oid); - - a.mode = stage_data->stages[2].mode; - oidcpy(, _data->stages[2].oid); - - b.mode = stage_data->stages[3].mode; - oidcpy(, _data->stages[3].oid); - - return update_stages(opt, path, -is_null_sha1(o.oid.hash) ? NULL : , -is_null_sha1(a.oid.hash) ? NULL : , -is_null_sha1(b.oid.hash) ? NULL : ); -} - static void update_entry(struct stage_data *entry, struct diff_filespec *o, struct diff_filespec *a, @@ -1598,7 +1578,6 @@ static int conflict_rename_rename_2to1(struct merge_options *o, char *path = c1->path; /* == c2->path */ struct merge_file_info mfi_c1; struct merge_file_info mfi_c2; - int ret; output(o, 1, _("CONFLICT (rename/rename): " "Rename %s->%s in %s. " @@ -1606,9 +1585,6 @@ static int conflict_rename_rename_2to1(struct merge_options *o, a->path, c1->path, ci->branch1, b->path, c2->path, ci->branch2); - remove_file(o, 1, a->path, o->call_depth || would_lose_untracked(a->path)); - remove_file(o, 1, b->path, o->call_depth || would_lose_untracked(b->path)); - if (merge_file_special_markers(o, a, c1, >ren1_other, o->branch1, c1->path, o->branch2, ci->ren1_other.path, _c1) || @@ -1617,66 +1593,11 @@ static int conflict_rename_rename_2to1(struct merge_options *o, o->branch2, c2->path, _c2)) return -1; - if (o->call_depth) { - /* -* If mfi_c1.clean && mfi_c2.clean, then it might make -* sense to do a two-way merge of those results. But, I -* think in all cases, it makes sense to have the virtual -* merge base just undo the renames; they can be detected -* again later for the non-recursive merge. -*/ - remove_file(o, 0, path, 0); - ret = update_file(o, 0, _c1.oid, mfi_c1.mode, a->path); - if (!ret) - ret = update_file(o, 0, _c2.oid, mfi_c2.mode, - b->path); - } else { - char *new_path1 = unique_path(o, path, ci->branch1); - char *new_path2 = unique_path(o, path, ci->branch2); - output(o, 1, _("Renaming %s to %s and %s to %s instead"), - a->path, new_path1, b->path, new_path2); - if (was_dirty(o, path)) - output(o, 1, _("Refusing to lose dirty file at %s"), - path); - else if (would_lose_untracked(path)) - /* -* Only way we get here is if both renames
[RFC PATCH 7/9] merge-recursive: Improve handling for add/add conflicts
This makes add/add conflicts use the new handle_file_collision() function. This leaves the handling of the index the same, but modifies how the working tree is handled: instead of always doing a two-way merge of the file contents and recording them at the collision path name, we instead first estimate the similarity of the two files involved. If they are not similar, we instead record the file contents into two separate files for the user to inspect. Several testcases had to be modified to either expect files to be written to different locations, or for the two test colliding files to be modified so that they were similar. Signed-off-by: Elijah Newren--- The sheer number of tests that relied on add/add always resulting in a two-way merge gave me a little pause. I am assuming that the ONLY reason the testsuite did that was just because it made it really easy to write the tests (you wouldn't need to make the two files similar at all), not because it was actually the behavior that was really wanted for truly dissimilar files. Based on that assumption, I just modified the tests for the "new world". But I'd like to hear others' comments on that assumption of mine. merge-recursive.c| 24 +++- t/t2023-checkout-m.sh| 2 +- t/t3418-rebase-continue.sh | 27 +++ t/t3504-cherry-pick-rerere.sh| 19 ++- t/t4200-rerere.sh| 12 ++-- t/t6020-merge-df.sh | 4 ++-- t/t6024-recursive-merge.sh | 35 +-- t/t6025-merge-symlinks.sh| 9 ++--- t/t6031-merge-filemode.sh| 4 ++-- t/t6042-merge-rename-corner-cases.sh | 2 +- t/t6043-merge-rename-directories.sh | 13 - t/t7060-wtstatus.sh | 1 + t/t7064-wtstatus-pv2.sh | 4 ++-- t/t7506-status-submodule.sh | 11 +++ t/t7610-mergetool.sh | 28 ++-- 15 files changed, 127 insertions(+), 68 deletions(-) diff --git a/merge-recursive.c b/merge-recursive.c index b8108740c4..7bc9a2ac80 100644 --- a/merge-recursive.c +++ b/merge-recursive.c @@ -3029,11 +3029,25 @@ static int process_entry(struct merge_options *o, clean_merge = -1; } } else if (a_oid && b_oid) { - /* Case C: Added in both (check for same permissions) and */ - /* case D: Modified in both, but differently. */ - clean_merge = merge_content(o, path, 0 /* file_in_way */, - o_oid, o_mode, a_oid, a_mode, b_oid, b_mode, - NULL); + if (!o_oid) { + /* Case C: Added in both (check for same permissions) */ + output(o, 1, + _("CONFLICT (add/add): Merge conflict in %s"), + path); + clean_merge = handle_file_collision(o, + path, NULL, NULL, + o->branch1, + o->branch2, + a_oid, a_mode, + b_oid, b_mode, + 0); + } else + /* case D: Modified in both, but differently. */ + clean_merge = merge_content(o, path, 0 /* file_in_way */, + o_oid, o_mode, + a_oid, a_mode, + b_oid, b_mode, + NULL); } else if (!o_oid && !a_oid && !b_oid) { /* * this entry was deleted altogether. a_mode == 0 means diff --git a/t/t2023-checkout-m.sh b/t/t2023-checkout-m.sh index 7e18985134..2f8ea52318 100755 --- a/t/t2023-checkout-m.sh +++ b/t/t2023-checkout-m.sh @@ -27,7 +27,7 @@ clean_branchnames () { } test_expect_success '-m restores 2-way conflicted+resolved file' ' - cp each.txt each.txt.conflicted && + test_must_fail git merge-file -p each.txt~HEAD /dev/null each.txt~master >each.txt.conflicted && echo resolved >each.txt && git add each.txt && git checkout -m -- each.txt && diff --git a/t/t3418-rebase-continue.sh b/t/t3418-rebase-continue.sh index fcfdd197bd..3523558421 100755 --- a/t/t3418-rebase-continue.sh +++ b/t/t3418-rebase-continue.sh @@ -10,10 +10,17 @@ set_fake_editor test_expect_success 'setup' ' test_commit "commit-new-file-F1" F1 1 && - test_commit "commit-new-file-F2" F2 2 && + printf
[RFC PATCH 9/9] diffcore-rename: Filter rename_src list when possible
We have to look at each entry in rename_src a total of rename_dst_nr times. When we're not detecting copies, any exact renames or ignorable rename paths will just be skipped over. While checking that these can be skipped over is a relatively cheap check, it's still a waste of time to do that check more than once, let alone rename_dst_nr times. When rename_src_nr is a few thousand times bigger than the number of relevant sources (such as when cherry-picking a commit that only touched a handful of files, but from a side of history that has different names for some high level directories), this time can add up. First make an initial pass over the rename_src array and move all the relevant entries to the front, so that we can iterate over just those relevant entries. In one particular testcase involving a large repository and some high-level directories having been renamed, this cut the time necessary for a cherry-pick down by a factor of about 2 (from around 34 seconds down to just under 16 seconds) Signed-off-by: Elijah Newren--- diffcore-rename.c | 47 +++ 1 file changed, 31 insertions(+), 16 deletions(-) diff --git a/diffcore-rename.c b/diffcore-rename.c index aa8e0e4d4a..f6fc084891 100644 --- a/diffcore-rename.c +++ b/diffcore-rename.c @@ -437,16 +437,14 @@ static int find_renames(struct diff_score *mx, int dst_cnt, int minimum_score, i return count; } -static int handle_rename_ignores(struct diff_options *options) +static void handle_rename_ignores(struct diff_options *options) { - int detect_rename = options->detect_rename; struct string_list *ignores = options->ignore_for_renames; - int ignored = 0; int i, j; /* rename_ignores onlhy relevant when we're not detecting copies */ - if (ignores == NULL || detect_rename == DIFF_DETECT_COPY) - return 0; + if (ignores == NULL) + return; for (i = 0, j = 0; i < ignores->nr && j < rename_src_nr;) { struct diff_filespec *one = rename_src[j].p->one; @@ -464,11 +462,27 @@ static int handle_rename_ignores(struct diff_options *options) j++; else { one->rename_used++; - ignored++; + i++; + j++; } } +} + +static int remove_renames_from_src() +{ + int j, new_j; + + for (j = 0, new_j = 0; j < rename_src_nr; j++) { + if (rename_src[j].p->one->rename_used) + continue; + + if (new_j < j) + memcpy(_src[new_j], _src[j], + sizeof(struct diff_rename_src)); + new_j++; + } - return ignored; + return new_j; } void diffcore_rename(struct diff_options *options) @@ -479,7 +493,7 @@ void diffcore_rename(struct diff_options *options) struct diff_queue_struct outq; struct diff_score *mx; int i, j, rename_count, skip_unmodified = 0; - int num_create, dst_cnt, num_src, ignore_count; + int num_create, dst_cnt, num_src; struct progress *progress = NULL; if (!minimum_score) @@ -542,18 +556,19 @@ void diffcore_rename(struct diff_options *options) /* * Mark source files as used if they are found in the -* ignore_for_renames list. +* ignore_for_renames list, and clean out files from rename_src +* that we don't need to continue considering. */ - ignore_count = handle_rename_ignores(options); + num_src = rename_src_nr; + if (detect_rename != DIFF_DETECT_COPY) { + handle_rename_ignores(options); + num_src = remove_renames_from_src(); + } /* -* Calculate how many renames are left (but all the source -* files still remain as options for rename/copies!) +* Calculate how many renames are left */ num_create = (rename_dst_nr - rename_count); - num_src = (detect_rename == DIFF_DETECT_COPY ? - rename_src_nr : rename_src_nr - rename_count); - num_src -= ignore_count; /* All done? */ if (!num_create) @@ -588,7 +603,7 @@ void diffcore_rename(struct diff_options *options) for (j = 0; j < NUM_CANDIDATE_PER_DST; j++) m[j].dst = -1; - for (j = 0; j < rename_src_nr; j++) { + for (j = 0; j < num_src; j++) { struct diff_filespec *one = rename_src[j].p->one; struct diff_score this_src; -- 2.15.0.46.g41dca04efb
[RFC PATCH 1/9] diffcore-rename: No point trying to find a match better than exact
diffcore_rename() had some code to avoid having destination paths that already had an exact rename detected from being re-checked for other renames. Source paths, however, were re-checked because we wanted to allow the possibility of detecting copies. But if copy detection isn't turned on, then this merely amounts to attempting to find a better-than-exact match, which naturally ends up being an expensive no-op. In particular, copy detection is never turned on by the merge recursive machinery. In a large repository (~50k files, about 60% of which was java) that had a number of high level directories involved in renames, this cut the time necessary for a cherry-pick down by about 50% (from around 9 minutes to 4.5 minutes). Signed-off-by: Elijah Newren--- diffcore-rename.c | 17 +++-- 1 file changed, 11 insertions(+), 6 deletions(-) diff --git a/diffcore-rename.c b/diffcore-rename.c index 6ba6157c61..c0517058b0 100644 --- a/diffcore-rename.c +++ b/diffcore-rename.c @@ -377,11 +377,10 @@ static void record_if_better(struct diff_score m[], struct diff_score *o) * 1 if we need to disable inexact rename detection; * 2 if we would be under the limit if we were given -C instead of -C -C. */ -static int too_many_rename_candidates(int num_create, +static int too_many_rename_candidates(int num_create, int num_src, struct diff_options *options) { int rename_limit = options->rename_limit; - int num_src = rename_src_nr; int i; options->needed_rename_limit = 0; @@ -446,7 +445,7 @@ void diffcore_rename(struct diff_options *options) struct diff_queue_struct outq; struct diff_score *mx; int i, j, rename_count, skip_unmodified = 0; - int num_create, dst_cnt; + int num_create, dst_cnt, num_src; struct progress *progress = NULL; if (!minimum_score) @@ -512,12 +511,14 @@ void diffcore_rename(struct diff_options *options) * files still remain as options for rename/copies!) */ num_create = (rename_dst_nr - rename_count); + num_src = (detect_rename == DIFF_DETECT_COPY ? + rename_src_nr : rename_src_nr - rename_count); /* All done? */ if (!num_create) goto cleanup; - switch (too_many_rename_candidates(num_create, options)) { + switch (too_many_rename_candidates(num_create, num_src, options)) { case 1: goto cleanup; case 2: @@ -531,7 +532,7 @@ void diffcore_rename(struct diff_options *options) if (options->show_rename_progress) { progress = start_delayed_progress( _("Performing inexact rename detection"), - (uint64_t)rename_dst_nr * (uint64_t)rename_src_nr); + (uint64_t)num_create * (uint64_t)num_src); } mx = xcalloc(st_mult(NUM_CANDIDATE_PER_DST, num_create), sizeof(*mx)); @@ -550,6 +551,10 @@ void diffcore_rename(struct diff_options *options) struct diff_filespec *one = rename_src[j].p->one; struct diff_score this_src; + if (one->rename_used && + detect_rename != DIFF_DETECT_COPY) + continue; + if (skip_unmodified && diff_unmodified_pair(rename_src[j].p)) continue; @@ -568,7 +573,7 @@ void diffcore_rename(struct diff_options *options) diff_free_filespec_blob(two); } dst_cnt++; - display_progress(progress, (uint64_t)(i+1)*(uint64_t)rename_src_nr); + display_progress(progress, (uint64_t)dst_cnt*(uint64_t)num_src); } stop_progress(); -- 2.15.0.46.g41dca04efb
[RFC PATCH 4/9] Add testcases for improved file collision conflict handling
Adds testcases dealing with file collisions for the following types of conflicts: * add/add * rename/add * rename/rename(2to1) These tests include expectations for new, smarter behavior provided by handle_file_collision(). Since that function is not in use yet, the tests are currently expected to fail. Signed-off-by: Elijah Newren--- t/t6036-recursive-corner-cases.sh| 8 +- t/t6042-merge-rename-corner-cases.sh | 210 +++ 2 files changed, 214 insertions(+), 4 deletions(-) diff --git a/t/t6036-recursive-corner-cases.sh b/t/t6036-recursive-corner-cases.sh index 18aa88b5c0..8485e04b9b 100755 --- a/t/t6036-recursive-corner-cases.sh +++ b/t/t6036-recursive-corner-cases.sh @@ -208,11 +208,11 @@ test_expect_success 'git detects differently handled merges conflict' ' git cat-file -p C:new_a >>merge-me && >empty && test_must_fail git merge-file \ - -L "Temporary merge branch 2" \ - -L "" \ -L "Temporary merge branch 1" \ - merged empty merge-me && - sed -e "s/^\([<=>]\)/\1\1\1/" merged >merged-internal && + -L "" \ + -L "Temporary merge branch 2" \ + merge-me empty merged && + sed -e "s/^\([<=>]\)/\1\1\1/" merge-me >merged-internal && test $(git rev-parse :1:new_a) = $(git hash-object merged-internal) ' diff --git a/t/t6042-merge-rename-corner-cases.sh b/t/t6042-merge-rename-corner-cases.sh index 411550d2b6..d8fe797f0d 100755 --- a/t/t6042-merge-rename-corner-cases.sh +++ b/t/t6042-merge-rename-corner-cases.sh @@ -575,4 +575,214 @@ test_expect_success 'rename/rename/add-dest merge still knows about conflicting test ! -f c ' +test_conflicts_with_adds_and_renames() { + test $1 != 0 && side1=rename || side1=add + test $2 != 0 && side2=rename || side2=add + + # Setup: + # L + # / \ + # master ? + # \ / + # R + # + # Where: + # Both L and R have files named 'three-unrelated' and + # 'three-related' which collide. Each of the colliding files + # could have been involved in a rename, in which case there was a + # file named 'one-[un]related' or 'two-[un]related' that was + # modified on the opposite side of history and renamed into the + # collision on this side of history. + # + # Questions for both three-unrelated and three-related: + # 1) The index should contain both a stage 2 and stage 3 entry + # for the colliding file. Does it? + # 2) When renames are involved, the content merges are clean, so + # the index should reflect the content merges, not merely the + # version of the colliding file from the prior commit. Does + # it? + # + # Questions for three-unrelated: + # 3) There should be files in the worktree named + # 'three-unrelated~HEAD' and 'three-unrelated~R^0' with the + # (content-merged) version of 'three-unrelated' from the + # appropriate side of the merge. Are they present? + # 4) There should be no file named 'three-unrelated' in the + # working tree. That'd make it too likely that users would + # use it instead of carefully looking at both + # three-unrelated~HEAD and three-unrelated~R^0. Is it + # correctly missing? + # + # Questions for three-related: + # 3) There should be a file in the worktree named three-related + # containing the two-way merged contents of the content-merged + # versions of three-related from each of the two colliding + # files. Is it present? + # 4) There should not be any three-related~* files in the working + # tree. + test_expect_success "setup simple $side1/$side2 conflict" ' + git rm -rf . && + git clean -fdqx && + rm -rf .git && + git init && + + # Create a simple file with 10 lines + ten="0 1 2 3 4 5 6 7 8 9" && + for i in $ten + do + echo line $i in a sample file + done >unrelated1_v1 && + # Create a second version of same file with one more line + cat unrelated1_v1 >unrelated1_v2 && + echo another line >>unrelated1_v2 && + + # Create an unrelated simple file with 10 lines + for i in $ten + do + echo line $i in another sample file + done >unrelated2_v1 && + # Create a second version of same file with one more line + cat unrelated2_v1 >unrelated2_v2 && + echo another line >>unrelated2_v2 && + + # Create some related files now +
[RFC PATCH 3/9] merge-recursive: New function for better colliding conflict resolutions
There are three conflict types that represent two (possibly entirely unrelated) files colliding at the same location: * add/add * rename/add * rename/rename(2to1) These three conflict types already share more similarity than might be immediately apparent from their description: (1) the handling of the rename variants already involves removing any entries from the index corresponding to the original file names[*], thus only leaving entries in the index for the colliding path; (2) likewise, any trace of the original file name in the working tree is also removed. So, in all three cases we're left with how to represent two colliding files in both the index and the working copy. [*] Technically, this isn't quite true because rename/rename(2to1) conflicts in the recursive (o->call_depth > 0) case do an "unrename" since about seven years ago. But even in that case, Junio felt compelled to explain that my decision to "unrename" wasn't necessarily the only or right answer -- search for "Comment from Junio" in t6036 for details. My initial motivation for looking at these three conflict types was that if the handling of these three conflict types is the "same" (defined more precisely in a later commit), or is at least the "same" in the limited set of cases where a renamed file is unmodified on the side of history where the file is not renamed, then a significant performance improvement for rename detection during merges is possible. However, while that served as motivation to look at these three types of conflicts, the actual goal of this new function is to try to improve the handling for all three cases, not to merely make them the same. The previous behavior for these conflict types in regards to the working tree (assuming the file collision occurs at 'foo') was: * add/add does a two-way merge of the two files and records it as 'foo'. * rename/rename(2to1) records the two different files into two new uniquely named files (foo~HEAD and foo~$MERGE), while removing 'foo' from the working tree. * rename/add records the two different files into two different locations, recording the add at foo~$SIDE and, oddly, recording the rename at foo (why is the rename more important than the add?) So, the biggest question is whether the two colliding files should be two-way merged and recorded in place, or recorded into separate files. If the files are similar enough, the two-way merge is probably preferable, but if they're not similar, recording as separate files is probably similar. (The same logic that applies for the working directory here would also apply to the recursive (i.e. o->call_depth > 0) case as well.) The code handling the different types of conflicts appear to have been written with different assumptions about whether the colliding files would be similar. But, rather than make an assumption about whether the two files will be similar, why not just check? If we simply call estimate_similarity(), we can two-way merge the files if they are similar, and otherwise record the two files at different locations. Checking similarity in order to optimize worktree handling is the primary improvement for these three conflict types. Further improvements will be discussed in subsequent commits that modify the code to take advantage of this new shared function. Signed-off-by: Elijah Newren--- Besides the use-similarity-to-determine-two-way-vs-record-two-separate choice, I am curious if anyone else has problems with the "unrename" behavior for the recursive case (which I'd guess never actually comes up in practice and only cursorily appears in the testsuite because I put it there years ago). If anyone does, I may have a useful testcase I could bring up for discussion on the list. Given that Junio disagreed with some of my reasoning on the one testcase in the testsuite that cursorily touches this, I'm not sure it's even clear what "correct"/"optimal" behavior for those cases is. diff.h| 4 ++ diffcore-rename.c | 6 +-- merge-recursive.c | 118 ++ 3 files changed, 125 insertions(+), 3 deletions(-) diff --git a/diff.h b/diff.h index aca150ba2e..adf7e92eb5 100644 --- a/diff.h +++ b/diff.h @@ -427,4 +427,8 @@ extern void print_stat_summary(FILE *fp, int files, int insertions, int deletions); extern void setup_diff_pager(struct diff_options *); +extern int estimate_similarity(struct diff_filespec *src, + struct diff_filespec *dst, + int minimum_score); + #endif /* DIFF_H */ diff --git a/diffcore-rename.c b/diffcore-rename.c index c0517058b0..b15d9d74ef 100644 --- a/diffcore-rename.c +++ b/diffcore-rename.c @@ -127,9 +127,9 @@ struct diff_score { short name_score; }; -static int estimate_similarity(struct diff_filespec *src, - struct diff_filespec *dst, -
[RFC PATCH 0/9] Improve rename detection performance in merge recursive
This series depends on BOTH my rename-limits and directory-detection series (https://public-inbox.org/git/20171110173956.25105-1-new...@gmail.com/ and https://public-inbox.org/git/20171110190550.27059-1-new...@gmail.com/). This series could be modified to be submitted with no dependencies on those series, but it'd end up causing lots of merging conflicts, and having this series depend on the other two seemed most logical to me. This patch series tries to improve performance rename detection performance in merge recursive where possible. In particular, I was guided by a specific usecase of cherry-picking a small change (by which I mean 7 files with small modifications and one new file) in a repo that has thousands of renames, due to some high-level directories having been renamed. Some of the changes should help rename detection performance in general, but the greatest benefit will be found when one side of history only makes a small number of changes. For my usecase, I dropped the time needed for the cherry-pick from over 9 minutes, down to about 16 seconds. RFC because: * I believe the patch with the biggest performance improvement will break directory rename detection in specific, limited cases (which are not yet represented in the testsuite). Should be fixable; I just need to implement the fix. (The fix may reduce the performance improvement slightly). * This series includes changes to conflict handling for conflict types that involve two colliding files. I think the new behavior is strictly better, but this is the kind of thing I really need to make sure others agree with; comments very welcome. (Patches 2--6) * 16 seconds is still annoyingly slow; we should be able to do better. I have one or two ideas. But since others are asking about the performance of small cherry-picks in large repos with lots of renames, I figure it's worth posting what I have so far. Elijah Newren (9): diffcore-rename: No point trying to find a match better than exact merge-recursive: Avoid unnecessary string list lookups merge-recursive: New function for better colliding conflict resolutions Add testcases for improved file collision conflict handling merge-recursive: Fix rename/add conflict handling merge-recursive: Improve handling for rename/rename(2to1) conflicts merge-recursive: Improve handling for add/add conflicts merge-recursive: Accelerate rename detection diffcore-rename: Filter rename_src list when possible diff.c | 1 + diff.h | 7 + diffcore-rename.c| 85 ++- merge-recursive.c| 452 --- t/t2023-checkout-m.sh| 2 +- t/t3418-rebase-continue.sh | 27 ++- t/t3504-cherry-pick-rerere.sh| 19 +- t/t4200-rerere.sh| 12 +- t/t6020-merge-df.sh | 4 +- t/t6024-recursive-merge.sh | 35 +-- t/t6025-merge-symlinks.sh| 9 +- t/t6031-merge-filemode.sh| 4 +- t/t6036-recursive-corner-cases.sh| 19 +- t/t6042-merge-rename-corner-cases.sh | 212 +++- t/t6043-merge-rename-directories.sh | 13 +- t/t7060-wtstatus.sh | 1 + t/t7064-wtstatus-pv2.sh | 4 +- t/t7506-status-submodule.sh | 11 +- t/t7610-mergetool.sh | 28 +-- 19 files changed, 722 insertions(+), 223 deletions(-) -- 2.15.0.46.g41dca04efb
[RFC PATCH 2/9] merge-recursive: Avoid unnecessary string list lookups
Since we're taking entries from active_cache, which is already in sorted order with same-named entries adjacent, we can skip a lookup. Also, we can just use append instead of insert (avoiding the need to find where to put the new item) and still end up with a sorted list. Signed-off-by: Elijah Newren--- Assumed negligible performance change; I didn't even bother measuring it. But I just happened to be looking around at this code while trying to figure out some of the performance and figured it was at least speeding it up a tiny bit. merge-recursive.c | 14 ++ 1 file changed, 10 insertions(+), 4 deletions(-) diff --git a/merge-recursive.c b/merge-recursive.c index 6ef1d52f0a..d54466649e 100644 --- a/merge-recursive.c +++ b/merge-recursive.c @@ -451,22 +451,28 @@ static struct stage_data *insert_stage_data(const char *path, static struct string_list *get_unmerged(void) { struct string_list *unmerged = xcalloc(1, sizeof(struct string_list)); + struct string_list_item *item; + const char *last = NULL; int i; unmerged->strdup_strings = 1; for (i = 0; i < active_nr; i++) { - struct string_list_item *item; struct stage_data *e; const struct cache_entry *ce = active_cache[i]; if (!ce_stage(ce)) continue; - item = string_list_lookup(unmerged, ce->name); - if (!item) { - item = string_list_insert(unmerged, ce->name); + if (last == NULL || strcmp(last, ce->name)) { + /* +* active_cache is in sorted order, so we can just call +* string_list_append instead of string_list_insert and +* still end up with a sorted list. +*/ + item = string_list_append(unmerged, ce->name); item->util = xcalloc(1, sizeof(struct stage_data)); } + last = ce->name; e = item->util; e->stages[ce_stage(ce)].mode = ce->ce_mode; oidcpy(>stages[ce_stage(ce)].oid, >oid); -- 2.15.0.46.g41dca04efb
Re: [PATCH] bisect: mention "view" as an alternative to "visualize"
On Fri, Nov 10, 2017 at 4:13 PM, Robert P. J. Daywrote: > On Fri, 10 Nov 2017, Eric Sunshine wrote: > >> Thanks for the patch. Some comments below... >> >> On Fri, Nov 10, 2017 at 11:32 AM, Robert P. J. Day >> wrote: >> > Tweak a number of files to mention "view" as an alternative to >> > "visualize": >> >> You probably want to end this sentence with a period, not a colon. > > not sure about that, what's the standard when you're introducing a > code snippet? It wasn't clear that you were including a snippet since it's not common practice to include the diffstat in the commit message on this project... >> > Documentation/git-bisect.txt | 9 - >> > Documentation/user-manual.txt | 3 ++- >> > builtin/bisect--helper.c | 2 +- >> > contrib/completion/git-completion.bash | 2 +- >> > git-bisect.sh | 4 ++-- >> > 5 files changed, 10 insertions(+), 10 deletions(-) >> >> The diffstat belongs below the "---" separator, otherwise this text >> will (undesirably) become part of the commit message proper. > > no, i actually want that as part of the commit message. my standard > is, if there are 5 or more files that get changed, i like to include a > diff --stat in the commit message so people viewing the log can get a > quick idea of how much has changed. if that's not desired here, i can > remove it. The same information is available to anyone interested in it simply by asking for it: git log --stat ... More generally, commit messages should contain the really important information you want to convey to the reader which _isn't_ available some other way (by, for instance, taking advantage of the tool itself -- such as the above --stat example). On this project, the diffstat is never included as part of the commit message, and it's likely that the patch won't be accepted by Junio like that (or he'll just edit it out if he does accept it). Thanks.
Re: [PATCH] bisect: mention "view" as an alternative to "visualize"
On Fri, 10 Nov 2017, Eric Sunshine wrote: > Thanks for the patch. Some comments below... > > On Fri, Nov 10, 2017 at 11:32 AM, Robert P. J. Day >wrote: > > Tweak a number of files to mention "view" as an alternative to > > "visualize": > > You probably want to end this sentence with a period, not a colon. not sure about that, what's the standard when you're introducing a code snippet? > > Documentation/git-bisect.txt | 9 - > > Documentation/user-manual.txt | 3 ++- > > builtin/bisect--helper.c | 2 +- > > contrib/completion/git-completion.bash | 2 +- > > git-bisect.sh | 4 ++-- > > 5 files changed, 10 insertions(+), 10 deletions(-) > > The diffstat belongs below the "---" separator, otherwise this text > will (undesirably) become part of the commit message proper. no, i actually want that as part of the commit message. my standard is, if there are 5 or more files that get changed, i like to include a diff --stat in the commit message so people viewing the log can get a quick idea of how much has changed. if that's not desired here, i can remove it. > > + git bisect visualize|view > > I think you need parentheses around these terms (see "git bisect > skip", for example): > > git bisect (visualize | view) ah, quite so. > However, in this case, it might be easier for readers if each is > presented on its own line (and subsequent discussion can make it clear > that they are synonyms). > > git bisect visualize > git bisect view > > But, that's a matter of taste... given the precedent already established: git bisect (bad|new|) [] git bisect (good|old|) [...] i'll go with the parentheses and no intervening spaces. i'll let that posting simmer for a bit longer before posting "v2". rday -- Robert P. J. Day Ottawa, Ontario, CANADA http://crashcourse.ca Twitter: http://twitter.com/rpjday LinkedIn: http://ca.linkedin.com/in/rpjday
Re: [PATCH] bisect: mention "view" as an alternative to "visualize"
Thanks for the patch. Some comments below... On Fri, Nov 10, 2017 at 11:32 AM, Robert P. J. Daywrote: > Tweak a number of files to mention "view" as an alternative to > "visualize": You probably want to end this sentence with a period, not a colon. > Documentation/git-bisect.txt | 9 - > Documentation/user-manual.txt | 3 ++- > builtin/bisect--helper.c | 2 +- > contrib/completion/git-completion.bash | 2 +- > git-bisect.sh | 4 ++-- > 5 files changed, 10 insertions(+), 10 deletions(-) The diffstat belongs below the "---" separator, otherwise this text will (undesirably) become part of the commit message proper. > Signed-off-by: Robert P. J. Day > > --- > > here's hoping i have the right format for this patch ... are there > any parts of this that are inappropriate, such as extending the bash > completion? This is the correct place for your commentary. The diffstat should appear below your commentary. > diff --git a/Documentation/git-bisect.txt b/Documentation/git-bisect.txt > @@ -23,7 +23,7 @@ on the subcommand: > git bisect terms [--term-good | --term-bad] > git bisect skip [(|)...] > git bisect reset [] > - git bisect visualize > + git bisect visualize|view I think you need parentheses around these terms (see "git bisect skip", for example): git bisect (visualize | view) However, in this case, it might be easier for readers if each is presented on its own line (and subsequent discussion can make it clear that they are synonyms). git bisect visualize git bisect view But, that's a matter of taste... > @@ -196,15 +196,14 @@ of `git bisect good` and `git bisect bad` to mark > commits. > Bisect visualize > > > -To see the currently remaining suspects in 'gitk', issue the following > -command during the bisection process: > +To see the currently remaining suspects in 'gitk', issue either of the > +following equivalent commands during the bisection process: > > > $ git bisect visualize > +$ git bisect view > > > -`view` may also be used as a synonym for `visualize`. Honestly, I think the original was clearer and placed a bit less cognitive load on the reader. Moreover, for someone scanning the documentation without reading it too deeply, the revised example makes it seem as if it is necessary to invoke both commands rather than one or the other. > diff --git a/Documentation/user-manual.txt b/Documentation/user-manual.txt > @@ -538,10 +538,11 @@ Note that the version which `git bisect` checks out for > you at each > point is just a suggestion, and you're free to try a different > version if you think it would be a good idea. For example, > occasionally you may land on a commit that broke something unrelated; > -run > +run either of the equivalent commands > > - > $ git bisect visualize > +$ git bisect view > - Same observation as above. This has the potential to confuse someone quickly scanning the documentation into thinking that both commands must be invoked. Merely stating in prose that one is the alias of the other might be better. > which will run gitk and label the commit it chose with a marker that > diff --git a/contrib/completion/git-completion.bash > b/contrib/completion/git-completion.bash > index fdd984d34..52f68c922 100644 > --- a/contrib/completion/git-completion.bash > +++ b/contrib/completion/git-completion.bash > @@ -1162,7 +1162,7 @@ _git_bisect () > { > __git_has_doubledash && return > > - local subcommands="start bad good skip reset visualize replay log run" > + local subcommands="start bad good skip reset visualize view replay > log run" People using muscle memory to type "git bisect v" or "...vi" might find it annoying for this to suddenly become ambiguous. Just an observation; no strong opinion on it... > diff --git a/git-bisect.sh b/git-bisect.sh > @@ -20,7 +20,7 @@ git bisect next > find next bisection to test and check it out. > git bisect reset [] > finish bisection search and go back to commit. > -git bisect visualize > +git bisect visualize|view > show bisect status in gitk. Again, this might be easier to read if split over two lines: git bisect visualize git bisect view show bisect status in gitk. in which case it's plenty clear that the commands are synonyms.
[PATCH v1] fsmonitor: simplify determining the git worktree under Windows
I haven't tested the non Windows paths but the patch looks reasonable. This inspired me to get someone more familiar with perl (thanks Johannes) to revisit this code for the Windows side as well. The logic for determining the git worktree when running on Windows is more complex than necessary. It also spawns multiple processes (uname and cygpath) which slows things down. Simplify and speed up the process of finding the git worktree when running on Windows by keeping it in perl and avoiding spawning helper processes. Signed-off-by: Ben PeartSigned-off-by: Johannes Schindelin --- Notes: Base Ref: Web-Diff: https://github.com/benpeart/git/commit/20affe124b Checkout: git fetch https://github.com/benpeart/git fsmonitor_splitindex-v1 && git checkout 20affe124b t/t7519/fsmonitor-watchman | 13 +++-- templates/hooks--fsmonitor-watchman.sample | 13 +++-- 2 files changed, 6 insertions(+), 20 deletions(-) diff --git a/t/t7519/fsmonitor-watchman b/t/t7519/fsmonitor-watchman index 5fe72cefaf..5514edcf68 100755 --- a/t/t7519/fsmonitor-watchman +++ b/t/t7519/fsmonitor-watchman @@ -29,17 +29,10 @@ if ($version == 1) { "Falling back to scanning...\n"; } -# Convert unix style paths to escaped Windows style paths when running -# in Windows command prompt - -my $system = `uname -s`; -$system =~ s/[\r\n]+//g; my $git_work_tree; - -if ($system =~ m/^MSYS_NT/ || $system =~ m/^MINGW/) { - $git_work_tree = `cygpath -aw "\$PWD"`; - $git_work_tree =~ s/[\r\n]+//g; - $git_work_tree =~ s,\\,/,g; +if ($^O =~ 'msys' || $^O =~ 'cygwin') { + $git_work_tree = Win32::GetCwd(); + $git_work_tree =~ tr/\\/\//; } else { require Cwd; $git_work_tree = Cwd::cwd(); diff --git a/templates/hooks--fsmonitor-watchman.sample b/templates/hooks--fsmonitor-watchman.sample index ba6d88c5f8..e673bb3980 100755 --- a/templates/hooks--fsmonitor-watchman.sample +++ b/templates/hooks--fsmonitor-watchman.sample @@ -28,17 +28,10 @@ if ($version == 1) { "Falling back to scanning...\n"; } -# Convert unix style paths to escaped Windows style paths when running -# in Windows command prompt - -my $system = `uname -s`; -$system =~ s/[\r\n]+//g; my $git_work_tree; - -if ($system =~ m/^MSYS_NT/ || $system =~ m/^MINGW/) { - $git_work_tree = `cygpath -aw "\$PWD"`; - $git_work_tree =~ s/[\r\n]+//g; - $git_work_tree =~ s,\\,/,g; +if ($^O =~ 'msys' || $^O =~ 'cygwin') { + $git_work_tree = Win32::GetCwd(); + $git_work_tree =~ tr/\\/\//; } else { require Cwd; $git_work_tree = Cwd::cwd(); base-commit: f9d9e50b62094689773dccc5f9493fa15e30d592 -- 2.15.0.windows.1
Re: [RFC] protocol version 2
On Fri, 20 Oct 2017 10:18:39 -0700 Brandon Williamswrote: > Some of the pain points with the current protocol spec are: After some in-office discussion, I think that the most important pain point is that we have to implement each protocol twice: once for HTTP(S), and once for SSH (and friends) that support bidirectional byte streams. If it weren't for this, I think that what is discussed in this document (e.g. ls-refs, fetch-object) can be less invasively accomplished with v1, specifying "extra parameters" (explained in this e-mail [1]) to merely tweak the output of upload-pack instead of replacing it nearly completely, thus acting more as optimizations than changing the mode of operation entirely. [1] https://public-inbox.org/git/20171010193956.168385-1-jonathanta...@google.com/ > * The server's initial response is the ref advertisement. This > advertisement cannot be omitted and can become an issue due to the > sheer number of refs that can be sent with large repositories. For > example, when contacting the internal equivalent of > `https://android.googlesource.com/`, the server will send > approximately 1 million refs totaling 71MB. This is data that is > sent during each and every fetch and is not scalable. For me, this is not a compelling one, because we can provide a ref whitelist as an "extra parameter" in v1. > * Capabilities were implemented as a hack and are hidden behind a NUL > byte after the first ref sent from the server during the ref > advertisement: > >\0 > > Since they are sent in the context of a pkt-line they are also subject > to the same length limitations (1k bytes with old clients). While we > may not be close to hitting this limitation with capabilities alone, it > has become a problem when trying to abuse capabilities for other > purposes (e.g. > [symrefs](https://public-inbox.org/git/20160816161838.klvjhhoxsftvkfmd@x/)). > > * Various other technical debt (e.g. abusing capabilities to > communicate agent and symref data, service name set using a query > parameter). I think these 2 are the same - I would emphasize the fact that we cannot add more stuff here, rather than the fact that we're putting this behind NUL. > Special Packets > - > > In protocol v2 these special packets will have the following semantics: > > * '' Flush Packet (flush-pkt) - indicates the end of a message > * '0001' End-of-List delimiter (delim-pkt) - indicates the end of a list To address the pain point of HTTP(S) being different from the others (mentioned above), I think the packet semantics should be further qualified: - Communications must be divided up into packets terminated by a flush-pkt. Also, each side must be implemented without knowing whether packets-in-progress can or cannot be seen by the other side. - Each request packet must have a corresponding, possibly empty, response packet. - A request packet may be sent even if a response packet corresponding to a previously sent request packet is awaited. (This allows us to retain the existing optimization in fetch-pack wherein, during negotiation, the "have" request-response packet pairs are interleaved.) This will allow us to more easily share code between HTTP(S) and the others. In summary, I think that we need a big motivation to make the jump from v1 to v2, instead of merely making small changes to v1 (and I do think that the proposed new commands, such as "ls-refs" and "fetch-object", can be implemented merely by small changes). And I think that the ability to better share code between HTTP(S) and others provides that motivation.
Re: [RFD] Long term plan with submodule refs?
> >>> Basically, a workflow where it's easier to have each submodule checked >>> out at master, and we can still keep track of historical relationship >>> of what commit was the submodule at some time ago, but without causing >>> some of these headaches. >> >> So essentially a repo or otherwise parallel workflow just with the versioning >> happening magically behind your back? > > Ideally, my developers would like to just have each submodule checked > out at master. > > Ideally, I'd like to be able to checkout an old version of the parent > project and have it recorded what version of the shared submodule was > at at the time. This sounds as if a "passive superproject" would work best for you, i.e. each commit in a submodule is bubbled up into the superproject, making a commit potentially even behind the scenes, such that the user interaction with the superproject would be none. However this approach also sounds careless, as there is no precondition that e.g. the superproject builds with all the submodules as is; it is a mere tracking of "at this time we have the submodules arranged as such", whereas for the versioning aspect, you would want to have commit messages in the superproject saying *why* you bumped up a specific submodule. The user may not like to give such an explanation as they already wrote a commit message for the individual project. Also this approach sounds like a local approach, as it is not clear to me, why you'd want to share the superproject history. > Ideally, my developers don't want to have to worry about knowing that > they shouldn't "git add -a" or "git commit -a" when they have a > submodule checked out at a different location from the parent projects > gitlink. > > Thanks, > Jake >
Re: [PATCH v2 0/9] sequencer: dont't fork git commit
Phillip Woodwrites: > Here's the summary from the previous version > These patches teach the sequencer to create commits without forking > git commit when the commit message does not need to be edited. This > speeds up cherry picking 10 commits by 26% and picking 10 commits with > rebase --continue by 44%. The first few patches move bits of > builtin/commit.c to sequencer.c. The last two patches actually > implement creating commits in sequencer.c. Thanks. The changes since the initial iteration seems quite small and I didn't find much objectionable. Here are some style fixes I needed to add on top to make the output of "diff master HEAD" checkpatch.pl-clean. I think 3/9 and 9/9 are the culprits. diff --git a/sequencer.c b/sequencer.c index 1f65e82696..a989588ee5 100644 --- a/sequencer.c +++ b/sequencer.c @@ -592,7 +592,7 @@ static int read_env_script(struct argv_array *env) return 0; } -static char *get_author(const char* message) +static char *get_author(const char *message) { size_t len; const char *a; @@ -1104,7 +1104,7 @@ static int try_to_commit(struct strbuf *msg, const char *author, } if (update_head_with_reflog(current_head, oid, - getenv("GIT_REFLOG_ACTION"), msg, )){ + getenv("GIT_REFLOG_ACTION"), msg, )) { res = error("%s", err.buf); goto out; } @@ -1121,7 +1121,7 @@ static int try_to_commit(struct strbuf *msg, const char *author, return res; } -static int do_commit(const char *msg_file, const char* author, +static int do_commit(const char *msg_file, const char *author, struct replay_opts *opts, unsigned int flags) { int res = 1; @@ -1521,7 +1521,7 @@ static int do_pick_commit(enum todo_command command, struct commit *commit, strbuf_addstr(, oid_to_hex(>object.oid)); strbuf_addstr(, ")\n"); } - if (!is_fixup (command)) + if (!is_fixup(command)) author = get_author(msg.message); } diff --git a/sequencer.h b/sequencer.h index 27f34be400..e0be354301 100644 --- a/sequencer.h +++ b/sequencer.h @@ -72,7 +72,7 @@ int template_untouched(const struct strbuf *sb, const char *template_file, enum commit_msg_cleanup_mode cleanup_mode); int update_head_with_reflog(const struct commit *old_head, const struct object_id *new_head, - const char* action, const struct strbuf *msg, + const char *action, const struct strbuf *msg, struct strbuf *err); void commit_post_rewrite(const struct commit *current_head, const struct object_id *new_head);
[PATCH 21/30] merge-recursive: Add get_directory_renames()
This populates a list of directory renames for us. The list of directory renames is not yet used, but will be in subsequent commits. Signed-off-by: Elijah Newren--- merge-recursive.c | 146 ++ 1 file changed, 146 insertions(+) diff --git a/merge-recursive.c b/merge-recursive.c index 89a9b32635..b5770d3d7f 100644 --- a/merge-recursive.c +++ b/merge-recursive.c @@ -1376,6 +1376,124 @@ static struct diff_queue_struct *get_diffpairs(struct merge_options *o, return ret; } +static void get_renamed_dir_portion(const char *old_path, const char *new_path, + char **old_dir, char **new_dir) { + *old_dir = NULL; + *new_dir = NULL; + + /* For +*"a/b/c/d/foo.c" -> "a/b/something-else/d/foo.c" +* the "d/foo.c" part is the same, we just want to know that +*"a/b/c" was renamed to "a/b/something-else" +* so, for this example, this function returns "a/b/c" in +* *old_dir and "a/b/something-else" in *new_dir. +* +* Also, if the basename of the file changed, we don't care. We +* want to know which portion of the directory, if any, changed. +*/ + char *end_of_old = strrchr(old_path, '/'); + char *end_of_new = strrchr(new_path, '/'); + if (end_of_old == NULL || end_of_new == NULL) + return; + while (*--end_of_new == *--end_of_old && + end_of_old != old_path && + end_of_new != new_path) + ; // Do nothing; all in the while loop + /* +* We've found the first non-matching character in the directory +* paths. That means the current directory we were comparing +* represents the rename. Move end_of_old and end_of_new back +* to the full directory name. +*/ + if (*end_of_old == '/') + end_of_old++; + if (*end_of_old != '/') + end_of_new++; + end_of_old = strchr(end_of_old, '/'); + end_of_new = strchr(end_of_new, '/'); + + /* +* It may have been the case that old_path and new_path were the same +* directory all along. Don't claim a rename if they're the same. +*/ + int old_len = end_of_old - old_path; + int new_len = end_of_new - new_path; + + if (old_len != new_len || strncmp(old_path, new_path, old_len)) { + *old_dir = strndup(old_path, old_len); + *new_dir = strndup(new_path, new_len); + } +} + +static struct hashmap *get_directory_renames(struct diff_queue_struct *pairs, +struct tree *tree) { + struct hashmap *dir_renames; + struct hashmap_iter iter; + struct dir_rename_entry *entry; + int i; + + dir_renames = malloc(sizeof(struct hashmap)); + dir_rename_init(dir_renames); + for (i = 0; i < pairs->nr; ++i) { + struct string_list_item *item; + int *count; + struct diff_filepair *pair = pairs->queue[i]; + + char *old_dir, *new_dir; + get_renamed_dir_portion(pair->one->path, pair->two->path, + _dir,_dir); + if (!old_dir) + // Directory didn't change at all; ignore this one. + continue; + + entry = dir_rename_find_entry(dir_renames, old_dir); + if (!entry) { + entry = xcalloc(1, sizeof(struct dir_rename_entry)); + hashmap_entry_init(entry, strhash(old_dir)); + hashmap_put(dir_renames, entry); + entry->dir = old_dir; + } else { + free(old_dir); + } + item = string_list_lookup(>possible_new_dirs, new_dir); + if (!item) { + item = string_list_insert(>possible_new_dirs, new_dir); + item->util = xcalloc(1, sizeof(int)); + } else { + free(new_dir); + } + count = item->util; + *count += 1; + } + + hashmap_iter_init(dir_renames, ); + while ((entry = hashmap_iter_next())) { + int max = 0; + int bad_max = 0; + char *best = NULL; + for (i = 0; i < entry->possible_new_dirs.nr; i++) { + int *count = entry->possible_new_dirs.items[i].util; + if (*count == max) + bad_max = max; + else if (*count > max) { + max = *count; + best = entry->possible_new_dirs.items[i].string; + } + } + if (bad_max == max) +
[PATCH 13/30] directory rename detection: tests for handling overwriting untracked files
Signed-off-by: Elijah Newren--- t/t6043-merge-rename-directories.sh | 314 1 file changed, 314 insertions(+) diff --git a/t/t6043-merge-rename-directories.sh b/t/t6043-merge-rename-directories.sh index bb179b16c8..7af8962512 100755 --- a/t/t6043-merge-rename-directories.sh +++ b/t/t6043-merge-rename-directories.sh @@ -2559,4 +2559,318 @@ test_expect_failure '9g-check: Renamed directory that only contained immediate s # side of history for any implicit directory renames. ### +### +# SECTION 10: Handling untracked files +# +# unpack_trees(), upon which the recursive merge algorithm is based, aborts +# the operation if untracked or dirty files would be deleted or overwritten +# by the merge. Unfortunately, unpack_trees() does not understand renames, +# and if it doesn't abort, then it muddies up the working directory before +# we even get to the point of detecting renames, so we need some special +# handling, at least in the case of directory renames. +### + +# Testcase 10a, Overwrite untracked: normal rename/delete +# Commit A: z/{b,c_1} +# Commit B: z/b + untracked z/c + untracked z/d +# Commit C: z/{b,d_1} +# Expected: Aborted Merge + +# ERROR_MSG(untracked working tree files would be overwritten by merge) + +test_expect_success '10a-setup: Overwrite untracked with normal rename/delete' ' + git rm -rf . && + git clean -fdqx && + rm -rf .git && + git init && + + mkdir z && + echo b >z/b && + echo c >z/c && + git add z && + test_tick && + git commit -m "A" && + + git branch A && + git branch B && + git branch C && + + git checkout B && + git rm z/c && + test_tick && + git commit -m "B" && + + git checkout C && + git mv z/c z/d && + test_tick && + git commit -m "C" +' + +test_expect_success '10a-check: Overwrite untracked with normal rename/delete' ' + git checkout B^0 && + echo very >z/c && + echo important >z/d && + + test_must_fail git merge -s recursive C^0 >out 2>err && + test_i18ngrep "The following untracked working tree files would be overwritten by merge" err && + + test 1 -eq $(git ls-files -s | wc -l) && + test 4 -eq $(git ls-files -o | wc -l) && + + test "very" = "$(cat z/c)" && + test "important" = "$(cat z/d)" && + test $(git rev-parse HEAD:z/b) = $(git rev-parse A:z/b) +' + +# Testcase 10b, Overwrite untracked: dir rename + delete +# Commit A: z/{b,c_1} +# Commit B: y/b + untracked y/{c,d,e} +# Commit C: z/{b,d_1,e} +# Expected: Failed Merge; y/b + untracked y/c + untracked y/d on disk + +# z/c_1 -> z/d_1 rename recorded at stage 3 for y/d + +# ERROR_MSG(refusing to lose untracked file at 'y/d') + +test_expect_success '10b-setup: Overwrite untracked with dir rename + delete' ' + git rm -rf . && + git clean -fdqx && + rm -rf .git && + git init && + + mkdir z && + echo b >z/b && + echo c >z/c && + git add z && + test_tick && + git commit -m "A" && + + git branch A && + git branch B && + git branch C && + + git checkout B && + git rm z/c && + git mv z/ y/ && + test_tick && + git commit -m "B" && + + git checkout C && + git mv z/c z/d && + echo e >z/e && + git add z/e && + test_tick && + git commit -m "C" +' + +test_expect_failure '10b-check: Overwrite untracked with dir rename + delete' ' + git checkout B^0 && + echo very >y/c && + echo important >y/d && + echo contents >y/e && + + test_must_fail git merge -s recursive C^0 >out 2>err && + test_i18ngrep "CONFLICT (rename/delete).*Version C^0 of y/d left in tree at y/d~C^0" out && + test_i18ngrep "Error: Refusing to lose untracked file at y/e; writing to y/e~C^0 instead" out && + + test 3 -eq $(git ls-files -s | wc -l) && + test 2 -eq $(git ls-files -u | wc -l) && + test 5 -eq $(git ls-files -o | wc -l) && + + test $(git rev-parse :0:y/b) = $(git rev-parse A:z/b) && + test "very" = "$(cat y/c)" && + + test "important" = "$(cat y/d)" && + test "important" != "$(git rev-parse :3:y/d)" && + test $(git rev-parse :3:y/d) = $(git rev-parse A:z/c) && + + test "contents" = "$(cat y/e)" && + test "contents" != "$(git rev-parse :3:y/e)" && + test $(git rev-parse :3:y/e) = $(git rev-parse C:z/e) +' + +# Testcase 10c, Overwrite untracked: dir rename/rename(1to2) +# Commit A: z/{a,b}, x/{c,d} +# Commit B: y/{a,b}, w/c, x/d + different untracked y/c +# Commit C: z/{a,b,c}, x/d +# Expected: Failed
[PATCH 19/30] merge-recursive: Split out code for determining diff_filepairs
Create a new function, get_diffpairs() to compute the diff_filepairs between two trees. While these are currently only used in get_renames(), I want them to be available to some new functions. No actual logic changes yet. Signed-off-by: Elijah Newren--- merge-recursive.c | 81 --- 1 file changed, 60 insertions(+), 21 deletions(-) diff --git a/merge-recursive.c b/merge-recursive.c index f40c70990c..8c9543d85c 100644 --- a/merge-recursive.c +++ b/merge-recursive.c @@ -1315,24 +1315,15 @@ static int conflict_rename_rename_2to1(struct merge_options *o, } /* - * Get information of all renames which occurred between 'o_tree' and - * 'tree'. We need the three trees in the merge ('o_tree', 'a_tree' and - * 'b_tree') to be able to associate the correct cache entries with - * the rename information. 'tree' is always equal to either a_tree or b_tree. + * Get the diff_filepairs changed between o_tree and tree. */ -static struct string_list *get_renames(struct merge_options *o, - struct tree *tree, - struct tree *o_tree, - struct tree *a_tree, - struct tree *b_tree, - struct string_list *entries) +static struct diff_queue_struct *get_diffpairs(struct merge_options *o, + struct tree *o_tree, + struct tree *tree) { - int i; - struct string_list *renames; + struct diff_queue_struct *ret; struct diff_options opts; - renames = xcalloc(1, sizeof(struct string_list)); - diff_setup(); DIFF_OPT_SET(, RECURSIVE); DIFF_OPT_CLR(, RENAME_EMPTY); @@ -1348,10 +1339,43 @@ static struct string_list *get_renames(struct merge_options *o, diffcore_std(); if (opts.needed_rename_limit > o->needed_rename_limit) o->needed_rename_limit = opts.needed_rename_limit; - for (i = 0; i < diff_queued_diff.nr; ++i) { + + ret = malloc(sizeof(struct diff_queue_struct)); + ret->queue = diff_queued_diff.queue; + ret->nr = diff_queued_diff.nr; + // Ignore diff_queued_diff.alloc; we won't be changing the size at all + + opts.output_format = DIFF_FORMAT_NO_OUTPUT; + diff_queued_diff.nr = 0; + diff_queued_diff.queue = NULL; + diff_flush(); + return ret; +} + +/* + * Get information of all renames which occurred in 'pairs', making use of + * any implicit directory renames inferred from the other side of history. + * We need the three trees in the merge ('o_tree', 'a_tree' and 'b_tree') + * to be able to associate the correct cache entries with the rename + * information; tree is always equal to either a_tree or b_tree. + */ +static struct string_list *get_renames(struct merge_options *o, + struct diff_queue_struct *pairs, + struct tree *tree, + struct tree *o_tree, + struct tree *a_tree, + struct tree *b_tree, + struct string_list *entries) +{ + int i; + struct string_list *renames; + + renames = xcalloc(1, sizeof(struct string_list)); + + for (i = 0; i < pairs->nr; ++i) { struct string_list_item *item; struct rename *re; - struct diff_filepair *pair = diff_queued_diff.queue[i]; + struct diff_filepair *pair = pairs->queue[i]; if (pair->status != 'R') { diff_free_filepair(pair); continue; @@ -1375,9 +1399,6 @@ static struct string_list *get_renames(struct merge_options *o, item = string_list_insert(renames, pair->one->path); item->util = re; } - opts.output_format = DIFF_FORMAT_NO_OUTPUT; - diff_queued_diff.nr = 0; - diff_flush(); return renames; } @@ -1649,15 +1670,33 @@ static struct rename_info *handle_renames(struct merge_options *o, int *clean) { struct rename_info *rei = xcalloc(1, sizeof(struct rename_info)); + struct diff_queue_struct *head_pairs, *merge_pairs; *clean = 1; if (!o->detect_rename) return NULL; - rei->head_renames = get_renames(o, head, common, head, merge, entries); - rei->merge_renames = get_renames(o, merge, common, head, merge, entries); + head_pairs = get_diffpairs(o, common, head); + merge_pairs = get_diffpairs(o, common, merge); + + rei->head_renames = get_renames(o, head_pairs, head, +common, head, merge, entries); +
[PATCH 27/30] merge-recursive: Apply necessary modifications for directory renames
This commit hooks together all the directory rename logic by making the necessary changes to the rename struct, it's dst_entry, and the diff_filepair under consideration. Signed-off-by: Elijah Newren--- merge-recursive.c | 195 +++- t/t6043-merge-rename-directories.sh | 50 - 2 files changed, 219 insertions(+), 26 deletions(-) diff --git a/merge-recursive.c b/merge-recursive.c index 2a7258f6bb..838bfd32ec 100644 --- a/merge-recursive.c +++ b/merge-recursive.c @@ -166,6 +166,7 @@ static int oid_eq(const struct object_id *a, const struct object_id *b) enum rename_type { RENAME_NORMAL = 0, + RENAME_DIR, RENAME_DELETE, RENAME_ONE_FILE_TO_ONE, RENAME_ONE_FILE_TO_TWO, @@ -599,6 +600,7 @@ struct rename { */ struct stage_data *src_entry; struct stage_data *dst_entry; + unsigned add_turned_into_rename:1; unsigned processed:1; }; @@ -633,6 +635,26 @@ static int update_stages(struct merge_options *opt, const char *path, return 0; } +static int update_stages_for_stage_data(struct merge_options *opt, + const char *path, + const struct stage_data *stage_data) +{ + struct diff_filespec o, a, b; + o.mode = stage_data->stages[1].mode; + oidcpy(, _data->stages[1].oid); + + a.mode = stage_data->stages[2].mode; + oidcpy(, _data->stages[2].oid); + + b.mode = stage_data->stages[3].mode; + oidcpy(, _data->stages[3].oid); + + return update_stages(opt, path, +is_null_sha1(o.oid.hash) ? NULL : , +is_null_sha1(a.oid.hash) ? NULL : , +is_null_sha1(b.oid.hash) ? NULL : ); +} + static void update_entry(struct stage_data *entry, struct diff_filespec *o, struct diff_filespec *a, @@ -1100,6 +1122,18 @@ static int merge_file_one(struct merge_options *o, return merge_file_1(o, , , , branch1, branch2, mfi); } +static int conflict_rename_dir(struct merge_options *o, + struct diff_filepair *pair, + const char *rename_branch, + const char *other_branch) +{ + const struct diff_filespec *dest = pair->two; + + if (update_file(o, 1, >oid, dest->mode, dest->path)) + return -1; + return 0; +} + static int handle_change_delete(struct merge_options *o, const char *path, const char *old_path, const struct object_id *o_oid, int o_mode, @@ -1369,6 +1403,24 @@ static int conflict_rename_rename_2to1(struct merge_options *o, if (!ret) ret = update_file(o, 0, _c2.oid, mfi_c2.mode, new_path2); + /* +* unpack_trees() actually populates the index for us for +* "normal" rename/rename(2to1) situtations so that the +* correct entries are at the higher stages, which would +* make the call below to update_stages_for_stage_data +* unnecessary. However, if either of the renames came +* from a directory rename, then unpack_trees() will not +* have gotten the right data loaded into the index, so we +* need to do so now. (While it'd be tempting to move this +* call to update_stages_for_stage_data() to +* apply_directory_rename_modifications(), that would break +* our intermediate calls to would_lose_untracked() since +* those rely on the current in-memory index. See also the +* big "NOTE" in update_stages()). +*/ + if (update_stages_for_stage_data(o, path, ci->dst_entry1)) + ret = -1; + free(new_path2); free(new_path1); } @@ -1888,6 +1940,120 @@ static char *check_for_directory_rename(struct merge_options *o, return new_path; } +static void apply_directory_rename_modifications(struct merge_options *o, +struct diff_filepair *pair, +char *new_path, +struct rename *re, +struct tree *tree, +struct tree *o_tree, +struct tree *a_tree, +struct tree *b_tree, +struct string_list *entries, +int *clean) +{ + struct
[PATCH 02/30] merge-recursive: Fix logic ordering issue
merge_trees() did a variety of work, including: * Calling get_unmerged() to get unmerged entries * Calling record_df_conflict_files() with all unmerged entries to do some work to ensure we could handle D/F conflicts correctly * Calling get_renames() to check for renames. An easily overlooked issue is that get_renames() can create more unmerged entries and add them to the list, which have the possibility of being involved in D/F conflicts. So the call to record_df_conflict_files() should really be moved after all the rename detection. I didn't come up with any testcases demonstrating any bugs with the old ordering, but I suspect there were some for both normal renames and for directory renames. Fix the ordering. Signed-off-by: Elijah Newren--- merge-recursive.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/merge-recursive.c b/merge-recursive.c index 1d3f8f0d22..52521faf09 100644 --- a/merge-recursive.c +++ b/merge-recursive.c @@ -1981,10 +1981,10 @@ int merge_trees(struct merge_options *o, get_files_dirs(o, merge); entries = get_unmerged(); - record_df_conflict_files(o, entries); re_head = get_renames(o, head, common, head, merge, entries); re_merge = get_renames(o, merge, common, head, merge, entries); clean = process_renames(o, re_head, re_merge); + record_df_conflict_files(o, entries); if (clean < 0) goto cleanup; for (i = entries->nr-1; 0 <= i; i--) { -- 2.15.0.5.g9567be9905
[PATCH 03/30] merge-recursive: Add explanation for src_entry and dst_entry
If I have to walk through the debugger and inspect the values found in here in order to figure out their meaning, despite having known these things inside and out some years back, then they probably need a comment for the casual reader to explain their purpose. Signed-off-by: Elijah Newren--- merge-recursive.c | 22 ++ 1 file changed, 22 insertions(+) diff --git a/merge-recursive.c b/merge-recursive.c index 52521faf09..3526c8d0b8 100644 --- a/merge-recursive.c +++ b/merge-recursive.c @@ -513,6 +513,28 @@ static void record_df_conflict_files(struct merge_options *o, struct rename { struct diff_filepair *pair; + /* +* Because I keep forgetting every few years what src_entry and +* dst_entry are and have to walk through a debugger and puzzle +* through it to remind myself... +* +* If 'before' is renamed to 'after' then src_entry will contain +* the versions of 'before' from the merge_base, HEAD, and MERGE in +* stages 1, 2, and 3; dst_entry will contain the versions of +* 'after' from the merge_base, HEAD, and MERGE in stages 1, 2, and +* 3. Thus, we have a total of six modes and oids, though some +* will be null. (Stage 0 is ignored; we're interested in handling +* conflicts.) +* +* Since we don't turn on break-rewrites by default, neither +* src_entry nor dst_entry can have all three of their stages have +* non-null oids, meaning at most four of the six will be non-null. +* Also, since this is a rename, both src_entry and dst_entry will +* have at least one non-null oid, meaning at least two will be +* non-null. Of the six oids, a typical rename will have three be +* non-null. Only two implies a rename/delete, and four implies a +* rename/add. +*/ struct stage_data *src_entry; struct stage_data *dst_entry; unsigned processed:1; -- 2.15.0.5.g9567be9905
[PATCH 04/30] directory rename detection: basic testcases
Signed-off-by: Elijah Newren--- t/t6043-merge-rename-directories.sh | 391 1 file changed, 391 insertions(+) create mode 100755 t/t6043-merge-rename-directories.sh diff --git a/t/t6043-merge-rename-directories.sh b/t/t6043-merge-rename-directories.sh new file mode 100755 index 00..b737b0a105 --- /dev/null +++ b/t/t6043-merge-rename-directories.sh @@ -0,0 +1,391 @@ +#!/bin/sh + +test_description="recursive merge with directory renames" +# includes checking of many corner cases, with a similar methodology to: +# t6042: corner cases with renames but not criss-cross merges +# t6036: corner cases with both renames and criss-cross merges +# +# The setup for all of them, pictorially, is: +# +# B +# o +# / \ +# A o ? +# \ / +# o +# C +# +# To help make it easier to follow the flow of tests, they have been +# divided into sections and each test will start with a quick explanation +# of what commits A, B, and C contain. +# +# Notation: +#z/{b,c} means files z/b and z/c both exist +#x/d_1 means file x/d exists with content d1. (Purpose of the +# underscore notation is to differentiate different +# files that might be renamed into each other's paths.) + +. ./test-lib.sh + + +### +# SECTION 1: Basic cases we should be able to handle +### + +# Testcase 1a, Basic directory rename. +# Commit A: z/{b,c} +# Commit B: y/{b,c} +# Commit C: z/{b,c,d,e/f} +# Expected: y/{b,c,d,e/f} + +test_expect_success '1a-setup: Simple directory rename detection' ' + mkdir z && + echo b >z/b && + echo c >z/c && + git add z && + test_tick && + git commit -m "A" && + + git branch A && + git branch B && + git branch C && + + git checkout B && + git mv z y && + test_tick && + git commit -m "B" && + + git checkout C && + echo d >z/d && + mkdir z/e && + echo f >z/e/f && + git add z/d z/e/f && + test_tick && + git commit -m "C" +' + +test_expect_failure '1a-check: Simple directory rename detection' ' + git checkout B^0 && + + git merge -s recursive C^0 && + + test 4 -eq $(git ls-files -s | wc -l) && + + test $(git rev-parse HEAD:y/b) = $(git rev-parse A:z/b) && + test $(git rev-parse HEAD:y/c) = $(git rev-parse A:z/c) && + test $(git rev-parse HEAD:y/d) = $(git rev-parse C:z/d) && + test "$(git hash-object y/d)" = $(git rev-parse C:z/d) && + test $(git rev-parse HEAD:y/e/f) = $(git rev-parse C:z/e/f) && + test_must_fail git rev-parse HEAD:z/d && + test_must_fail git rev-parse HEAD:z/e/f && + test ! -d z/d && + test ! -d z/e/f +' + +# Testcase 1b, Merge a directory with another +# Commit A: z/{b,c}, y/d +# Commit B: z/{b,c,e}, y/d +# Commit C: y/{b,c,d} +# Expected: y/{b,c,d,e} + +test_expect_success '1b-setup: Merge a directory with another' ' + git rm -rf . && + git clean -fdqx && + rm -rf .git && + git init && + + mkdir z && + echo b >z/b && + echo c >z/c && + mkdir y && + echo d >y/d && + git add z y && + test_tick && + git commit -m "A" && + + git branch A && + git branch B && + git branch C && + + git checkout B && + echo e >z/e && + git add z/e && + test_tick && + git commit -m "B" && + + git checkout C && + git mv z/b y && + git mv z/c y && + rmdir z && + test_tick && + git commit -m "C" +' + +test_expect_failure '1b-check: Merge a directory with another' ' + git checkout B^0 && + + git merge -s recursive C^0 && + + test 4 -eq $(git ls-files -s | wc -l) && + + test $(git rev-parse HEAD:y/b) = $(git rev-parse A:z/b) && + test $(git rev-parse HEAD:y/c) = $(git rev-parse A:z/c) && + test $(git rev-parse HEAD:y/d) = $(git rev-parse A:y/d) && + test $(git rev-parse HEAD:y/e) = $(git rev-parse B:z/e) && + test_must_fail git rev-parse HEAD:z/e +' + +# Testcase 1c, Transitive renaming +# (Related to testcases 3a and 6d -- when should a transitive rename apply?) +# (Related to testcases 9c and 9d -- can transitivity repeat?) +# Commit A: z/{b,c}, x/d +# Commit B: y/{b,c}, x/d +# Commit C: z/{b,c,d} +# Expected: y/{b,c,d} (because x/d -> z/d -> y/d) + +test_expect_success '1c-setup: Transitive renaming' ' + git rm -rf . && + git clean -fdqx && + rm -rf .git && + git init && + + mkdir z && + echo b >z/b && + echo c >z/c && + mkdir x && + echo d >x/d && + git add z x && + test_tick && + git commit -m "A" && + + git branch A && + git branch B && +
[PATCH 28/30] merge-recursive: Avoid clobbering untracked files with directory renames
Signed-off-by: Elijah Newren--- merge-recursive.c | 39 +++-- t/t6043-merge-rename-directories.sh | 6 +++--- 2 files changed, 40 insertions(+), 5 deletions(-) diff --git a/merge-recursive.c b/merge-recursive.c index 838bfd32ec..1b3ee5b9fb 100644 --- a/merge-recursive.c +++ b/merge-recursive.c @@ -1129,6 +1129,25 @@ static int conflict_rename_dir(struct merge_options *o, { const struct diff_filespec *dest = pair->two; + if (!o->call_depth && would_lose_untracked(dest->path)) { + char *alt_path = unique_path(o, dest->path, rename_branch); + output(o, 1, _("Error: Refusing to lose untracked file at %s; " + "writing to %s instead."), + dest->path, alt_path); + /* +* Write the file in worktree at alt_path, but not in the +* index. Instead, write to dest->path for the index but +* only at the higher appropriate stage. +*/ + if (update_file(o, 0, >oid, dest->mode, alt_path)) + return -1; + free(alt_path); + return update_stages(o, dest->path, NULL, +rename_branch == o->branch1 ? dest : NULL, +rename_branch == o->branch1 ? NULL : dest); + } + + /* Update dest->path both in index and in worktree */ if (update_file(o, 1, >oid, dest->mode, dest->path)) return -1; return 0; @@ -1147,7 +1166,8 @@ static int handle_change_delete(struct merge_options *o, const char *update_path = path; int ret = 0; - if (dir_in_way(path, !o->call_depth, 0)) { + if (dir_in_way(path, !o->call_depth, 0) || + (!o->call_depth && would_lose_untracked(path))) { update_path = alt_path = unique_path(o, path, change_branch); } @@ -1273,6 +1293,10 @@ static int handle_file(struct merge_options *o, dst_name = unique_path(o, rename->path, cur_branch); output(o, 1, _("%s is a directory in %s adding as %s instead"), rename->path, other_branch, dst_name); + } else if (!o->call_depth && would_lose_untracked(rename->path)) { + dst_name = unique_path(o, rename->path, cur_branch); + output(o, 1, _("Refusing to lose untracked file at %s; adding as %s instead"), + rename->path, dst_name); } } if ((ret = update_file(o, 0, >oid, rename->mode, dst_name))) @@ -1398,7 +1422,18 @@ static int conflict_rename_rename_2to1(struct merge_options *o, char *new_path2 = unique_path(o, path, ci->branch2); output(o, 1, _("Renaming %s to %s and %s to %s instead"), a->path, new_path1, b->path, new_path2); - remove_file(o, 0, path, 0); + if (would_lose_untracked(path)) + /* +* Only way we get here is if both renames were from +* a directory rename AND user had an untracked file +* at the location where both files end up after the +* two directory renames. See testcase 10d of t6043. +*/ + output(o, 1, _("Refusing to lose untracked file at " + "%s, even though it's in the way."), + path); + else + remove_file(o, 0, path, 0); ret = update_file(o, 0, _c1.oid, mfi_c1.mode, new_path1); if (!ret) ret = update_file(o, 0, _c2.oid, mfi_c2.mode, diff --git a/t/t6043-merge-rename-directories.sh b/t/t6043-merge-rename-directories.sh index e737bad2c5..6db764a1b6 100755 --- a/t/t6043-merge-rename-directories.sh +++ b/t/t6043-merge-rename-directories.sh @@ -2660,7 +2660,7 @@ test_expect_success '10b-setup: Overwrite untracked with dir rename + delete' ' git commit -m "C" ' -test_expect_failure '10b-check: Overwrite untracked with dir rename + delete' ' +test_expect_success '10b-check: Overwrite untracked with dir rename + delete' ' git checkout B^0 && echo very >y/c && echo important >y/d && @@ -2727,7 +2727,7 @@ test_expect_success '10c-setup: Overwrite untracked with dir rename/rename(1to2) git commit -m "C" ' -test_expect_failure '10c-check: Overwrite untracked with dir rename/rename(1to2)' ' +test_expect_success '10c-check: Overwrite untracked with dir rename/rename(1to2)' ' git checkout B^0 && echo important >y/c && @@ -2793,7 +2793,7 @@ test_expect_success '10d-setup: Delete untracked with dir rename/rename(2to1)' ' git commit -m
[PATCH 20/30] merge-recursive: Add a new hashmap for storing directory renames
This just adds dir_rename_entry and the associated functions; code using these will be added in subsequent commits. Signed-off-by: Elijah Newren--- merge-recursive.c | 24 merge-recursive.h | 8 2 files changed, 32 insertions(+) diff --git a/merge-recursive.c b/merge-recursive.c index 8c9543d85c..89a9b32635 100644 --- a/merge-recursive.c +++ b/merge-recursive.c @@ -49,6 +49,30 @@ static unsigned int path_hash(const char *path) return ignore_case ? strihash(path) : strhash(path); } +static struct dir_rename_entry *dir_rename_find_entry(struct hashmap *hashmap, char *dir) +{ + struct dir_rename_entry key; + + if (dir == NULL) + return NULL; + hashmap_entry_init(, strhash(dir)); + key.dir = dir; + return hashmap_get(hashmap, , NULL); +} + +static int dir_rename_cmp(void *unused_cmp_data, + const struct dir_rename_entry *e1, + const struct dir_rename_entry *e2, + const void *unused_keydata) +{ + return strcmp(e1->dir, e2->dir); +} + +static void dir_rename_init(struct hashmap *map) +{ + hashmap_init(map, (hashmap_cmp_fn) dir_rename_cmp, NULL, 0); +} + static void flush_output(struct merge_options *o) { if (o->buffer_output < 2 && o->obuf.len) { diff --git a/merge-recursive.h b/merge-recursive.h index 80d69d1401..a024949739 100644 --- a/merge-recursive.h +++ b/merge-recursive.h @@ -29,6 +29,14 @@ struct merge_options { struct string_list df_conflict_file_set; }; +struct dir_rename_entry { + struct hashmap_entry ent; /* must be the first member! */ + char *dir; + unsigned non_unique_new_dir:1; + char *new_dir; + struct string_list possible_new_dirs; +}; + /* merge_trees() but with recursive ancestor consolidation */ int merge_recursive(struct merge_options *o, struct commit *h1, -- 2.15.0.5.g9567be9905
[PATCH 30/30] merge-recursive: Fix remaining directory rename + dirty overwrite cases
Signed-off-by: Elijah Newren--- merge-recursive.c | 26 +++--- t/t6043-merge-rename-directories.sh | 8 2 files changed, 27 insertions(+), 7 deletions(-) diff --git a/merge-recursive.c b/merge-recursive.c index 86ddb89727..6ef1d52f0a 100644 --- a/merge-recursive.c +++ b/merge-recursive.c @@ -1296,11 +1296,23 @@ static int handle_file(struct merge_options *o, add = filespec_from_entry(, dst_entry, stage ^ 1); if (add) { + int ren_src_was_dirty = was_dirty(o, rename->path); char *add_name = unique_path(o, rename->path, other_branch); if (update_file(o, 0, >oid, add->mode, add_name)) return -1; - remove_file(o, 0, rename->path, 0); + if (ren_src_was_dirty) { + output(o, 1, _("Refusing to lose dirty file at %s"), + rename->path); + } + /* +* Stupid double negatives in remove_file; it somehow manages +* to repeatedly mess me up. So, just for myself: +*1) update_wd iff !ren_src_was_dirty. +*2) no_wd iff !update_wd +*3) so, no_wd == !!ren_src_was_dirty == ren_src_was_dirty +*/ + remove_file(o, 0, rename->path, ren_src_was_dirty); dst_name = unique_path(o, rename->path, cur_branch); } else { if (dir_in_way(rename->path, !o->call_depth, 0)) { @@ -1436,7 +1448,10 @@ static int conflict_rename_rename_2to1(struct merge_options *o, char *new_path2 = unique_path(o, path, ci->branch2); output(o, 1, _("Renaming %s to %s and %s to %s instead"), a->path, new_path1, b->path, new_path2); - if (would_lose_untracked(path)) + if (was_dirty(o, path)) + output(o, 1, _("Refusing to lose dirty file at %s"), + path); + else if (would_lose_untracked(path)) /* * Only way we get here is if both renames were from * a directory rename AND user had an untracked file @@ -2002,6 +2017,7 @@ static void apply_directory_rename_modifications(struct merge_options *o, { struct string_list_item *item; int stage = (tree == a_tree ? 2 : 3); + int update_wd; /* * In all cases where we can do directory rename detection, @@ -2012,7 +2028,11 @@ static void apply_directory_rename_modifications(struct merge_options *o, * saying the file would have been overwritten), but it might * be dirty, though. */ - remove_file(o, 1, pair->two->path, 0 /* no_wd */); + update_wd = !was_dirty(o, pair->two->path); + if (!update_wd) + output(o, 1, _("Refusing to lose dirty file at %s"), + pair->two->path); + remove_file(o, 1, pair->two->path, !update_wd); /* Find or create a new re->dst_entry */ item = string_list_lookup(entries, new_path); diff --git a/t/t6043-merge-rename-directories.sh b/t/t6043-merge-rename-directories.sh index 02c97c9823..676e72e9e6 100755 --- a/t/t6043-merge-rename-directories.sh +++ b/t/t6043-merge-rename-directories.sh @@ -2985,7 +2985,7 @@ test_expect_success '11b-setup: Avoid losing dirty file involved in directory re git commit -m "C" ' -test_expect_failure '11b-check: Avoid losing dirty file involved in directory rename' ' +test_expect_success '11b-check: Avoid losing dirty file involved in directory rename' ' git checkout B^0 && echo stuff >>z/c && @@ -3109,7 +3109,7 @@ test_expect_success '11d-setup: Avoid losing not-uptodate with rename + D/F conf git commit -m "C" ' -test_expect_failure '11d-check: Avoid losing not-uptodate with rename + D/F conflict' ' +test_expect_success '11d-check: Avoid losing not-uptodate with rename + D/F conflict' ' git checkout B^0 && echo stuff >>z/c && @@ -3178,7 +3178,7 @@ test_expect_success '11e-setup: Avoid deleting not-uptodate with dir rename/rena git commit -m "C" ' -test_expect_failure '11e-check: Avoid deleting not-uptodate with dir rename/rename(1to2)/add' ' +test_expect_success '11e-check: Avoid deleting not-uptodate with dir rename/rename(1to2)/add' ' git checkout B^0 && echo mods >>y/c && @@ -3247,7 +3247,7 @@ test_expect_success '11f-setup: Avoid deleting not-uptodate with dir rename/rena git commit -m "C" ' -test_expect_failure '11f-check: Avoid deleting not-uptodate with dir rename/rename(2to1)' ' +test_expect_success '11f-check: Avoid deleting not-uptodate with dir rename/rename(2to1)' ' git checkout B^0 && echo important >>y/wham && -- 2.15.0.5.g9567be9905
[PATCH 18/30] merge-recursive: Make !o->detect_rename codepath more obvious
Previously, if !o->detect_rename then get_renames() would return an empty string_list, and then process_renames() would have nothing to iterate over. It seems more straightforward to simply avoid calling either function in that case. Signed-off-by: Elijah Newren--- merge-recursive.c | 9 +++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/merge-recursive.c b/merge-recursive.c index 7a3402e50c..f40c70990c 100644 --- a/merge-recursive.c +++ b/merge-recursive.c @@ -1332,8 +1332,6 @@ static struct string_list *get_renames(struct merge_options *o, struct diff_options opts; renames = xcalloc(1, sizeof(struct string_list)); - if (!o->detect_rename) - return renames; diff_setup(); DIFF_OPT_SET(, RECURSIVE); @@ -1652,6 +1650,10 @@ static struct rename_info *handle_renames(struct merge_options *o, { struct rename_info *rei = xcalloc(1, sizeof(struct rename_info)); + *clean = 1; + if (!o->detect_rename) + return NULL; + rei->head_renames = get_renames(o, head, common, head, merge, entries); rei->merge_renames = get_renames(o, merge, common, head, merge, entries); *clean = process_renames(o, rei->head_renames, rei->merge_renames); @@ -1664,6 +1666,9 @@ static void cleanup_renames(struct rename_info *re_info) const struct rename *re; int i; + if (!re_info) + return; + for (i = 0; i < re_info->head_renames->nr; i++) { re = re_info->head_renames->items[i].util; diff_free_filepair(re->pair); -- 2.15.0.5.g9567be9905
[PATCH 25/30] merge-recursive: Check for file level conflicts then get new name
Before trying to apply directory renames to paths within the given directories, we want to make sure that there aren't conflicts at the file level either. If there aren't any, then get the new name from any directory renames. Signed-off-by: Elijah Newren--- merge-recursive.c | 184 ++-- t/t6043-merge-rename-directories.sh | 2 +- 2 files changed, 176 insertions(+), 10 deletions(-) diff --git a/merge-recursive.c b/merge-recursive.c index 251c4cc7fa..7c2c29bb51 100644 --- a/merge-recursive.c +++ b/merge-recursive.c @@ -1481,6 +1481,102 @@ static void get_renamed_dir_portion(const char *old_path, const char *new_path, } } +/* + * Write: + * element1, element2, element3, ..., elementN + * to str. If only one element, just write "element1" to str. + */ +static void comma_separated_list(char *str, struct string_list *slist) { + int i; + for (i=0; inr; i++) { + str += sprintf(str, "%s", slist->items[i].string); + if (i < slist->nr-1) + str += sprintf(str, ", "); + } +} + +/* + * See if there is a directory rename for path, and if there are any file + * level conflicts for the renamed location. If there is a rename and + * there are no conflicts, return the new name. Otherwise, return NULL. + */ +static char* handle_path_level_conflicts(struct merge_options *o, +const char *path, +struct dir_rename_entry *entry, +struct hashmap *collisions, +struct tree *tree) +{ + char *new_path = NULL; + struct collision_entry *collision_ent; + int clean = 1; + + /* +* entry has the mapping of old directory name to new directory name +* that we want to apply to path. +*/ + new_path = apply_dir_rename(entry, path); + + if (!new_path) { + /* This should only happen when entry->non_unique_new_dir set */ + assert(entry->non_unique_new_dir); + output(o, 1, _("CONFLICT (directory rename split): " + "Unclear where to place %s because directory " + "%s was renamed to multiple other directories, " + "with no destination getting a majority of the " + "files."), + path, entry->dir); + clean = 0; + return NULL; + } + + /* +* The caller needs to have ensured that it has pre-populated +* collisions with all paths that map to new_path. Do a quick check +* to ensure that's the case. + */ + collision_ent = collision_find_entry(collisions, new_path); + assert(collision_ent != NULL); + + /* +* Check for one-sided add/add/.../add conflicts, i.e. +* where implicit renames from the other side doing +* directory rename(s) can affect this side of history +* to put multiple paths into the same location. Warn +* and bail on directory renames for such paths. +*/ + char collision_paths[(PATH_MAX+2) * collision_ent->source_files.nr]; + if (collision_ent->reported_already) { + clean = 0; + } else if (tree_has_path(tree, new_path)) { + collision_ent->reported_already = 1; + comma_separated_list(collision_paths, +_ent->source_files); + output(o, 1, _("CONFLICT (implicit dir rename): Existing " + "file/dir at %s in the way of implicit " + "directory rename(s) putting the following " + "path(s) there: %s."), + new_path, collision_paths); + clean = 0; + } else if (collision_ent->source_files.nr > 1) { + collision_ent->reported_already = 1; + comma_separated_list(collision_paths, +_ent->source_files); + output(o, 1, _("CONFLICT (implicit dir rename): Cannot map " + "more than one path to %s; implicit directory " + "renames tried to put these paths there: %s"), + new_path, collision_paths); + clean = 0; + } + + /* Free memory we no longer need */ + if (!clean && new_path) { + free(new_path); + return NULL; + } + + return new_path; +} + /* * There are a couple things we want to do at the directory level: * 1. Check for both sides renaming to the same thing, in order to avoid @@ -1725,6 +1821,58 @@ static void compute_collisions(struct hashmap *collisions, } } +static char
[PATCH 17/30] merge-recursive: Fix leaks of allocated renames and diff_filepairs
get_renames() has always zero'ed out diff_queued_diff.nr while only manually free'ing diff_filepairs that did not correspond to renames. Further, it allocated struct renames that were tucked away in the return string_list. Make sure all of these are deallocated when we are done with them. Signed-off-by: Elijah Newren--- merge-recursive.c | 15 +-- 1 file changed, 13 insertions(+), 2 deletions(-) diff --git a/merge-recursive.c b/merge-recursive.c index 49710c0964..7a3402e50c 100644 --- a/merge-recursive.c +++ b/merge-recursive.c @@ -1661,10 +1661,21 @@ static struct rename_info *handle_renames(struct merge_options *o, static void cleanup_renames(struct rename_info *re_info) { - string_list_clear(re_info->head_renames, 0); - string_list_clear(re_info->merge_renames, 0); + const struct rename *re; + int i; + for (i = 0; i < re_info->head_renames->nr; i++) { + re = re_info->head_renames->items[i].util; + diff_free_filepair(re->pair); + } + string_list_clear(re_info->head_renames, 1); free(re_info->head_renames); + + for (i = 0; i < re_info->merge_renames->nr; i++) { + re = re_info->merge_renames->items[i].util; + diff_free_filepair(re->pair); + } + string_list_clear(re_info->merge_renames, 1); free(re_info->merge_renames); free(re_info); -- 2.15.0.5.g9567be9905
[PATCH 22/30] merge-recursive: Check for directory level conflicts
Before trying to apply directory renames to paths within the given directories, we want to make sure that there aren't conflicts at the directory level. There will be additional checks at the individual file level too, which will be added later. Signed-off-by: Elijah Newren--- merge-recursive.c | 112 ++ 1 file changed, 112 insertions(+) diff --git a/merge-recursive.c b/merge-recursive.c index b5770d3d7f..3633be0123 100644 --- a/merge-recursive.c +++ b/merge-recursive.c @@ -1376,6 +1376,15 @@ static struct diff_queue_struct *get_diffpairs(struct merge_options *o, return ret; } +static int tree_has_path(struct tree *tree, const char *path) +{ + unsigned char hashy[20]; + unsigned mode_o; + + return !get_tree_entry(tree->object.oid.hash, path, + hashy, _o); +} + static void get_renamed_dir_portion(const char *old_path, const char *new_path, char **old_dir, char **new_dir) { *old_dir = NULL; @@ -1425,6 +1434,105 @@ static void get_renamed_dir_portion(const char *old_path, const char *new_path, } } +/* + * There are a couple things we want to do at the directory level: + * 1. Check for both sides renaming to the same thing, in order to avoid + * implicit renaming of files that should be left in place. (See + * testcase 6b in t6043 for details.) + * 2. Prune directory renames if there are still files left in the + * the original directory. These represent a partial directory rename, + * i.e. a rename where only some of the files within the directory + * were renamed elsewhere. (Technically, this could be done earlier + * in get_directory_renames(), except that would prevent us from + * doing the previous check and thus failing testcase 6b.) + * 3. Check for rename/rename(1to2) conflicts (at the directory level). + * In the future, we could potentially record this info as well and + * omit reporting rename/rename(1to2) conflicts for each path within + * the affected directories, thus cleaning up the merge output. + * NOTE: We do NOT check for rename/rename(2to1) conflicts at the + * directory level, because merging directories is fine. If it + * causes conflicts for files within those merged directories, then + * that should be detected at the individual path level. + */ +static void handle_directory_level_conflicts(struct merge_options *o, +struct hashmap *dir_re_head, +struct tree *head, +struct hashmap *dir_re_merge, +struct tree *merge) +{ + struct hashmap_iter iter; + struct dir_rename_entry *head_ent; + struct dir_rename_entry *merge_ent; + int i; + + struct string_list remove_from_head = STRING_LIST_INIT_NODUP; + struct string_list remove_from_merge = STRING_LIST_INIT_NODUP; + + hashmap_iter_init(dir_re_head, ); + while ((head_ent = hashmap_iter_next())) { + merge_ent = dir_rename_find_entry(dir_re_merge, head_ent->dir); + if (merge_ent && + !head_ent->non_unique_new_dir && + !merge_ent->non_unique_new_dir && + !strcmp(head_ent->new_dir, merge_ent->new_dir)) { + /* 1. Renamed identically; remove it from both sides */ + string_list_append(_from_head, + head_ent->dir)->util = head_ent; + free(head_ent->new_dir); + string_list_append(_from_merge, + merge_ent->dir)->util = merge_ent; + free(merge_ent->new_dir); + } else if (tree_has_path(head, head_ent->dir)) { + /* 2. This wasn't a directory rename after all */ + string_list_append(_from_head, + head_ent->dir)->util = head_ent; + free(head_ent->new_dir); + } + } + + hashmap_iter_init(dir_re_merge, ); + while ((merge_ent = hashmap_iter_next())) { + head_ent = dir_rename_find_entry(dir_re_head, merge_ent->dir); + if (tree_has_path(merge, merge_ent->dir)) { + /* 2. This wasn't a directory rename after all */ + string_list_append(_from_merge, + merge_ent->dir)->util = merge_ent; + } else if (head_ent && + !head_ent->non_unique_new_dir && + !merge_ent->non_unique_new_dir) { + /* 3. rename/rename(1to2) */ + /* We can assume it's not
[PATCH 15/30] merge-recursive: Move the get_renames() function
I want to re-use some other functions in the file without moving those other functions or dealing with a handful of annoying split function declarations and definitions. Signed-off-by: Elijah Newren--- merge-recursive.c | 138 +++--- 1 file changed, 69 insertions(+), 69 deletions(-) diff --git a/merge-recursive.c b/merge-recursive.c index 3526c8d0b8..382016508b 100644 --- a/merge-recursive.c +++ b/merge-recursive.c @@ -540,75 +540,6 @@ struct rename { unsigned processed:1; }; -/* - * Get information of all renames which occurred between 'o_tree' and - * 'tree'. We need the three trees in the merge ('o_tree', 'a_tree' and - * 'b_tree') to be able to associate the correct cache entries with - * the rename information. 'tree' is always equal to either a_tree or b_tree. - */ -static struct string_list *get_renames(struct merge_options *o, - struct tree *tree, - struct tree *o_tree, - struct tree *a_tree, - struct tree *b_tree, - struct string_list *entries) -{ - int i; - struct string_list *renames; - struct diff_options opts; - - renames = xcalloc(1, sizeof(struct string_list)); - if (!o->detect_rename) - return renames; - - diff_setup(); - DIFF_OPT_SET(, RECURSIVE); - DIFF_OPT_CLR(, RENAME_EMPTY); - opts.detect_rename = DIFF_DETECT_RENAME; - opts.rename_limit = o->merge_rename_limit >= 0 ? o->merge_rename_limit : - o->diff_rename_limit >= 0 ? o->diff_rename_limit : - 1000; - opts.rename_score = o->rename_score; - opts.show_rename_progress = o->show_rename_progress; - opts.output_format = DIFF_FORMAT_NO_OUTPUT; - diff_setup_done(); - diff_tree_oid(_tree->object.oid, >object.oid, "", ); - diffcore_std(); - if (opts.needed_rename_limit > o->needed_rename_limit) - o->needed_rename_limit = opts.needed_rename_limit; - for (i = 0; i < diff_queued_diff.nr; ++i) { - struct string_list_item *item; - struct rename *re; - struct diff_filepair *pair = diff_queued_diff.queue[i]; - if (pair->status != 'R') { - diff_free_filepair(pair); - continue; - } - re = xmalloc(sizeof(*re)); - re->processed = 0; - re->pair = pair; - item = string_list_lookup(entries, re->pair->one->path); - if (!item) - re->src_entry = insert_stage_data(re->pair->one->path, - o_tree, a_tree, b_tree, entries); - else - re->src_entry = item->util; - - item = string_list_lookup(entries, re->pair->two->path); - if (!item) - re->dst_entry = insert_stage_data(re->pair->two->path, - o_tree, a_tree, b_tree, entries); - else - re->dst_entry = item->util; - item = string_list_insert(renames, pair->one->path); - item->util = re; - } - opts.output_format = DIFF_FORMAT_NO_OUTPUT; - diff_queued_diff.nr = 0; - diff_flush(); - return renames; -} - static int update_stages(struct merge_options *opt, const char *path, const struct diff_filespec *o, const struct diff_filespec *a, @@ -1383,6 +1314,75 @@ static int conflict_rename_rename_2to1(struct merge_options *o, return ret; } +/* + * Get information of all renames which occurred between 'o_tree' and + * 'tree'. We need the three trees in the merge ('o_tree', 'a_tree' and + * 'b_tree') to be able to associate the correct cache entries with + * the rename information. 'tree' is always equal to either a_tree or b_tree. + */ +static struct string_list *get_renames(struct merge_options *o, + struct tree *tree, + struct tree *o_tree, + struct tree *a_tree, + struct tree *b_tree, + struct string_list *entries) +{ + int i; + struct string_list *renames; + struct diff_options opts; + + renames = xcalloc(1, sizeof(struct string_list)); + if (!o->detect_rename) + return renames; + + diff_setup(); + DIFF_OPT_SET(, RECURSIVE); + DIFF_OPT_CLR(, RENAME_EMPTY); + opts.detect_rename = DIFF_DETECT_RENAME; + opts.rename_limit = o->merge_rename_limit >= 0 ? o->merge_rename_limit : +
[PATCH 10/30] directory rename detection: more involved edge/corner testcases
Signed-off-by: Elijah Newren--- t/t6043-merge-rename-directories.sh | 347 1 file changed, 347 insertions(+) diff --git a/t/t6043-merge-rename-directories.sh b/t/t6043-merge-rename-directories.sh index 157299105f..115d0d2622 100755 --- a/t/t6043-merge-rename-directories.sh +++ b/t/t6043-merge-rename-directories.sh @@ -1336,4 +1336,351 @@ test_expect_success '6e-check: Add/add from one side' ' test $(git rev-parse HEAD:z/d) = $(git rev-parse C:z/d) ' + +### +# SECTION 7: More involved Edge/Corner cases +# +# The ruleset we have generated in the above sections seems to provide +# well-defined merges. But can we find edge/corner cases that either (a) +# are harder for users to understand, or (b) have a resolution that is +# non-intuitive or suboptimal? +# +# The testcases in this section dive into cases that I've tried to craft in +# a way to find some that might be surprising to users or difficult for +# them to understand (the next section will look at non-intuitive or +# suboptimal merge results). Some of the testcases are similar to ones +# from past sections, but have been simplified to try to highlight error +# messages using a "modified" path (due to the directory rename). Are +# users okay with these? +# +# In my opinion, testcases that are difficult to understand from this +# section is due to difficulty in the testcase rather than the directory +# renaming (similar to how t6042 and t6036 have difficult resolutions due +# to the problem setup itself being complex). And I don't think the +# error messages are a problem. +# +# On the other hand, the testcases in section 8 worry me slightly more... +### + +# Testcase 7a, rename-dir vs. rename-dir (NOT split evenly) PLUS add-other-file +# Commit A: z/{b,c} +# Commit B: y/{b,c} +# Commit C: w/b, x/c, z/d +# Expected: y/d, CONFLICT(rename/rename for both z/b and z/c) +# NOTE: There's a rename of z/ here, y/ has more renames, so z/d -> y/d. + +test_expect_success '7a-setup: rename-dir vs. rename-dir (NOT split evenly) PLUS add-other-file' ' + git rm -rf . && + git clean -fdqx && + rm -rf .git && + git init && + + mkdir z && + echo b >z/b && + echo c >z/c && + git add z && + test_tick && + git commit -m "A" && + + git branch A && + git branch B && + git branch C && + + git checkout B && + git mv z y && + test_tick && + git commit -m "B" && + + git checkout C && + mkdir w && + mkdir x && + git mv z/b w/ && + git mv z/c x/ && + echo d > z/d && + git add z/d && + test_tick && + git commit -m "C" +' + +test_expect_failure '7a-check: rename-dir vs. rename-dir (NOT split evenly) PLUS add-other-file' ' + git checkout B^0 && + + test_must_fail git merge -s recursive C^0 >out && + test_i18ngrep "CONFLICT (rename/rename).*z/b.*y/b.*w/b" out && + test_i18ngrep "CONFLICT (rename/rename).*z/c.*y/c.*x/c" out && + + test 7 -eq $(git ls-files -s | wc -l) && + test 6 -eq $(git ls-files -u | wc -l) && + test 1 -eq $(git ls-files -o | wc -l) && + + test $(git rev-parse :0:y/d) = $(git rev-parse C:z/d) && + + test $(git rev-parse :1:z/b) = $(git rev-parse A:z/b) && + test $(git rev-parse :2:y/b) = $(git rev-parse A:z/b) && + test $(git rev-parse :3:w/b) = $(git rev-parse A:z/b) && + + test $(git rev-parse :1:z/c) = $(git rev-parse A:z/c) && + test $(git rev-parse :2:y/c) = $(git rev-parse A:z/c) && + test $(git rev-parse :3:x/c) = $(git rev-parse A:z/c) && + + test $(git hash-object y/b) = $(git rev-parse A:z/b) && + test $(git hash-object w/b) = $(git rev-parse A:z/b) && + test $(git hash-object y/c) = $(git rev-parse A:z/c) && + test $(git hash-object x/c) = $(git rev-parse A:z/c) +' + +# Testcase 7b, rename/rename(2to1), but only due to transitive rename +# (Related to testcase 1d) +# Commit A: z/{b,c}, x/d_1, w/d_2 +# Commit B: y/{b,c,d_2}, x/d_1 +# Commit C: z/{b,c,d_1},w/d_2 +# Expected: y/{b,c}, CONFLICT(rename/rename(2to1): x/d_1, w/d_2 -> y_d) + +test_expect_success '7b-setup: rename/rename(2to1), but only due to transitive rename' ' + git rm -rf . && + git clean -fdqx && + rm -rf .git && + git init && + + mkdir z && + mkdir x && + mkdir w && + echo b >z/b && + echo c >z/c && + echo d1 > x/d && + echo d2 > w/d && + git add z x w && + test_tick && + git commit -m "A" && + + git branch A && + git branch B && + git branch C && + + git checkout B && + git mv z y && + git mv w/d y/ && + test_tick && + git commit -m "B" && + + git
[PATCH 14/30] directory rename detection: tests for handling overwriting dirty files
Signed-off-by: Elijah Newren--- t/t6043-merge-rename-directories.sh | 401 1 file changed, 401 insertions(+) diff --git a/t/t6043-merge-rename-directories.sh b/t/t6043-merge-rename-directories.sh index 7af8962512..4066b08767 100755 --- a/t/t6043-merge-rename-directories.sh +++ b/t/t6043-merge-rename-directories.sh @@ -2873,4 +2873,405 @@ test_expect_failure '10e-check: Does git complain about untracked file that is n test "random" = "$(cat z/c)" ' +### +# SECTION 11: Handling dirty (not up-to-date) files +# +# unpack_trees(), upon which the recursive merge algorithm is based, aborts +# the operation if untracked or dirty files would be deleted or overwritten +# by the merge. Unfortunately, unpack_trees() does not understand renames, +# and if it doesn't abort, then it muddies up the working directory before +# we even get to the point of detecting renames, so we need some special +# handling. This was true even of normal renames, but there are additional +# codepaths that need special handling with directory renames. Add +# testcases for both renamed-by-directory-rename-detection and standard +# rename cases. +### + +# Testcase 11a, Avoid losing dirty contents with simple rename +# Commit A: z/{a,b_v1}, +# Commit B: z/{a,c_v1}, and z/c_v1 has uncommitted mods +# Commit C: z/{a,b_v2} +# Expected: ERROR_MSG(Refusing to lose dirty file at z/c) + +# z/a, staged version of z/c has sha1sum matching C:z/b_v2, +# z/c~HEAD with contents of C:z/b_v2, +# z/c with uncommitted mods on top of B:z/c_v1 + +test_expect_success '11a-setup: Avoid losing dirty contents with simple rename' ' + git rm -rf . && + git clean -fdqx && + rm -rf .git && + git init && + + mkdir z && + echo a >z/a && + test_seq 1 10 >z/b && + git add z && + test_tick && + git commit -m "A" && + + git branch A && + git branch B && + git branch C && + + git checkout B && + git mv z/b z/c && + test_tick && + git commit -m "B" && + + git checkout C && + echo 11 >>z/b && + git add z/b && + test_tick && + git commit -m "C" +' + +test_expect_failure '11a-check: Avoid losing dirty contents with simple rename' ' + git checkout B^0 && + echo stuff >>z/c && + + test_must_fail git merge -s recursive C^0 >out 2>err && + test_i18ngrep "Refusing to lose dirty file at z/c" out && + + test_seq 1 10 >expected && + echo stuff >>expected && + + test 2 -eq $(git ls-files -s | wc -l) && + test 1 -eq $(git ls-files -u | wc -l) && + test 4 -eq $(git ls-files -o | wc -l) && + + test $(git rev-parse :0:z/a) = $(git rev-parse A:z/a) && + test $(git rev-parse :2:z/c) = $(git rev-parse C:z/b) && + + test "$(git hash-object z/c~HEAD)" = $(git rev-parse C:z/b) && + test_cmp expected z/c +' + +# Testcase 11b, Avoid losing dirty file involved in directory rename +# Commit A: z/a, x/{b,c_v1} +# Commit B: z/{a,c_v1}, x/b, and z/c_v1 has uncommitted mods +# Commit C: y/a, x/{b,c_v2} +# Expected: y/{a,c_v2}, x/b, z/c_v1 with uncommited mods untracked, +# ERROR_MSG(Refusing to lose dirty file at z/c) + + +test_expect_success '11b-setup: Avoid losing dirty file involved in directory rename' ' + git rm -rf . && + git clean -fdqx && + rm -rf .git && + git init && + + mkdir z x && + echo a >z/a && + echo b >x/b && + test_seq 1 10 >x/c && + git add z x && + test_tick && + git commit -m "A" && + + git branch A && + git branch B && + git branch C && + + git checkout B && + git mv x/c z/c && + test_tick && + git commit -m "B" && + + git checkout C && + git mv z y && + echo 11 >>x/c && + git add x/c && + test_tick && + git commit -m "C" +' + +test_expect_failure '11b-check: Avoid losing dirty file involved in directory rename' ' + git checkout B^0 && + echo stuff >>z/c && + + git merge -s recursive C^0 >out 2>err && + test_i18ngrep "Refusing to lose dirty file at z/c" out && + + grep -q stuff */* && + test_seq 1 10 >expected && + echo stuff >>expected && + + test 3 -eq $(git ls-files -s | wc -l) && + test 0 -eq $(git ls-files -u | wc -l) && + test 0 -eq $(git ls-files -m | wc -l) && + test 4 -eq $(git ls-files -o | wc -l) && + + test $(git rev-parse :0:x/b) = $(git rev-parse A:x/b) && + test $(git rev-parse :0:y/a) = $(git rev-parse A:z/a) && + test $(git rev-parse :0:y/c) = $(git rev-parse C:x/c) && + + test "$(git hash-object y/c)" = $(git rev-parse C:x/c)
[RFC PATCH 29/30] merge-recursive: Fix overwriting dirty files involved in renames
This fixes an issue that existed before my directory rename detection patches that affects both normal renames and renames implied by directory rename detection. Additional codepaths that only affect overwriting of directy files that are involved in directory rename detection will be added in a subsequent commit. Signed-off-by: Elijah Newren--- Seems kinda hacky, in multiple different ways. It seems like there should be a better way to handle lots of different things about this patch, though I have a hard time seeing how without doing a bigger rewrite of the whole interface between unpack_trees and merge-recursive (possibly rewriting them so there isn't an interface but some piece of code that does both functions). Alternate simpler suggestions? merge-recursive.c | 81 - merge-recursive.h | 2 + t/t3501-revert-cherry-pick.sh | 2 +- t/t6043-merge-rename-directories.sh | 2 +- t/t7607-merge-overwrite.sh | 2 +- unpack-trees.c | 4 +- unpack-trees.h | 4 ++ 7 files changed, 73 insertions(+), 24 deletions(-) diff --git a/merge-recursive.c b/merge-recursive.c index 1b3ee5b9fb..86ddb89727 100644 --- a/merge-recursive.c +++ b/merge-recursive.c @@ -323,32 +323,32 @@ static void init_tree_desc_from_tree(struct tree_desc *desc, struct tree *tree) init_tree_desc(desc, tree->buffer, tree->size); } -static int git_merge_trees(int index_only, +static int git_merge_trees(struct merge_options *o, struct tree *common, struct tree *head, struct tree *merge) { int rc; struct tree_desc t[3]; - struct unpack_trees_options opts; - memset(, 0, sizeof(opts)); - if (index_only) - opts.index_only = 1; + memset(>unpack_opts, 0, sizeof(o->unpack_opts)); + if (o->call_depth) + o->unpack_opts.index_only = 1; else - opts.update = 1; - opts.merge = 1; - opts.head_idx = 2; - opts.fn = threeway_merge; - opts.src_index = _index; - opts.dst_index = _index; - setup_unpack_trees_porcelain(, "merge"); + o->unpack_opts.update = 1; + o->unpack_opts.merge = 1; + o->unpack_opts.head_idx = 2; + o->unpack_opts.fn = threeway_merge; + o->unpack_opts.src_index = _index; + o->unpack_opts.dst_index = _index; + setup_unpack_trees_porcelain(>unpack_opts, "merge"); init_tree_desc_from_tree(t+0, common); init_tree_desc_from_tree(t+1, head); init_tree_desc_from_tree(t+2, merge); - rc = unpack_trees(3, t, ); + rc = unpack_trees(3, t, >unpack_opts); + o->unpack_opts.src_index = _index; // unpack_trees NULLifies src_index, but it's used in verify_uptodate, so set it back cache_tree_free(_cache_tree); return rc; } @@ -783,6 +783,20 @@ static int would_lose_untracked(const char *path) return !was_tracked(path) && file_exists(path); } +static int was_dirty(struct merge_options *o, const char *path) +{ + struct cache_entry *ce; + + int dirty = 1; + if (o->call_depth || !was_tracked(path)) + return !dirty; + + ce = cache_file_exists(path, strlen(path), ignore_case); + dirty = (ce->ce_stat_data.sd_mtime.sec > 0 && +verify_uptodate(ce, >unpack_opts) != 0); + return dirty; +} + static int make_room_for_path(struct merge_options *o, const char *path) { int status, i; @@ -2635,6 +2649,7 @@ static int handle_modify_delete(struct merge_options *o, static int merge_content(struct merge_options *o, const char *path, +int file_in_way, struct object_id *o_oid, int o_mode, struct object_id *a_oid, int a_mode, struct object_id *b_oid, int b_mode, @@ -2709,7 +2724,7 @@ static int merge_content(struct merge_options *o, return -1; } - if (df_conflict_remains) { + if (df_conflict_remains || file_in_way) { char *new_path; if (o->call_depth) { remove_file_from_cache(path); @@ -2743,6 +2758,31 @@ static int merge_content(struct merge_options *o, return mfi.clean; } +static int conflict_rename_normal(struct merge_options *o, + const char *path, + struct object_id *o_oid, unsigned o_mode, + struct object_id *a_oid, unsigned a_mode, + struct object_id *b_oid, unsigned b_mode, + struct rename_conflict_info *ci) +{ + int clean_merge; + int file_in_the_way = 0; + + if (was_dirty(o, path)) { +
[PATCH 12/30] directory rename detection: miscellaneous testcases to complete coverage
Signed-off-by: Elijah Newren--- t/t6043-merge-rename-directories.sh | 505 1 file changed, 505 insertions(+) diff --git a/t/t6043-merge-rename-directories.sh b/t/t6043-merge-rename-directories.sh index bdfd943c88..bb179b16c8 100755 --- a/t/t6043-merge-rename-directories.sh +++ b/t/t6043-merge-rename-directories.sh @@ -265,6 +265,7 @@ test_expect_failure '1d-check: Directory renames cause a rename/rename(2to1) con ' # Testcase 1e, Renamed directory, with all filenames being renamed too +# (Related to testcases 9f & 9g) # Commit A: z/{oldb,oldc} # Commit B: y/{newb,newc} # Commit C: z/{oldb,oldc,d} @@ -2054,4 +2055,508 @@ test_expect_failure '8e-check: Both sides rename, one side adds to original dire test_i18ngrep CONFLICT.*rename/rename.*z/b.*y/b.*w/b out ' +### +# SECTION 9: Other testcases +# +# I came up with the testcases in the first eight sections before coding up +# the implementation. The testcases in this section were mostly ones I +# thought of while coding/debugging, and which I was too lazy to insert +# into the previous sections because I didn't want to re-label with all the +# testcase references. :-) +### + +# Testcase 9a, Inner renamed directory within outer renamed directory +# (Related to testcase 1f) +# Commit A: z/{b,c,d/{e,f,g}} +# Commit B: y/{b,c}, x/w/{e,f,g} +# Commit C: z/{b,c,d/{e,f,g,h},i} +# Expected: y/{b,c,i}, x/w/{e,f,g,h} +# NOTE: The only reason this one is interesting is because when a directory +# is split into multiple other directories, we determine by the weight +# of which one had the most paths going to it. A naive implementation +# of that could take the new file in commit C at z/i to x/w/i or x/i. + +test_expect_success '9a-setup: Inner renamed directory within outer renamed directory' ' + git rm -rf . && + git clean -fdqx && + rm -rf .git && + git init && + + mkdir -p z/d && + echo b >z/b && + echo c >z/c && + echo e >z/d/e && + echo f >z/d/f && + echo g >z/d/g && + git add z && + test_tick && + git commit -m "A" && + + git branch A && + git branch B && + git branch C && + + git checkout B && + mkdir x && + git mv z/d x/w && + git mv z y && + test_tick && + git commit -m "B" && + + git checkout C && + echo h >z/d/h && + echo i >z/i && + git add z && + test_tick && + git commit -m "C" +' + +test_expect_failure '9a-check: Inner renamed directory within outer renamed directory' ' + git checkout B^0 && + + git merge -s recursive C^0 && + + test 7 -eq $(git ls-files -s | wc -l) && + test 0 -eq $(git ls-files -u | wc -l) && + test 0 -eq $(git ls-files -o | wc -l) && + + test $(git rev-parse HEAD:y/b) = $(git rev-parse A:z/b) && + test $(git rev-parse HEAD:y/c) = $(git rev-parse A:z/c) && + test $(git rev-parse HEAD:y/i) = $(git rev-parse C:z/i) && + + test $(git rev-parse HEAD:x/w/e) = $(git rev-parse A:z/d/e) && + test $(git rev-parse HEAD:x/w/f) = $(git rev-parse A:z/d/f) && + test $(git rev-parse HEAD:x/w/g) = $(git rev-parse A:z/d/g) && + test $(git rev-parse HEAD:x/w/h) = $(git rev-parse C:z/d/h) +' + +# Testcase 9b, Transitive rename with content merge +# (Related to testcase 1c) +# Commit A: z/{b,c}, x/d_1 +# Commit B: y/{b,c}, x/d_2 +# Commit C: z/{b,c,d_3} +# Expected: y/{b,c,d_merged} + +test_expect_success '9b-setup: Transitive rename with content merge' ' + git rm -rf . && + git clean -fdqx && + rm -rf .git && + git init && + + mkdir z && + echo b >z/b && + echo c >z/c && + mkdir x && + test_seq 1 10 >x/d && + git add z x && + test_tick && + git commit -m "A" && + + git branch A && + git branch B && + git branch C && + + git checkout B && + git mv z y && + test_seq 1 11 >x/d && + git add x/d && + test_tick && + git commit -m "B" && + + git checkout C && + test_seq 0 10 >x/d && + git mv x/d z/d && + git add z/d && + test_tick && + git commit -m "C" +' + +test_expect_failure '9b-check: Transitive rename with content merge' ' + git checkout B^0 && + + git merge -s recursive C^0 && + + test 3 -eq $(git ls-files -s | wc -l) && + + test $(git rev-parse HEAD:y/b) = $(git rev-parse A:z/b) && + test $(git rev-parse HEAD:y/c) = $(git rev-parse A:z/c) && + test_must_fail git rev-parse HEAD:x/d && + test_must_fail git rev-parse HEAD:z/d && + test ! -f z/d && + + test $(git rev-parse HEAD:y/d) != $(git rev-parse A:x/d) && + test $(git
[PATCH 05/30] directory rename detection: directory splitting testcases
Signed-off-by: Elijah Newren--- t/t6043-merge-rename-directories.sh | 125 1 file changed, 125 insertions(+) diff --git a/t/t6043-merge-rename-directories.sh b/t/t6043-merge-rename-directories.sh index b737b0a105..00811f512a 100755 --- a/t/t6043-merge-rename-directories.sh +++ b/t/t6043-merge-rename-directories.sh @@ -388,4 +388,129 @@ test_expect_failure '1f-check: Split a directory into two other directories' ' # in section 2, plus testcases 3a and 4a. ### + +### +# SECTION 2: Split into multiple directories, with equal number of paths +# +# Explore the splitting-a-directory rules a bit; what happens in the +# edge cases? +# +# Note that there is a closely related case of a directory not being +# split on either side of history, but being renamed differently on +# each side. See testcase 8e for that. +### + +# Testcase 2a, Directory split into two on one side, with equal numbers of paths +# Commit A: z/{b,c} +# Commit B: y/b, w/c +# Commit C: z/{b,c,d} +# Expected: y/b, w/c, z/d, with warning about z/ -> (y/ vs. w/) conflict +test_expect_success '2a-setup: Directory split into two on one side, with equal numbers of paths' ' + git rm -rf . && + git clean -fdqx && + rm -rf .git && + git init && + + mkdir z && + echo b >z/b && + echo c >z/c && + git add z && + test_tick && + git commit -m "A" && + + git branch A && + git branch B && + git branch C && + + git checkout B && + mkdir y && + mkdir w && + git mv z/b y/ && + git mv z/c w/ && + test_tick && + git commit -m "B" && + + git checkout C && + echo d >z/d && + git add z/d && + test_tick && + git commit -m "C" +' + +test_expect_failure '2a-check: Directory split into two on one side, with equal numbers of paths' ' + git checkout B^0 && + + test_must_fail git merge -s recursive C^0 >out && + + test 3 -eq $(git ls-files -s | wc -l) && + test 0 -eq $(git ls-files -u | wc -l) && + test 1 -eq $(git ls-files -o | wc -l) && + + test $(git rev-parse :0:y/b) = $(git rev-parse A:z/b) && + test $(git rev-parse :0:w/c) = $(git rev-parse A:z/c) && + test $(git rev-parse :0:z/d) = $(git rev-parse C:z/d) && + test_i18ngrep "CONFLICT.*directory rename split" out +' + +# Testcase 2b, Directory split into two on one side, with equal numbers of paths +# Commit A: z/{b,c} +# Commit B: y/b, w/c +# Commit C: z/{b,c}, x/d +# Expected: y/b, w/c, x/d; No warning about z/ -> (y/ vs. w/) conflict +test_expect_success '2b-setup: Directory split into two on one side, with equal numbers of paths' ' + git rm -rf . && + git clean -fdqx && + rm -rf .git && + git init && + + mkdir z && + echo b >z/b && + echo c >z/c && + git add z && + test_tick && + git commit -m "A" && + + git branch A && + git branch B && + git branch C && + + git checkout B && + mkdir y && + mkdir w && + git mv z/b y/ && + git mv z/c w/ && + test_tick && + git commit -m "B" && + + git checkout C && + mkdir x && + echo d >x/d && + git add x/d && + test_tick && + git commit -m "C" +' + +test_expect_success '2b-check: Directory split into two on one side, with equal numbers of paths' ' + git checkout B^0 && + + git merge -s recursive C^0 >out && + + test 3 -eq $(git ls-files -s | wc -l) && + test 0 -eq $(git ls-files -u | wc -l) && + test 1 -eq $(git ls-files -o | wc -l) && + + test $(git rev-parse :0:y/b) = $(git rev-parse A:z/b) && + test $(git rev-parse :0:w/c) = $(git rev-parse A:z/c) && + test $(git rev-parse :0:x/d) = $(git rev-parse C:x/d) && + ! test_i18ngrep "CONFLICT.*directory rename split" out +' + +### +# Rules suggested by section 2: +# +# None; the rule was already covered in section 1. These testcases are +# here just to make sure the conflict resolution and necessary warning +# messages are handled correctly. +### + test_done -- 2.15.0.5.g9567be9905
[PATCH 07/30] directory rename detection: partially renamed directory testcase/discussion
Signed-off-by: Elijah Newren--- t/t6043-merge-rename-directories.sh | 100 1 file changed, 100 insertions(+) diff --git a/t/t6043-merge-rename-directories.sh b/t/t6043-merge-rename-directories.sh index 021513ec00..ec054b210a 100755 --- a/t/t6043-merge-rename-directories.sh +++ b/t/t6043-merge-rename-directories.sh @@ -650,4 +650,104 @@ test_expect_success '3b-check: Avoid implicit rename if involved as source on cu # of a rename on either side of a merge. ### + +### +# SECTION 4: Partially renamed directory; still exists on both sides of merge +# +# What if we were to attempt to do directory rename detection when someone +# "mostly" moved a directory but still left some files around, or, +# equivalently, fully renamed a directory in one commmit and then recreated +# that directory in a later commit adding some new files and then tried to +# merge? +# +# It's hard to divine user intent in these cases, because you can make an +# argument that, depending on the intermediate history of the side being +# merged, that some users will want files in that directory to +# automatically be detected and renamed, while users with a different +# intermediate history wouldn't want that rename to happen. +# +# I think that it is best to simply not have directory rename detection +# apply to such cases. My reasoning for this is four-fold: (1) it's +# easiest for users in general to figure out what happened if we don't +# apply directory rename detection in any such case, (2) it's an easy rule +# to explain ["We don't do directory rename detection if the directory +# still exists on both sides of the merge"], (3) we can get some hairy +# edge/corner cases that would be really confusing and possibly not even +# representable in the index if we were to even try, and [related to 3] (4) +# attempting to resolve this issue of divining user intent by examining +# intermediate history goes against the spirit of three-way merges and is a +# path towards crazy corner cases that are far more complex than what we're +# already dealing with. +# +# This section contains a test for this partially-renamed-directory case. +### + +# Testcase 4a, Directory split, with original directory still present +# (Related to testcase 1f) +# Commit A: z/{b,c,d,e} +# Commit B: y/{b,c,d}, z/e +# Commit C: z/{b,c,d,e,f} +# Expected: y/{b,c,d}, z/{e,f} +# NOTE: Even though most files from z moved to y, we don't want f to follow. + +test_expect_success '4a-setup: Directory split, with original directory still present' ' + git rm -rf . && + git clean -fdqx && + rm -rf .git && + git init && + + mkdir z && + echo b >z/b && + echo c >z/c && + echo d >z/d && + echo e >z/e && + git add z && + test_tick && + git commit -m "A" && + + git branch A && + git branch B && + git branch C && + + git checkout B && + mkdir y && + git mv z/b y/ && + git mv z/c y/ && + git mv z/d y/ && + test_tick && + git commit -m "B" && + + git checkout C && + echo f >z/f && + git add z/f && + test_tick && + git commit -m "C" +' + +test_expect_success '4a-check: Directory split, with original directory still present' ' + git checkout B^0 && + + git merge -s recursive C^0 && + + test 5 -eq $(git ls-files -s | wc -l) && + test 0 -eq $(git ls-files -u | wc -l) && + test 0 -eq $(git ls-files -o | wc -l) && + + test $(git rev-parse HEAD:y/b) = $(git rev-parse A:z/b) && + test $(git rev-parse HEAD:y/c) = $(git rev-parse A:z/c) && + test $(git rev-parse HEAD:y/d) = $(git rev-parse A:z/d) && + test $(git rev-parse HEAD:z/e) = $(git rev-parse A:z/e) && + test $(git rev-parse HEAD:z/f) = $(git rev-parse C:z/f) +' + +### +# Rules suggested by section 4: +# +# Directory-rename-detection should be turned off for any directories (as +# a source for renames) that exist on both sides of the merge. (The "as +# a source for renames" clarification is due to cases like 1c where +# the target directory exists on both sides and we do want the rename +# detection.) But, sadly, see testcase 8b. +### + test_done -- 2.15.0.5.g9567be9905
[PATCH 26/30] merge-recursive: When comparing files, don't include trees
get_renames() would look up stage data that already existed (populated in get_unmerged(), taken from whatever unpack_trees() created), and if it didn't exist, would call insert_stage_data() to create the necessary entry for the given file. The insert_stage_data() fallback becomes much more important for directory rename detection, because that creates a mechanism to have a file in the resulting merge that didn't exist on either side of history. However, insert_stage_data(), due to calling get_tree_entry() loaded up trees as readily as files. We aren't interested in comparing trees to files; the D/F conflict handling is done elsewhere. This code is just concerned with what entries existed for a given path on the different sides of the merge, so create a get_tree_entry_if_blob() helper function and use it. Signed-off-by: Elijah Newren--- merge-recursive.c | 27 +-- 1 file changed, 21 insertions(+), 6 deletions(-) diff --git a/merge-recursive.c b/merge-recursive.c index 7c2c29bb51..2a7258f6bb 100644 --- a/merge-recursive.c +++ b/merge-recursive.c @@ -407,6 +407,21 @@ static void get_files_dirs(struct merge_options *o, struct tree *tree) read_tree_recursive(tree, "", 0, 0, _all, save_files_dirs, o); } +static int get_tree_entry_if_blob(const unsigned char *tree, + const char *path, + unsigned char *hashy, + unsigned *mode_o) +{ + int ret; + + ret = get_tree_entry(tree, path, hashy, mode_o); + if (S_ISDIR(*mode_o)) { + hashcpy(hashy, null_sha1); + *mode_o = 0; + } + return ret; +} + /* * Returns an index_entry instance which doesn't have to correspond to * a real cache entry in Git's index. @@ -417,12 +432,12 @@ static struct stage_data *insert_stage_data(const char *path, { struct string_list_item *item; struct stage_data *e = xcalloc(1, sizeof(struct stage_data)); - get_tree_entry(o->object.oid.hash, path, - e->stages[1].oid.hash, >stages[1].mode); - get_tree_entry(a->object.oid.hash, path, - e->stages[2].oid.hash, >stages[2].mode); - get_tree_entry(b->object.oid.hash, path, - e->stages[3].oid.hash, >stages[3].mode); + get_tree_entry_if_blob(o->object.oid.hash, path, + e->stages[1].oid.hash, >stages[1].mode); + get_tree_entry_if_blob(a->object.oid.hash, path, + e->stages[2].oid.hash, >stages[2].mode); + get_tree_entry_if_blob(b->object.oid.hash, path, + e->stages[3].oid.hash, >stages[3].mode); item = string_list_insert(entries, path); item->util = e; return e; -- 2.15.0.5.g9567be9905
[PATCH 24/30] merge-recursive: Add computation of collisions due to dir rename & merging
directory renaming and merging can cause one or more files to be moved to where an existing file is, or to cause several files to all be moved to the same (otherwise vacant) location. Add checking and reporting for such cases, falling back to no-directory-rename handling for such paths. Signed-off-by: Elijah Newren--- merge-recursive.c | 118 +- 1 file changed, 116 insertions(+), 2 deletions(-) diff --git a/merge-recursive.c b/merge-recursive.c index 1858686c35..251c4cc7fa 100644 --- a/merge-recursive.c +++ b/merge-recursive.c @@ -1408,6 +1408,30 @@ static int tree_has_path(struct tree *tree, const char *path) hashy, _o); } +/* + * Return a new string that replaces the beginning portion (which matches + * entry->dir), with entry->new_dir. In perl-speak: + * new_path_name = (old_path =~ s/entry->dir/entry->new_dir/); + */ +static char *apply_dir_rename(struct dir_rename_entry *entry, + const char *old_path) { + char *new_path; + int oldlen, newlen; + + if (entry->non_unique_new_dir) + return NULL; + + oldlen = strlen(entry->dir); + assert(strncmp(entry->dir, old_path, oldlen) == 0 && + old_path[oldlen] == '/'); + newlen = strlen(entry->new_dir) + (strlen(old_path) - oldlen) + 1; + new_path = malloc(newlen); + strcpy(new_path, entry->new_dir); + strcpy(_path[strlen(new_path)], _path[oldlen]); + + return new_path; +} + static void get_renamed_dir_portion(const char *old_path, const char *new_path, char **old_dir, char **new_dir) { *old_dir = NULL; @@ -1625,6 +1649,82 @@ static struct hashmap *get_directory_renames(struct diff_queue_struct *pairs, return dir_renames; } +static struct dir_rename_entry *check_dir_renamed(const char *path, + struct hashmap *dir_renames) { + char temp[PATH_MAX]; + char *end; + struct dir_rename_entry *entry; + + strcpy(temp, path); + while ((end = strrchr(temp, '/'))) { + *end = '\0'; + entry = dir_rename_find_entry(dir_renames, temp); + if (entry) + return entry; + } + return NULL; +} + +static void compute_collisions(struct hashmap *collisions, + struct hashmap *dir_renames, + struct diff_queue_struct *pairs) +{ + int i; + + /* +* Multiple files can be mapped to the same path due to directory +* renames done by the other side of history. Since that other +* side of history could have merged multiple directories into one, +* if our side of history added the same file basename to each of +* those directories, then all N of them would get implicitly +* renamed by the directory rename detection into the same path, +* and we'd get an add/add/.../add conflict, and all those adds +* from *this* side of history. This is not representable in the +* index, and users aren't going to easily be able to make sense of +* it. So we need to provide a good warning about what's +* happening, and fall back to no-directory-rename detection +* behavior for those paths. +* +* See testcases 9e and all of section 5 from t6043 for examples. +*/ + collision_init(collisions); + + for (i = 0; i < pairs->nr; ++i) { + struct dir_rename_entry *dir_rename_ent; + struct collision_entry *collision_ent; + char *new_path; + struct diff_filepair *pair = pairs->queue[i]; + + if (pair->status == 'D') + continue; + dir_rename_ent = check_dir_renamed(pair->two->path, dir_renames); + if (!dir_rename_ent) + continue; + + new_path = apply_dir_rename(dir_rename_ent, pair->two->path); + if (!new_path) + /* +* dir_rename_ent->non_unique_new_path is true, which +* means there is no directory rename for us to use, +* which means it won't cause us any additional +* collisions. +*/ + continue; + collision_ent = collision_find_entry(collisions, new_path); + if (!collision_ent) { + collision_ent = xcalloc(1, + sizeof(struct collision_entry)); + hashmap_entry_init(collision_ent, strhash(new_path)); + hashmap_put(collisions, collision_ent); + collision_ent->target_file = new_path; + } else { +
[PATCH 23/30] merge-recursive: Add a new hashmap for storing file collisions
Directory renames with the ability to merge directories opens up the possibility of add/add/add/.../add conflicts, if each of the N directories being merged into one target directory all had a file with the same name. We need a way to check for and report on such collisions; this hashmap will be used for this purpose. Signed-off-by: Elijah Newren--- merge-recursive.c | 23 +++ merge-recursive.h | 7 +++ 2 files changed, 30 insertions(+) diff --git a/merge-recursive.c b/merge-recursive.c index 3633be0123..1858686c35 100644 --- a/merge-recursive.c +++ b/merge-recursive.c @@ -73,6 +73,29 @@ static void dir_rename_init(struct hashmap *map) hashmap_init(map, (hashmap_cmp_fn) dir_rename_cmp, NULL, 0); } +static struct collision_entry *collision_find_entry(struct hashmap *hashmap, + char *target_file) +{ + struct collision_entry key; + + hashmap_entry_init(, strhash(target_file)); + key.target_file = target_file; + return hashmap_get(hashmap, , NULL); +} + +static int collision_cmp(void *unused_cmp_data, +const struct collision_entry *e1, +const struct collision_entry *e2, +const void *unused_keydata) +{ + return strcmp(e1->target_file, e2->target_file); +} + +static void collision_init(struct hashmap *map) +{ + hashmap_init(map, (hashmap_cmp_fn) collision_cmp, NULL, 0); +} + static void flush_output(struct merge_options *o) { if (o->buffer_output < 2 && o->obuf.len) { diff --git a/merge-recursive.h b/merge-recursive.h index a024949739..e02c1e1243 100644 --- a/merge-recursive.h +++ b/merge-recursive.h @@ -37,6 +37,13 @@ struct dir_rename_entry { struct string_list possible_new_dirs; }; +struct collision_entry { + struct hashmap_entry ent; /* must be the first member! */ + char *target_file; + struct string_list source_files; + unsigned reported_already:1; +}; + /* merge_trees() but with recursive ancestor consolidation */ int merge_recursive(struct merge_options *o, struct commit *h1, -- 2.15.0.5.g9567be9905
[PATCH 11/30] directory rename detection: testcases exploring possibly suboptimal merges
Signed-off-by: Elijah Newren--- t/t6043-merge-rename-directories.sh | 371 1 file changed, 371 insertions(+) diff --git a/t/t6043-merge-rename-directories.sh b/t/t6043-merge-rename-directories.sh index 115d0d2622..bdfd943c88 100755 --- a/t/t6043-merge-rename-directories.sh +++ b/t/t6043-merge-rename-directories.sh @@ -1683,4 +1683,375 @@ test_expect_failure '7e-check: transitive rename in rename/delete AND dirs in th test $(git hash-object y/d~C^0) = $(git rev-parse A:x/d) ' + +### +# SECTION 8: Suboptimal merges +# +# As alluded to in the last section, the ruleset we have built up for +# detecting directory renames unfortunately has some special cases where it +# results in slightly suboptimal or non-intuitive behavior. This section +# explores these cases. +# +# To be fair, we already had non-intuitive or suboptimal behavior for most +# of these cases in git before introducing implicit directory rename +# detection, but it'd be nice if there was a modified ruleset out there +# that handled these cases a bit better. +### + +# Testcase 8a, Dual-directory rename, one into the others' way +# Commit A. x/{a,b}, y/{c,d} +# Commit B. x/{a,b,e}, y/{c,d,f} +# Commit C. y/{a,b}, z/{c,d} +# +# Possible Resolutions: +# Previous git: y/{a,b,f}, z/{c,d}, x/e +# Expected: y/{a,b,e,f}, z/{c,d} +# Preferred:y/{a,b,e}, z/{c,d,f} +# +# Note: Both x and y got renamed and it'd be nice to detect both, and we do +# better with directory rename detection than git did previously, but the +# simple rule from section 5 prevents me from handling this as optimally as +# we potentially could. + +test_expect_success '8a-setup: Dual-directory rename, one into the others way' ' + git rm -rf . && + git clean -fdqx && + rm -rf .git && + git init && + + mkdir x && + mkdir y && + echo a >x/a && + echo b >x/b && + echo c >y/c && + echo d >y/d && + git add x y && + test_tick && + git commit -m "A" && + + git branch A && + git branch B && + git branch C && + + git checkout B && + echo e >x/e && + echo f >y/f && + git add x/e y/f && + test_tick && + git commit -m "B" && + + git checkout C && + git mv y z && + git mv x y && + test_tick && + git commit -m "C" +' + +test_expect_failure '8a-check: Dual-directory rename, one into the others way' ' + git checkout B^0 && + + git merge -s recursive C^0 && + + test 6 -eq $(git ls-files -s | wc -l) && + test 0 -eq $(git ls-files -u | wc -l) && + test 0 -eq $(git ls-files -o | wc -l) && + + test $(git rev-parse HEAD:y/a) = $(git rev-parse A:x/a) && + test $(git rev-parse HEAD:y/b) = $(git rev-parse A:x/b) && + test $(git rev-parse HEAD:y/e) = $(git rev-parse B:x/e) && + test $(git rev-parse HEAD:y/f) = $(git rev-parse B:y/f) && + test $(git rev-parse HEAD:z/c) = $(git rev-parse A:y/c) && + test $(git rev-parse HEAD:z/d) = $(git rev-parse A:y/d) +' + +# Testcase 8b, Dual-directory rename, one into the others' way, with conflicting filenames +# Commit A. x/{a_1,b_1}, y/{a_2,b_2} +# Commit B. x/{a_1,b_1,e_1}, y/{a_2,b_2,e_2} +# Commit C. y/{a_1,b_1}, z/{a_2,b_2} +# +# Possible Resolutions: +# Previous git: y/{a_1,b_1,e_2}, z/{a_2,b_2}, x/e_1 +# Scary:y/{a_1,b_1}, z/{a_2,b_2}, CONFLICT(add/add, e_1 vs. e_2) +# Preferred:y/{a_1,b_1,e_1}, z/{a_2,b_2,e_2} +# +# Note: Very similar to 8a, except instead of 'e' and 'f' in directories x and +# y, both are named 'e'. Without directory rename detection, neither file +# moves directories. Implment directory rename detection suboptimally, and +# you get an add/add conflict, but both files were added in commit B, so this +# is an add/add conflict where one side of history added both files -- +# something we can't represent in the index. Obviously, we'd prefer the last +# resolution, but our previous rules are too coarse to allow it. Using both +# the rules from section 4 and section 5 save us from the Scary resolution, +# making us fall back to pre-directory-rename-detection behavior for both +# e_1 and e_2. + +test_expect_success '8b-setup: Dual-directory rename, one into the others way, with conflicting filenames' ' + git rm -rf . && + git clean -fdqx && + rm -rf .git && + git init && + + mkdir x && + mkdir y && + echo a1 >x/a && + echo b1 >x/b && + echo a2 >y/a && + echo b2 >y/b && + git add x y && + test_tick && + git commit -m "A" && + + git branch A && + git branch B && + git branch C && + + git checkout B && + echo e1 >x/e && + echo e2 >y/e
[PATCH 09/30] directory rename detection: testcases checking which side did the rename
Signed-off-by: Elijah Newren--- t/t6043-merge-rename-directories.sh | 283 1 file changed, 283 insertions(+) diff --git a/t/t6043-merge-rename-directories.sh b/t/t6043-merge-rename-directories.sh index d15153c652..157299105f 100755 --- a/t/t6043-merge-rename-directories.sh +++ b/t/t6043-merge-rename-directories.sh @@ -1053,4 +1053,287 @@ test_expect_failure '5d-check: Directory/file/file conflict due to directory ren # back to old handling. But, sadly, see testcases 8a and 8b. ### + +### +# SECTION 6: Same side of the merge was the one that did the rename +# +# It may sound obvious that you only want to apply implicit directory +# renames to directories if the _other_ side of history did the renaming. +# If you did make an implementation that didn't explicitly enforce this +# rule, the majority of cases that would fall under this section would +# also be solved by following the rules from the above sections. But +# there are still a few that stick out, so this section covers them just +# to make sure we also get them right. +### + +# Testcase 6a, Tricky rename/delete +# Commit A: z/{b,c,d} +# Commit B: z/b +# Commit C: y/{b,c}, z/d +# Expected: y/b, CONFLICT(rename/delete, z/c -> y/c vs. NULL) +# Note: We're just checking here that the rename of z/b and z/c to put +# them under y/ doesn't accidentally catch z/d and make it look like +# it is also involved in a rename/delete conflict. + +test_expect_success '6a-setup: Tricky rename/delete' ' + git rm -rf . && + git clean -fdqx && + rm -rf .git && + git init && + + mkdir z && + echo b >z/b && + echo c >z/c && + echo d >z/d && + git add z && + test_tick && + git commit -m "A" && + + git branch A && + git branch B && + git branch C && + + git checkout B && + git rm z/c && + git rm z/d && + test_tick && + git commit -m "B" && + + git checkout C && + mkdir y && + git mv z/b y/ && + git mv z/c y/ && + test_tick && + git commit -m "C" +' + +test_expect_success '6a-check: Tricky rename/delete' ' + git checkout B^0 && + + test_must_fail git merge -s recursive C^0 >out && + test_i18ngrep "CONFLICT (rename/delete).*z/c.*y/c" out && + + test 2 -eq $(git ls-files -s | wc -l) && + test 1 -eq $(git ls-files -u | wc -l) && + test 1 -eq $(git ls-files -o | wc -l) && + + test $(git rev-parse :0:y/b) = $(git rev-parse A:z/b) && + test $(git rev-parse :3:y/c) = $(git rev-parse A:z/c) +' + +# Testcase 6b, Same rename done on both sides +# (Related to testcases 6c and 8e) +# Commit A: z/{b,c} +# Commit B: y/{b,c} +# Commit C: y/{b,c}, z/d +# Note: If we did directory rename detection here, we'd move z/d into y/, +# but C did that rename and still decided to put the file into z/, +# so we probably shouldn't apply directory rename detection for it. + +test_expect_success '6b-setup: Same rename done on both sides' ' + git rm -rf . && + git clean -fdqx && + rm -rf .git && + git init && + + mkdir z && + echo b >z/b && + echo c >z/c && + git add z && + test_tick && + git commit -m "A" && + + git branch A && + git branch B && + git branch C && + + git checkout B && + git mv z y && + test_tick && + git commit -m "B" && + + git checkout C && + git mv z y && + mkdir z && + echo d >z/d && + git add z/d && + test_tick && + git commit -m "C" +' + +test_expect_success '6b-check: Same rename done on both sides' ' + git checkout B^0 && + + git merge -s recursive C^0 && + + test 3 -eq $(git ls-files -s | wc -l) && + test 0 -eq $(git ls-files -u | wc -l) && + test 0 -eq $(git ls-files -o | wc -l) && + + test $(git rev-parse HEAD:y/b) = $(git rev-parse A:z/b) && + test $(git rev-parse HEAD:y/c) = $(git rev-parse A:z/c) && + test $(git rev-parse HEAD:z/d) = $(git rev-parse C:z/d) +' + +# Testcase 6c, Rename only done on same side +# (Related to testcases 6b and 8e) +# Commit A: z/{b,c} +# Commit B: z/{b,c} (no change) +# Commit C: y/{b,c}, z/d +# Expected: y/{b,c}, z/d +# NOTE: Seems obvious, but just checking that the implementation doesn't +# "accidentally detect a rename" and give us y/{b,c,d}. + +test_expect_success '6c-setup: Rename only done on same side' ' + git rm -rf . && + git clean -fdqx && + rm -rf .git && + git init && + + mkdir z && + echo b >z/b && + echo c >z/c && + git add z && + test_tick && +
[PATCH 08/30] directory rename detection: files/directories in the way of some renames
Signed-off-by: Elijah Newren--- t/t6043-merge-rename-directories.sh | 303 1 file changed, 303 insertions(+) diff --git a/t/t6043-merge-rename-directories.sh b/t/t6043-merge-rename-directories.sh index ec054b210a..d15153c652 100755 --- a/t/t6043-merge-rename-directories.sh +++ b/t/t6043-merge-rename-directories.sh @@ -750,4 +750,307 @@ test_expect_success '4a-check: Directory split, with original directory still pr # detection.) But, sadly, see testcase 8b. ### + +### +# SECTION 5: Files/directories in the way of subset of to-be-renamed paths +# +# Implicitly renaming files due to a detected directory rename could run +# into problems if there are files or directories in the way of the paths +# we want to rename. Explore such cases in this section. +### + +# Testcase 5a, Merge directories, other side adds files to original and target +# Commit A: z/{b,c}, y/d +# Commit B: z/{b,c,e_1,f}, y/{d,e_2} +# Commit C: y/{b,c,d} +# Expected: z/e_1, y/{b,c,d,e_2,f} + CONFLICT warning +# NOTE: While directory rename detection is active here causing z/f to +# become y/f, we did not apply this for z/e_1 because that would +# give us an add/add conflict for y/e_1 vs y/e_2. This problem with +# this add/add, is that both versions of y/e are from the same side +# of history, giving us no way to represent this conflict in the +# index. + +test_expect_success '5a-setup: Merge directories, other side adds files to original and target' ' + git rm -rf . && + git clean -fdqx && + rm -rf .git && + git init && + + mkdir z && + echo b >z/b && + echo c >z/c && + mkdir y && + echo d >y/d && + git add z y && + test_tick && + git commit -m "A" && + + git branch A && + git branch B && + git branch C && + + git checkout B && + echo e1 >z/e && + echo f >z/f && + echo e2 >y/e && + git add z/e z/f y/e && + test_tick && + git commit -m "B" && + + git checkout C && + git mv z/b y/ && + git mv z/c y/ && + rmdir z && + test_tick && + git commit -m "C" +' + +test_expect_failure '5a-check: Merge directories, other side adds files to original and target' ' + git checkout B^0 && + + test_must_fail git merge -s recursive C^0 >out && + + test 6 -eq $(git ls-files -s | wc -l) && + test 0 -eq $(git ls-files -u | wc -l) && + test 1 -eq $(git ls-files -o | wc -l) && + + test $(git rev-parse :0:y/b) = $(git rev-parse A:z/b) && + test $(git rev-parse :0:y/c) = $(git rev-parse A:z/c) && + test $(git rev-parse :0:y/d) = $(git rev-parse A:y/d) && + + test $(git rev-parse :0:y/e) = $(git rev-parse B:y/e) && + test $(git rev-parse :0:z/e) = $(git rev-parse B:z/e) && + + test $(git rev-parse :0:y/f) = $(git rev-parse B:z/f) && + + test_i18ngrep "CONFLICT.*implicit dir rename" out +' + +# Testcase 5b, Rename/delete in order to get add/add/add conflict +# (Related to testcase 8d; these may appear slightly inconsistent to users; +#Also related to testcases 7d and 7e) +# Commit A: z/{b,c,d_1} +# Commit B: y/{b,c,d_2} +# Commit C: z/{b,c,d_1,e}, y/d_3 +# Expected: y/{b,c,e}, CONFLICT(add/add: y/d_2 vs. y/d_3) +# NOTE: If z/d_1 in commit C were to be involved in dir rename detection, as +# we normaly would since z/ is being renamed to y/, then this would be +# a rename/delete (z/d_1 -> y/d_1 vs. deleted) AND an add/add/add +# conflict of y/d_1 vs. y/d_2 vs. y/d_3. Add/add/add is not +# representable in the index, so the existence of y/d_3 needs to +# cause us to bail on directory rename detection for that path, falling +# back to git behavior without the directory rename detection. + +test_expect_success '5b-setup: Rename/delete in order to get add/add/add conflict' ' + git rm -rf . && + git clean -fdqx && + rm -rf .git && + git init && + + mkdir z && + echo b >z/b && + echo c >z/c && + echo d1 >z/d && + git add z && + test_tick && + git commit -m "A" && + + git branch A && + git branch B && + git branch C && + + git checkout B && + git rm z/d && + git mv z y && + echo d2 >y/d && + git add y/d && + test_tick && + git commit -m "B" && + + git checkout C && + mkdir y && + echo d3 >y/d && + echo e >z/e && + git add y/d z/e && + test_tick && + git commit -m "C" +' + +test_expect_failure '5b-check: Rename/delete in order to get add/add/add conflict' ' + git
[PATCH 16/30] merge-recursive: Introduce new functions to handle rename logic
The amount of logic in merge_trees() relative to renames was just a few lines, but split it out into new handle_renames() and cleanup_renames() functions to prepare for additional logic to be added to each. No code or logic changes, just a new place to put stuff for when the rename detection gains additional checks. Signed-off-by: Elijah Newren--- merge-recursive.c | 48 ++-- 1 file changed, 38 insertions(+), 10 deletions(-) diff --git a/merge-recursive.c b/merge-recursive.c index 382016508b..49710c0964 100644 --- a/merge-recursive.c +++ b/merge-recursive.c @@ -1638,6 +1638,38 @@ static int process_renames(struct merge_options *o, return clean_merge; } +struct rename_info { + struct string_list *head_renames; + struct string_list *merge_renames; +}; + +static struct rename_info *handle_renames(struct merge_options *o, + struct tree *common, + struct tree *head, + struct tree *merge, + struct string_list *entries, + int *clean) +{ + struct rename_info *rei = xcalloc(1, sizeof(struct rename_info)); + + rei->head_renames = get_renames(o, head, common, head, merge, entries); + rei->merge_renames = get_renames(o, merge, common, head, merge, entries); + *clean = process_renames(o, rei->head_renames, rei->merge_renames); + + return rei; +} + +static void cleanup_renames(struct rename_info *re_info) +{ + string_list_clear(re_info->head_renames, 0); + string_list_clear(re_info->merge_renames, 0); + + free(re_info->head_renames); + free(re_info->merge_renames); + + free(re_info); +} + static struct object_id *stage_oid(const struct object_id *oid, unsigned mode) { return (is_null_oid(oid) || mode == 0) ? NULL: (struct object_id *)oid; @@ -1989,7 +2021,8 @@ int merge_trees(struct merge_options *o, } if (unmerged_cache()) { - struct string_list *entries, *re_head, *re_merge; + struct string_list *entries; + struct rename_info *re_info; int i; /* * Only need the hashmap while processing entries, so @@ -2003,9 +2036,7 @@ int merge_trees(struct merge_options *o, get_files_dirs(o, merge); entries = get_unmerged(); - re_head = get_renames(o, head, common, head, merge, entries); - re_merge = get_renames(o, merge, common, head, merge, entries); - clean = process_renames(o, re_head, re_merge); + re_info = handle_renames(o, common, head, merge, entries, ); record_df_conflict_files(o, entries); if (clean < 0) goto cleanup; @@ -2030,16 +2061,13 @@ int merge_trees(struct merge_options *o, } cleanup: - string_list_clear(re_merge, 0); - string_list_clear(re_head, 0); + cleanup_renames(re_info); + string_list_clear(entries, 1); + free(entries); hashmap_free(>current_file_dir_set, 1); - free(re_merge); - free(re_head); - free(entries); - if (clean < 0) return clean; } -- 2.15.0.5.g9567be9905
[PATCH 06/30] directory rename detection: testcases to avoid taking detection too far
Signed-off-by: Elijah Newren--- t/t6043-merge-rename-directories.sh | 137 1 file changed, 137 insertions(+) diff --git a/t/t6043-merge-rename-directories.sh b/t/t6043-merge-rename-directories.sh index 00811f512a..021513ec00 100755 --- a/t/t6043-merge-rename-directories.sh +++ b/t/t6043-merge-rename-directories.sh @@ -513,4 +513,141 @@ test_expect_success '2b-check: Directory split into two on one side, with equal # messages are handled correctly. ### + +### +# SECTION 3: Path in question is the source path for some rename already +# +# Combining cases from Section 1 and trying to handle them could lead to +# directory renaming detection being over-applied. So, this section +# provides some good testcases to check that the implementation doesn't go +# too far. +### + +# Testcase 3a, Avoid implicit rename if involved as source on other side +# (Related to testcases 1c and 1f) +# Commit A: z/{b,c,d} +# Commit B: z/{b,c,d} (no change) +# Commit C: y/{b,c}, x/d +# Expected: y/{b,c}, x/d +test_expect_success '3a-setup: Avoid implicit rename if involved as source on other side' ' + git rm -rf . && + git clean -fdqx && + rm -rf .git && + git init && + + mkdir z && + echo b >z/b && + echo c >z/c && + echo d >z/d && + git add z && + test_tick && + git commit -m "A" && + + git branch A && + git branch B && + git branch C && + + git checkout B && + test_tick && + git commit --allow-empty -m "B" && + + git checkout C && + mkdir y && + mkdir x && + git mv z/b y/ && + git mv z/c y/ && + git mv z/d x/ && + rmdir z && + test_tick && + git commit -m "C" +' + +test_expect_success '3a-check: Avoid implicit rename if involved as source on other side' ' + git checkout B^0 && + + git merge -s recursive C^0 && + + test 3 -eq $(git ls-files -s | wc -l) && + + test $(git rev-parse HEAD:y/b) = $(git rev-parse A:z/b) && + test $(git rev-parse HEAD:y/c) = $(git rev-parse A:z/c) && + test $(git rev-parse HEAD:x/d) = $(git rev-parse A:z/d) +' + +# Testcase 3b, Avoid implicit rename if involved as source on other side +# (Related to testcases 5c and 7c, also kind of 1e and 1f) +# Commit A: z/{b,c,d} +# Commit B: y/{b,c}, x/d +# Commit C: z/{b,c}, w/d +# Expected: y/{b,c}, CONFLICT:(z/d -> x/d vs. w/d) +# NOTE: We're particularly checking that since z/d is already involved as +# a source in a file rename on the same side of history, that we don't +# get it involved in directory rename detection. If it were, we might +# end up with CONFLICT:(z/d -> y/d vs. x/d vs. w/d), i.e. a +# rename/rename/rename(1to3) conflict, which is just weird. +test_expect_success '3b-setup: Avoid implicit rename if involved as source on current side' ' + git rm -rf . && + git clean -fdqx && + rm -rf .git && + git init && + + mkdir z && + echo b >z/b && + echo c >z/c && + echo d >z/d && + git add z && + test_tick && + git commit -m "A" && + + git branch A && + git branch B && + git branch C && + + git checkout B && + mkdir y && + mkdir x && + git mv z/b y/ && + git mv z/c y/ && + git mv z/d x/ && + rmdir z && + test_tick && + git commit -m "B" && + + git checkout C && + mkdir w && + git mv z/d w/ && + test_tick && + git commit -m "C" +' + +test_expect_success '3b-check: Avoid implicit rename if involved as source on current side' ' + git checkout B^0 && + + test_must_fail git merge -s recursive C^0 >out && + + test 5 -eq $(git ls-files -s | wc -l) && + test 3 -eq $(git ls-files -u | wc -l) && + test 1 -eq $(git ls-files -o | wc -l) && + + test $(git rev-parse :0:y/b) = $(git rev-parse A:z/b) && + test $(git rev-parse :0:y/c) = $(git rev-parse A:z/c) && + + test $(git rev-parse :1:z/d) = $(git rev-parse A:z/d) && + test $(git rev-parse :2:x/d) = $(git rev-parse A:z/d) && + test $(git rev-parse :3:w/d) = $(git rev-parse A:z/d) && + test ! -f z/d && + test $(git hash-object x/d) = $(git rev-parse A:z/d) && + test $(git hash-object w/d) = $(git rev-parse A:z/d) && + + test_i18ngrep CONFLICT.*rename/rename.*z/d.*x/d.*w/d out && + ! test_i18ngrep CONFLICT.*rename/rename.*y/d +' + +### +# Rules suggested by section 3: +# +# Avoid directory-rename-detection for a path, if that path is the source +# of a rename on
[PATCH 00/30] Add directory rename detection to git
[This series is entirely independent of my rename detection limits series. However, I have a separate rename detection performance series that depends on both this series and the rename detection limits series.] In this patchset, I introduce directory rename detection to merge-recursive, predominantly so that when files are added to directories on one side of history and those directories are renamed on the other side of history, the files will end up in the proper location after a merge or cherry-pick. However, this isn't limited to that simplistic case. More interesting possibilities exist, such as: * a file being renamed into a directory which is renamed on the other side of history, causing the need for a transitive rename. * two (or three or N) directories being merged (with no conflicts so long as files/directories within the merged directory have different names), and the "merging" being detected as a directory rename for each original directory. * not all files in a directory being renamed to the same location; i.e. perhaps the directory was renamed, but some files within it were renamed to a different location * a directory being renamed, which also contained a subdirectory that was renamed to some entirely different location. (And perhaps the inner directory itself contained inner directories that were renamed to yet other locations). Also, I found it useful to allow all files within the directory being renamed to themselves be renamed and still detect the directory rename. For example, if goal/a and goal/b are renamed to priority/alpha and priority/bravo, we can detect that goal/ was renamed to priority/, so that if someone adds goal/c on the other side of history, after the merge we'll end up with priority/c. (In the absence of a readily available libmindread.so library that I can link to, we can't rename directly from goal/c to priority/charlie automatically, and will need to have priority/c suffice.) Naturally, an attempt to do all of the above brings up all kinds of interesting edge and corner cases, some of which result in conflicts that cannot be represented in the index, and others of which might be considered too complex for users to understand and resolve. For example: * An add/add/add/.../add conflict, all on one side of history (see testcase 9e in the new t6043, or any of the testcases in section 5) * Doubly, triply, or N-fold transitive renames (testcases 9c & 9d) In order to prevent such problems, I introduce a couple basic rules that limit when directory rename detection applies: 1) If a subset of to-be-renamed files have a file or directory in the way (or would be in the way of each other), "turn off" the directory rename for those specific sub-paths and report the conflict to the user. 2) If the other side of history did a directory rename to a path that your side of history renamed away, then ignore that particular rename from the other side of history for any implicit directory renames (but warn the user). Further, there's a basic question about when directory rename detection should be applied at all. I have a simple rule: 3) If a given directory still exists on both sides of a merge, we do not consider it to have been renamed. Rule 3 may sound obvious at first, but it will probably arise as a question for some users -- what if someone "mostly" moved a directory but still left some files around, or, equivalently (from the perspective of the three-way merge that merge-recursive performs), fully renamed a directory in one commmit and then recreated that directory in a later commit adding some new files and then tried to merge? See the big comment in section 4 of the new t6043 for further discussion of this rule. This set of rules seems to be reasonably easy to explain, is self-consistent, allows all conflict cases to be represented without changing any on-disk data structures or introducing new terminology or commands for users, prevents excessively complex conflicts that users might struggle to understand, and brings peace to the middle east. Actually, maybe not that last one. While I feel that this directory rename detection reduces the number of suboptimal merges and cherry-picks that git performs, there are sadly still a number of cases that remain suboptimal, or that even newly appear to be not-quite-consistent with other cases. The fact that one file layout might trigger some of the rules above while another "slightly" different file layout doesn't might occasionally cause some user grumblings. I've tried to explore and document these cases in section 8 of the new t6043-merge-rename-directories.sh Finally, from an implementation perspective, there's another strong advantage to the ruleset above: it means that any path to which we want to apply an implicit directory rename will have a free and open spot for us to move it into. Thus, we can just adjust
[PATCH 01/30] Tighten and correct a few testcases for merging and cherry-picking
t3501 had a testcase originally added to ensure cherry-pick wouldn't segfault when working with a dirty file involved in a rename. While the segfault was fixed, there was another problem this test demonstrated: namely, that git would overwrite a dirty file involved in a rename. Further, the test encoded a "successful merge" and overwriting of this file as correct behavior. Modify the test so that it would still catch the segfault, but to require the correct behavior. t7607 had a test specific to looking for a merge overwriting a dirty file involved in a rename, but it too actually encoded what I would term incorrect behavior: it expected the merge to succeed. Fix that, and add a few more checks to make sure that the merge really does produce the expected results. Signed-off-by: Elijah Newren--- t/t3501-revert-cherry-pick.sh | 7 +-- t/t7607-merge-overwrite.sh| 5 - 2 files changed, 9 insertions(+), 3 deletions(-) diff --git a/t/t3501-revert-cherry-pick.sh b/t/t3501-revert-cherry-pick.sh index 4f2a263b63..783bdbf59d 100755 --- a/t/t3501-revert-cherry-pick.sh +++ b/t/t3501-revert-cherry-pick.sh @@ -141,7 +141,7 @@ test_expect_success 'cherry-pick "-" works with arguments' ' test_cmp expect actual ' -test_expect_success 'cherry-pick works with dirty renamed file' ' +test_expect_failure 'cherry-pick works with dirty renamed file' ' test_commit to-rename && git checkout -b unrelated && test_commit unrelated && @@ -150,7 +150,10 @@ test_expect_success 'cherry-pick works with dirty renamed file' ' test_tick && git commit -m renamed && echo modified >renamed && - git cherry-pick refs/heads/unrelated + test_must_fail git cherry-pick refs/heads/unrelated >out && + test_i18ngrep "Refusing to lose dirty file at renamed" out && + test $(git rev-parse :0:renamed) = $(git rev-parse HEAD^:to-rename.t) && + grep -q "^modified$" renamed ' test_done diff --git a/t/t7607-merge-overwrite.sh b/t/t7607-merge-overwrite.sh index 9444d6a9b9..00617dadf8 100755 --- a/t/t7607-merge-overwrite.sh +++ b/t/t7607-merge-overwrite.sh @@ -97,7 +97,10 @@ test_expect_failure 'will not overwrite unstaged changes in renamed file' ' git mv c1.c other.c && git commit -m rename && cp important other.c && - git merge c1a && + test_must_fail git merge c1a >out && + test_i18ngrep "Refusing to lose dirty file at other.c" out && + test -f other.c~HEAD && + test $(git hash-object other.c~HEAD) = $(git rev-parse c1a:c1.c) && test_cmp important other.c ' -- 2.15.0.5.g9567be9905
Re: [PATCH v3 0/8] Create Git/Packet.pm
Christian Couderwrites: > There are only a few small changes compared to v2: Please go incremental as v2 is already in 'next', and small changes are easier to reiew and understand when given as follow-up "polish this minor glitch in the initial effort" patches. Thanks.
Re: [PATCH v2 2/9] commit: move empty message checks to libgit
On 10/11/17 11:09, Phillip Wood wrote: > From: Phillip Wood> > Move the functions that check for empty messages from bulitin/commit.c > to sequencer.c so they can be shared with other commands. The > functions are refactored to take an explicit cleanup mode and template > filename passed by the caller. > > Signed-off-by: Phillip Wood > --- > > Notes: > changes since v1: > - prefix cleanup_mode enum and constants with commit_msg_ > > builtin/commit.c | 99 > +++- > sequencer.c | 61 ++ > sequencer.h | 11 +++ > 3 files changed, 91 insertions(+), 80 deletions(-) > Just an idle thought - why are these functions moving to sequencer.[ch] rather than commit[.ch]? Similar comments for other patches in the series which moves code from builtin/commit.c to sequencer.[ch]. ATB, Ramsay Jones
Re: [PATCH v2 3/9] Add a function to update HEAD after creating a commit
Phillip Woodwrites: > From: Phillip Wood > > Add update_head() based on the code that updates HEAD after committing > in builtin/commit.c that can be called by 'git commit' and other > commands. > > Signed-off-by: Phillip Wood > --- > > Notes: > changes since v1: > - rename update_head() to update_head_with_reflog() The above proposed log message still calls it with the old name though.
Re: [PATCH 2/4] Remove silent clamp of renameLimit
Thanks for taking a look! On Fri, Nov 10, 2017 at 10:26 AM, Stefan Bellerwrote: >> - if (rename_limit <= 0 || rename_limit > 32767) >> - rename_limit = 32767; >> if ((num_create <= rename_limit || num_src <= rename_limit) && >> - (num_create * num_src <= rename_limit * rename_limit)) >> + ((double)num_create * (double)num_src >> +<= (double)rename_limit * (double)rename_limit)) >> return 0; > > From a technical perspective, I would think that if > (num_create <= rename_limit || num_src <= rename_limit) > holds true, that the double-cast condition would also be always true? > Could we just remove that last check? Not necessarily. For example, if num_create = rename_limit-1 and num_src = rename_limit+2, then the first condition will be satisfied but the second won't. If it was && rather than ||, then the second condition would be superfluous. > Or phrased differently, if we can cast to double and extend the check > here, do we have to adapt code at other places as well? Good point, and yes. Perhaps I should have re-ordered my patch series because I came back to it later and realized that the progress code was broken due to overflow/wraparound, and a patch 3 fixed that. Further, the later patch used uint64_t instead of double. While double works, perhaps I should change the double here to uint64_t for consistency?
Re: [RFC] cover-at-tip
On Fri, 10 Nov 2017 16:37:49 +0100 Nicolas Morey-Chaisemartinwrote: > > Hi, > > > > I'm starting to look into the cover-at-tip topic that I found in the > > leftover bits (http://www.spinics.net/lists/git/msg259573.html) Thanks - I personally would find this very useful. > > Here's a first draft of a patch that adds support for format-patch > > --cover-at-tip. It compiles and works in my nice and user firnedly test > > case. > > Just wanted to make sure I was going roughly in the right direction here. > > > > > > I was wondering where is the right place to put a commit_is_cover_at_tip() > > as the test will be needed in other place as the feature is extended to git > > am/merge/pull. I think you can put this in (root)/commit.c, especially since that test operates on a "struct commit *". > > Feel free to comment. I know the help is not clear at this point and > > there's still some work to do on option handling (add a config option, > > probably have --cover-at-tip imply --cover-letter, etc) and > > some testing :) Both are good ideas. You should probably use a --cover-letter={no,auto,yes} instead of the current boolean, so that the config can use the same options and configuring it to "auto" (to use a cover letter if the tip is empty and singly-parented, and not to use a cover letter otherwise) is meaningful. > The proposed patch for format-patch does not output any "---" to signal the > end of the commit log and the begining of the patch in the cover letter. > This means that the log summary, the diffstat and the git footer ( --\n version>) is seen as part of the commit log. Which is just wrong. > > Removing them would solve the issue but I feel they bring some useful info > (or they would not be here). > Adding a "---" between the commit log and those added infos poses another > problem: git am does not see an empty patch anymore. > I would need to add "some" level of parsing to am.c to make sure the patch > content is just garbage and that there are no actual hunks for that. Could you just take the message from the commit and put that in the cover letter? The summary and diffstat do normally have useful info, but if the commit is specifically made to be used only for the cover letter, I think that is no longer true.
Re: [PATCH 0/4] Fix issues with rename detection limits
On Fri, Nov 10, 2017 at 10:13 AM, Elijah Newrenwrote: > In a repo with around 60k files with deep directory hierarchies (due to > Elijah Newren (4): > sequencer: Warn when internal merge may be suboptimal due to > renameLimit > Remove silent clamp of renameLimit > progress: Fix progress meters when dealing with lots of work > sequencer: Show rename progress during cherry picks Sorry for the duplicate send. I noticed the cover letter didn't appear on in the email list, double-checked https://public-inbox.org/git/, waited over half an hour and double checked both email and public-inbox again and still didn't see it, so I re-sent. Just as soon as I did, it seems that the original and the new cover letter emails suddenly showed up right then. *shrug*.
Re: [PATCH 2/4] Remove silent clamp of renameLimit
On Fri, Nov 10, 2017 at 9:39 AM, Elijah Newrenwrote: > In commit 0024a5492 (Fix the rename detection limit checking; 2007-09-14), > the renameLimit was clamped to 32767. This appears to have been to simply > avoid integer overflow in the following computation: > >num_create * num_src <= rename_limit * rename_limit > > although it also could be viewed as a hardcoded bound on the amount of CPU > time we're willing to allow users to tell git to spend on handling > renames. An upper bound may make sense, particularly as the computation > is O(rename_limit^2), but only if the bound is documented and communicated > to the user -- neither of which were true. > > In fact, the silent clamping of the renameLimit to a smaller value and > lack of reporting of the needed renameLimit when it was too large made it > appear to the user as though they had used a high enough value; however, > git would proceed to mess up the merge or cherry-pick badly based on the > lack of rename detection. Some hardy folks, despite the lack of feedback > on the correct limit to choose, were desperate enough to repeatedly retry > their cherry-picks with increasingly larger renameLimit values (going > orders of magnitude beyond the built-in limit of 32767), but were > consistently met with the same failure. > > Although large limits can make things slow, we have users who would be > ecstatic to have a small five file change be correctly cherry picked even > if they have to manually specify a large limit and it took git ten minutes > to compute it. > > Signed-off-by: Elijah Newren > --- > diff.c| 2 +- > diffcore-rename.c | 11 --- > 2 files changed, 5 insertions(+), 8 deletions(-) > > diff --git a/diff.c b/diff.c > index 6fd288420b..c6597e3231 100644 > --- a/diff.c > +++ b/diff.c > @@ -5524,7 +5524,7 @@ void diff_warn_rename_limit(const char *varname, int > needed, int degraded_cc) > warning(_(rename_limit_warning)); > else > return; > - if (0 < needed && needed < 32767) > + if (0 < needed) > warning(_(rename_limit_advice), varname, needed); > } > > diff --git a/diffcore-rename.c b/diffcore-rename.c > index 0d8c3d2ee4..7f9a463f5a 100644 > --- a/diffcore-rename.c > +++ b/diffcore-rename.c > @@ -391,14 +391,10 @@ static int too_many_rename_candidates(int num_create, > * growing larger than a "rename_limit" square matrix, ie: > * > *num_create * num_src > rename_limit * rename_limit > -* > -* but handles the potential overflow case specially (and we > -* assume at least 32-bit integers) > */ > - if (rename_limit <= 0 || rename_limit > 32767) > - rename_limit = 32767; > if ((num_create <= rename_limit || num_src <= rename_limit) && > - (num_create * num_src <= rename_limit * rename_limit)) > + ((double)num_create * (double)num_src > +<= (double)rename_limit * (double)rename_limit)) > return 0; >From a technical perspective, I would think that if (num_create <= rename_limit || num_src <= rename_limit) holds true, that the double-cast condition would also be always true? Could we just remove that last check? Or phrased differently, if we can cast to double and extend the check here, do we have to adapt code at other places as well? > > options->needed_rename_limit = > @@ -415,7 +411,8 @@ static int too_many_rename_candidates(int num_create, > num_src++; > } > if ((num_create <= rename_limit || num_src <= rename_limit) && > - (num_create * num_src <= rename_limit * rename_limit)) > + ((double)num_create * (double)num_src > +<= (double)rename_limit * (double)rename_limit)) > return 2; > return 1; > } > -- > 2.15.0.5.g9567be9905 >
Re: [RFC] cover-at-tip
Nicolas Morey-Chaisemartinwrites: > I would need to add "some" level of parsing to am.c to make sure > the patch content is just garbage and that there are no actual > hunks for that. > > I did not find any public API that would allow me to do that, > although apply_path/parse_chunk would fit the bill. Is that the > right way to approach this ? I do not think you would want this non-patch cruft seen at the apply layer at all. Reading a mailbox, with the help of mailsplit and mailinfo, and being the driver to create a series of commits is what "am" is about, and it would have to notice that the non-patch cruft at the beginning is not a patch at all and defer creation of an empty commit with that cover material at the end. For each of the other messages in the series that has patches, it will need to call apply to update the index and the working tree so that it can make a commit, but there is NO reason whatsoever to ask help from apply, whose sole purpose is to read a patch and make modifications to the index and the working tree, to handle the cover material.
Re: Bug - Status - Space in Filename
Jeff Kingwrites: > Are there callers who don't just print the result? If not, we could just > always emit. That's slightly more efficient since it drops the expensive > strbuf_insert (though there are already so many copies going on in > quote_path_relative that it hardly matters). But it also drops the need > for the caller to know about the strbuf at all. I am fine by that, too. Consider that I'm still suffering from the trauma caused by some patches (not in this area but others) that changed output we have long been directly emiting to stdio to instead capture to strings ;-) This might also be a good bite-sized material for the weekend thing ;-)
[PATCH 0/4] Fix issues with rename detection limits
In a repo with around 60k files with deep directory hierarchies (due to being predominantly java code) and which had a few high level directories moved around (making it appear to git as dozens of thousands of individual file renames), attempts to cherry-pick commits across product versions resulted in non-sensical delete/modify conflicts (rather than rename/modify as expected). We had a few teams who were in the unenviable position of having to backport hundreds or thousands of commits across such product versions, and the result was months of pain, which could have been alleviated were it not for a few small git bugs: * When you try to cherry-pick changes, unlike when you merge two branches, git will not notify you when you need to set a higher merge.renameLimit. * If you know git internals well enough, you can try to trick git into telling you the correct renameLimit by doing a merge instead of a cherry-pick. If you do that, and have a renameLimit that is too small, git will let you know the value you need. You can then undo the merge and proceed with the cherry-pick. Except that... * If you are performing a merge and specify a large renameLimit that isn't large enough, and the necessary renameLimit you need is greater than 32767, then git tells you nothing, leading you to believe that the limit you specified is high enough, but only to watch it still mess up the merge badly. * If you happen to specify a merge.renameLimit that *is* high enough, but just happens to be greater than 32767, then git will silently pretend you specified 32767, determine that 32767 is not high enough, not tell you anything about it's silent clamping, and then proceed to mess up the merge or cherry-pick badly. Not only do you get no feedback, the clamping to 32767 isn't documented anywhere even if you tried to read every manual page. Folks did discover the merge.renameLimit and tried setting it to various values, spread over the spectrum from 1000 (the default) up to 9 (or maybe more, that's just the highest number I heard), completely unaware that most their attempts were ignored and wondering why git couldn't handle things and couldn't explain why either. Eventually folks wrote scripts that would run the output of format-patch through a bunch of sed commands to pretend patches were against the filename on the other side of history and then pipe back through git-am. Such scripts grew as more and more rename rules were added. I eventually learned of one of these scripts and said something close to, "You don't need these pile of rename rules; just set merge.renameLimit to something higher." When they responded that merge.renameLimit didn't work, I didn't believe them. This patch series, along with two others that I will be sending shortly, represent my attempt to continue to not believe them. :-) Elijah Newren (4): sequencer: Warn when internal merge may be suboptimal due to renameLimit Remove silent clamp of renameLimit progress: Fix progress meters when dealing with lots of work sequencer: Show rename progress during cherry picks diff.c| 2 +- diffcore-rename.c | 15 ++- progress.c| 22 +++--- progress.h| 8 sequencer.c | 2 ++ 5 files changed, 24 insertions(+), 25 deletions(-) -- 2.15.0.5.g9567be9905
[PATCH 0/4] Fix issues with rename detection limits
In a repo with around 60k files with deep directory hierarchies (due to being predominantly java code) and which had a few high level directories moved around (making it appear to git as dozens of thousands of individual file renames), attempts to cherry-pick commits across product versions resulted in non-sensical delete/modify conflicts (rather than rename/modify as expected). We had a few teams who were in the unenviable position of having to backport hundreds or thousands of commits across such product versions, and the result was months of pain, which could have been alleviated were it not for a few small git bugs: * When you try to cherry-pick changes, unlike when you merge two branches, git will not notify you when you need to set a higher merge.renameLimit. * If you know git internals well enough, you can try to trick git into telling you the correct renameLimit by doing a merge instead of a cherry-pick. If you do that, and have a renameLimit that is too small, git will let you know the value you need. You can then undo the merge and proceed with the cherry-pick. Except that... * If you are performing a merge and specify a large renameLimit that isn't large enough, and the necessary renameLimit you need is greater than 32767, then git tells you nothing, leading you to believe that the limit you specified is high enough, but only to watch it still mess up the merge badly. * If you happen to specify a merge.renameLimit that *is* high enough, but just happens to be greater than 32767, then git will silently pretend you specified 32767, determine that 32767 is not high enough, not tell you anything about it's silent clamping, and then proceed to mess up the merge or cherry-pick badly. Not only do you get no feedback, the clamping to 32767 isn't documented anywhere even if you tried to read every manual page. Folks did discover the merge.renameLimit and tried setting it to various values, spread over the spectrum from 1000 (the default) up to 9 (or maybe more, that's just the highest number I heard), completely unaware that most their attempts were ignored and wondering why git couldn't handle things and couldn't explain why either. Eventually folks wrote scripts that would run the output of format-patch through a bunch of sed commands to pretend patches were against the filename on the other side of history and then pipe back through git-am. Such scripts grew as more and more rename rules were added. I eventually learned of one of these scripts and said something close to, "You don't need these pile of rename rules; just set merge.renameLimit to something higher." When they responded that merge.renameLimit didn't work, I didn't believe them. This patch series, along with two others that I will be sending shortly, represent my attempt to continue to not believe them. :-) Elijah Newren (4): sequencer: Warn when internal merge may be suboptimal due to renameLimit Remove silent clamp of renameLimit progress: Fix progress meters when dealing with lots of work sequencer: Show rename progress during cherry picks diff.c| 2 +- diffcore-rename.c | 15 ++- progress.c| 22 +++--- progress.h| 8 sequencer.c | 2 ++ 5 files changed, 24 insertions(+), 25 deletions(-) -- 2.15.0.5.g9567be9905
Re: is there a stylistic preference for a trailing "--" on a command?
On Fri, Nov 10, 2017 at 5:57 AM, Robert P. J. Daywrote: > > just noticed these examples in "man git-bisect": > > EXAMPLES > $ git bisect start HEAD v1.2 -- # HEAD is bad, v1.2 is good > ... > $ git bisect start HEAD origin --# HEAD is bad, origin is good > ... > $ git bisect start HEAD HEAD~10 -- # culprit is among the last 10 > > is there some rationale or stylistic significance to those trailing > "--" on those commands? i assume they have no effect, just curious as > to why they're there. By having the -- there, it is clear that the strings are ref specs and not files of such a name. (Who would want to store a file named HEAD~10 in their repo?)
Re: [PATCH v1 5/8] sequencer: don't die in print_commit_summary()
Phillip Woodwrites: > On 07/11/17 15:13, Junio C Hamano wrote: > ... >> Another possibility perhaps is that the function is safe to reuse >> already even without this patch, of course ;-). >> > Hmm, maybe it is. Looking at pick_commits() and do_pick_commit() if the > sequencer dies in print_commit_summary() (which can only happen when > cherry-picking or reverting) then neither the todo list or the abort > safety file are updated to reflect the commit that was just made. > > As I understand it print_commit_summary() dies because: (i) it cannot > resolve HEAD either because some other process is updating it (which is > bad news in the middle of a cherry-pick); (ii) because something went > wrong HEAD is corrupt; or (iii) log_tree_commit() cannot read some > objects. In all those cases dying will leave the sequencer in a sane > state for aborting - 'git cherry-pick --abort' will rewind HEAD to the > last successful commit before there was a problem with HEAD or the > object database. If the user somehow fixes the problem and runs 'git > cherry-pick --continue' then the sequencer will try and pick the same > commit again which may or may not be what the user wants depending on > what caused print_commit_summary() to die. The above is all good analysis---thanks for your diligence. Perhaps some if not all of it can go to the log message?
Re: [Query] Separate hooks for Git worktrees
On Thu, Nov 9, 2017 at 9:00 PM, Junio C Hamanowrote: > Stefan Beller writes: > >> We have no worktree specific config yet, though patches for >> this were floated on the mailing list. >> >> Though recent versions of git learned to conditionally include >> config files. (look for includeIf in man git-config), which I think >> could be used to set the option gerrit.createChangeId depending >> on the worktree you are in. >> >>> Any idea how I can get around this problem without having separate >>> repositories for kernel and android ? >> >> The proposed approach above might be hacky but sounds as if >> it should work? > > If you meant "conditional include" by "proposed approach above", I > do not see which part you found possibly hacky. Compared to a per-worktree configuration that you can setup via git config --for-worktree=X key value the setup using conditional includes seems hacky to me. (I just realize that these conditional includes can be set using regular git-config, so maybe it is not as hacky as I thought.) > It is to allow > different set of configurations to apply depending on where you are > working at, which I think was invented exactly for something like > this. >From a UX perspective, I can imagine a way easier workflow, but the data model seems to make sense. > It certainly is not any hackier than using the same repository to > house separately manged projects even if they may be related > projects. Well it is the same project with different upstream workflows. For example I would imagine that Viresh wants to cherry-pick from one branch to another, or even send the same patch (just with different commit messages, with or without the ChangeId) to the different upstreams? > Where does the aversion of "having separate repositories" primarily > come from? Is it bad due to disk consumption? Is it bad because > you cannot do "git diff android-branch kernel-branch"? Something > else? Yeah, that is an interesting question! (I suspect workflow related things, diff/cherry-pick) > If it is purely disk consumption that is an issue, perhaps the real > solution is to make it easier to maintain separate repositories > while sharing as much disk space as possible. GC may have to be > made a lot more robust in the presense of alternate object stores, > for example.
Re: [PATCH v4] doc/SubmittingPatches: correct subject guidance
On Fri, Nov 10, 2017 at 03:02:50PM +, Adam Dinwoodie wrote: > The examples and common practice for adding markers such as "RFC" or > "v2" to the subject of patch emails is to have them within the same > brackets as the "PATCH" text, not after the closing bracket. Further, > the practice of `git format-patch` and the like, as well as what appears > to be the more common pratice on the mailing list, is to use "[RFC > PATCH]", not "[PATCH/RFC]". > > Update the SubmittingPatches article to match and to reference the > `format-patch` helper arguments, and also make some minor text > clarifications in the area. > > Signed-off-by: Adam Dinwoodie> Helped-by: Eric Sunshine This looks great! Thank you for updating this documentation. Reviewed-by: Josh Triplett > --- > > Notes: > Changes since v3: > - Clarified meaning of "RFC" per Eric's suggestion > - Made the impact of --subject-prefix and friends clearer per Eric's > suggestion > > Thank you for your nitpicking, Eric, it's useful and very much > appreciated :) > > Documentation/SubmittingPatches | 19 --- > 1 file changed, 12 insertions(+), 7 deletions(-) > > diff --git a/Documentation/SubmittingPatches b/Documentation/SubmittingPatches > index 558d465b6..89f239071 100644 > --- a/Documentation/SubmittingPatches > +++ b/Documentation/SubmittingPatches > @@ -184,21 +184,26 @@ lose tabs that way if you are not careful. > > It is a common convention to prefix your subject line with > [PATCH]. This lets people easily distinguish patches from other > -e-mail discussions. Use of additional markers after PATCH and > -the closing bracket to mark the nature of the patch is also > -encouraged. E.g. [PATCH/RFC] is often used when the patch is > -not ready to be applied but it is for discussion, [PATCH v2], > -[PATCH v3] etc. are often seen when you are sending an update to > -what you have previously sent. > +e-mail discussions. Use of markers in addition to PATCH within > +the brackets to describe the nature of the patch is also > +encouraged. E.g. [RFC PATCH] (where RFC stands for "request for > +comments") is often used to indicate a patch needs further > +discussion before being accepted, [PATCH v2], [PATCH v3] etc. > +are often seen when you are sending an update to what you have > +previously sent. > > -"git format-patch" command follows the best current practice to > +The "git format-patch" command follows the best current practice to > format the body of an e-mail message. At the beginning of the > patch should come your commit message, ending with the > Signed-off-by: lines, and a line that consists of three dashes, > followed by the diffstat information and the patch itself. If > you are forwarding a patch from somebody else, optionally, at > the beginning of the e-mail message just before the commit > message starts, you can put a "From: " line to name that person. > +To change the default "[PATCH]" in the subject to "[]", use > +`git format-patch --subject-prefix=`. As a shortcut, you > +can use `--rfc` instead of `--subject-prefix="RFC PATCH"`, or > +`-v ` instead of `--subject-prefix="PATCH v"`. > > You often want to add additional explanation about the patch, > other than the commit message itself. Place such "cover letter" > -- > 2.14.3 >
[PATCH 4/4] sequencer: Show rename progress during cherry picks
When trying to cherry-pick a change that has lots of renames, it is somewhat unsettling to wait a really long time without any feedback. Signed-off-by: Elijah Newren--- sequencer.c | 1 + 1 file changed, 1 insertion(+) diff --git a/sequencer.c b/sequencer.c index 2b4cecb617..247d93f363 100644 --- a/sequencer.c +++ b/sequencer.c @@ -448,6 +448,7 @@ static int do_recursive_merge(struct commit *base, struct commit *next, o.branch2 = next ? next_label : "(empty tree)"; if (is_rebase_i(opts)) o.buffer_output = 2; + o.show_rename_progress = 1; head_tree = parse_tree_indirect(head); next_tree = next ? next->tree : empty_tree(); -- 2.15.0.5.g9567be9905
[PATCH 1/4] sequencer: Warn when internal merge may be suboptimal due to renameLimit
Having renamed files silently treated as deleted/modified conflicts instead of correctly resolving the renamed/modified case has caused lots of pain for some users. Eventually, some of them figured out to set merge.renameLimit to something higher, but without feedback about what value it should have, they were just repeatedly guessing and retrying. Signed-off-by: Elijah Newren--- sequencer.c | 1 + 1 file changed, 1 insertion(+) diff --git a/sequencer.c b/sequencer.c index f2a10cc4f2..2b4cecb617 100644 --- a/sequencer.c +++ b/sequencer.c @@ -462,6 +462,7 @@ static int do_recursive_merge(struct commit *base, struct commit *next, if (is_rebase_i(opts) && clean <= 0) fputs(o.obuf.buf, stdout); strbuf_release(); + diff_warn_rename_limit("merge.renamelimit", o.needed_rename_limit, 0); if (clean < 0) return clean; -- 2.15.0.5.g9567be9905
[PATCH 3/4] progress: Fix progress meters when dealing with lots of work
The possibility of setting merge.renameLimit beyond 2^16 raises the possibility that the values passed to progress can exceed 2^32. For my usecase of interest, I only needed to pass a value a little over 2^31. If I were only interested in fixing my usecase, I could have simply changed last_value from int to unsigned, and casted each of rename_dst_nr and rename_src_nr (in merge-recursive.c) from int to unsigned just before multiplying them. However, as long as we're making changes to allow larger progress meters, we may as well make a little more room in general. Use uint64_t, because it "ought to be enough for anybody". :-) Signed-off-by: Elijah Newren--- diffcore-rename.c | 4 ++-- progress.c| 22 +++--- progress.h| 8 3 files changed, 17 insertions(+), 17 deletions(-) diff --git a/diffcore-rename.c b/diffcore-rename.c index 7f9a463f5a..6ba6157c61 100644 --- a/diffcore-rename.c +++ b/diffcore-rename.c @@ -531,7 +531,7 @@ void diffcore_rename(struct diff_options *options) if (options->show_rename_progress) { progress = start_delayed_progress( _("Performing inexact rename detection"), - rename_dst_nr * rename_src_nr); + (uint64_t)rename_dst_nr * (uint64_t)rename_src_nr); } mx = xcalloc(st_mult(NUM_CANDIDATE_PER_DST, num_create), sizeof(*mx)); @@ -568,7 +568,7 @@ void diffcore_rename(struct diff_options *options) diff_free_filespec_blob(two); } dst_cnt++; - display_progress(progress, (i+1)*rename_src_nr); + display_progress(progress, (uint64_t)(i+1)*(uint64_t)rename_src_nr); } stop_progress(); diff --git a/progress.c b/progress.c index 289678d43d..7e4a2f9532 100644 --- a/progress.c +++ b/progress.c @@ -30,8 +30,8 @@ struct throughput { struct progress { const char *title; - int last_value; - unsigned total; + uint64_t last_value; + uint64_t total; unsigned last_percent; unsigned delay; unsigned delayed_percent_threshold; @@ -79,7 +79,7 @@ static int is_foreground_fd(int fd) return tpgrp < 0 || tpgrp == getpgid(0); } -static int display(struct progress *progress, unsigned n, const char *done) +static int display(struct progress *progress, uint64_t n, const char *done) { const char *eol, *tp; @@ -106,7 +106,7 @@ static int display(struct progress *progress, unsigned n, const char *done) if (percent != progress->last_percent || progress_update) { progress->last_percent = percent; if (is_foreground_fd(fileno(stderr)) || done) { - fprintf(stderr, "%s: %3u%% (%u/%u)%s%s", + fprintf(stderr, "%s: %3u%% (%lu/%lu)%s%s", progress->title, percent, n, progress->total, tp, eol); fflush(stderr); @@ -116,7 +116,7 @@ static int display(struct progress *progress, unsigned n, const char *done) } } else if (progress_update) { if (is_foreground_fd(fileno(stderr)) || done) { - fprintf(stderr, "%s: %u%s%s", + fprintf(stderr, "%s: %lu%s%s", progress->title, n, tp, eol); fflush(stderr); } @@ -127,7 +127,7 @@ static int display(struct progress *progress, unsigned n, const char *done) return 0; } -static void throughput_string(struct strbuf *buf, off_t total, +static void throughput_string(struct strbuf *buf, uint64_t total, unsigned int rate) { strbuf_reset(buf); @@ -138,7 +138,7 @@ static void throughput_string(struct strbuf *buf, off_t total, strbuf_addstr(buf, "/s"); } -void display_throughput(struct progress *progress, off_t total) +void display_throughput(struct progress *progress, uint64_t total) { struct throughput *tp; uint64_t now_ns; @@ -200,12 +200,12 @@ void display_throughput(struct progress *progress, off_t total) display(progress, progress->last_value, NULL); } -int display_progress(struct progress *progress, unsigned n) +int display_progress(struct progress *progress, uint64_t n) { return progress ? display(progress, n, NULL) : 0; } -static struct progress *start_progress_delay(const char *title, unsigned total, +static struct progress *start_progress_delay(const char *title, uint64_t total, unsigned percent_threshold, unsigned delay) { struct progress *progress = malloc(sizeof(*progress)); @@ -227,12 +227,12 @@ static struct progress *start_progress_delay(const char *title, unsigned
[PATCH 2/4] Remove silent clamp of renameLimit
In commit 0024a5492 (Fix the rename detection limit checking; 2007-09-14), the renameLimit was clamped to 32767. This appears to have been to simply avoid integer overflow in the following computation: num_create * num_src <= rename_limit * rename_limit although it also could be viewed as a hardcoded bound on the amount of CPU time we're willing to allow users to tell git to spend on handling renames. An upper bound may make sense, particularly as the computation is O(rename_limit^2), but only if the bound is documented and communicated to the user -- neither of which were true. In fact, the silent clamping of the renameLimit to a smaller value and lack of reporting of the needed renameLimit when it was too large made it appear to the user as though they had used a high enough value; however, git would proceed to mess up the merge or cherry-pick badly based on the lack of rename detection. Some hardy folks, despite the lack of feedback on the correct limit to choose, were desperate enough to repeatedly retry their cherry-picks with increasingly larger renameLimit values (going orders of magnitude beyond the built-in limit of 32767), but were consistently met with the same failure. Although large limits can make things slow, we have users who would be ecstatic to have a small five file change be correctly cherry picked even if they have to manually specify a large limit and it took git ten minutes to compute it. Signed-off-by: Elijah Newren--- diff.c| 2 +- diffcore-rename.c | 11 --- 2 files changed, 5 insertions(+), 8 deletions(-) diff --git a/diff.c b/diff.c index 6fd288420b..c6597e3231 100644 --- a/diff.c +++ b/diff.c @@ -5524,7 +5524,7 @@ void diff_warn_rename_limit(const char *varname, int needed, int degraded_cc) warning(_(rename_limit_warning)); else return; - if (0 < needed && needed < 32767) + if (0 < needed) warning(_(rename_limit_advice), varname, needed); } diff --git a/diffcore-rename.c b/diffcore-rename.c index 0d8c3d2ee4..7f9a463f5a 100644 --- a/diffcore-rename.c +++ b/diffcore-rename.c @@ -391,14 +391,10 @@ static int too_many_rename_candidates(int num_create, * growing larger than a "rename_limit" square matrix, ie: * *num_create * num_src > rename_limit * rename_limit -* -* but handles the potential overflow case specially (and we -* assume at least 32-bit integers) */ - if (rename_limit <= 0 || rename_limit > 32767) - rename_limit = 32767; if ((num_create <= rename_limit || num_src <= rename_limit) && - (num_create * num_src <= rename_limit * rename_limit)) + ((double)num_create * (double)num_src +<= (double)rename_limit * (double)rename_limit)) return 0; options->needed_rename_limit = @@ -415,7 +411,8 @@ static int too_many_rename_candidates(int num_create, num_src++; } if ((num_create <= rename_limit || num_src <= rename_limit) && - (num_create * num_src <= rename_limit * rename_limit)) + ((double)num_create * (double)num_src +<= (double)rename_limit * (double)rename_limit)) return 2; return 1; } -- 2.15.0.5.g9567be9905
Re: [PATCH v1 2/2] log: add option to choose which refs to decorate
Rafael Ascensãowrites: > I agree that describe should receive the "normalize" treatment. However, > and following the same reasoning, why should describe users adopt the > rules imposed by --glob? I could argue they're also used to the way it > works now. > > That being said, the suggestion I mentioned earlier would allow to keep > both current behaviors consistent at the expense of the extra call to > refs.c::ref_exists(). In any case, updating the "describe" for consistency is something we can and should leave for later, to be done as a separate topic. While I agree with you that the consistent behaviour between commands is desirable, and also I agree with you that given a pattern $X that does not have any glob char, trying to match $X when a ref whose name exactly is $X exists and trying to match $X/* otherwise would give us a consistent semantics without hurting any existing uses, I do not think you need to pay any extra expense of calling ref_exists() at all to achieve that. That is because when $X exists, you already know $X/otherthing does not exist. And when $X does not exist, $X/otherthing might. So a naive implementation would be just to add two patterns $X and $X/* to the filter list and be done with it. If you exactly have refs/heads/master, even with the naive logic may throw both refs/heads/master and refs/heads/master/* to the filter list, nothing will match the latter to contaminate your result (and vice versa). A bit more clever implementation "just throw in two items" would go like this. It is not all that involved: - In load_ref_decorations(), before running add_ref_decoration for each ref and head ref, iterate over the elements in the refname filter list. For each element: - if item->string has a trailing '/', trim that. - store NULL in the item->util field for item whose string field has a glob char. - store something non-NULL (e.g. item->string) for item whose string field does not have a glob char. - In add_ref_decoration(), where your previous round iterates over filter->{include,exclude}, get rid of normalize_glob_ref() and use of real_pattern. Instead do something like: matched = 0; if (item->util == NULL) { if (!wildmatch(item->string, refname, 0)) matched = 1; } else { const char *rest; if (skip_prefix(refname, item->string, ) && (!*rest || *rest == '/')) matched = 1; } if (matched) ... Of course, you would probably want to encapsulate the logic to set matched = 1/0 in a helper function, e.g. static int match_ref_pattern(const char *refname, const struct string_list_item *item) { int matched = 0; ... do either wildmatch or head match with tail validation ... depending on the item->util's NULLness (see above) return matched; } and call that from the two loops for exclude and include list. Hmm?
Re: cherry-pick very slow on big repository
Interesting timing. I have some performance patches specifically developed because rename detection during merges made a small cherry-pick in a large repo rather slow...in my case, I dropped the time for the cherry pick by a factor of about 30 (no guarantees you'll see the same; it's very history-specific). I was just about to start sending my three series of patches, the performance one being the third... On Fri, Nov 10, 2017 at 6:05 AM, Peter Kreftingwrote: > Derrick Stolee: > >> Git is spending time detecting renames, which implies you probably renamed >> a folder or added and deleted a large number of files. This rename detection >> is quadratic (# adds times # deletes). > > > Yes, a couple of directories with a lot of template files have been renamed > (and some removed, some added) between the current development branch and > this old maintenance branch. I get the "Performing inexact rename detection" > a lot when merging changes in the other direction. > > However, none of them applies to these particular commits, which only > touches files that are in the exact same location on both branches. > >> You can remove this rename detection by running your cherry-pick with `git >> -c diff.renameLimit=1 cherry-pick ...` > > > That didn't work, actually it failed to finish with this setting in effect, > it hangs in such a way that I can't stop it with Ctrl+C (neither when > running from the command line, nor when running inside gdb). It didn't > finish in the 20 minutes I gave it. > > I also tried with diff.renames=false, which also seemed to fail. > > > -- > \\// Peter - http://www.softwolves.pp.se/
JPMorgan Chase & Co.
JPMorgan Chase & Co. 270 Park Avenue, Midtown Manhattan New York City, Attention: We are pleased to inform you about your fund which was seized by International Monetary Fund (IMF) due to your failure to provide necessary credentials which state the legitimacy of your fund, the fund was said to be transferred from BOA before its seizure by IMF. Presently the fund is in JPMorgan Chase bank for immediate remittance to your nominated bank account, below is the account details we have about you kindly reconfirm if all the details are correct and update before we make the transfer. NOTE: no one will ask you for transfer or any form of charge to transfer your fund if ever happen please report to us because it’s never from us. Acct. name: Doreen Koehler. Bank name: Bank of America WIRE#026009593 ACC.898046532236 Yours sincerely John Mour (917) 900-0351
Urgent Message.
Dear Western Union Customer, You have been awarded with the sum of $50,000 USD by our office, as one of our customers who use Western Union in their daily business transaction. This award was been selected through the internet, where your e-mail address was indicated and notified. Please provide Mr. James Udo with the following details listed below so that your fund will be remited to you through Western Union. 1. Name: 2. Address: 3. Country: 4. Phone Number: 5. Occupation: 6. Sex: 7. Age: Mr. James Udo E-mail: westerrnunion2...@outlook.com As soon as these details are received and verified, your fund will be transferred to you.Thank you, for using western union.
RE: cherry-pick very slow on big repository
Since this is happening during a merge, you might need to use merge.renameLimit or the merge strategy option of -Xno-renames. Although the code does fallback to use the diff.renameLimit but there is still a lot that is done before even checking the rename limit so I would first try getting renames turned off. Thanks, Kevin > -Original Message- > From: git-ow...@vger.kernel.org [mailto:git-ow...@vger.kernel.org] On Behalf > Of Peter Krefting > Sent: Friday, November 10, 2017 7:05 AM > To: Derrick Stolee> Cc: Jeff King ; Git Mailing List > Subject: Re: cherry-pick very slow on big repository > > Derrick Stolee: > > > Git is spending time detecting renames, which implies you probably > > renamed a folder or added and deleted a large number of files. This > > rename detection is quadratic (# adds times # deletes). > > Yes, a couple of directories with a lot of template files have been > renamed (and some removed, some added) between the current development > branch and this old maintenance branch. I get the "Performing inexact > rename detection" a lot when merging changes in the other direction. > > However, none of them applies to these particular commits, which only > touches files that are in the exact same location on both branches. > > > You can remove this rename detection by running your cherry-pick > > with `git -c diff.renameLimit=1 cherry-pick ...` > > That didn't work, actually it failed to finish with this setting in > effect, it hangs in such a way that I can't stop it with Ctrl+C > (neither when running from the command line, nor when running inside > gdb). It didn't finish in the 20 minutes I gave it. > > I also tried with diff.renames=false, which also seemed to fail. > > -- > \\// Peter - > https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.softw > olves.pp.se%2F=02%7C01%7Ckewillf%40microsoft.com%7C6b831a75739e4 > 0428d3808d52844106c%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636 > 459195209466999=kJtNLAs1LSoPy%2B%2BNADJkuEBPMZVcxkSkKzOEEeIG > VpM%3D=0
Bug: cherry-pick & submodule
I faced an unexpected behaviour during cherry-picking a commit to a branch with a submodule. Git graph: A -- B [master] \ `- C -- D [test] Both branches have a file with name `a_file`. It has been added by commit A. Commits B and C add a folder with name `a_submodule` to the respective branches. Commit C does it regularly by adding a file with name `a_submodule/content`. Commit B adds a submodule with name `a_submodule`. Commit D only modifies `a_file`. Note that `a_file` and `a_submodule` are not related. [repo] |- a_file `- a_submodule When I trying to cherry pick commit D on commit B, I got a conflict with `a_submodule`. Changes of `a_file` are successfully cherry-picked. I expected, that there would be no conflicts. A bash script reproducing the bug is listed below. Vasily. rm -rf a_submodule a_repo mkdir a_submodule cd a_submodule git init touch a_file_in_a_submodule git add a_file_in_a_submodule git commit -m "add a file in a submodule" cd .. mkdir a_repo cd a_repo git init touch a_file git add a_file git commit -m "add a file" git branch test git checkout test mkdir a_submodule touch a_submodule/content git add a_submodule/content git commit -m "add a regular folder with name a_submodule" echo "123" > a_file git add a_file git commit -m "modify a file" git checkout master git submodule add ../a_submodule a_submodule git submodule update a_submodule git commit -m "add a submodule info folder with name a_submodule" # Trying to cherry-pick modification of a file from test branch. git cherry-pick test # some debug git status
[PATCH] bisect: mention "view" as an alternative to "visualize"
Tweak a number of files to mention "view" as an alternative to "visualize": Documentation/git-bisect.txt | 9 - Documentation/user-manual.txt | 3 ++- builtin/bisect--helper.c | 2 +- contrib/completion/git-completion.bash | 2 +- git-bisect.sh | 4 ++-- 5 files changed, 10 insertions(+), 10 deletions(-) Signed-off-by: Robert P. J. Day--- here's hoping i have the right format for this patch ... are there any parts of this that are inappropriate, such as extending the bash completion? diff --git a/Documentation/git-bisect.txt b/Documentation/git-bisect.txt index 6c42abf07..89e6f9667 100644 --- a/Documentation/git-bisect.txt +++ b/Documentation/git-bisect.txt @@ -23,7 +23,7 @@ on the subcommand: git bisect terms [--term-good | --term-bad] git bisect skip [(|)...] git bisect reset [] - git bisect visualize + git bisect visualize|view git bisect replay git bisect log git bisect run ... @@ -196,15 +196,14 @@ of `git bisect good` and `git bisect bad` to mark commits. Bisect visualize -To see the currently remaining suspects in 'gitk', issue the following -command during the bisection process: +To see the currently remaining suspects in 'gitk', issue either of the +following equivalent commands during the bisection process: $ git bisect visualize +$ git bisect view -`view` may also be used as a synonym for `visualize`. - If the `DISPLAY` environment variable is not set, 'git log' is used instead. You can also give command-line options such as `-p` and `--stat`. diff --git a/Documentation/user-manual.txt b/Documentation/user-manual.txt index 3a03e63eb..55ec58986 100644 --- a/Documentation/user-manual.txt +++ b/Documentation/user-manual.txt @@ -538,10 +538,11 @@ Note that the version which `git bisect` checks out for you at each point is just a suggestion, and you're free to try a different version if you think it would be a good idea. For example, occasionally you may land on a commit that broke something unrelated; -run +run either of the equivalent commands - $ git bisect visualize +$ git bisect view - which will run gitk and label the commit it chose with a marker that diff --git a/builtin/bisect--helper.c b/builtin/bisect--helper.c index 35d2105f9..4b5fadcbe 100644 --- a/builtin/bisect--helper.c +++ b/builtin/bisect--helper.c @@ -46,7 +46,7 @@ static int check_term_format(const char *term, const char *orig_term) return error(_("'%s' is not a valid term"), term); if (one_of(term, "help", "start", "skip", "next", "reset", - "visualize", "replay", "log", "run", "terms", NULL)) + "visualize", "view", "replay", "log", "run", "terms", NULL)) return error(_("can't use the builtin command '%s' as a term"), term); /* diff --git a/contrib/completion/git-completion.bash b/contrib/completion/git-completion.bash index fdd984d34..52f68c922 100644 --- a/contrib/completion/git-completion.bash +++ b/contrib/completion/git-completion.bash @@ -1162,7 +1162,7 @@ _git_bisect () { __git_has_doubledash && return - local subcommands="start bad good skip reset visualize replay log run" + local subcommands="start bad good skip reset visualize view replay log run" local subcommand="$(__git_find_on_cmdline "$subcommands")" if [ -z "$subcommand" ]; then __git_find_repo_path diff --git a/git-bisect.sh b/git-bisect.sh index 0138a8860..e8b622a47 100755 --- a/git-bisect.sh +++ b/git-bisect.sh @@ -1,6 +1,6 @@ #!/bin/sh -USAGE='[help|start|bad|good|new|old|terms|skip|next|reset|visualize|replay|log|run]' +USAGE='[help|start|bad|good|new|old|terms|skip|next|reset|visualize|view|replay|log|run]' LONG_USAGE='git bisect help print this long help message. git bisect start [--term-{old,good}= --term-{new,bad}=] @@ -20,7 +20,7 @@ git bisect next find next bisection to test and check it out. git bisect reset [] finish bisection search and go back to commit. -git bisect visualize +git bisect visualize|view show bisect status in gitk. git bisect replay replay bisection log. -- Robert P. J. Day Ottawa, Ontario, CANADA http://crashcourse.ca Twitter: http://twitter.com/rpjday LinkedIn: http://ca.linkedin.com/in/rpjday
proper patch prefix for tweaking both git-bisect.sh and git-bisect.txt?
digging through both git-bisect.sh and git-bisect.txt, and it seems pretty clear they're both a bit out of date WRT documenting the newer alternatives "old"/"new" as opposed to the older "good"/"bad" terms, and a few other things. first, trivially, neither the script nor the man page mention "view" as an alternative to "visualize", but that's easy to fix. however, most of the inconsistency involves that good/bad/old/new stuff. the man page reads (in part): git bisect (bad|new|) [] git bisect (good|old|) [...] which i assume should actually read: git bisect (bad|new||) [] git bisect (good|old||) [...] unless there's some implicit assumption that isn't mentioned there. also from the man page, i'm guessing that: git bisect terms [--term-good | --term-bad] might need to say: git bisect terms [--term-good | --term-bad | --term-new | --term-old] and so on, and so on (again, unless the generality of those terms is understood). so given that all of that is, technically, documentation (even the usage message in the script), if one submits a patch to change both files appropriately, what is the subject line prefix to use? anyway, maybe i'll do this in bite-size pieces to keep it manageable. my first patch to the code base ... whoo hoo! rday -- Robert P. J. Day Ottawa, Ontario, CANADA http://crashcourse.ca Twitter: http://twitter.com/rpjday LinkedIn: http://ca.linkedin.com/in/rpjday
Bug: cherry-pick & submodule
I faced an unexpected behaviour during cherry-picking a commit to a branch with a submodule. Git graph: A -- B [master] \ `- C -- D [test] Both branches have a file with name `a_file`. It has been added by commit A. Commits B and C add a folder with name `a_submodule` to the respective branches. Commit C does it regularly by adding a file with name `a_submodule/content`. Commit B adds a submodule with name `a_submodule`. Commit D only modifies `a_file`. Note that `a_file` and `a_submodule` are not related. [repo] |- a_file `- a_submodule When I trying to cherry pick commit D on commit B, I got a conflict with `a_submodule`. Changes of `a_file` are successfully cherry-picked. I expected, that there would be no conflicts. A bash script reproducing the bug is attached. Vasily. bug.sh Description: application/shellscript
Re: [RFC] cover-at-tip
Le 10/11/2017 à 11:24, Nicolas Morey-Chaisemartin a écrit : > Hi, > > I'm starting to look into the cover-at-tip topic that I found in the leftover > bits (http://www.spinics.net/lists/git/msg259573.html) > > Here's a first draft of a patch that adds support for format-patch > --cover-at-tip. It compiles and works in my nice and user firnedly test case. > Just wanted to make sure I was going roughly in the right direction here. > > > I was wondering where is the right place to put a commit_is_cover_at_tip() as > the test will be needed in other place as the feature is extended to git > am/merge/pull. > > Feel free to comment. I know the help is not clear at this point and there's > still some work to do on option handling (add a config option, probably have > --cover-at-tip imply --cover-letter, etc) and > some testing :) > > > --- Leaving some more updates and questions before the week end: I started on git am --cover-at-tip. The proposed patch for format-patch does not output any "---" to signal the end of the commit log and the begining of the patch in the cover letter. This means that the log summary, the diffstat and the git footer ( --\n) is seen as part of the commit log. Which is just wrong. Removing them would solve the issue but I feel they bring some useful info (or they would not be here). Adding a "---" between the commit log and those added infos poses another problem: git am does not see an empty patch anymore. I would need to add "some" level of parsing to am.c to make sure the patch content is just garbage and that there are no actual hunks for that. I did not find any public API that would allow me to do that, although apply_path/parse_chunk would fit the bill. Is that the right way to approach this ? My branch is here if anyone want to give a look: https://github.com/nmorey/git/tree/dev/cover-at-tip Nicolas
[PATCH v4] doc/SubmittingPatches: correct subject guidance
The examples and common practice for adding markers such as "RFC" or "v2" to the subject of patch emails is to have them within the same brackets as the "PATCH" text, not after the closing bracket. Further, the practice of `git format-patch` and the like, as well as what appears to be the more common pratice on the mailing list, is to use "[RFC PATCH]", not "[PATCH/RFC]". Update the SubmittingPatches article to match and to reference the `format-patch` helper arguments, and also make some minor text clarifications in the area. Signed-off-by: Adam DinwoodieHelped-by: Eric Sunshine --- Notes: Changes since v3: - Clarified meaning of "RFC" per Eric's suggestion - Made the impact of --subject-prefix and friends clearer per Eric's suggestion Thank you for your nitpicking, Eric, it's useful and very much appreciated :) Documentation/SubmittingPatches | 19 --- 1 file changed, 12 insertions(+), 7 deletions(-) diff --git a/Documentation/SubmittingPatches b/Documentation/SubmittingPatches index 558d465b6..89f239071 100644 --- a/Documentation/SubmittingPatches +++ b/Documentation/SubmittingPatches @@ -184,21 +184,26 @@ lose tabs that way if you are not careful. It is a common convention to prefix your subject line with [PATCH]. This lets people easily distinguish patches from other -e-mail discussions. Use of additional markers after PATCH and -the closing bracket to mark the nature of the patch is also -encouraged. E.g. [PATCH/RFC] is often used when the patch is -not ready to be applied but it is for discussion, [PATCH v2], -[PATCH v3] etc. are often seen when you are sending an update to -what you have previously sent. +e-mail discussions. Use of markers in addition to PATCH within +the brackets to describe the nature of the patch is also +encouraged. E.g. [RFC PATCH] (where RFC stands for "request for +comments") is often used to indicate a patch needs further +discussion before being accepted, [PATCH v2], [PATCH v3] etc. +are often seen when you are sending an update to what you have +previously sent. -"git format-patch" command follows the best current practice to +The "git format-patch" command follows the best current practice to format the body of an e-mail message. At the beginning of the patch should come your commit message, ending with the Signed-off-by: lines, and a line that consists of three dashes, followed by the diffstat information and the patch itself. If you are forwarding a patch from somebody else, optionally, at the beginning of the e-mail message just before the commit message starts, you can put a "From: " line to name that person. +To change the default "[PATCH]" in the subject to "[]", use +`git format-patch --subject-prefix=`. As a shortcut, you +can use `--rfc` instead of `--subject-prefix="RFC PATCH"`, or +`-v ` instead of `--subject-prefix="PATCH v"`. You often want to add additional explanation about the patch, other than the commit message itself. Place such "cover letter" -- 2.14.3
Re: [PATCH v1 5/8] sequencer: don't die in print_commit_summary()
On 07/11/17 15:13, Junio C Hamano wrote: > Junio C Hamanowrites: > >> And this step is going in the right direction, but I am not sure if >> this made the function safe enough to be called repeatedly from the >> rebase machinery and we are ready to unleash this to the end users >> and tell them it is safe to use it. > > Another possibility perhaps is that the function is safe to reuse > already even without this patch, of course ;-). > Hmm, maybe it is. Looking at pick_commits() and do_pick_commit() if the sequencer dies in print_commit_summary() (which can only happen when cherry-picking or reverting) then neither the todo list or the abort safety file are updated to reflect the commit that was just made. As I understand it print_commit_summary() dies because: (i) it cannot resolve HEAD either because some other process is updating it (which is bad news in the middle of a cherry-pick); (ii) because something went wrong HEAD is corrupt; or (iii) log_tree_commit() cannot read some objects. In all those cases dying will leave the sequencer in a sane state for aborting - 'git cherry-pick --abort' will rewind HEAD to the last successful commit before there was a problem with HEAD or the object database. If the user somehow fixes the problem and runs 'git cherry-pick --continue' then the sequencer will try and pick the same commit again which may or may not be what the user wants depending on what caused print_commit_summary() to die.
Re: cherry-pick very slow on big repository
Derrick Stolee: Git is spending time detecting renames, which implies you probably renamed a folder or added and deleted a large number of files. This rename detection is quadratic (# adds times # deletes). Yes, a couple of directories with a lot of template files have been renamed (and some removed, some added) between the current development branch and this old maintenance branch. I get the "Performing inexact rename detection" a lot when merging changes in the other direction. However, none of them applies to these particular commits, which only touches files that are in the exact same location on both branches. You can remove this rename detection by running your cherry-pick with `git -c diff.renameLimit=1 cherry-pick ...` That didn't work, actually it failed to finish with this setting in effect, it hangs in such a way that I can't stop it with Ctrl+C (neither when running from the command line, nor when running inside gdb). It didn't finish in the 20 minutes I gave it. I also tried with diff.renames=false, which also seemed to fail. -- \\// Peter - http://www.softwolves.pp.se/
is there a stylistic preference for a trailing "--" on a command?
just noticed these examples in "man git-bisect": EXAMPLES $ git bisect start HEAD v1.2 -- # HEAD is bad, v1.2 is good ... $ git bisect start HEAD origin --# HEAD is bad, origin is good ... $ git bisect start HEAD HEAD~10 -- # culprit is among the last 10 is there some rationale or stylistic significance to those trailing "--" on those commands? i assume they have no effect, just curious as to why they're there. rday -- Robert P. J. Day Ottawa, Ontario, CANADA http://crashcourse.ca Twitter: http://twitter.com/rpjday LinkedIn: http://ca.linkedin.com/in/rpjday
Re: [PATCH v1 2/2] log: add option to choose which refs to decorate
On 07/11/17 00:18, Junio C Hamano wrote: > Jacob Kellerwrites: > > I would have to say that the describe's one is wrong if it does not > match what for_each_glob_ref() does for the log family of commands' > "--branches=" etc. describe.c::get_name() uses positive > and negative patterns, just like log-tree.c::add_ref_decoration() > would with the patch we are discussing, so perhaps the items in > these lists should get the same "normalize" treatment the patch 1/2 > of this series brings in to make things consistent? > I agree that describe should receive the "normalize" treatment. However, and following the same reasoning, why should describe users adopt the rules imposed by --glob? I could argue they're also used to the way it works now. That being said, the suggestion I mentioned earlier would allow to keep both current behaviors consistent at the expense of the extra call to refs.c::ref_exists(). +if (!has_glob_specials(pattern) && !ref_exists(normalized_pattern->buf)) { +/* Append implied '/' '*' if not present. */ +strbuf_complete(normalized_pattern, '/'); +/* No need to check for '*', there is none. */ +strbuf_addch(normalized_pattern, '*'); +} But I don't have enough expertise to decide if this consistency is worth the extra call to refs.c::ref_exists() or if there are other side-effects I am not considering. >> That being said, if we think the extra glob would not cause >> problems and generally do what people mean... I guess consistent >> with --glob would be good... But it's definitely not what I'd >> expect at first glance. My position is that consistency is good, but the "first glance expectation" is definitely something important we should take into consideration.
[PATCH v3 8/8] Add Git/Packet.pm from parts of t0021/rot13-filter.pl
And while at it let's simplify t0021/rot13-filter.pl by using Git/Packet.pm. This will make it possible to reuse packet related functions in other test scripts. Signed-off-by: Christian Couder--- perl/Git/Packet.pm | 169 perl/Makefile | 1 + t/t0021/rot13-filter.pl | 141 +--- 3 files changed, 173 insertions(+), 138 deletions(-) create mode 100644 perl/Git/Packet.pm diff --git a/perl/Git/Packet.pm b/perl/Git/Packet.pm new file mode 100644 index 00..1682403ffc --- /dev/null +++ b/perl/Git/Packet.pm @@ -0,0 +1,169 @@ +package Git::Packet; +use 5.008; +use strict; +use warnings; +BEGIN { + require Exporter; + if ($] < 5.008003) { + *import = \::import; + } else { + # Exporter 5.57 which supports this invocation was + # released with perl 5.8.3 + Exporter->import('import'); + } +} + +our @EXPORT = qw( + packet_compare_lists + packet_bin_read + packet_txt_read + packet_key_val_read + packet_bin_write + packet_txt_write + packet_flush + packet_initialize + packet_read_capabilities + packet_read_and_check_capabilities + packet_check_and_write_capabilities + ); +our @EXPORT_OK = @EXPORT; + +sub packet_compare_lists { + my ($expect, @result) = @_; + my $ix; + if (scalar @$expect != scalar @result) { + return undef; + } + for ($ix = 0; $ix < $#result; $ix++) { + if ($expect->[$ix] ne $result[$ix]) { + return undef; + } + } + return 1; +} + +sub packet_bin_read { + my $buffer; + my $bytes_read = read STDIN, $buffer, 4; + if ( $bytes_read == 0 ) { + # EOF - Git stopped talking to us! + return ( -1, "" ); + } elsif ( $bytes_read != 4 ) { + die "invalid packet: '$buffer'"; + } + my $pkt_size = hex($buffer); + if ( $pkt_size == 0 ) { + return ( 1, "" ); + } elsif ( $pkt_size > 4 ) { + my $content_size = $pkt_size - 4; + $bytes_read = read STDIN, $buffer, $content_size; + if ( $bytes_read != $content_size ) { + die "invalid packet ($content_size bytes expected; $bytes_read bytes read)"; + } + return ( 0, $buffer ); + } else { + die "invalid packet size: $pkt_size"; + } +} + +sub remove_final_lf_or_die { + my $buf = shift; + if ( $buf =~ s/\n$// ) { + return $buf; + } + die "A non-binary line MUST be terminated by an LF.\n" + . "Received: '$buf'"; +} + +sub packet_txt_read { + my ( $res, $buf ) = packet_bin_read(); + if ( $res != -1 and $buf ne '' ) { + $buf = remove_final_lf_or_die($buf); + } + return ( $res, $buf ); +} + +# Read a text line and check that it is in the form "key=value" +sub packet_key_val_read { + my ( $key ) = @_; + my ( $res, $buf ) = packet_txt_read(); + if ( $res == -1 or ( $buf =~ s/^$key=// and $buf ne '' ) ) { + return ( $res, $buf ); + } + die "bad $key: '$buf'"; +} + +sub packet_bin_write { + my $buf = shift; + print STDOUT sprintf( "%04x", length($buf) + 4 ); + print STDOUT $buf; + STDOUT->flush(); +} + +sub packet_txt_write { + packet_bin_write( $_[0] . "\n" ); +} + +sub packet_flush { + print STDOUT sprintf( "%04x", 0 ); + STDOUT->flush(); +} + +sub packet_initialize { + my ($name, $version) = @_; + + packet_compare_lists([0, $name . "-client"], packet_txt_read()) || + die "bad initialize"; + packet_compare_lists([0, "version=" . $version], packet_txt_read()) || + die "bad version"; + packet_compare_lists([1, ""], packet_bin_read()) || + die "bad version end"; + + packet_txt_write( $name . "-server" ); + packet_txt_write( "version=" . $version ); + packet_flush(); +} + +sub packet_read_capabilities { + my @cap; + while (1) { + my ( $res, $buf ) = packet_bin_read(); + if ( $res == -1 ) { + die "unexpected EOF when reading capabilities"; + } + return ( $res, @cap ) if ( $res != 0 ); + $buf = remove_final_lf_or_die($buf); + unless ( $buf =~ s/capability=// ) { + die "bad capability buf: '$buf'"; + } + push @cap, $buf; + } +} + +# Read remote capabilities and check them against