Re: [RFD] Long term plan with submodule refs?

2017-11-10 Thread Jacob Keller
On Fri, Nov 10, 2017 at 12:01 PM, Stefan Beller  wrote:
>>
 Basically, a workflow where it's easier to have each submodule checked
 out at master, and we can still keep track of historical relationship
 of what commit was the submodule at some time ago, but without causing
 some of these headaches.
>>>
>>> So essentially a repo or otherwise parallel workflow just with the 
>>> versioning
>>> happening magically behind your back?
>>
>> Ideally, my developers would like to just have each submodule checked
>> out at master.
>>
>> Ideally, I'd like to be able to checkout an old version of the parent
>> project and have it recorded what version of the shared submodule was
>> at at the time.
>
> This sounds as if a "passive superproject" would work best for you, i.e.
> each commit in a submodule is bubbled up into the superproject,
> making a commit potentially even behind the scenes, such that the
> user interaction with the superproject would be none.
>
> However this approach also sounds careless, as there is no precondition
> that e.g. the superproject builds with all the submodules as is; it is a mere
> tracking of "at this time we have the submodules arranged as such",
> whereas for the versioning aspect, you would want to have commit messages
> in the superproject saying *why* you bumped up a specific submodule.
> The user may not like to give such an explanation as they already wrote
> a commit message for the individual project.
>
> Also this approach sounds like a local approach, as it is not clear to me,
> why you'd want to share the superproject history.
>
>> Ideally, my developers don't want to have to worry about knowing that
>> they shouldn't "git add -a" or "git commit -a" when they have a
>> submodule checked out at a different location from the parent projects
>> gitlink.
>>
>> Thanks,
>> Jake
>>


It doesn't need to be totally passive, in that some (or one
maintainer) can manage when the submodule pointer is actually updated,
but ideally other users don't have to worry about that and can
"pretend" to always keep each submodule at master, as they have always
done in the past.

Thanks,
Jake


Re: Unify annotated and non-annotated tags

2017-11-10 Thread Igor Djordjevic
On 11/11/2017 03:06, Junio C Hamano wrote:

> Igor Djordjevic  writes:
> 
>> If you would like to mimic output of "git show-ref", repeating 
>> commits for each tag pointing to it and showing full tag name as 
>> well, you could do something like this, for example:
>>
>>  for tag in $(git for-each-ref --format="%(refname)" refs/tags)
>>  do
>>  printf '%s %s\n' "$(git rev-parse $tag^0)" "$tag"
>>  done
>>
>>
>> Hope that helps a bit.
> 
> If you use for-each-ref's --format option, you could do something
> like (pardon a long line):
> 
> git for-each-ref 
> --format='%(if)%(*objectname)%(then)%(*objectname)%(else)%(objectname)%(end) 
> %(refname)' refs/tags
> 
> without any loop, I would think.

... and I did have a feeling it should be possible in a single Git 
command :P Thanks.


Re: use of PWD

2017-11-10 Thread Junio C Hamano
Jeff King  writes:

> So totally orthogonal to your bug, I wonder if we ought to be doing:
>
> diff --git a/sha1_file.c b/sha1_file.c
> index 057262d46e..0b76233aa7 100644
> --- a/sha1_file.c
> +++ b/sha1_file.c
> @@ -530,11 +530,11 @@ void prepare_alt_odb(void)
>   if (alt_odb_tail)
>   return;
>  
> - alt = getenv(ALTERNATE_DB_ENVIRONMENT);
> - if (!alt) alt = "";
> -
>   alt_odb_tail = _odb_list;
> - link_alt_odb_entries(alt, strlen(alt), PATH_SEP, NULL, 0);
> +
> + alt = getenv(ALTERNATE_DB_ENVIRONMENT);
> + if (alt)
> + link_alt_odb_entries(alt, strlen(alt), PATH_SEP, NULL, 0);
>  
>   read_info_alternates(get_object_directory(), 0);
>  }
>
> to avoid hitting link_alt_odb_entries() at all when there are no
> entries.

Sounds sane.


Re: Unify annotated and non-annotated tags

2017-11-10 Thread Junio C Hamano
Igor Djordjevic  writes:

> If you would like to mimic output of "git show-ref", repeating 
> commits for each tag pointing to it and showing full tag name as 
> well, you could do something like this, for example:
>
>   for tag in $(git for-each-ref --format="%(refname)" refs/tags)
>   do
>   printf '%s %s\n' "$(git rev-parse $tag^0)" "$tag"
>   done
>
>
> Hope that helps a bit.

If you use for-each-ref's --format option, you could do something
like (pardon a long line):

git for-each-ref 
--format='%(if)%(*objectname)%(then)%(*objectname)%(else)%(objectname)%(end) 
%(refname)' refs/tags

without any loop, I would think.




Hello,

2017-11-10 Thread Fedor Danevich


-- 
Good day! 


What is the best way to reach you for business? I am writing you because I have 
 an opportunity to present to you. I have a business that I would like to 
discuss with you. 

Waiting to read from you soon. 

Thank you


Re: Unify annotated and non-annotated tags

2017-11-10 Thread Igor Djordjevic
Hi Anatoly,

On 10/11/2017 11:58, anatoly techtonik wrote:
> It is hard to work with Git tags, because on low level hash
> of non-annotated tag is pointing to commit, but hash for
> annotated tag is pointing to tag metadata.
> 
> On low level that means that there is no way to get commit
> hash from tag in a single step. If tag is annotated, you need
> to find and parse ^{} string of show-ref, if not, then look for
> string without ^{}.

That is not quite true, as you can always dereference any tag 
(annotated or not) using "^0" notation, see git-rev-parse[1]:

  "As a special rule, ^0 means the commit itself and is used when 
   is the object name of a tag object that refers to a commit 
  object."
 
> So, why not just make all tags work the same so that every
> tag has its own hash and you need to dereference it in the
> same way to get commit hash?
> 
> This way I could get all commit hashes with just:
> 
> git show-ref --tags -d | grep "\^{}"
> 
> or abandon ^{} completely and show commit hashes on -d:
> 
> git show-ref --tags --dereference
>
Depending on what you would _exactly_ like to do, you could get all 
tagged commit hashes like this:

git rev-list --tags --no-walk

Note that each commit will be listed only once, even if more tags 
(annotated or not) point to it.

If you would like to mimic output of "git show-ref", repeating 
commits for each tag pointing to it and showing full tag name as 
well, you could do something like this, for example:

for tag in $(git for-each-ref --format="%(refname)" refs/tags)
do
printf '%s %s\n' "$(git rev-parse $tag^0)" "$tag"
done


Hope that helps a bit.

Regards, Buga

[1] 
https://git-scm.com/docs/git-rev-parse#git-rev-parse-emltrevgtemegemHEADv1510em


Re: What's cooking in git.git (Nov 2017, #03; Wed, 8)

2017-11-10 Thread brian m. carlson
On Wed, Nov 08, 2017 at 02:50:24PM +0900, Junio C Hamano wrote:
> * bc/submitting-patches-in-asciidoc (2017-10-30) 2 commits
>  - Documentation: convert SubmittingPatches to AsciiDoc
>  - Documentation: enable compat-mode for Asciidoctor
> 
>  The SubmittingPatches document has been converted to produce an
>  HTML version via AsciiDoc/Asciidoctor.
> 
>  Any further comments?  Otherwise, will merge to 'next'.

I think you had wanted some changes, so let me send out a fixed series.

> * bc/hash-algo (2017-10-30) 4 commits
>  - Switch empty tree and blob lookups to use hash abstraction
>  - Integrate hash algorithm support with repo setup
>  - Add structure representing hash algorithm
>  - setup: expose enumerated repo info
> 
>  An infrastructure to define what hash function is used in Git is
>  introduced, and an effort to plumb that throughout various
>  codepaths has been started.
> 
>  cf. <20171028181239.59458-1-sand...@crustytoothpaste.net>

I'll be rerolling this as well.
-- 
brian m. carlson / brian with sandals: Houston, Texas, US
https://www.crustytoothpaste.net/~bmc | My opinion only
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


[PATCH] t/3512: demonstrate unrelated submodule/file conflict as cherry-pick failure

2017-11-10 Thread Stefan Beller
Signed-off-by: Stefan Beller 
---

I rewrote your script to integrate with our test suite, potentially acting as
a regression test once we solve the Directory/File conflict issue.

Thanks,
Stefan

 t/t3512-cherry-pick-submodule.sh | 36 
 1 file changed, 36 insertions(+)

diff --git a/t/t3512-cherry-pick-submodule.sh b/t/t3512-cherry-pick-submodule.sh
index 6863b7bb6f..17a6773247 100755
--- a/t/t3512-cherry-pick-submodule.sh
+++ b/t/t3512-cherry-pick-submodule.sh
@@ -10,4 +10,40 @@ KNOWN_FAILURE_NOFF_MERGE_DOESNT_CREATE_EMPTY_SUBMODULE_DIR=1
 KNOWN_FAILURE_NOFF_MERGE_ATTEMPTS_TO_MERGE_REMOVED_SUBMODULE_FILES=1
 test_submodule_switch "git cherry-pick"
 
+test_expect_failure 'unrelated submodule/file conflict is ignored' '
+   test_create_repo sub &&
+
+   touch sub/file &&
+   git -C sub add file &&
+   git -C sub commit -m "add a file in a submodule" &&
+
+   test_create_repo a_repo &&
+   (
+   cd a_repo &&
+   touch a_file &&
+   git add a_file &&
+   git commit -m "add a file" &&
+
+   git branch test &&
+   git checkout test &&
+
+   mkdir sub &&
+   touch sub/content &&
+   git add sub/content &&
+   git commit -m "add a regular folder with name sub" &&
+
+   echo "123" >a_file &&
+   git add a_file &&
+   git commit -m "modify a file" &&
+
+   git checkout master &&
+
+   git submodule add ../sub sub &&
+   git submodule update sub &&
+   git commit -m "add a submodule info folder with name sub" &&
+
+   git cherry-pick test
+   )
+'
+
 test_done
-- 
2.15.0.128.gcadd42da22



Re: [PATCH 2/4] Remove silent clamp of renameLimit

2017-11-10 Thread brian m. carlson
On Fri, Nov 10, 2017 at 10:36:17AM -0800, Elijah Newren wrote:
> Thanks for taking a look!
> 
> On Fri, Nov 10, 2017 at 10:26 AM, Stefan Beller  wrote:
> > From a technical perspective, I would think that if
> > (num_create <= rename_limit || num_src <= rename_limit)
> > holds true, that the double-cast condition would also be always true?
> > Could we just remove that last check?
> 
> Not necessarily.  For example, if num_create = rename_limit-1 and
> num_src = rename_limit+2, then the first condition will be satisfied
> but the second won't.  If it was && rather than ||, then the second
> condition would be superfluous.
> 
> > Or phrased differently, if we can cast to double and extend the check
> > here, do we have to adapt code at other places as well?
> 
> Good point, and yes.  Perhaps I should have re-ordered my patch series
> because I came back to it later and realized that the progress code
> was broken due to overflow/wraparound, and a patch 3 fixed that.
> 
> Further, the later patch used uint64_t instead of double.  While
> double works, perhaps I should change the double here to uint64_t for
> consistency?

I'm wondering if maybe you want to use size_t.  If you end up using an
unsigned type, you might be able to leverage unsigned_mult_overflows to
avoid having to write this by hand.
-- 
brian m. carlson / brian with sandals: Houston, Texas, US
https://www.crustytoothpaste.net/~bmc | My opinion only
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


Re: cherry-pick very slow on big repository

2017-11-10 Thread Elijah Newren
On Fri, Nov 10, 2017 at 6:05 AM, Peter Krefting  wrote:
> Derrick Stolee:
>
>> Git is spending time detecting renames, which implies you probably renamed
>> a folder or added and deleted a large number of files. This rename detection
>> is quadratic (# adds times # deletes).
>
> Yes, a couple of directories with a lot of template files have been renamed
> (and some removed, some added) between the current development branch and
> this old maintenance branch. I get the "Performing inexact rename detection"
> a lot when merging changes in the other direction.
>
> However, none of them applies to these particular commits, which only
> touches files that are in the exact same location on both branches.

I would be very interested to hear how my rename detection performance
patches work for you; this kind of usecase was the exact one it was
designed to help the most.  See
https://public-inbox.org/git/20171110222156.23221-1-new...@gmail.com/


Re: [PATCH 00/30] Add directory rename detection to git

2017-11-10 Thread Elijah Newren
On Fri, Nov 10, 2017 at 2:27 PM, Philip Oakley  wrote:
> From: "Elijah Newren" 
>>
>> In this patchset, I introduce directory rename detection to
>> merge-recursive,
>> predominantly so that when files are added to directories on one side of
>> history and those directories are renamed on the other side of history,
>> the
>> files will end up in the proper location after a merge or cherry-pick.
>>
>> However, this isn't limited to that simplistic case.  More interesting
>> possibilities exist, such as:
>>
>>  * a file being renamed into a directory which is renamed on the other
>>side of history, causing the need for a transitive rename.
>>
>
> How does this cope with the case insensitive case preserving file systems on
> Mac and Windows, esp when core.ignorecase is true. If it's a bigger problem
> that the series already covers, would the likely changes be reasonably
> localised?
>
> This came up recently on GfW for `git checkout` of a branch where the case
> changed ("Test" <-> "test"), but git didn't notice that it needed to rename
> the directories on such an file system.
> https://github.com/git-for-windows/git/issues/1333

I wasn't aware there were problems with git on case insensitive case
preserving filesystems; fixing them wasn't something I had in mind
when writing this series.  However, the particular bug you mention is
actually completely orthogonal to this series; it talks about
git-checkout without the -m/--merge option, which doesn't touch any
code path I modified in my series, so my series can't really fix or
worsen that particular issue.

But, if there are further issues with such filesystems that also
affect merges/cherry-picks/rebases, then I don't think my series will
either help or hurt there either.  The recursive merge machinery
already has remove_file() and update_file() wrappers that it uses
whenever it needs to remove/add/update a file in the working directory
and/or index, and I have simply continued using those, so the number
of places you'd need to modify to fix issues would remain just as
localized as before.  Also, I continue to depend on the reading of the
index & trees that unpack_trees() does, which I haven't modified, so
again it'd be the same number of places that someone would need to
fix.  (However, the whole design to have unpack_trees() do the initial
work and then have recursive merge try to "fix it up" is really
starting to strain.  I'm starting to think, again, that merge
recursive needs a redesign, and have some arguments I wanted to float
out there...but I've dumped enough on the list for a day.)

It's possible that this series fixes one particular issue -- namely
when merging, if the merge-base contained a "Test" directory, one side
added a file to that directory, and the other side renamed "Test" to
"test", and if the presence of both "Test" and "test" directories in
the merge result is problematic, then at least with my fixes you
wouldn't end up with both directories and could thus avoid that
problem in a narrow set of cases.

Sorry that I don't have any better news than that for you.

Elijah


[PATCH 2/2] stash: implement builtin stash helper

2017-11-10 Thread Joel Teichroeb
Start moving stash functions over to builtin c code and call
them in the shell script, instead of converting it all at
once.

Signed-off-by: Joel Teichroeb 
---
 Makefile|   1 +
 builtin.h   |   1 +
 builtin/stash--helper.c | 516 
 git-stash.sh| 134 +
 git.c   |   1 +
 5 files changed, 526 insertions(+), 127 deletions(-)
 create mode 100644 builtin/stash--helper.c

diff --git a/Makefile b/Makefile
index ee9d5eb11..3a9bd4d57 100644
--- a/Makefile
+++ b/Makefile
@@ -1000,6 +1000,7 @@ BUILTIN_OBJS += builtin/send-pack.o
 BUILTIN_OBJS += builtin/shortlog.o
 BUILTIN_OBJS += builtin/show-branch.o
 BUILTIN_OBJS += builtin/show-ref.o
+BUILTIN_OBJS += builtin/stash--helper.o
 BUILTIN_OBJS += builtin/stripspace.o
 BUILTIN_OBJS += builtin/submodule--helper.o
 BUILTIN_OBJS += builtin/symbolic-ref.o
diff --git a/builtin.h b/builtin.h
index 42378f3aa..a14fd85b0 100644
--- a/builtin.h
+++ b/builtin.h
@@ -219,6 +219,7 @@ extern int cmd_shortlog(int argc, const char **argv, const 
char *prefix);
 extern int cmd_show(int argc, const char **argv, const char *prefix);
 extern int cmd_show_branch(int argc, const char **argv, const char *prefix);
 extern int cmd_status(int argc, const char **argv, const char *prefix);
+extern int cmd_stash__helper(int argc, const char **argv, const char *prefix);
 extern int cmd_stripspace(int argc, const char **argv, const char *prefix);
 extern int cmd_submodule__helper(int argc, const char **argv, const char 
*prefix);
 extern int cmd_symbolic_ref(int argc, const char **argv, const char *prefix);
diff --git a/builtin/stash--helper.c b/builtin/stash--helper.c
new file mode 100644
index 0..c8cb667fe
--- /dev/null
+++ b/builtin/stash--helper.c
@@ -0,0 +1,516 @@
+#include "builtin.h"
+#include "config.h"
+#include "parse-options.h"
+#include "refs.h"
+#include "lockfile.h"
+#include "cache-tree.h"
+#include "unpack-trees.h"
+#include "merge-recursive.h"
+#include "argv-array.h"
+#include "run-command.h"
+
+static const char * const git_stash_usage[] = {
+   N_("git stash--helper drop [-q|--quiet] []"),
+   N_("git stash--helper pop [--index] [-q|--quiet] []"),
+   N_("git stash--helper apply [--index] [-q|--quiet] []"),
+   N_("git stash--helper branch  []"),
+   N_("git stash--helper clear"),
+   NULL
+};
+
+static const char * const git_stash_drop_usage[] = {
+   N_("git stash--helper drop [-q|--quiet] []"),
+   NULL
+};
+
+static const char * const git_stash_pop_usage[] = {
+   N_("git stash--helper pop [--index] [-q|--quiet] []"),
+   NULL
+};
+
+static const char * const git_stash_apply_usage[] = {
+   N_("git stash--helper apply [--index] [-q|--quiet] []"),
+   NULL
+};
+
+static const char * const git_stash_branch_usage[] = {
+   N_("git stash--helper branch  []"),
+   NULL
+};
+
+static const char * const git_stash_clear_usage[] = {
+   N_("git stash--helper clear"),
+   NULL
+};
+
+static const char *ref_stash = "refs/stash";
+static int quiet;
+static char stash_index_path[PATH_MAX];
+
+struct stash_info {
+   struct object_id w_commit;
+   struct object_id b_commit;
+   struct object_id i_commit;
+   struct object_id u_commit;
+   struct object_id w_tree;
+   struct object_id b_tree;
+   struct object_id i_tree;
+   struct object_id u_tree;
+   const char *message;
+   const char *revision;
+   int is_stash_ref;
+   int has_u;
+   const char *patch;
+};
+
+static int get_stash_info(struct stash_info *info, const char *commit)
+{
+   struct strbuf w_commit_rev = STRBUF_INIT;
+   struct strbuf b_commit_rev = STRBUF_INIT;
+   struct strbuf w_tree_rev = STRBUF_INIT;
+   struct strbuf b_tree_rev = STRBUF_INIT;
+   struct strbuf i_tree_rev = STRBUF_INIT;
+   struct strbuf u_tree_rev = STRBUF_INIT;
+   struct strbuf commit_buf = STRBUF_INIT;
+   struct strbuf symbolic = STRBUF_INIT;
+   struct strbuf out = STRBUF_INIT;
+   int ret;
+   const char *revision = commit;
+   char *end_of_rev;
+   struct child_process cp = CHILD_PROCESS_INIT;
+   info->is_stash_ref = 0;
+
+   if (commit == NULL) {
+   strbuf_addf(_buf, "%s@{0}", ref_stash);
+   revision = commit_buf.buf;
+   } else if (strspn(commit, "0123456789") == strlen(commit)) {
+   strbuf_addf(_buf, "%s@{%s}", ref_stash, commit);
+   revision = commit_buf.buf;
+   }
+   info->revision = revision;
+
+   strbuf_addf(_commit_rev, "%s", revision);
+   strbuf_addf(_commit_rev, "%s^1", revision);
+   strbuf_addf(_tree_rev, "%s:", revision);
+   strbuf_addf(_tree_rev, "%s^1:", revision);
+   strbuf_addf(_tree_rev, "%s^2:", revision);
+
+   ret = !get_oid(w_commit_rev.buf, >w_commit) &&
+   !get_oid(b_commit_rev.buf, >b_commit) &&
+

[PATCH 1/2] merge: close the index lock when not writing the new index

2017-11-10 Thread Joel Teichroeb
If the merge does not have anything to do, it does not unlock the index,
causing any further index operations to fail. Thus, always unlock the index
regardless of outcome.

Signed-off-by: Joel Teichroeb 
---
 merge-recursive.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/merge-recursive.c b/merge-recursive.c
index 2ca8444c6..225ff3fb5 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -2184,9 +2184,12 @@ int merge_recursive_generic(struct merge_options *o,
if (clean < 0)
return clean;
 
-   if (active_cache_changed &&
-   write_locked_index(_index, , COMMIT_LOCK))
-   return err(o, _("Unable to write index."));
+   if (active_cache_changed) {
+   if (write_locked_index(_index, , COMMIT_LOCK))
+   return err(o, _("Unable to write index."));
+   } else {
+   rollback_lock_file();
+   }
 
return clean ? 0 : 1;
 }
-- 
2.15.0



[PATCH 0/2] Convert some stash functionality to a builtin

2017-11-10 Thread Joel Teichroeb
I've been working on converting all of git stash to be a
builtin, however it's hard to get it all working at once with
limited time, so I've moved around half of it to a new
stash--helper builtin and called these functions from the shell
script. Once this is stabalized, it should be easier to convert
the rest of the commands one at a time without breaking
anything.

I've sent most of this code before, but that was targetting a
full replacement of stash. The code is overall the same, but
with some code review changes and updates for internal api
changes.

Joel Teichroeb (2):
  merge: close the index lock when not writing the new index
  stash: implement builtin stash helper

 Makefile|   1 +
 builtin.h   |   1 +
 builtin/stash--helper.c | 516 
 git-stash.sh| 134 +
 git.c   |   1 +
 merge-recursive.c   |   9 +-
 6 files changed, 532 insertions(+), 130 deletions(-)
 create mode 100644 builtin/stash--helper.c

-- 
2.15.0



[PATCH] builtin/describe.c: describe a blob

2017-11-10 Thread Stefan Beller
Sometimes users are given a hash of an object and they want to
identify it further (ex.: Use verify-pack to find the largest blobs,
but what are these? or [1])

When describing commits, we try to anchor them to tags or refs, as these
are conceptually on a higher level than the commit. And if there is no ref
or tag that matches exactly, we're out of luck.  So we employ a heuristic
to make up a name for the commit. These names are ambiguous, there might
be different tags or refs to anchor to, and there might be different
path in the DAG to travel to arrive at the commit precisely.

When describing a blob, we want to describe the blob from a higher layer
as well, which is a tuple of (commit, deep/path) as the tree objects
involved are rather uninteresting.  The same blob can be referenced by
multiple commits, so how we decide which commit to use?  This patch
implements a rather naive approach on this: As there are no back pointers
from blobs to commits in which the blob occurs, we'll start walking from
any tips available, listing the blobs in-order of the commit and once we
found the blob, we'll take the first commit that listed the blob.  For
source code this is likely not the first commit that introduced the blob,
but rather the latest commit that contained the blob.  For example:

  git describe v0.99:Makefile
  v0.99-5-gab6625e06a:Makefile

tells us the latest commit that contained the Makefile as it was in tag
v0.99 is commit v0.99-5-gab6625e06a (and at the same path), as the next
commit on top v0.99-6-gb1de9de2b9 ([PATCH] Bootstrap "make dist",
2005-07-11) touches the Makefile.

Let's see how this description turns out, if it is useful in day-to-day
use as I have the intuition that we'd rather want to see the *first*
commit that this blob was introduced to the repository (which can be
achieved easily by giving the `--reverse` flag in the describe_blob rev
walk).

[1] https://stackoverflow.com/questions/223678/which-commit-has-this-blob

Signed-off-by: Stefan Beller 
Signed-off-by: Junio C Hamano 
---
 Documentation/git-describe.txt | 13 +++-
 builtin/describe.c | 71 ++
 t/t6120-describe.sh| 15 +
 3 files changed, 92 insertions(+), 7 deletions(-)

diff --git a/Documentation/git-describe.txt b/Documentation/git-describe.txt
index c924c945ba..a25443ca91 100644
--- a/Documentation/git-describe.txt
+++ b/Documentation/git-describe.txt
@@ -3,7 +3,7 @@ git-describe(1)
 
 NAME
 
-git-describe - Describe a commit using the most recent tag reachable from it
+git-describe - Describe a commit or blob using the graph relations
 
 
 SYNOPSIS
@@ -11,6 +11,7 @@ SYNOPSIS
 [verse]
 'git describe' [--all] [--tags] [--contains] [--abbrev=] [...]
 'git describe' [--all] [--tags] [--contains] [--abbrev=] --dirty[=]
+'git describe' 
 
 DESCRIPTION
 ---
@@ -24,6 +25,16 @@ By default (without --all or --tags) `git describe` only 
shows
 annotated tags.  For more information about creating annotated tags
 see the -a and -s options to linkgit:git-tag[1].
 
+If the given object refers to a blob, it will be described
+as `:`, such that the blob can be found
+at `` in the ``. Note, that the commit is likely
+not the commit that introduced the blob, but the one that was found
+first; to find the commit that introduced the blob, you need to find
+the commit that last touched the path, e.g.
+`git log  -- `.
+As blobs do not point at the commits they are contained in,
+describing blobs is slow as we have to walk the whole graph.
+
 OPTIONS
 ---
 ...::
diff --git a/builtin/describe.c b/builtin/describe.c
index 9e9a5ed5d4..acfd853a30 100644
--- a/builtin/describe.c
+++ b/builtin/describe.c
@@ -3,6 +3,7 @@
 #include "lockfile.h"
 #include "commit.h"
 #include "tag.h"
+#include "blob.h"
 #include "refs.h"
 #include "builtin.h"
 #include "exec_cmd.h"
@@ -11,8 +12,9 @@
 #include "hashmap.h"
 #include "argv-array.h"
 #include "run-command.h"
+#include "revision.h"
+#include "list-objects.h"
 
-#define SEEN   (1u << 0)
 #define MAX_TAGS   (FLAG_BITS - 1)
 
 static const char * const describe_usage[] = {
@@ -434,6 +436,56 @@ static void describe_commit(struct object_id *oid, struct 
strbuf *dst)
strbuf_addstr(dst, suffix);
 }
 
+struct process_commit_data {
+   struct object_id current_commit;
+   struct object_id looking_for;
+   struct strbuf *dst;
+   struct rev_info *revs;
+};
+
+static void process_commit(struct commit *commit, void *data)
+{
+   struct process_commit_data *pcd = data;
+   pcd->current_commit = commit->object.oid;
+}
+
+static void process_object(struct object *obj, const char *path, void *data)
+{
+   struct process_commit_data *pcd = data;
+
+   if (!oidcmp(>looking_for, >oid) && !pcd->dst->len) {
+   reset_revision_walk();
+   describe_commit(>current_commit, pcd->dst);
+   strbuf_addf(pcd->dst, 

[PATCH 0/1] describe a blob: with better docs

2017-11-10 Thread Stefan Beller
This replaces the last patch of origin/sb/describe-blob.
Interdiff is below.

I chose to not mention options at all, as currently none are applicable;
Check for options and tell the user by die()ing that we don't know about
the options for blobs.

Thanks,
Stefan

  builtin/describe.c: describe a blob

 Documentation/git-describe.txt | 13 +++-
 builtin/describe.c | 71 ++
 t/t6120-describe.sh| 15 +
 3 files changed, 92 insertions(+), 7 deletions(-)

-- 
2.15.0.128.gcadd42da22
diff --git c/Documentation/git-describe.txt w/Documentation/git-describe.txt
index 79ec0be62a..a25443ca91 100644
--- c/Documentation/git-describe.txt
+++ w/Documentation/git-describe.txt
@@ -11,7 +11,7 @@ SYNOPSIS
 [verse]
 'git describe' [--all] [--tags] [--contains] [--abbrev=] [...]
 'git describe' [--all] [--tags] [--contains] [--abbrev=] --dirty[=]
-'git describe' [] 
+'git describe' 
 
 DESCRIPTION
 ---
diff --git c/builtin/describe.c w/builtin/describe.c
index cf08bef344..acfd853a30 100644
--- c/builtin/describe.c
+++ w/builtin/describe.c
@@ -501,9 +501,13 @@ static void describe(const char *arg, int last_one)
 
if (cmit)
describe_commit(, );
-   else if (lookup_blob())
+   else if (lookup_blob()) {
+   if (all || tags || longformat || first_parent ||
+   patterns.nr || exclude_patterns.nr ||
+   always || dirty || broken)
+   die(_("options not available for describing blobs"));
describe_blob(oid, );
-   else
+   } else
die(_("%s is neither a commit nor blob"), arg);
 
puts(sb.buf);


Re: [PATCH 00/30] Add directory rename detection to git

2017-11-10 Thread Philip Oakley

From: "Elijah Newren" 

[This series is entirely independent of my rename detection limits series.
However, I have a separate rename detection performance series that 
depends

on both this series and the rename detection limits series.]

In this patchset, I introduce directory rename detection to 
merge-recursive,

predominantly so that when files are added to directories on one side of
history and those directories are renamed on the other side of history, 
the

files will end up in the proper location after a merge or cherry-pick.

However, this isn't limited to that simplistic case.  More interesting
possibilities exist, such as:

 * a file being renamed into a directory which is renamed on the other
   side of history, causing the need for a transitive rename.



How does this cope with the case insensitive case preserving file systems on 
Mac and Windows, esp when core.ignorecase is true. If it's a bigger problem 
that the series already covers, would the likely changes be reasonably 
localised?


This came up recently on GfW for `git checkout` of a branch where the case 
changed ("Test" <-> "test"), but git didn't notice that it needed to rename 
the directories on such an file system. 
https://github.com/git-for-windows/git/issues/1333




--
Philip 



[RFC PATCH 8/9] merge-recursive: Accelerate rename detection

2017-11-10 Thread Elijah Newren
If a file is unmodified on one side of history (no content changes, no
name change, and no mode change) and is renamed on the other side, then
the correct merge result is to take both the file name and the file
contents (and file mode) of the renamed file.  merge-recursive detects
this rename and gets the correct merge result.

Note that if no rename detection is done, then this will appear to the
merge machinery as two files: one that was unmodified on one side of
history and deleted on the other (thus the merge should delete it), and
one which was newly added on one side of history (thus the merge should
include it).  Thus, even if the rename wasn't detected, we still would
have ended up with the correct result.

In other words, rename detection is a waste of time for files that were
unmodified on the OTHER side of history.  We can accelerate rename
detection for merges by providing information about the other side of
history, which will allow us to remove all such rename sources from the
list of candidates we care about.

There are two gotchas:

  1) Not trying to detect renames for these types of files can result in
 rename/add conflicts being instead detected as add/add conflicts,
 and can result in rename/rename(2to1) conflicts being instead
 detected as either rename/add or add/add conflicts.  Luckily for
 us, these three types of conflicts happen to make the same changes
 to the index and working tree (what a coincidence...), so this
 isn't a significant issue; the only annoyance is that the stdout
 from the merge command will include a "CONFLICT($type)" message for
 a related conflict type instead of the precise conflict type.

  2) If there is a directory rename on one side of history AND all files
 within the directory are not merely renamed but are modified as
 well AND none of the original files in the directory are modified
 on the other side of history AND there are new files added (or
 moved into) to the original directory on that other side of
 history, then this change will prevent us from being able to detect
 that directory rename and placing the new file(s) into the
 appropriate directory.  A subsequent commit will correct this
 downside.

In one particular testcase involving a large repository and some
high-level directories having been renamed, this cut the time necessary
for a cherry-pick down by a factor of about 8 (from around 4.5 minutes
down to around 34 seconds)

Signed-off-by: Elijah Newren 
---
 diff.c|  1 +
 diff.h|  3 +++
 diffcore-rename.c | 43 ++-
 merge-recursive.c | 61 +--
 4 files changed, 101 insertions(+), 7 deletions(-)

diff --git a/diff.c b/diff.c
index c6597e3231..83723ab26d 100644
--- a/diff.c
+++ b/diff.c
@@ -4173,6 +4173,7 @@ void diff_setup(struct diff_options *options)
}
 
options->color_moved = diff_color_moved_default;
+   options->ignore_for_renames = NULL;
 }
 
 void diff_setup_done(struct diff_options *options)
diff --git a/diff.h b/diff.h
index adf7e92eb5..1288f36fd2 100644
--- a/diff.h
+++ b/diff.h
@@ -196,6 +196,9 @@ struct diff_options {
} color_moved;
#define COLOR_MOVED_DEFAULT COLOR_MOVED_ZEBRA
#define COLOR_MOVED_MIN_ALNUM_COUNT 20
+
+   /* Paths we should ignore for rename purposes */
+   struct string_list *ignore_for_renames;
 };
 
 void diff_emit_submodule_del(struct diff_options *o, const char *line);
diff --git a/diffcore-rename.c b/diffcore-rename.c
index b15d9d74ef..aa8e0e4d4a 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -437,6 +437,40 @@ static int find_renames(struct diff_score *mx, int 
dst_cnt, int minimum_score, i
return count;
 }
 
+static int handle_rename_ignores(struct diff_options *options)
+{
+   int detect_rename = options->detect_rename;
+   struct string_list *ignores = options->ignore_for_renames;
+   int ignored = 0;
+   int i, j;
+
+   /* rename_ignores onlhy relevant when we're not detecting copies */
+   if (ignores == NULL || detect_rename == DIFF_DETECT_COPY)
+   return 0;
+
+   for (i = 0, j = 0; i < ignores->nr && j < rename_src_nr;) {
+   struct diff_filespec *one = rename_src[j].p->one;
+   int cmp;
+
+   if (one->rename_used) {
+   j++;
+   continue;
+   }
+
+   cmp = strcmp(ignores->items[i].string, one->path);
+   if (cmp < 0)
+   i++;
+   else if (cmp > 0)
+   j++;
+   else {
+   one->rename_used++;
+   ignored++;
+   }
+   }
+
+   return ignored;
+}
+
 void diffcore_rename(struct diff_options *options)
 {
int detect_rename = options->detect_rename;
@@ -445,7 +479,7 

[RFC PATCH 5/9] merge-recursive: Fix rename/add conflict handling

2017-11-10 Thread Elijah Newren
This makes the rename/add conflict handling make use of the new
handle_file_collision() function, which fixes several bugs and improves
things for the rename/add case significantly.  Previously, rename/add
would:

  * Not leave any higher order stage entries in the index, making it
appear as if there were no conflict.
  * Would place the rename file at the colliding path, and move the
added file elsewhere, which combined with the lack of higher order
stage entries felt really odd.  It's not clear to me why the
rename should take precedence over the add; if one should be moved
out of the way, they both probably should.
  * In the recursive case, it would do a two way merge of the added
file and the version of the renamed file on the renamed side,
completely excluding modifications to the renamed file on the
unrenamed side of history.

Using the new handle_file_collision() fixes all of these issues, and
adds smarts to allow two-way merge OR move colliding files to separate
paths depending on the similarity of the colliding files.

Signed-off-by: Elijah Newren 
---
 merge-recursive.c | 135 +++---
 t/t6036-recursive-corner-cases.sh |  11 ++--
 2 files changed, 104 insertions(+), 42 deletions(-)

diff --git a/merge-recursive.c b/merge-recursive.c
index 345479ad50..f29cbd1240 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -167,6 +167,7 @@ static int oid_eq(const struct object_id *a, const struct 
object_id *b)
 enum rename_type {
RENAME_NORMAL = 0,
RENAME_DIR,
+   RENAME_ADD,
RENAME_DELETE,
RENAME_ONE_FILE_TO_ONE,
RENAME_ONE_FILE_TO_TWO,
@@ -209,6 +210,7 @@ static inline void setup_rename_conflict_info(enum 
rename_type rename_type,
  struct stage_data *src_entry1,
  struct stage_data *src_entry2)
 {
+   int ostage1, ostage2;
struct rename_conflict_info *ci = xcalloc(1, sizeof(struct 
rename_conflict_info));
ci->rename_type = rename_type;
ci->pair1 = pair1;
@@ -226,18 +228,22 @@ static inline void setup_rename_conflict_info(enum 
rename_type rename_type,
dst_entry2->rename_conflict_info = ci;
}
 
-   if (rename_type == RENAME_TWO_FILES_TO_ONE) {
-   /*
-* For each rename, there could have been
-* modifications on the side of history where that
-* file was not renamed.
-*/
-   int ostage1 = o->branch1 == branch1 ? 3 : 2;
-   int ostage2 = ostage1 ^ 1;
+   /*
+* For each rename, there could have been
+* modifications on the side of history where that
+* file was not renamed.
+*/
+   if (rename_type == RENAME_ADD ||
+   rename_type == RENAME_TWO_FILES_TO_ONE) {
+   ostage1 = o->branch1 == branch1 ? 3 : 2;
 
ci->ren1_other.path = pair1->one->path;
oidcpy(>ren1_other.oid, _entry1->stages[ostage1].oid);
ci->ren1_other.mode = src_entry1->stages[ostage1].mode;
+   }
+
+   if (rename_type == RENAME_TWO_FILES_TO_ONE) {
+   ostage2 = ostage1 ^ 1;
 
ci->ren2_other.path = pair2->one->path;
oidcpy(>ren2_other.oid, _entry2->stages[ostage2].oid);
@@ -1104,6 +1110,18 @@ static int merge_file_special_markers(struct 
merge_options *o,
   const char *filename2,
   struct merge_file_info *mfi)
 {
+   if (o->branch1 != branch1) {
+   /*
+* It's weird getting a reverse merge with HEAD on the bottom
+* and the other branch on the top.  Fix that.
+*/
+   return merge_file_special_markers(o,
+ one, b, a,
+ branch2, filename2,
+ branch1, filename1,
+ mfi);
+   }
+
char *side1 = NULL;
char *side2 = NULL;
int ret;
@@ -1291,6 +1309,21 @@ static int handle_file_collision(struct merge_options *o,
struct diff_filespec null, a, b;
int minimum_score;
 
+   /*
+* It's easiest to get the correct things into stage 2 and 3, and
+* to make sure that the content merge puts HEAD before the other
+* branch if we just ensure that branch1 == o->branch1.  So, simply
+* flip arguments around if we don't have that.
+*/
+   if (branch1 != o->branch1) {
+   return handle_file_collision(o, collide_path,
+prev_path2, prev_path1,
+branch2, branch1,
+b_oid, b_mode,
+   

[RFC PATCH 6/9] merge-recursive: Improve handling for rename/rename(2to1) conflicts

2017-11-10 Thread Elijah Newren
This makes the rename/rename(2to1) conflicts use the new
handle_file_collision() function.  Since that function was based
originally on the rename/rename(2to1) handling code, the main
differences here are in what was added.  In particular:

  * If the two colliding files are similar, instead of being stored
at collide_path~HEAD and collide_path~MERGE, the files are two-way
merged and recorded at collide_path.
  * Instead of recording the version of the renamed file that existed
on the renamed side in the index (thus ignoring any changes that
were made to the file on the side of history without the rename),
we do a three-way content merge on the renamed path, then store
that at either stage 2 or stage 3.
  * Note that if either of the three-way content merges done for each
rename have conflicts, we do NOT try to estimate the similarity of
the resulting two files and just automatically consider them to be
dissimilar.  This is done to avoid foisting conflicts-of-conflicts
on the user.

Signed-off-by: Elijah Newren 
---
Is it too weird to others that I potentially record a merged file with
conflict markers at both stage 2 and stage 3 in the index?  To me, it
seemed less weird than what we previously did, but I am curious what
others think of it. 

 merge-recursive.c| 100 +--
 t/t6042-merge-rename-corner-cases.sh |   2 +-
 2 files changed, 14 insertions(+), 88 deletions(-)

diff --git a/merge-recursive.c b/merge-recursive.c
index f29cbd1240..b8108740c4 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -647,26 +647,6 @@ static int update_stages(struct merge_options *opt, const 
char *path,
return 0;
 }
 
-static int update_stages_for_stage_data(struct merge_options *opt,
-   const char *path,
-   const struct stage_data *stage_data)
-{
-   struct diff_filespec o, a, b;
-   o.mode = stage_data->stages[1].mode;
-   oidcpy(, _data->stages[1].oid);
-
-   a.mode = stage_data->stages[2].mode;
-   oidcpy(, _data->stages[2].oid);
-
-   b.mode = stage_data->stages[3].mode;
-   oidcpy(, _data->stages[3].oid);
-
-   return update_stages(opt, path,
-is_null_sha1(o.oid.hash) ? NULL : ,
-is_null_sha1(a.oid.hash) ? NULL : ,
-is_null_sha1(b.oid.hash) ? NULL : );
-}
-
 static void update_entry(struct stage_data *entry,
 struct diff_filespec *o,
 struct diff_filespec *a,
@@ -1598,7 +1578,6 @@ static int conflict_rename_rename_2to1(struct 
merge_options *o,
char *path = c1->path; /* == c2->path */
struct merge_file_info mfi_c1;
struct merge_file_info mfi_c2;
-   int ret;
 
output(o, 1, _("CONFLICT (rename/rename): "
   "Rename %s->%s in %s. "
@@ -1606,9 +1585,6 @@ static int conflict_rename_rename_2to1(struct 
merge_options *o,
   a->path, c1->path, ci->branch1,
   b->path, c2->path, ci->branch2);
 
-   remove_file(o, 1, a->path, o->call_depth || 
would_lose_untracked(a->path));
-   remove_file(o, 1, b->path, o->call_depth || 
would_lose_untracked(b->path));
-
if (merge_file_special_markers(o, a, c1, >ren1_other,
   o->branch1, c1->path,
   o->branch2, ci->ren1_other.path, 
_c1) ||
@@ -1617,66 +1593,11 @@ static int conflict_rename_rename_2to1(struct 
merge_options *o,
   o->branch2, c2->path, _c2))
return -1;
 
-   if (o->call_depth) {
-   /*
-* If mfi_c1.clean && mfi_c2.clean, then it might make
-* sense to do a two-way merge of those results.  But, I
-* think in all cases, it makes sense to have the virtual
-* merge base just undo the renames; they can be detected
-* again later for the non-recursive merge.
-*/
-   remove_file(o, 0, path, 0);
-   ret = update_file(o, 0, _c1.oid, mfi_c1.mode, a->path);
-   if (!ret)
-   ret = update_file(o, 0, _c2.oid, mfi_c2.mode,
- b->path);
-   } else {
-   char *new_path1 = unique_path(o, path, ci->branch1);
-   char *new_path2 = unique_path(o, path, ci->branch2);
-   output(o, 1, _("Renaming %s to %s and %s to %s instead"),
-  a->path, new_path1, b->path, new_path2);
-   if (was_dirty(o, path))
-   output(o, 1, _("Refusing to lose dirty file at %s"),
-  path);
-   else if (would_lose_untracked(path))
-   /*
-* Only way we get here is if both renames 

[RFC PATCH 7/9] merge-recursive: Improve handling for add/add conflicts

2017-11-10 Thread Elijah Newren
This makes add/add conflicts use the new handle_file_collision()
function.  This leaves the handling of the index the same, but
modifies how the working tree is handled: instead of always doing
a two-way merge of the file contents and recording them at the
collision path name, we instead first estimate the similarity of the
two files involved.  If they are not similar, we instead record the
file contents into two separate files for the user to inspect.

Several testcases had to be modified to either expect files to be
written to different locations, or for the two test colliding files
to be modified so that they were similar.

Signed-off-by: Elijah Newren 
---
The sheer number of tests that relied on add/add always resulting in a
two-way merge gave me a little pause.  I am assuming that the ONLY reason
the testsuite did that was just because it made it really easy to write
the tests (you wouldn't need to make the two files similar at all), not
because it was actually the behavior that was really wanted for truly
dissimilar files.  Based on that assumption, I just modified the tests
for the "new world".  But I'd like to hear others' comments on that
assumption of mine.

 merge-recursive.c| 24 +++-
 t/t2023-checkout-m.sh|  2 +-
 t/t3418-rebase-continue.sh   | 27 +++
 t/t3504-cherry-pick-rerere.sh| 19 ++-
 t/t4200-rerere.sh| 12 ++--
 t/t6020-merge-df.sh  |  4 ++--
 t/t6024-recursive-merge.sh   | 35 +--
 t/t6025-merge-symlinks.sh|  9 ++---
 t/t6031-merge-filemode.sh|  4 ++--
 t/t6042-merge-rename-corner-cases.sh |  2 +-
 t/t6043-merge-rename-directories.sh  | 13 -
 t/t7060-wtstatus.sh  |  1 +
 t/t7064-wtstatus-pv2.sh  |  4 ++--
 t/t7506-status-submodule.sh  | 11 +++
 t/t7610-mergetool.sh | 28 ++--
 15 files changed, 127 insertions(+), 68 deletions(-)

diff --git a/merge-recursive.c b/merge-recursive.c
index b8108740c4..7bc9a2ac80 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -3029,11 +3029,25 @@ static int process_entry(struct merge_options *o,
clean_merge = -1;
}
} else if (a_oid && b_oid) {
-   /* Case C: Added in both (check for same permissions) and */
-   /* case D: Modified in both, but differently. */
-   clean_merge = merge_content(o, path, 0 /* file_in_way */,
-   o_oid, o_mode, a_oid, a_mode, 
b_oid, b_mode,
-   NULL);
+   if (!o_oid) {
+   /* Case C: Added in both (check for same permissions) */
+   output(o, 1,
+  _("CONFLICT (add/add): Merge conflict in %s"),
+  path);
+   clean_merge = handle_file_collision(o,
+   path, NULL, NULL,
+   o->branch1,
+   o->branch2,
+   a_oid, a_mode,
+   b_oid, b_mode,
+   0);
+   } else
+   /* case D: Modified in both, but differently. */
+   clean_merge = merge_content(o, path, 0 /* file_in_way 
*/,
+   o_oid, o_mode,
+   a_oid, a_mode,
+   b_oid, b_mode,
+   NULL);
} else if (!o_oid && !a_oid && !b_oid) {
/*
 * this entry was deleted altogether. a_mode == 0 means
diff --git a/t/t2023-checkout-m.sh b/t/t2023-checkout-m.sh
index 7e18985134..2f8ea52318 100755
--- a/t/t2023-checkout-m.sh
+++ b/t/t2023-checkout-m.sh
@@ -27,7 +27,7 @@ clean_branchnames () {
 }
 
 test_expect_success '-m restores 2-way conflicted+resolved file' '
-   cp each.txt each.txt.conflicted &&
+   test_must_fail git merge-file -p each.txt~HEAD /dev/null 
each.txt~master >each.txt.conflicted &&
echo resolved >each.txt &&
git add each.txt &&
git checkout -m -- each.txt &&
diff --git a/t/t3418-rebase-continue.sh b/t/t3418-rebase-continue.sh
index fcfdd197bd..3523558421 100755
--- a/t/t3418-rebase-continue.sh
+++ b/t/t3418-rebase-continue.sh
@@ -10,10 +10,17 @@ set_fake_editor
 
 test_expect_success 'setup' '
test_commit "commit-new-file-F1" F1 1 &&
-   test_commit "commit-new-file-F2" F2 2 &&
+   printf 

[RFC PATCH 9/9] diffcore-rename: Filter rename_src list when possible

2017-11-10 Thread Elijah Newren
We have to look at each entry in rename_src a total of rename_dst_nr
times.  When we're not detecting copies, any exact renames or ignorable
rename paths will just be skipped over.  While checking that these can
be skipped over is a relatively cheap check, it's still a waste of time
to do that check more than once, let alone rename_dst_nr times.  When
rename_src_nr is a few thousand times bigger than the number of relevant
sources (such as when cherry-picking a commit that only touched a
handful of files, but from a side of history that has different names
for some high level directories), this time can add up.

First make an initial pass over the rename_src array and move all the
relevant entries to the front, so that we can iterate over just those
relevant entries.

In one particular testcase involving a large repository and some
high-level directories having been renamed, this cut the time necessary
for a cherry-pick down by a factor of about 2 (from around 34 seconds
down to just under 16 seconds)

Signed-off-by: Elijah Newren 
---
 diffcore-rename.c | 47 +++
 1 file changed, 31 insertions(+), 16 deletions(-)

diff --git a/diffcore-rename.c b/diffcore-rename.c
index aa8e0e4d4a..f6fc084891 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -437,16 +437,14 @@ static int find_renames(struct diff_score *mx, int 
dst_cnt, int minimum_score, i
return count;
 }
 
-static int handle_rename_ignores(struct diff_options *options)
+static void handle_rename_ignores(struct diff_options *options)
 {
-   int detect_rename = options->detect_rename;
struct string_list *ignores = options->ignore_for_renames;
-   int ignored = 0;
int i, j;
 
/* rename_ignores onlhy relevant when we're not detecting copies */
-   if (ignores == NULL || detect_rename == DIFF_DETECT_COPY)
-   return 0;
+   if (ignores == NULL)
+   return;
 
for (i = 0, j = 0; i < ignores->nr && j < rename_src_nr;) {
struct diff_filespec *one = rename_src[j].p->one;
@@ -464,11 +462,27 @@ static int handle_rename_ignores(struct diff_options 
*options)
j++;
else {
one->rename_used++;
-   ignored++;
+   i++;
+   j++;
}
}
+}
+
+static int remove_renames_from_src()
+{
+   int j, new_j;
+
+   for (j = 0, new_j = 0; j < rename_src_nr; j++) {
+   if (rename_src[j].p->one->rename_used)
+   continue;
+
+   if (new_j < j)
+   memcpy(_src[new_j], _src[j],
+  sizeof(struct diff_rename_src));
+   new_j++;
+   }
 
-   return ignored;
+   return new_j;
 }
 
 void diffcore_rename(struct diff_options *options)
@@ -479,7 +493,7 @@ void diffcore_rename(struct diff_options *options)
struct diff_queue_struct outq;
struct diff_score *mx;
int i, j, rename_count, skip_unmodified = 0;
-   int num_create, dst_cnt, num_src, ignore_count;
+   int num_create, dst_cnt, num_src;
struct progress *progress = NULL;
 
if (!minimum_score)
@@ -542,18 +556,19 @@ void diffcore_rename(struct diff_options *options)
 
/*
 * Mark source files as used if they are found in the
-* ignore_for_renames list.
+* ignore_for_renames list, and clean out files from rename_src
+* that we don't need to continue considering.
 */
-   ignore_count = handle_rename_ignores(options);
+   num_src = rename_src_nr;
+   if (detect_rename != DIFF_DETECT_COPY) {
+   handle_rename_ignores(options);
+   num_src = remove_renames_from_src();
+   }
 
/*
-* Calculate how many renames are left (but all the source
-* files still remain as options for rename/copies!)
+* Calculate how many renames are left
 */
num_create = (rename_dst_nr - rename_count);
-   num_src = (detect_rename == DIFF_DETECT_COPY ?
-  rename_src_nr : rename_src_nr - rename_count);
-   num_src -= ignore_count;
 
/* All done? */
if (!num_create)
@@ -588,7 +603,7 @@ void diffcore_rename(struct diff_options *options)
for (j = 0; j < NUM_CANDIDATE_PER_DST; j++)
m[j].dst = -1;
 
-   for (j = 0; j < rename_src_nr; j++) {
+   for (j = 0; j < num_src; j++) {
struct diff_filespec *one = rename_src[j].p->one;
struct diff_score this_src;
 
-- 
2.15.0.46.g41dca04efb



[RFC PATCH 1/9] diffcore-rename: No point trying to find a match better than exact

2017-11-10 Thread Elijah Newren
diffcore_rename() had some code to avoid having destination paths that
already had an exact rename detected from being re-checked for other
renames.  Source paths, however, were re-checked because we wanted to
allow the possibility of detecting copies.  But if copy detection isn't
turned on, then this merely amounts to attempting to find a
better-than-exact match, which naturally ends up being an expensive
no-op.  In particular, copy detection is never turned on by the merge
recursive machinery.

In a large repository (~50k files, about 60% of which was java) that had
a number of high level directories involved in renames, this cut the time
necessary for a cherry-pick down by about 50% (from around 9 minutes to
4.5 minutes).

Signed-off-by: Elijah Newren 
---
 diffcore-rename.c | 17 +++--
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/diffcore-rename.c b/diffcore-rename.c
index 6ba6157c61..c0517058b0 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -377,11 +377,10 @@ static void record_if_better(struct diff_score m[], 
struct diff_score *o)
  * 1 if we need to disable inexact rename detection;
  * 2 if we would be under the limit if we were given -C instead of -C -C.
  */
-static int too_many_rename_candidates(int num_create,
+static int too_many_rename_candidates(int num_create, int num_src,
  struct diff_options *options)
 {
int rename_limit = options->rename_limit;
-   int num_src = rename_src_nr;
int i;
 
options->needed_rename_limit = 0;
@@ -446,7 +445,7 @@ void diffcore_rename(struct diff_options *options)
struct diff_queue_struct outq;
struct diff_score *mx;
int i, j, rename_count, skip_unmodified = 0;
-   int num_create, dst_cnt;
+   int num_create, dst_cnt, num_src;
struct progress *progress = NULL;
 
if (!minimum_score)
@@ -512,12 +511,14 @@ void diffcore_rename(struct diff_options *options)
 * files still remain as options for rename/copies!)
 */
num_create = (rename_dst_nr - rename_count);
+   num_src = (detect_rename == DIFF_DETECT_COPY ?
+  rename_src_nr : rename_src_nr - rename_count);
 
/* All done? */
if (!num_create)
goto cleanup;
 
-   switch (too_many_rename_candidates(num_create, options)) {
+   switch (too_many_rename_candidates(num_create, num_src, options)) {
case 1:
goto cleanup;
case 2:
@@ -531,7 +532,7 @@ void diffcore_rename(struct diff_options *options)
if (options->show_rename_progress) {
progress = start_delayed_progress(
_("Performing inexact rename detection"),
-   (uint64_t)rename_dst_nr * 
(uint64_t)rename_src_nr);
+   (uint64_t)num_create * (uint64_t)num_src);
}
 
mx = xcalloc(st_mult(NUM_CANDIDATE_PER_DST, num_create), sizeof(*mx));
@@ -550,6 +551,10 @@ void diffcore_rename(struct diff_options *options)
struct diff_filespec *one = rename_src[j].p->one;
struct diff_score this_src;
 
+   if (one->rename_used &&
+   detect_rename != DIFF_DETECT_COPY)
+   continue;
+
if (skip_unmodified &&
diff_unmodified_pair(rename_src[j].p))
continue;
@@ -568,7 +573,7 @@ void diffcore_rename(struct diff_options *options)
diff_free_filespec_blob(two);
}
dst_cnt++;
-   display_progress(progress, 
(uint64_t)(i+1)*(uint64_t)rename_src_nr);
+   display_progress(progress, (uint64_t)dst_cnt*(uint64_t)num_src);
}
stop_progress();
 
-- 
2.15.0.46.g41dca04efb



[RFC PATCH 4/9] Add testcases for improved file collision conflict handling

2017-11-10 Thread Elijah Newren
Adds testcases dealing with file collisions for the following types of
conflicts:
  * add/add
  * rename/add
  * rename/rename(2to1)
These tests include expectations for new, smarter behavior provided by
handle_file_collision().  Since that function is not in use yet, the
tests are currently expected to fail.

Signed-off-by: Elijah Newren 
---
 t/t6036-recursive-corner-cases.sh|   8 +-
 t/t6042-merge-rename-corner-cases.sh | 210 +++
 2 files changed, 214 insertions(+), 4 deletions(-)

diff --git a/t/t6036-recursive-corner-cases.sh 
b/t/t6036-recursive-corner-cases.sh
index 18aa88b5c0..8485e04b9b 100755
--- a/t/t6036-recursive-corner-cases.sh
+++ b/t/t6036-recursive-corner-cases.sh
@@ -208,11 +208,11 @@ test_expect_success 'git detects differently handled 
merges conflict' '
git cat-file -p C:new_a >>merge-me &&
>empty &&
test_must_fail git merge-file \
-   -L "Temporary merge branch 2" \
-   -L "" \
-L "Temporary merge branch 1" \
-   merged empty merge-me &&
-   sed -e "s/^\([<=>]\)/\1\1\1/" merged >merged-internal &&
+   -L "" \
+   -L "Temporary merge branch 2" \
+   merge-me empty merged &&
+   sed -e "s/^\([<=>]\)/\1\1\1/" merge-me >merged-internal &&
test $(git rev-parse :1:new_a) = $(git hash-object merged-internal)
 '
 
diff --git a/t/t6042-merge-rename-corner-cases.sh 
b/t/t6042-merge-rename-corner-cases.sh
index 411550d2b6..d8fe797f0d 100755
--- a/t/t6042-merge-rename-corner-cases.sh
+++ b/t/t6042-merge-rename-corner-cases.sh
@@ -575,4 +575,214 @@ test_expect_success 'rename/rename/add-dest merge still 
knows about conflicting
test ! -f c
 '
 
+test_conflicts_with_adds_and_renames() {
+   test $1 != 0 && side1=rename || side1=add
+   test $2 != 0 && side2=rename || side2=add
+
+   # Setup:
+   #  L
+   # / \
+   #   master   ?
+   # \ /
+   #  R
+   #
+   # Where:
+   #   Both L and R have files named 'three-unrelated' and
+   #   'three-related' which collide.  Each of the colliding files
+   #   could have been involved in a rename, in which case there was a
+   #   file named 'one-[un]related' or 'two-[un]related' that was
+   #   modified on the opposite side of history and renamed into the
+   #   collision on this side of history.
+   #
+   # Questions for both three-unrelated and three-related:
+   #   1) The index should contain both a stage 2 and stage 3 entry
+   #  for the colliding file.  Does it?
+   #   2) When renames are involved, the content merges are clean, so
+   #  the index should reflect the content merges, not merely the
+   #  version of the colliding file from the prior commit.  Does
+   #  it?
+   #
+   # Questions for three-unrelated:
+   #   3) There should be files in the worktree named
+   #  'three-unrelated~HEAD' and 'three-unrelated~R^0' with the
+   #  (content-merged) version of 'three-unrelated' from the
+   #  appropriate side of the merge.  Are they present?
+   #   4) There should be no file named 'three-unrelated' in the
+   #  working tree.  That'd make it too likely that users would
+   #  use it instead of carefully looking at both
+   #  three-unrelated~HEAD and three-unrelated~R^0.  Is it
+   #  correctly missing?
+   #
+   # Questions for three-related:
+   #   3) There should be a file in the worktree named three-related
+   #  containing the two-way merged contents of the content-merged
+   #  versions of three-related from each of the two colliding
+   #  files.  Is it present?
+   #   4) There should not be any three-related~* files in the working
+   #  tree.
+   test_expect_success "setup simple $side1/$side2 conflict" '
+   git rm -rf . &&
+   git clean -fdqx &&
+   rm -rf .git &&
+   git init &&
+
+   # Create a simple file with 10 lines
+   ten="0 1 2 3 4 5 6 7 8 9" &&
+   for i in $ten
+   do
+   echo line $i in a sample file
+   done >unrelated1_v1 &&
+   # Create a second version of same file with one more line
+   cat unrelated1_v1 >unrelated1_v2 &&
+   echo another line >>unrelated1_v2 &&
+
+   # Create an unrelated simple file with 10 lines
+   for i in $ten
+   do
+   echo line $i in another sample file
+   done >unrelated2_v1 &&
+   # Create a second version of same file with one more line
+   cat unrelated2_v1 >unrelated2_v2 &&
+   echo another line >>unrelated2_v2 &&
+
+   # Create some related files now
+  

[RFC PATCH 3/9] merge-recursive: New function for better colliding conflict resolutions

2017-11-10 Thread Elijah Newren
There are three conflict types that represent two (possibly entirely
unrelated) files colliding at the same location:
  * add/add
  * rename/add
  * rename/rename(2to1)

These three conflict types already share more similarity than might be
immediately apparent from their description: (1) the handling of the
rename variants already involves removing any entries from the index
corresponding to the original file names[*], thus only leaving entries
in the index for the colliding path; (2) likewise, any trace of the
original file name in the working tree is also removed.  So, in all
three cases we're left with how to represent two colliding files in both
the index and the working copy.

[*] Technically, this isn't quite true because rename/rename(2to1)
conflicts in the recursive (o->call_depth > 0) case do an "unrename"
since about seven years ago.  But even in that case, Junio felt
compelled to explain that my decision to "unrename" wasn't necessarily
the only or right answer -- search for "Comment from Junio" in t6036 for
details.

My initial motivation for looking at these three conflict types was that
if the handling of these three conflict types is the "same" (defined
more precisely in a later commit), or is at least the "same" in the
limited set of cases where a renamed file is unmodified on the side of
history where the file is not renamed, then a significant performance
improvement for rename detection during merges is possible.  However,
while that served as motivation to look at these three types of
conflicts, the actual goal of this new function is to try to improve the
handling for all three cases, not to merely make them the same.

The previous behavior for these conflict types in regards to the
working tree (assuming the file collision occurs at 'foo') was:
  * add/add does a two-way merge of the two files and records it as 'foo'.
  * rename/rename(2to1) records the two different files into two new
uniquely named files (foo~HEAD and foo~$MERGE), while removing 'foo'
from the working tree.
  * rename/add records the two different files into two different
locations, recording the add at foo~$SIDE and, oddly, recording
the rename at foo (why is the rename more important than the add?)

So, the biggest question is whether the two colliding files should be
two-way merged and recorded in place, or recorded into separate files.
If the files are similar enough, the two-way merge is probably
preferable, but if they're not similar, recording as separate files is
probably similar.  (The same logic that applies for the working
directory here would also apply to the recursive (i.e. o->call_depth >
0) case as well.)  The code handling the different types of conflicts
appear to have been written with different assumptions about whether the
colliding files would be similar.

But, rather than make an assumption about whether the two files will be
similar, why not just check?  If we simply call estimate_similarity(),
we can two-way merge the files if they are similar, and otherwise record
the two files at different locations.

Checking similarity in order to optimize worktree handling is the
primary improvement for these three conflict types.  Further
improvements will be discussed in subsequent commits that modify the
code to take advantage of this new shared function.

Signed-off-by: Elijah Newren 
---
Besides the use-similarity-to-determine-two-way-vs-record-two-separate
choice, I am curious if anyone else has problems with the "unrename"
behavior for the recursive case (which I'd guess never actually comes up
in practice and only cursorily appears in the testsuite because I put it
there years ago).  If anyone does, I may have a useful testcase I could
bring up for discussion on the list.  Given that Junio disagreed with
some of my reasoning on the one testcase in the testsuite that cursorily
touches this, I'm not sure it's even clear what "correct"/"optimal"
behavior for those cases is.

 diff.h|   4 ++
 diffcore-rename.c |   6 +--
 merge-recursive.c | 118 ++
 3 files changed, 125 insertions(+), 3 deletions(-)

diff --git a/diff.h b/diff.h
index aca150ba2e..adf7e92eb5 100644
--- a/diff.h
+++ b/diff.h
@@ -427,4 +427,8 @@ extern void print_stat_summary(FILE *fp, int files,
   int insertions, int deletions);
 extern void setup_diff_pager(struct diff_options *);
 
+extern int estimate_similarity(struct diff_filespec *src,
+  struct diff_filespec *dst,
+  int minimum_score);
+
 #endif /* DIFF_H */
diff --git a/diffcore-rename.c b/diffcore-rename.c
index c0517058b0..b15d9d74ef 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -127,9 +127,9 @@ struct diff_score {
short name_score;
 };
 
-static int estimate_similarity(struct diff_filespec *src,
-  struct diff_filespec *dst,
-

[RFC PATCH 0/9] Improve rename detection performance in merge recursive

2017-11-10 Thread Elijah Newren
This series depends on BOTH my rename-limits and directory-detection
series
(https://public-inbox.org/git/20171110173956.25105-1-new...@gmail.com/
and
https://public-inbox.org/git/20171110190550.27059-1-new...@gmail.com/).
This series could be modified to be submitted with no dependencies on
those series, but it'd end up causing lots of merging conflicts, and
having this series depend on the other two seemed most logical to me.

This patch series tries to improve performance rename detection
performance in merge recursive where possible.  In particular, I was
guided by a specific usecase of cherry-picking a small change (by
which I mean 7 files with small modifications and one new file) in a
repo that has thousands of renames, due to some high-level directories
having been renamed.  Some of the changes should help rename detection
performance in general, but the greatest benefit will be found when
one side of history only makes a small number of changes.  For my
usecase, I dropped the time needed for the cherry-pick from over 9
minutes, down to about 16 seconds.

RFC because:

  * I believe the patch with the biggest performance improvement will
break directory rename detection in specific, limited cases (which
are not yet represented in the testsuite).  Should be fixable; I
just need to implement the fix.  (The fix may reduce the
performance improvement slightly).

  * This series includes changes to conflict handling for conflict
types that involve two colliding files.  I think the new behavior
is strictly better, but this is the kind of thing I really need to
make sure others agree with; comments very welcome.  (Patches 2--6)

  * 16 seconds is still annoyingly slow; we should be able to do better.
I have one or two ideas.  But since others are asking about the
performance of small cherry-picks in large repos with lots of renames,
I figure it's worth posting what I have so far.

Elijah Newren (9):
  diffcore-rename: No point trying to find a match better than exact
  merge-recursive: Avoid unnecessary string list lookups
  merge-recursive: New function for better colliding conflict
resolutions
  Add testcases for improved file collision conflict handling
  merge-recursive: Fix rename/add conflict handling
  merge-recursive: Improve handling for rename/rename(2to1) conflicts
  merge-recursive: Improve handling for add/add conflicts
  merge-recursive: Accelerate rename detection
  diffcore-rename: Filter rename_src list when possible

 diff.c   |   1 +
 diff.h   |   7 +
 diffcore-rename.c|  85 ++-
 merge-recursive.c| 452 ---
 t/t2023-checkout-m.sh|   2 +-
 t/t3418-rebase-continue.sh   |  27 ++-
 t/t3504-cherry-pick-rerere.sh|  19 +-
 t/t4200-rerere.sh|  12 +-
 t/t6020-merge-df.sh  |   4 +-
 t/t6024-recursive-merge.sh   |  35 +--
 t/t6025-merge-symlinks.sh|   9 +-
 t/t6031-merge-filemode.sh|   4 +-
 t/t6036-recursive-corner-cases.sh|  19 +-
 t/t6042-merge-rename-corner-cases.sh | 212 +++-
 t/t6043-merge-rename-directories.sh  |  13 +-
 t/t7060-wtstatus.sh  |   1 +
 t/t7064-wtstatus-pv2.sh  |   4 +-
 t/t7506-status-submodule.sh  |  11 +-
 t/t7610-mergetool.sh |  28 +--
 19 files changed, 722 insertions(+), 223 deletions(-)

-- 
2.15.0.46.g41dca04efb



[RFC PATCH 2/9] merge-recursive: Avoid unnecessary string list lookups

2017-11-10 Thread Elijah Newren
Since we're taking entries from active_cache, which is already in sorted
order with same-named entries adjacent, we can skip a lookup.  Also, we can
just use append instead of insert (avoiding the need to find where to put
the new item) and still end up with a sorted list.

Signed-off-by: Elijah Newren 
---
Assumed negligible performance change; I didn't even bother measuring it.
But I just happened to be looking around at this code while trying to figure
out some of the performance and figured it was at least speeding it up a
tiny bit.

 merge-recursive.c | 14 ++
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/merge-recursive.c b/merge-recursive.c
index 6ef1d52f0a..d54466649e 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -451,22 +451,28 @@ static struct stage_data *insert_stage_data(const char 
*path,
 static struct string_list *get_unmerged(void)
 {
struct string_list *unmerged = xcalloc(1, sizeof(struct string_list));
+   struct string_list_item *item;
+   const char *last = NULL;
int i;
 
unmerged->strdup_strings = 1;
 
for (i = 0; i < active_nr; i++) {
-   struct string_list_item *item;
struct stage_data *e;
const struct cache_entry *ce = active_cache[i];
if (!ce_stage(ce))
continue;
 
-   item = string_list_lookup(unmerged, ce->name);
-   if (!item) {
-   item = string_list_insert(unmerged, ce->name);
+   if (last == NULL || strcmp(last, ce->name)) {
+   /*
+* active_cache is in sorted order, so we can just call
+* string_list_append instead of string_list_insert and
+* still end up with a sorted list.
+*/
+   item = string_list_append(unmerged, ce->name);
item->util = xcalloc(1, sizeof(struct stage_data));
}
+   last = ce->name;
e = item->util;
e->stages[ce_stage(ce)].mode = ce->ce_mode;
oidcpy(>stages[ce_stage(ce)].oid, >oid);
-- 
2.15.0.46.g41dca04efb



Re: [PATCH] bisect: mention "view" as an alternative to "visualize"

2017-11-10 Thread Eric Sunshine
On Fri, Nov 10, 2017 at 4:13 PM, Robert P. J. Day  wrote:
> On Fri, 10 Nov 2017, Eric Sunshine wrote:
>
>> Thanks for the patch. Some comments below...
>>
>> On Fri, Nov 10, 2017 at 11:32 AM, Robert P. J. Day
>>  wrote:
>> > Tweak a number of files to mention "view" as an alternative to
>> > "visualize":
>>
>> You probably want to end this sentence with a period, not a colon.
>
>   not sure about that, what's the standard when you're introducing a
> code snippet?

It wasn't clear that you were including a snippet since it's not
common practice to include the diffstat in the commit message on this
project...

>> >  Documentation/git-bisect.txt   | 9 -
>> >  Documentation/user-manual.txt  | 3 ++-
>> >  builtin/bisect--helper.c   | 2 +-
>> >  contrib/completion/git-completion.bash | 2 +-
>> >  git-bisect.sh  | 4 ++--
>> >  5 files changed, 10 insertions(+), 10 deletions(-)
>>
>> The diffstat belongs below the "---" separator, otherwise this text
>> will (undesirably) become part of the commit message proper.
>
>   no, i actually want that as part of the commit message. my standard
> is, if there are 5 or more files that get changed, i like to include a
> diff --stat in the commit message so people viewing the log can get a
> quick idea of how much has changed. if that's not desired here, i can
> remove it.

The same information is available to anyone interested in it simply by
asking for it:

git log --stat ...

More generally, commit messages should contain the really important
information you want to convey to the reader which _isn't_ available
some other way (by, for instance, taking advantage of the tool itself
-- such as the above --stat example).

On this project, the diffstat is never included as part of the commit
message, and it's likely that the patch won't be accepted by Junio
like that (or he'll just edit it out if he does accept it).

Thanks.


Re: [PATCH] bisect: mention "view" as an alternative to "visualize"

2017-11-10 Thread Robert P. J. Day
On Fri, 10 Nov 2017, Eric Sunshine wrote:

> Thanks for the patch. Some comments below...
>
> On Fri, Nov 10, 2017 at 11:32 AM, Robert P. J. Day
>  wrote:
> > Tweak a number of files to mention "view" as an alternative to
> > "visualize":
>
> You probably want to end this sentence with a period, not a colon.

  not sure about that, what's the standard when you're introducing a
code snippet?

> >  Documentation/git-bisect.txt   | 9 -
> >  Documentation/user-manual.txt  | 3 ++-
> >  builtin/bisect--helper.c   | 2 +-
> >  contrib/completion/git-completion.bash | 2 +-
> >  git-bisect.sh  | 4 ++--
> >  5 files changed, 10 insertions(+), 10 deletions(-)
>
> The diffstat belongs below the "---" separator, otherwise this text
> will (undesirably) become part of the commit message proper.

  no, i actually want that as part of the commit message. my standard
is, if there are 5 or more files that get changed, i like to include a
diff --stat in the commit message so people viewing the log can get a
quick idea of how much has changed. if that's not desired here, i can
remove it.

> > + git bisect visualize|view
>
> I think you need parentheses around these terms (see "git bisect
> skip", for example):
>
> git bisect (visualize | view)

  ah, quite so.

> However, in this case, it might be easier for readers if each is
> presented on its own line (and subsequent discussion can make it clear
> that they are synonyms).
>
> git bisect visualize
> git bisect view
>
> But, that's a matter of taste...

  given the precedent already established:

   git bisect (bad|new|) []
   git bisect (good|old|) [...]

i'll go with the parentheses and no intervening spaces. i'll let that
posting simmer for a bit longer before posting "v2".

rday

-- 


Robert P. J. Day Ottawa, Ontario, CANADA
http://crashcourse.ca

Twitter:   http://twitter.com/rpjday
LinkedIn:   http://ca.linkedin.com/in/rpjday



Re: [PATCH] bisect: mention "view" as an alternative to "visualize"

2017-11-10 Thread Eric Sunshine
Thanks for the patch. Some comments below...

On Fri, Nov 10, 2017 at 11:32 AM, Robert P. J. Day
 wrote:
> Tweak a number of files to mention "view" as an alternative to
> "visualize":

You probably want to end this sentence with a period, not a colon.

>  Documentation/git-bisect.txt   | 9 -
>  Documentation/user-manual.txt  | 3 ++-
>  builtin/bisect--helper.c   | 2 +-
>  contrib/completion/git-completion.bash | 2 +-
>  git-bisect.sh  | 4 ++--
>  5 files changed, 10 insertions(+), 10 deletions(-)

The diffstat belongs below the "---" separator, otherwise this text
will (undesirably) become part of the commit message proper.

> Signed-off-by: Robert P. J. Day 
>
> ---
>
>   here's hoping i have the right format for this patch ... are there
> any parts of this that are inappropriate, such as extending the bash
> completion?

This is the correct place for your commentary. The diffstat should
appear below your commentary.

> diff --git a/Documentation/git-bisect.txt b/Documentation/git-bisect.txt
> @@ -23,7 +23,7 @@ on the subcommand:
>   git bisect terms [--term-good | --term-bad]
>   git bisect skip [(|)...]
>   git bisect reset []
> - git bisect visualize
> + git bisect visualize|view

I think you need parentheses around these terms (see "git bisect
skip", for example):

git bisect (visualize | view)

However, in this case, it might be easier for readers if each is
presented on its own line (and subsequent discussion can make it clear
that they are synonyms).

git bisect visualize
git bisect view

But, that's a matter of taste...

> @@ -196,15 +196,14 @@ of `git bisect good` and `git bisect bad` to mark 
> commits.
>  Bisect visualize
>  
>
> -To see the currently remaining suspects in 'gitk', issue the following
> -command during the bisection process:
> +To see the currently remaining suspects in 'gitk', issue either of the
> +following equivalent commands during the bisection process:
>
>  
>  $ git bisect visualize
> +$ git bisect view
>  
>
> -`view` may also be used as a synonym for `visualize`.

Honestly, I think the original was clearer and placed a bit less
cognitive load on the reader. Moreover, for someone scanning the
documentation without reading it too deeply, the revised example makes
it seem as if it is necessary to invoke both commands rather than one
or the other.

> diff --git a/Documentation/user-manual.txt b/Documentation/user-manual.txt
> @@ -538,10 +538,11 @@ Note that the version which `git bisect` checks out for 
> you at each
>  point is just a suggestion, and you're free to try a different
>  version if you think it would be a good idea.  For example,
>  occasionally you may land on a commit that broke something unrelated;
> -run
> +run either of the equivalent commands
>
>  -
>  $ git bisect visualize
> +$ git bisect view
>  -

Same observation as above. This has the potential to confuse someone
quickly scanning the documentation into thinking that both commands
must be invoked. Merely stating in prose that one is the alias of the
other might be better.

>  which will run gitk and label the commit it chose with a marker that
> diff --git a/contrib/completion/git-completion.bash 
> b/contrib/completion/git-completion.bash
> index fdd984d34..52f68c922 100644
> --- a/contrib/completion/git-completion.bash
> +++ b/contrib/completion/git-completion.bash
> @@ -1162,7 +1162,7 @@ _git_bisect ()
>  {
> __git_has_doubledash && return
>
> -   local subcommands="start bad good skip reset visualize replay log run"
> +   local subcommands="start bad good skip reset visualize view replay 
> log run"

People using muscle memory to type "git bisect v"  or
"...vi" might find it annoying for this to suddenly become
ambiguous. Just an observation; no strong opinion on it...

> diff --git a/git-bisect.sh b/git-bisect.sh
> @@ -20,7 +20,7 @@ git bisect next
> find next bisection to test and check it out.
>  git bisect reset []
> finish bisection search and go back to commit.
> -git bisect visualize
> +git bisect visualize|view
> show bisect status in gitk.

Again, this might be easier to read if split over two lines:

git bisect visualize
git bisect view
show bisect status in gitk.

in which case it's plenty clear that the commands are synonyms.


[PATCH v1] fsmonitor: simplify determining the git worktree under Windows

2017-11-10 Thread Ben Peart
I haven't tested the non Windows paths but the patch looks reasonable.

This inspired me to get someone more familiar with perl (thanks Johannes)
to revisit this code for the Windows side as well.  The logic for
determining the git worktree when running on Windows is more complex
than necessary.  It also spawns multiple processes (uname and cygpath)
which slows things down.

Simplify and speed up the process of finding the git worktree when
running on Windows by keeping it in perl and avoiding spawning helper
processes.

Signed-off-by: Ben Peart 
Signed-off-by: Johannes Schindelin 
---

Notes:
Base Ref:
Web-Diff: https://github.com/benpeart/git/commit/20affe124b
Checkout: git fetch https://github.com/benpeart/git fsmonitor_splitindex-v1 
&& git checkout 20affe124b

 t/t7519/fsmonitor-watchman | 13 +++--
 templates/hooks--fsmonitor-watchman.sample | 13 +++--
 2 files changed, 6 insertions(+), 20 deletions(-)

diff --git a/t/t7519/fsmonitor-watchman b/t/t7519/fsmonitor-watchman
index 5fe72cefaf..5514edcf68 100755
--- a/t/t7519/fsmonitor-watchman
+++ b/t/t7519/fsmonitor-watchman
@@ -29,17 +29,10 @@ if ($version == 1) {
"Falling back to scanning...\n";
 }
 
-# Convert unix style paths to escaped Windows style paths when running
-# in Windows command prompt
-
-my $system = `uname -s`;
-$system =~ s/[\r\n]+//g;
 my $git_work_tree;
-
-if ($system =~ m/^MSYS_NT/ || $system =~ m/^MINGW/) {
-   $git_work_tree = `cygpath -aw "\$PWD"`;
-   $git_work_tree =~ s/[\r\n]+//g;
-   $git_work_tree =~ s,\\,/,g;
+if ($^O =~ 'msys' || $^O =~ 'cygwin') {
+   $git_work_tree = Win32::GetCwd();
+   $git_work_tree =~ tr/\\/\//;
 } else {
require Cwd;
$git_work_tree = Cwd::cwd();
diff --git a/templates/hooks--fsmonitor-watchman.sample 
b/templates/hooks--fsmonitor-watchman.sample
index ba6d88c5f8..e673bb3980 100755
--- a/templates/hooks--fsmonitor-watchman.sample
+++ b/templates/hooks--fsmonitor-watchman.sample
@@ -28,17 +28,10 @@ if ($version == 1) {
"Falling back to scanning...\n";
 }
 
-# Convert unix style paths to escaped Windows style paths when running
-# in Windows command prompt
-
-my $system = `uname -s`;
-$system =~ s/[\r\n]+//g;
 my $git_work_tree;
-
-if ($system =~ m/^MSYS_NT/ || $system =~ m/^MINGW/) {
-   $git_work_tree = `cygpath -aw "\$PWD"`;
-   $git_work_tree =~ s/[\r\n]+//g;
-   $git_work_tree =~ s,\\,/,g;
+if ($^O =~ 'msys' || $^O =~ 'cygwin') {
+   $git_work_tree = Win32::GetCwd();
+   $git_work_tree =~ tr/\\/\//;
 } else {
require Cwd;
$git_work_tree = Cwd::cwd();

base-commit: f9d9e50b62094689773dccc5f9493fa15e30d592
-- 
2.15.0.windows.1



Re: [RFC] protocol version 2

2017-11-10 Thread Jonathan Tan
On Fri, 20 Oct 2017 10:18:39 -0700
Brandon Williams  wrote:

> Some of the pain points with the current protocol spec are:

After some in-office discussion, I think that the most important pain
point is that we have to implement each protocol twice: once for
HTTP(S), and once for SSH (and friends) that support bidirectional byte
streams.

If it weren't for this, I think that what is discussed in this document
(e.g. ls-refs, fetch-object) can be less invasively accomplished with
v1, specifying "extra parameters" (explained in this e-mail [1]) to
merely tweak the output of upload-pack instead of replacing it nearly
completely, thus acting more as optimizations than changing the mode of
operation entirely.

[1] 
https://public-inbox.org/git/20171010193956.168385-1-jonathanta...@google.com/

>   * The server's initial response is the ref advertisement.  This
> advertisement cannot be omitted and can become an issue due to the
> sheer number of refs that can be sent with large repositories.  For
> example, when contacting the internal equivalent of
> `https://android.googlesource.com/`, the server will send
> approximately 1 million refs totaling 71MB.  This is data that is
> sent during each and every fetch and is not scalable.

For me, this is not a compelling one, because we can provide a ref
whitelist as an "extra parameter" in v1.

>   * Capabilities were implemented as a hack and are hidden behind a NUL
> byte after the first ref sent from the server during the ref
> advertisement:
> 
>\0  
> 
> Since they are sent in the context of a pkt-line they are also subject
> to the same length limitations (1k bytes with old clients).  While we
> may not be close to hitting this limitation with capabilities alone, it
> has become a problem when trying to abuse capabilities for other
> purposes (e.g. 
> [symrefs](https://public-inbox.org/git/20160816161838.klvjhhoxsftvkfmd@x/)).
> 
>   * Various other technical debt (e.g. abusing capabilities to
> communicate agent and symref data, service name set using a query
> parameter).

I think these 2 are the same - I would emphasize the fact that we cannot
add more stuff here, rather than the fact that we're putting this behind
NUL.

>  Special Packets
> -
> 
> In protocol v2 these special packets will have the following semantics:
> 
>   * '' Flush Packet (flush-pkt) - indicates the end of a message
>   * '0001' End-of-List delimiter (delim-pkt) - indicates the end of a list

To address the pain point of HTTP(S) being different from the others
(mentioned above), I think the packet semantics should be further
qualified:

 - Communications must be divided up into packets terminated by a
   flush-pkt. Also, each side must be implemented without knowing
   whether packets-in-progress can or cannot be seen by the other side.
 - Each request packet must have a corresponding, possibly empty,
   response packet.
 - A request packet may be sent even if a response packet corresponding
   to a previously sent request packet is awaited. (This allows us to
   retain the existing optimization in fetch-pack wherein, during
   negotiation, the "have" request-response packet pairs are
   interleaved.)

This will allow us to more easily share code between HTTP(S) and the
others.

In summary, I think that we need a big motivation to make the jump from
v1 to v2, instead of merely making small changes to v1 (and I do think
that the proposed new commands, such as "ls-refs" and "fetch-object",
can be implemented merely by small changes). And I think that the
ability to better share code between HTTP(S) and others provides that
motivation.


Re: [RFD] Long term plan with submodule refs?

2017-11-10 Thread Stefan Beller
>
>>> Basically, a workflow where it's easier to have each submodule checked
>>> out at master, and we can still keep track of historical relationship
>>> of what commit was the submodule at some time ago, but without causing
>>> some of these headaches.
>>
>> So essentially a repo or otherwise parallel workflow just with the versioning
>> happening magically behind your back?
>
> Ideally, my developers would like to just have each submodule checked
> out at master.
>
> Ideally, I'd like to be able to checkout an old version of the parent
> project and have it recorded what version of the shared submodule was
> at at the time.

This sounds as if a "passive superproject" would work best for you, i.e.
each commit in a submodule is bubbled up into the superproject,
making a commit potentially even behind the scenes, such that the
user interaction with the superproject would be none.

However this approach also sounds careless, as there is no precondition
that e.g. the superproject builds with all the submodules as is; it is a mere
tracking of "at this time we have the submodules arranged as such",
whereas for the versioning aspect, you would want to have commit messages
in the superproject saying *why* you bumped up a specific submodule.
The user may not like to give such an explanation as they already wrote
a commit message for the individual project.

Also this approach sounds like a local approach, as it is not clear to me,
why you'd want to share the superproject history.

> Ideally, my developers don't want to have to worry about knowing that
> they shouldn't "git add -a" or "git commit -a" when they have a
> submodule checked out at a different location from the parent projects
> gitlink.
>
> Thanks,
> Jake
>


Re: [PATCH v2 0/9] sequencer: dont't fork git commit

2017-11-10 Thread Junio C Hamano
Phillip Wood  writes:

> Here's the summary from the previous version
> These patches teach the sequencer to create commits without forking
> git commit when the commit message does not need to be edited. This
> speeds up cherry picking 10 commits by 26% and picking 10 commits with
> rebase --continue by 44%. The first few patches move bits of
> builtin/commit.c to sequencer.c. The last two patches actually
> implement creating commits in sequencer.c.

Thanks.  The changes since the initial iteration seems quite small
and I didn't find much objectionable.

Here are some style fixes I needed to add on top to make the output
of "diff master HEAD" checkpatch.pl-clean.  I think 3/9 and 9/9 are
the culprits.

diff --git a/sequencer.c b/sequencer.c
index 1f65e82696..a989588ee5 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -592,7 +592,7 @@ static int read_env_script(struct argv_array *env)
return 0;
 }
 
-static char *get_author(const char* message)
+static char *get_author(const char *message)
 {
size_t len;
const char *a;
@@ -1104,7 +1104,7 @@ static int try_to_commit(struct strbuf *msg, const char 
*author,
}
 
if (update_head_with_reflog(current_head, oid,
-   getenv("GIT_REFLOG_ACTION"), msg, )){
+   getenv("GIT_REFLOG_ACTION"), msg, )) {
res = error("%s", err.buf);
goto out;
}
@@ -1121,7 +1121,7 @@ static int try_to_commit(struct strbuf *msg, const char 
*author,
return res;
 }
 
-static int do_commit(const char *msg_file, const char* author,
+static int do_commit(const char *msg_file, const char *author,
 struct replay_opts *opts, unsigned int flags)
 {
int res = 1;
@@ -1521,7 +1521,7 @@ static int do_pick_commit(enum todo_command command, 
struct commit *commit,
strbuf_addstr(, oid_to_hex(>object.oid));
strbuf_addstr(, ")\n");
}
-   if (!is_fixup (command))
+   if (!is_fixup(command))
author = get_author(msg.message);
}
 
diff --git a/sequencer.h b/sequencer.h
index 27f34be400..e0be354301 100644
--- a/sequencer.h
+++ b/sequencer.h
@@ -72,7 +72,7 @@ int template_untouched(const struct strbuf *sb, const char 
*template_file,
   enum commit_msg_cleanup_mode cleanup_mode);
 int update_head_with_reflog(const struct commit *old_head,
const struct object_id *new_head,
-   const char* action, const struct strbuf *msg,
+   const char *action, const struct strbuf *msg,
struct strbuf *err);
 void commit_post_rewrite(const struct commit *current_head,
 const struct object_id *new_head);




[PATCH 21/30] merge-recursive: Add get_directory_renames()

2017-11-10 Thread Elijah Newren
This populates a list of directory renames for us.  The list of
directory renames is not yet used, but will be in subsequent commits.

Signed-off-by: Elijah Newren 
---
 merge-recursive.c | 146 ++
 1 file changed, 146 insertions(+)

diff --git a/merge-recursive.c b/merge-recursive.c
index 89a9b32635..b5770d3d7f 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -1376,6 +1376,124 @@ static struct diff_queue_struct *get_diffpairs(struct 
merge_options *o,
return ret;
 }
 
+static void get_renamed_dir_portion(const char *old_path, const char *new_path,
+   char **old_dir, char **new_dir) {
+   *old_dir = NULL;
+   *new_dir = NULL;
+
+   /* For
+*"a/b/c/d/foo.c" -> "a/b/something-else/d/foo.c"
+* the "d/foo.c" part is the same, we just want to know that
+*"a/b/c" was renamed to "a/b/something-else"
+* so, for this example, this function returns "a/b/c" in
+* *old_dir and "a/b/something-else" in *new_dir.
+*
+* Also, if the basename of the file changed, we don't care.  We
+* want to know which portion of the directory, if any, changed.
+*/
+   char *end_of_old = strrchr(old_path, '/');
+   char *end_of_new = strrchr(new_path, '/');
+   if (end_of_old == NULL || end_of_new == NULL)
+   return;
+   while (*--end_of_new == *--end_of_old &&
+  end_of_old != old_path &&
+  end_of_new != new_path)
+   ; // Do nothing; all in the while loop
+   /*
+* We've found the first non-matching character in the directory
+* paths.  That means the current directory we were comparing
+* represents the rename.  Move end_of_old and end_of_new back
+* to the full directory name.
+*/
+   if (*end_of_old == '/')
+   end_of_old++;
+   if (*end_of_old != '/')
+   end_of_new++;
+   end_of_old = strchr(end_of_old, '/');
+   end_of_new = strchr(end_of_new, '/');
+
+   /*
+* It may have been the case that old_path and new_path were the same
+* directory all along.  Don't claim a rename if they're the same.
+*/
+   int old_len = end_of_old - old_path;
+   int new_len = end_of_new - new_path;
+
+   if (old_len != new_len || strncmp(old_path, new_path, old_len)) {
+   *old_dir = strndup(old_path, old_len);
+   *new_dir = strndup(new_path, new_len);
+   }
+}
+
+static struct hashmap *get_directory_renames(struct diff_queue_struct *pairs,
+struct tree *tree) {
+   struct hashmap *dir_renames;
+   struct hashmap_iter iter;
+   struct dir_rename_entry *entry;
+   int i;
+
+   dir_renames = malloc(sizeof(struct hashmap));
+   dir_rename_init(dir_renames);
+   for (i = 0; i < pairs->nr; ++i) {
+   struct string_list_item *item;
+   int *count;
+   struct diff_filepair *pair = pairs->queue[i];
+
+   char *old_dir, *new_dir;
+   get_renamed_dir_portion(pair->one->path, pair->two->path,
+   _dir,_dir);
+   if (!old_dir)
+   // Directory didn't change at all; ignore this one.
+   continue;
+
+   entry = dir_rename_find_entry(dir_renames, old_dir);
+   if (!entry) {
+   entry = xcalloc(1, sizeof(struct dir_rename_entry));
+   hashmap_entry_init(entry, strhash(old_dir));
+   hashmap_put(dir_renames, entry);
+   entry->dir = old_dir;
+   } else {
+   free(old_dir);
+   }
+   item = string_list_lookup(>possible_new_dirs, new_dir);
+   if (!item) {
+   item = string_list_insert(>possible_new_dirs, 
new_dir);
+   item->util = xcalloc(1, sizeof(int));
+   } else {
+   free(new_dir);
+   }
+   count = item->util;
+   *count += 1;
+   }
+
+   hashmap_iter_init(dir_renames, );
+   while ((entry = hashmap_iter_next())) {
+   int max = 0;
+   int bad_max = 0;
+   char *best = NULL;
+   for (i = 0; i < entry->possible_new_dirs.nr; i++) {
+   int *count = entry->possible_new_dirs.items[i].util;
+   if (*count == max)
+   bad_max = max;
+   else if (*count > max) {
+   max = *count;
+   best = entry->possible_new_dirs.items[i].string;
+   }
+   }
+   if (bad_max == max)
+   

[PATCH 13/30] directory rename detection: tests for handling overwriting untracked files

2017-11-10 Thread Elijah Newren
Signed-off-by: Elijah Newren 
---
 t/t6043-merge-rename-directories.sh | 314 
 1 file changed, 314 insertions(+)

diff --git a/t/t6043-merge-rename-directories.sh 
b/t/t6043-merge-rename-directories.sh
index bb179b16c8..7af8962512 100755
--- a/t/t6043-merge-rename-directories.sh
+++ b/t/t6043-merge-rename-directories.sh
@@ -2559,4 +2559,318 @@ test_expect_failure '9g-check: Renamed directory that 
only contained immediate s
 #   side of history for any implicit directory renames.
 ###
 
+###
+# SECTION 10: Handling untracked files
+#
+# unpack_trees(), upon which the recursive merge algorithm is based, aborts
+# the operation if untracked or dirty files would be deleted or overwritten
+# by the merge.  Unfortunately, unpack_trees() does not understand renames,
+# and if it doesn't abort, then it muddies up the working directory before
+# we even get to the point of detecting renames, so we need some special
+# handling, at least in the case of directory renames.
+###
+
+# Testcase 10a, Overwrite untracked: normal rename/delete
+#   Commit A: z/{b,c_1}
+#   Commit B: z/b + untracked z/c + untracked z/d
+#   Commit C: z/{b,d_1}
+#   Expected: Aborted Merge +
+#   ERROR_MSG(untracked working tree files would be overwritten by merge)
+
+test_expect_success '10a-setup: Overwrite untracked with normal rename/delete' 
'
+   git rm -rf . &&
+   git clean -fdqx &&
+   rm -rf .git &&
+   git init &&
+
+   mkdir z &&
+   echo b >z/b &&
+   echo c >z/c &&
+   git add z &&
+   test_tick &&
+   git commit -m "A" &&
+
+   git branch A &&
+   git branch B &&
+   git branch C &&
+
+   git checkout B &&
+   git rm z/c &&
+   test_tick &&
+   git commit -m "B" &&
+
+   git checkout C &&
+   git mv z/c z/d &&
+   test_tick &&
+   git commit -m "C"
+'
+
+test_expect_success '10a-check: Overwrite untracked with normal rename/delete' 
'
+   git checkout B^0 &&
+   echo very >z/c &&
+   echo important >z/d &&
+
+   test_must_fail git merge -s recursive C^0 >out 2>err &&
+   test_i18ngrep "The following untracked working tree files would be 
overwritten by merge" err &&
+
+   test 1 -eq $(git ls-files -s | wc -l) &&
+   test 4 -eq $(git ls-files -o | wc -l) &&
+
+   test "very" = "$(cat z/c)" &&
+   test "important" = "$(cat z/d)" &&
+   test $(git rev-parse HEAD:z/b) = $(git rev-parse A:z/b)
+'
+
+# Testcase 10b, Overwrite untracked: dir rename + delete
+#   Commit A: z/{b,c_1}
+#   Commit B: y/b + untracked y/{c,d,e}
+#   Commit C: z/{b,d_1,e}
+#   Expected: Failed Merge; y/b + untracked y/c + untracked y/d on disk +
+# z/c_1 -> z/d_1 rename recorded at stage 3 for y/d +
+#   ERROR_MSG(refusing to lose untracked file at 'y/d')
+
+test_expect_success '10b-setup: Overwrite untracked with dir rename + delete' '
+   git rm -rf . &&
+   git clean -fdqx &&
+   rm -rf .git &&
+   git init &&
+
+   mkdir z &&
+   echo b >z/b &&
+   echo c >z/c &&
+   git add z &&
+   test_tick &&
+   git commit -m "A" &&
+
+   git branch A &&
+   git branch B &&
+   git branch C &&
+
+   git checkout B &&
+   git rm z/c &&
+   git mv z/ y/ &&
+   test_tick &&
+   git commit -m "B" &&
+
+   git checkout C &&
+   git mv z/c z/d &&
+   echo e >z/e &&
+   git add z/e &&
+   test_tick &&
+   git commit -m "C"
+'
+
+test_expect_failure '10b-check: Overwrite untracked with dir rename + delete' '
+   git checkout B^0 &&
+   echo very >y/c &&
+   echo important >y/d &&
+   echo contents >y/e &&
+
+   test_must_fail git merge -s recursive C^0 >out 2>err &&
+   test_i18ngrep "CONFLICT (rename/delete).*Version C^0 of y/d left in 
tree at y/d~C^0" out &&
+   test_i18ngrep "Error: Refusing to lose untracked file at y/e; writing 
to y/e~C^0 instead" out &&
+
+   test 3 -eq $(git ls-files -s | wc -l) &&
+   test 2 -eq $(git ls-files -u | wc -l) &&
+   test 5 -eq $(git ls-files -o | wc -l) &&
+
+   test $(git rev-parse :0:y/b) = $(git rev-parse A:z/b) &&
+   test "very" = "$(cat y/c)" &&
+
+   test "important" = "$(cat y/d)" &&
+   test "important" != "$(git rev-parse :3:y/d)" &&
+   test $(git rev-parse :3:y/d) = $(git rev-parse A:z/c) &&
+
+   test "contents" = "$(cat y/e)" &&
+   test "contents" != "$(git rev-parse :3:y/e)" &&
+   test $(git rev-parse :3:y/e) = $(git rev-parse C:z/e)
+'
+
+# Testcase 10c, Overwrite untracked: dir rename/rename(1to2)
+#   Commit A: z/{a,b}, x/{c,d}
+#   Commit B: y/{a,b}, w/c, x/d + different untracked y/c
+#   Commit C: z/{a,b,c}, x/d
+#   Expected: Failed 

[PATCH 19/30] merge-recursive: Split out code for determining diff_filepairs

2017-11-10 Thread Elijah Newren
Create a new function, get_diffpairs() to compute the diff_filepairs
between two trees.  While these are currently only used in
get_renames(), I want them to be available to some new functions.  No
actual logic changes yet.

Signed-off-by: Elijah Newren 
---
 merge-recursive.c | 81 ---
 1 file changed, 60 insertions(+), 21 deletions(-)

diff --git a/merge-recursive.c b/merge-recursive.c
index f40c70990c..8c9543d85c 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -1315,24 +1315,15 @@ static int conflict_rename_rename_2to1(struct 
merge_options *o,
 }
 
 /*
- * Get information of all renames which occurred between 'o_tree' and
- * 'tree'. We need the three trees in the merge ('o_tree', 'a_tree' and
- * 'b_tree') to be able to associate the correct cache entries with
- * the rename information. 'tree' is always equal to either a_tree or b_tree.
+ * Get the diff_filepairs changed between o_tree and tree.
  */
-static struct string_list *get_renames(struct merge_options *o,
-  struct tree *tree,
-  struct tree *o_tree,
-  struct tree *a_tree,
-  struct tree *b_tree,
-  struct string_list *entries)
+static struct diff_queue_struct *get_diffpairs(struct merge_options *o,
+  struct tree *o_tree,
+  struct tree *tree)
 {
-   int i;
-   struct string_list *renames;
+   struct diff_queue_struct *ret;
struct diff_options opts;
 
-   renames = xcalloc(1, sizeof(struct string_list));
-
diff_setup();
DIFF_OPT_SET(, RECURSIVE);
DIFF_OPT_CLR(, RENAME_EMPTY);
@@ -1348,10 +1339,43 @@ static struct string_list *get_renames(struct 
merge_options *o,
diffcore_std();
if (opts.needed_rename_limit > o->needed_rename_limit)
o->needed_rename_limit = opts.needed_rename_limit;
-   for (i = 0; i < diff_queued_diff.nr; ++i) {
+
+   ret = malloc(sizeof(struct diff_queue_struct));
+   ret->queue = diff_queued_diff.queue;
+   ret->nr = diff_queued_diff.nr;
+   // Ignore diff_queued_diff.alloc; we won't be changing the size at all
+
+   opts.output_format = DIFF_FORMAT_NO_OUTPUT;
+   diff_queued_diff.nr = 0;
+   diff_queued_diff.queue = NULL;
+   diff_flush();
+   return ret;
+}
+
+/*
+ * Get information of all renames which occurred in 'pairs', making use of
+ * any implicit directory renames inferred from the other side of history.
+ * We need the three trees in the merge ('o_tree', 'a_tree' and 'b_tree')
+ * to be able to associate the correct cache entries with the rename
+ * information; tree is always equal to either a_tree or b_tree.
+ */
+static struct string_list *get_renames(struct merge_options *o,
+  struct diff_queue_struct *pairs,
+  struct tree *tree,
+  struct tree *o_tree,
+  struct tree *a_tree,
+  struct tree *b_tree,
+  struct string_list *entries)
+{
+   int i;
+   struct string_list *renames;
+
+   renames = xcalloc(1, sizeof(struct string_list));
+
+   for (i = 0; i < pairs->nr; ++i) {
struct string_list_item *item;
struct rename *re;
-   struct diff_filepair *pair = diff_queued_diff.queue[i];
+   struct diff_filepair *pair = pairs->queue[i];
if (pair->status != 'R') {
diff_free_filepair(pair);
continue;
@@ -1375,9 +1399,6 @@ static struct string_list *get_renames(struct 
merge_options *o,
item = string_list_insert(renames, pair->one->path);
item->util = re;
}
-   opts.output_format = DIFF_FORMAT_NO_OUTPUT;
-   diff_queued_diff.nr = 0;
-   diff_flush();
return renames;
 }
 
@@ -1649,15 +1670,33 @@ static struct rename_info *handle_renames(struct 
merge_options *o,
  int *clean)
 {
struct rename_info *rei = xcalloc(1, sizeof(struct rename_info));
+   struct diff_queue_struct *head_pairs, *merge_pairs;
 
*clean = 1;
if (!o->detect_rename)
return NULL;
 
-   rei->head_renames  = get_renames(o, head, common, head, merge, entries);
-   rei->merge_renames = get_renames(o, merge, common, head, merge, 
entries);
+   head_pairs = get_diffpairs(o, common, head);
+   merge_pairs = get_diffpairs(o, common, merge);
+
+   rei->head_renames  = get_renames(o, head_pairs, head,
+common, head, merge, entries);
+   

[PATCH 27/30] merge-recursive: Apply necessary modifications for directory renames

2017-11-10 Thread Elijah Newren
This commit hooks together all the directory rename logic by making the
necessary changes to the rename struct, it's dst_entry, and the
diff_filepair under consideration.

Signed-off-by: Elijah Newren 
---
 merge-recursive.c   | 195 +++-
 t/t6043-merge-rename-directories.sh |  50 -
 2 files changed, 219 insertions(+), 26 deletions(-)

diff --git a/merge-recursive.c b/merge-recursive.c
index 2a7258f6bb..838bfd32ec 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -166,6 +166,7 @@ static int oid_eq(const struct object_id *a, const struct 
object_id *b)
 
 enum rename_type {
RENAME_NORMAL = 0,
+   RENAME_DIR,
RENAME_DELETE,
RENAME_ONE_FILE_TO_ONE,
RENAME_ONE_FILE_TO_TWO,
@@ -599,6 +600,7 @@ struct rename {
 */
struct stage_data *src_entry;
struct stage_data *dst_entry;
+   unsigned add_turned_into_rename:1;
unsigned processed:1;
 };
 
@@ -633,6 +635,26 @@ static int update_stages(struct merge_options *opt, const 
char *path,
return 0;
 }
 
+static int update_stages_for_stage_data(struct merge_options *opt,
+   const char *path,
+   const struct stage_data *stage_data)
+{
+   struct diff_filespec o, a, b;
+   o.mode = stage_data->stages[1].mode;
+   oidcpy(, _data->stages[1].oid);
+
+   a.mode = stage_data->stages[2].mode;
+   oidcpy(, _data->stages[2].oid);
+
+   b.mode = stage_data->stages[3].mode;
+   oidcpy(, _data->stages[3].oid);
+
+   return update_stages(opt, path,
+is_null_sha1(o.oid.hash) ? NULL : ,
+is_null_sha1(a.oid.hash) ? NULL : ,
+is_null_sha1(b.oid.hash) ? NULL : );
+}
+
 static void update_entry(struct stage_data *entry,
 struct diff_filespec *o,
 struct diff_filespec *a,
@@ -1100,6 +1122,18 @@ static int merge_file_one(struct merge_options *o,
return merge_file_1(o, , , , branch1, branch2, mfi);
 }
 
+static int conflict_rename_dir(struct merge_options *o,
+  struct diff_filepair *pair,
+  const char *rename_branch,
+  const char *other_branch)
+{
+   const struct diff_filespec *dest = pair->two;
+
+   if (update_file(o, 1, >oid, dest->mode, dest->path))
+   return -1;
+   return 0;
+}
+
 static int handle_change_delete(struct merge_options *o,
 const char *path, const char *old_path,
 const struct object_id *o_oid, int o_mode,
@@ -1369,6 +1403,24 @@ static int conflict_rename_rename_2to1(struct 
merge_options *o,
if (!ret)
ret = update_file(o, 0, _c2.oid, mfi_c2.mode,
  new_path2);
+   /*
+* unpack_trees() actually populates the index for us for
+* "normal" rename/rename(2to1) situtations so that the
+* correct entries are at the higher stages, which would
+* make the call below to update_stages_for_stage_data
+* unnecessary.  However, if either of the renames came
+* from a directory rename, then unpack_trees() will not
+* have gotten the right data loaded into the index, so we
+* need to do so now.  (While it'd be tempting to move this
+* call to update_stages_for_stage_data() to
+* apply_directory_rename_modifications(), that would break
+* our intermediate calls to would_lose_untracked() since
+* those rely on the current in-memory index.  See also the
+* big "NOTE" in update_stages()).
+*/
+   if (update_stages_for_stage_data(o, path, ci->dst_entry1))
+   ret = -1;
+
free(new_path2);
free(new_path1);
}
@@ -1888,6 +1940,120 @@ static char *check_for_directory_rename(struct 
merge_options *o,
return new_path;
 }
 
+static void apply_directory_rename_modifications(struct merge_options *o,
+struct diff_filepair *pair,
+char *new_path,
+struct rename *re,
+struct tree *tree,
+struct tree *o_tree,
+struct tree *a_tree,
+struct tree *b_tree,
+struct string_list *entries,
+int *clean)
+{
+   struct 

[PATCH 02/30] merge-recursive: Fix logic ordering issue

2017-11-10 Thread Elijah Newren
merge_trees() did a variety of work, including:
  * Calling get_unmerged() to get unmerged entries
  * Calling record_df_conflict_files() with all unmerged entries to
do some work to ensure we could handle D/F conflicts correctly
  * Calling get_renames() to check for renames.

An easily overlooked issue is that get_renames() can create more
unmerged entries and add them to the list, which have the possibility of
being involved in D/F conflicts.  So the call to
record_df_conflict_files() should really be moved after all the rename
detection.  I didn't come up with any testcases demonstrating any bugs
with the old ordering, but I suspect there were some for both normal
renames and for directory renames.  Fix the ordering.

Signed-off-by: Elijah Newren 
---
 merge-recursive.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/merge-recursive.c b/merge-recursive.c
index 1d3f8f0d22..52521faf09 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -1981,10 +1981,10 @@ int merge_trees(struct merge_options *o,
get_files_dirs(o, merge);
 
entries = get_unmerged();
-   record_df_conflict_files(o, entries);
re_head  = get_renames(o, head, common, head, merge, entries);
re_merge = get_renames(o, merge, common, head, merge, entries);
clean = process_renames(o, re_head, re_merge);
+   record_df_conflict_files(o, entries);
if (clean < 0)
goto cleanup;
for (i = entries->nr-1; 0 <= i; i--) {
-- 
2.15.0.5.g9567be9905



[PATCH 03/30] merge-recursive: Add explanation for src_entry and dst_entry

2017-11-10 Thread Elijah Newren
If I have to walk through the debugger and inspect the values found in
here in order to figure out their meaning, despite having known these
things inside and out some years back, then they probably need a comment
for the casual reader to explain their purpose.

Signed-off-by: Elijah Newren 
---
 merge-recursive.c | 22 ++
 1 file changed, 22 insertions(+)

diff --git a/merge-recursive.c b/merge-recursive.c
index 52521faf09..3526c8d0b8 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -513,6 +513,28 @@ static void record_df_conflict_files(struct merge_options 
*o,
 
 struct rename {
struct diff_filepair *pair;
+   /*
+* Because I keep forgetting every few years what src_entry and
+* dst_entry are and have to walk through a debugger and puzzle
+* through it to remind myself...
+*
+* If 'before' is renamed to 'after' then src_entry will contain
+* the versions of 'before' from the merge_base, HEAD, and MERGE in
+* stages 1, 2, and 3; dst_entry will contain the versions of
+* 'after' from the merge_base, HEAD, and MERGE in stages 1, 2, and
+* 3.  Thus, we have a total of six modes and oids, though some
+* will be null.  (Stage 0 is ignored; we're interested in handling
+* conflicts.)
+*
+* Since we don't turn on break-rewrites by default, neither
+* src_entry nor dst_entry can have all three of their stages have
+* non-null oids, meaning at most four of the six will be non-null.
+* Also, since this is a rename, both src_entry and dst_entry will
+* have at least one non-null oid, meaning at least two will be
+* non-null.  Of the six oids, a typical rename will have three be
+* non-null.  Only two implies a rename/delete, and four implies a
+* rename/add.
+*/
struct stage_data *src_entry;
struct stage_data *dst_entry;
unsigned processed:1;
-- 
2.15.0.5.g9567be9905



[PATCH 04/30] directory rename detection: basic testcases

2017-11-10 Thread Elijah Newren
Signed-off-by: Elijah Newren 
---
 t/t6043-merge-rename-directories.sh | 391 
 1 file changed, 391 insertions(+)
 create mode 100755 t/t6043-merge-rename-directories.sh

diff --git a/t/t6043-merge-rename-directories.sh 
b/t/t6043-merge-rename-directories.sh
new file mode 100755
index 00..b737b0a105
--- /dev/null
+++ b/t/t6043-merge-rename-directories.sh
@@ -0,0 +1,391 @@
+#!/bin/sh
+
+test_description="recursive merge with directory renames"
+# includes checking of many corner cases, with a similar methodology to:
+#   t6042: corner cases with renames but not criss-cross merges
+#   t6036: corner cases with both renames and criss-cross merges
+#
+# The setup for all of them, pictorially, is:
+#
+#  B
+#  o
+# / \
+#  A o   ?
+# \ /
+#  o
+#  C
+#
+# To help make it easier to follow the flow of tests, they have been
+# divided into sections and each test will start with a quick explanation
+# of what commits A, B, and C contain.
+#
+# Notation:
+#z/{b,c}   means  files z/b and z/c both exist
+#x/d_1 means  file x/d exists with content d1.  (Purpose of the
+# underscore notation is to differentiate different
+# files that might be renamed into each other's paths.)
+
+. ./test-lib.sh
+
+
+###
+# SECTION 1: Basic cases we should be able to handle
+###
+
+# Testcase 1a, Basic directory rename.
+#   Commit A: z/{b,c}
+#   Commit B: y/{b,c}
+#   Commit C: z/{b,c,d,e/f}
+#   Expected: y/{b,c,d,e/f}
+
+test_expect_success '1a-setup: Simple directory rename detection' '
+   mkdir z &&
+   echo b >z/b &&
+   echo c >z/c &&
+   git add z &&
+   test_tick &&
+   git commit -m "A" &&
+
+   git branch A &&
+   git branch B &&
+   git branch C &&
+
+   git checkout B &&
+   git mv z y &&
+   test_tick &&
+   git commit -m "B" &&
+
+   git checkout C &&
+   echo d >z/d &&
+   mkdir z/e &&
+   echo f >z/e/f &&
+   git add z/d z/e/f &&
+   test_tick &&
+   git commit -m "C"
+'
+
+test_expect_failure '1a-check: Simple directory rename detection' '
+   git checkout B^0 &&
+
+   git merge -s recursive C^0 &&
+
+   test 4 -eq $(git ls-files -s | wc -l) &&
+
+   test $(git rev-parse HEAD:y/b) = $(git rev-parse A:z/b) &&
+   test $(git rev-parse HEAD:y/c) = $(git rev-parse A:z/c) &&
+   test $(git rev-parse HEAD:y/d) = $(git rev-parse C:z/d) &&
+   test "$(git hash-object y/d)" = $(git rev-parse C:z/d) &&
+   test $(git rev-parse HEAD:y/e/f) = $(git rev-parse C:z/e/f) &&
+   test_must_fail git rev-parse HEAD:z/d &&
+   test_must_fail git rev-parse HEAD:z/e/f &&
+   test ! -d z/d &&
+   test ! -d z/e/f
+'
+
+# Testcase 1b, Merge a directory with another
+#   Commit A: z/{b,c},   y/d
+#   Commit B: z/{b,c,e}, y/d
+#   Commit C: y/{b,c,d}
+#   Expected: y/{b,c,d,e}
+
+test_expect_success '1b-setup: Merge a directory with another' '
+   git rm -rf . &&
+   git clean -fdqx &&
+   rm -rf .git &&
+   git init &&
+
+   mkdir z &&
+   echo b >z/b &&
+   echo c >z/c &&
+   mkdir y &&
+   echo d >y/d &&
+   git add z y &&
+   test_tick &&
+   git commit -m "A" &&
+
+   git branch A &&
+   git branch B &&
+   git branch C &&
+
+   git checkout B &&
+   echo e >z/e &&
+   git add z/e &&
+   test_tick &&
+   git commit -m "B" &&
+
+   git checkout C &&
+   git mv z/b y &&
+   git mv z/c y &&
+   rmdir z &&
+   test_tick &&
+   git commit -m "C"
+'
+
+test_expect_failure '1b-check: Merge a directory with another' '
+   git checkout B^0 &&
+
+   git merge -s recursive C^0 &&
+
+   test 4 -eq $(git ls-files -s | wc -l) &&
+
+   test $(git rev-parse HEAD:y/b) = $(git rev-parse A:z/b) &&
+   test $(git rev-parse HEAD:y/c) = $(git rev-parse A:z/c) &&
+   test $(git rev-parse HEAD:y/d) = $(git rev-parse A:y/d) &&
+   test $(git rev-parse HEAD:y/e) = $(git rev-parse B:z/e) &&
+   test_must_fail git rev-parse HEAD:z/e
+'
+
+# Testcase 1c, Transitive renaming
+#   (Related to testcases 3a and 6d -- when should a transitive rename apply?)
+#   (Related to testcases 9c and 9d -- can transitivity repeat?)
+#   Commit A: z/{b,c},   x/d
+#   Commit B: y/{b,c},   x/d
+#   Commit C: z/{b,c,d}
+#   Expected: y/{b,c,d}  (because x/d -> z/d -> y/d)
+
+test_expect_success '1c-setup: Transitive renaming' '
+   git rm -rf . &&
+   git clean -fdqx &&
+   rm -rf .git &&
+   git init &&
+
+   mkdir z &&
+   echo b >z/b &&
+   echo c >z/c &&
+   mkdir x &&
+   echo d >x/d &&
+   git add z x &&
+   test_tick &&
+   git commit -m "A" &&
+
+   git branch A &&
+   git branch B &&
+   

[PATCH 28/30] merge-recursive: Avoid clobbering untracked files with directory renames

2017-11-10 Thread Elijah Newren
Signed-off-by: Elijah Newren 
---
 merge-recursive.c   | 39 +++--
 t/t6043-merge-rename-directories.sh |  6 +++---
 2 files changed, 40 insertions(+), 5 deletions(-)

diff --git a/merge-recursive.c b/merge-recursive.c
index 838bfd32ec..1b3ee5b9fb 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -1129,6 +1129,25 @@ static int conflict_rename_dir(struct merge_options *o,
 {
const struct diff_filespec *dest = pair->two;
 
+   if (!o->call_depth && would_lose_untracked(dest->path)) {
+   char *alt_path = unique_path(o, dest->path, rename_branch);
+   output(o, 1, _("Error: Refusing to lose untracked file at %s; "
+  "writing to %s instead."),
+  dest->path, alt_path);
+   /*
+* Write the file in worktree at alt_path, but not in the
+* index.  Instead, write to dest->path for the index but
+* only at the higher appropriate stage.
+*/
+   if (update_file(o, 0, >oid, dest->mode, alt_path))
+   return -1;
+   free(alt_path);
+   return update_stages(o, dest->path, NULL,
+rename_branch == o->branch1 ? dest : NULL,
+rename_branch == o->branch1 ? NULL : dest);
+   }
+
+   /* Update dest->path both in index and in worktree */
if (update_file(o, 1, >oid, dest->mode, dest->path))
return -1;
return 0;
@@ -1147,7 +1166,8 @@ static int handle_change_delete(struct merge_options *o,
const char *update_path = path;
int ret = 0;
 
-   if (dir_in_way(path, !o->call_depth, 0)) {
+   if (dir_in_way(path, !o->call_depth, 0) ||
+   (!o->call_depth && would_lose_untracked(path))) {
update_path = alt_path = unique_path(o, path, change_branch);
}
 
@@ -1273,6 +1293,10 @@ static int handle_file(struct merge_options *o,
dst_name = unique_path(o, rename->path, cur_branch);
output(o, 1, _("%s is a directory in %s adding as %s 
instead"),
   rename->path, other_branch, dst_name);
+   } else if (!o->call_depth && 
would_lose_untracked(rename->path)) {
+   dst_name = unique_path(o, rename->path, cur_branch);
+   output(o, 1, _("Refusing to lose untracked file at %s; 
adding as %s instead"),
+  rename->path, dst_name);
}
}
if ((ret = update_file(o, 0, >oid, rename->mode, dst_name)))
@@ -1398,7 +1422,18 @@ static int conflict_rename_rename_2to1(struct 
merge_options *o,
char *new_path2 = unique_path(o, path, ci->branch2);
output(o, 1, _("Renaming %s to %s and %s to %s instead"),
   a->path, new_path1, b->path, new_path2);
-   remove_file(o, 0, path, 0);
+   if (would_lose_untracked(path))
+   /*
+* Only way we get here is if both renames were from
+* a directory rename AND user had an untracked file
+* at the location where both files end up after the
+* two directory renames.  See testcase 10d of t6043.
+*/
+   output(o, 1, _("Refusing to lose untracked file at "
+  "%s, even though it's in the way."),
+  path);
+   else
+   remove_file(o, 0, path, 0);
ret = update_file(o, 0, _c1.oid, mfi_c1.mode, new_path1);
if (!ret)
ret = update_file(o, 0, _c2.oid, mfi_c2.mode,
diff --git a/t/t6043-merge-rename-directories.sh 
b/t/t6043-merge-rename-directories.sh
index e737bad2c5..6db764a1b6 100755
--- a/t/t6043-merge-rename-directories.sh
+++ b/t/t6043-merge-rename-directories.sh
@@ -2660,7 +2660,7 @@ test_expect_success '10b-setup: Overwrite untracked with 
dir rename + delete' '
git commit -m "C"
 '
 
-test_expect_failure '10b-check: Overwrite untracked with dir rename + delete' '
+test_expect_success '10b-check: Overwrite untracked with dir rename + delete' '
git checkout B^0 &&
echo very >y/c &&
echo important >y/d &&
@@ -2727,7 +2727,7 @@ test_expect_success '10c-setup: Overwrite untracked with 
dir rename/rename(1to2)
git commit -m "C"
 '
 
-test_expect_failure '10c-check: Overwrite untracked with dir 
rename/rename(1to2)' '
+test_expect_success '10c-check: Overwrite untracked with dir 
rename/rename(1to2)' '
git checkout B^0 &&
echo important >y/c &&
 
@@ -2793,7 +2793,7 @@ test_expect_success '10d-setup: Delete untracked with dir 
rename/rename(2to1)' '
git commit -m 

[PATCH 20/30] merge-recursive: Add a new hashmap for storing directory renames

2017-11-10 Thread Elijah Newren
This just adds dir_rename_entry and the associated functions; code using
these will be added in subsequent commits.

Signed-off-by: Elijah Newren 
---
 merge-recursive.c | 24 
 merge-recursive.h |  8 
 2 files changed, 32 insertions(+)

diff --git a/merge-recursive.c b/merge-recursive.c
index 8c9543d85c..89a9b32635 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -49,6 +49,30 @@ static unsigned int path_hash(const char *path)
return ignore_case ? strihash(path) : strhash(path);
 }
 
+static struct dir_rename_entry *dir_rename_find_entry(struct hashmap *hashmap, 
char *dir)
+{
+   struct dir_rename_entry key;
+
+   if (dir == NULL)
+   return NULL;
+   hashmap_entry_init(, strhash(dir));
+   key.dir = dir;
+   return hashmap_get(hashmap, , NULL);
+}
+
+static int dir_rename_cmp(void *unused_cmp_data,
+ const struct dir_rename_entry *e1,
+ const struct dir_rename_entry *e2,
+ const void *unused_keydata)
+{
+   return strcmp(e1->dir, e2->dir);
+}
+
+static void dir_rename_init(struct hashmap *map)
+{
+   hashmap_init(map, (hashmap_cmp_fn) dir_rename_cmp, NULL, 0);
+}
+
 static void flush_output(struct merge_options *o)
 {
if (o->buffer_output < 2 && o->obuf.len) {
diff --git a/merge-recursive.h b/merge-recursive.h
index 80d69d1401..a024949739 100644
--- a/merge-recursive.h
+++ b/merge-recursive.h
@@ -29,6 +29,14 @@ struct merge_options {
struct string_list df_conflict_file_set;
 };
 
+struct dir_rename_entry {
+   struct hashmap_entry ent; /* must be the first member! */
+   char *dir;
+   unsigned non_unique_new_dir:1;
+   char *new_dir;
+   struct string_list possible_new_dirs;
+};
+
 /* merge_trees() but with recursive ancestor consolidation */
 int merge_recursive(struct merge_options *o,
struct commit *h1,
-- 
2.15.0.5.g9567be9905



[PATCH 30/30] merge-recursive: Fix remaining directory rename + dirty overwrite cases

2017-11-10 Thread Elijah Newren
Signed-off-by: Elijah Newren 
---
 merge-recursive.c   | 26 +++---
 t/t6043-merge-rename-directories.sh |  8 
 2 files changed, 27 insertions(+), 7 deletions(-)

diff --git a/merge-recursive.c b/merge-recursive.c
index 86ddb89727..6ef1d52f0a 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -1296,11 +1296,23 @@ static int handle_file(struct merge_options *o,
 
add = filespec_from_entry(, dst_entry, stage ^ 1);
if (add) {
+   int ren_src_was_dirty = was_dirty(o, rename->path);
char *add_name = unique_path(o, rename->path, other_branch);
if (update_file(o, 0, >oid, add->mode, add_name))
return -1;
 
-   remove_file(o, 0, rename->path, 0);
+   if (ren_src_was_dirty) {
+   output(o, 1, _("Refusing to lose dirty file at %s"),
+  rename->path);
+   }
+   /*
+* Stupid double negatives in remove_file; it somehow manages
+* to repeatedly mess me up.  So, just for myself:
+*1) update_wd iff !ren_src_was_dirty.
+*2) no_wd iff !update_wd
+*3) so, no_wd == !!ren_src_was_dirty == ren_src_was_dirty
+*/
+   remove_file(o, 0, rename->path, ren_src_was_dirty);
dst_name = unique_path(o, rename->path, cur_branch);
} else {
if (dir_in_way(rename->path, !o->call_depth, 0)) {
@@ -1436,7 +1448,10 @@ static int conflict_rename_rename_2to1(struct 
merge_options *o,
char *new_path2 = unique_path(o, path, ci->branch2);
output(o, 1, _("Renaming %s to %s and %s to %s instead"),
   a->path, new_path1, b->path, new_path2);
-   if (would_lose_untracked(path))
+   if (was_dirty(o, path))
+   output(o, 1, _("Refusing to lose dirty file at %s"),
+  path);
+   else if (would_lose_untracked(path))
/*
 * Only way we get here is if both renames were from
 * a directory rename AND user had an untracked file
@@ -2002,6 +2017,7 @@ static void apply_directory_rename_modifications(struct 
merge_options *o,
 {
struct string_list_item *item;
int stage = (tree == a_tree ? 2 : 3);
+   int update_wd;
 
/*
 * In all cases where we can do directory rename detection,
@@ -2012,7 +2028,11 @@ static void apply_directory_rename_modifications(struct 
merge_options *o,
 * saying the file would have been overwritten), but it might
 * be dirty, though.
 */
-   remove_file(o, 1, pair->two->path, 0 /* no_wd */);
+   update_wd = !was_dirty(o, pair->two->path);
+   if (!update_wd)
+   output(o, 1, _("Refusing to lose dirty file at %s"),
+  pair->two->path);
+   remove_file(o, 1, pair->two->path, !update_wd);
 
/* Find or create a new re->dst_entry */
item = string_list_lookup(entries, new_path);
diff --git a/t/t6043-merge-rename-directories.sh 
b/t/t6043-merge-rename-directories.sh
index 02c97c9823..676e72e9e6 100755
--- a/t/t6043-merge-rename-directories.sh
+++ b/t/t6043-merge-rename-directories.sh
@@ -2985,7 +2985,7 @@ test_expect_success '11b-setup: Avoid losing dirty file 
involved in directory re
git commit -m "C"
 '
 
-test_expect_failure '11b-check: Avoid losing dirty file involved in directory 
rename' '
+test_expect_success '11b-check: Avoid losing dirty file involved in directory 
rename' '
git checkout B^0 &&
echo stuff >>z/c &&
 
@@ -3109,7 +3109,7 @@ test_expect_success '11d-setup: Avoid losing not-uptodate 
with rename + D/F conf
git commit -m "C"
 '
 
-test_expect_failure '11d-check: Avoid losing not-uptodate with rename + D/F 
conflict' '
+test_expect_success '11d-check: Avoid losing not-uptodate with rename + D/F 
conflict' '
git checkout B^0 &&
echo stuff >>z/c &&
 
@@ -3178,7 +3178,7 @@ test_expect_success '11e-setup: Avoid deleting 
not-uptodate with dir rename/rena
git commit -m "C"
 '
 
-test_expect_failure '11e-check: Avoid deleting not-uptodate with dir 
rename/rename(1to2)/add' '
+test_expect_success '11e-check: Avoid deleting not-uptodate with dir 
rename/rename(1to2)/add' '
git checkout B^0 &&
echo mods >>y/c &&
 
@@ -3247,7 +3247,7 @@ test_expect_success '11f-setup: Avoid deleting 
not-uptodate with dir rename/rena
git commit -m "C"
 '
 
-test_expect_failure '11f-check: Avoid deleting not-uptodate with dir 
rename/rename(2to1)' '
+test_expect_success '11f-check: Avoid deleting not-uptodate with dir 
rename/rename(2to1)' '
git checkout B^0 &&
echo important >>y/wham &&
 
-- 
2.15.0.5.g9567be9905



[PATCH 18/30] merge-recursive: Make !o->detect_rename codepath more obvious

2017-11-10 Thread Elijah Newren
Previously, if !o->detect_rename then get_renames() would return an
empty string_list, and then process_renames() would have nothing to
iterate over.  It seems more straightforward to simply avoid calling
either function in that case.

Signed-off-by: Elijah Newren 
---
 merge-recursive.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/merge-recursive.c b/merge-recursive.c
index 7a3402e50c..f40c70990c 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -1332,8 +1332,6 @@ static struct string_list *get_renames(struct 
merge_options *o,
struct diff_options opts;
 
renames = xcalloc(1, sizeof(struct string_list));
-   if (!o->detect_rename)
-   return renames;
 
diff_setup();
DIFF_OPT_SET(, RECURSIVE);
@@ -1652,6 +1650,10 @@ static struct rename_info *handle_renames(struct 
merge_options *o,
 {
struct rename_info *rei = xcalloc(1, sizeof(struct rename_info));
 
+   *clean = 1;
+   if (!o->detect_rename)
+   return NULL;
+
rei->head_renames  = get_renames(o, head, common, head, merge, entries);
rei->merge_renames = get_renames(o, merge, common, head, merge, 
entries);
*clean = process_renames(o, rei->head_renames, rei->merge_renames);
@@ -1664,6 +1666,9 @@ static void cleanup_renames(struct rename_info *re_info)
const struct rename *re;
int i;
 
+   if (!re_info)
+   return;
+
for (i = 0; i < re_info->head_renames->nr; i++) {
re = re_info->head_renames->items[i].util;
diff_free_filepair(re->pair);
-- 
2.15.0.5.g9567be9905



[PATCH 25/30] merge-recursive: Check for file level conflicts then get new name

2017-11-10 Thread Elijah Newren
Before trying to apply directory renames to paths within the given
directories, we want to make sure that there aren't conflicts at the
file level either.  If there aren't any, then get the new name from
any directory renames.

Signed-off-by: Elijah Newren 
---
 merge-recursive.c   | 184 ++--
 t/t6043-merge-rename-directories.sh |   2 +-
 2 files changed, 176 insertions(+), 10 deletions(-)

diff --git a/merge-recursive.c b/merge-recursive.c
index 251c4cc7fa..7c2c29bb51 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -1481,6 +1481,102 @@ static void get_renamed_dir_portion(const char 
*old_path, const char *new_path,
}
 }
 
+/*
+ * Write:
+ *   element1, element2, element3, ..., elementN
+ * to str.  If only one element, just write "element1" to str.
+ */
+static void comma_separated_list(char *str, struct string_list *slist) {
+   int i;
+   for (i=0; inr; i++) {
+   str += sprintf(str, "%s", slist->items[i].string);
+   if (i < slist->nr-1)
+   str += sprintf(str, ", ");
+   }
+}
+
+/*
+ * See if there is a directory rename for path, and if there are any file
+ * level conflicts for the renamed location.  If there is a rename and
+ * there are no conflicts, return the new name.  Otherwise, return NULL.
+ */
+static char* handle_path_level_conflicts(struct merge_options *o,
+const char *path,
+struct dir_rename_entry *entry,
+struct hashmap *collisions,
+struct tree *tree)
+{
+   char *new_path = NULL;
+   struct collision_entry *collision_ent;
+   int clean = 1;
+
+   /*
+* entry has the mapping of old directory name to new directory name
+* that we want to apply to path.
+*/
+   new_path = apply_dir_rename(entry, path);
+
+   if (!new_path) {
+   /* This should only happen when entry->non_unique_new_dir set */
+   assert(entry->non_unique_new_dir);
+   output(o, 1, _("CONFLICT (directory rename split): "
+  "Unclear where to place %s because directory "
+  "%s was renamed to multiple other directories, "
+  "with no destination getting a majority of the "
+  "files."),
+  path, entry->dir);
+   clean = 0;
+   return NULL;
+   }
+
+   /*
+* The caller needs to have ensured that it has pre-populated
+* collisions with all paths that map to new_path.  Do a quick check
+* to ensure that's the case.
+ */
+   collision_ent = collision_find_entry(collisions, new_path);
+   assert(collision_ent != NULL);
+
+   /*
+* Check for one-sided add/add/.../add conflicts, i.e.
+* where implicit renames from the other side doing
+* directory rename(s) can affect this side of history
+* to put multiple paths into the same location.  Warn
+* and bail on directory renames for such paths.
+*/
+   char collision_paths[(PATH_MAX+2) * collision_ent->source_files.nr];
+   if (collision_ent->reported_already) {
+   clean = 0;
+   } else if (tree_has_path(tree, new_path)) {
+   collision_ent->reported_already = 1;
+   comma_separated_list(collision_paths,
+_ent->source_files);
+   output(o, 1, _("CONFLICT (implicit dir rename): Existing "
+  "file/dir at %s in the way of implicit "
+  "directory rename(s) putting the following "
+  "path(s) there: %s."),
+  new_path, collision_paths);
+   clean = 0;
+   } else if (collision_ent->source_files.nr > 1) {
+   collision_ent->reported_already = 1;
+   comma_separated_list(collision_paths,
+_ent->source_files);
+   output(o, 1, _("CONFLICT (implicit dir rename): Cannot map "
+  "more than one path to %s; implicit directory "
+  "renames tried to put these paths there: %s"),
+  new_path, collision_paths);
+   clean = 0;
+   }
+
+   /* Free memory we no longer need */
+   if (!clean && new_path) {
+   free(new_path);
+   return NULL;
+   }
+
+   return new_path;
+}
+
 /*
  * There are a couple things we want to do at the directory level:
  *   1. Check for both sides renaming to the same thing, in order to avoid
@@ -1725,6 +1821,58 @@ static void compute_collisions(struct hashmap 
*collisions,
}
 }
 
+static char 

[PATCH 17/30] merge-recursive: Fix leaks of allocated renames and diff_filepairs

2017-11-10 Thread Elijah Newren
get_renames() has always zero'ed out diff_queued_diff.nr while only
manually free'ing diff_filepairs that did not correspond to renames.
Further, it allocated struct renames that were tucked away in the
return string_list.  Make sure all of these are deallocated when we
are done with them.

Signed-off-by: Elijah Newren 
---
 merge-recursive.c | 15 +--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/merge-recursive.c b/merge-recursive.c
index 49710c0964..7a3402e50c 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -1661,10 +1661,21 @@ static struct rename_info *handle_renames(struct 
merge_options *o,
 
 static void cleanup_renames(struct rename_info *re_info)
 {
-   string_list_clear(re_info->head_renames, 0);
-   string_list_clear(re_info->merge_renames, 0);
+   const struct rename *re;
+   int i;
 
+   for (i = 0; i < re_info->head_renames->nr; i++) {
+   re = re_info->head_renames->items[i].util;
+   diff_free_filepair(re->pair);
+   }
+   string_list_clear(re_info->head_renames, 1);
free(re_info->head_renames);
+
+   for (i = 0; i < re_info->merge_renames->nr; i++) {
+   re = re_info->merge_renames->items[i].util;
+   diff_free_filepair(re->pair);
+   }
+   string_list_clear(re_info->merge_renames, 1);
free(re_info->merge_renames);
 
free(re_info);
-- 
2.15.0.5.g9567be9905



[PATCH 22/30] merge-recursive: Check for directory level conflicts

2017-11-10 Thread Elijah Newren
Before trying to apply directory renames to paths within the given
directories, we want to make sure that there aren't conflicts at the
directory level.  There will be additional checks at the individual
file level too, which will be added later.

Signed-off-by: Elijah Newren 
---
 merge-recursive.c | 112 ++
 1 file changed, 112 insertions(+)

diff --git a/merge-recursive.c b/merge-recursive.c
index b5770d3d7f..3633be0123 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -1376,6 +1376,15 @@ static struct diff_queue_struct *get_diffpairs(struct 
merge_options *o,
return ret;
 }
 
+static int tree_has_path(struct tree *tree, const char *path)
+{
+   unsigned char hashy[20];
+   unsigned mode_o;
+
+   return !get_tree_entry(tree->object.oid.hash, path,
+  hashy, _o);
+}
+
 static void get_renamed_dir_portion(const char *old_path, const char *new_path,
char **old_dir, char **new_dir) {
*old_dir = NULL;
@@ -1425,6 +1434,105 @@ static void get_renamed_dir_portion(const char 
*old_path, const char *new_path,
}
 }
 
+/*
+ * There are a couple things we want to do at the directory level:
+ *   1. Check for both sides renaming to the same thing, in order to avoid
+ *  implicit renaming of files that should be left in place.  (See
+ *  testcase 6b in t6043 for details.)
+ *   2. Prune directory renames if there are still files left in the
+ *  the original directory.  These represent a partial directory rename,
+ *  i.e. a rename where only some of the files within the directory
+ *  were renamed elsewhere.  (Technically, this could be done earlier
+ *  in get_directory_renames(), except that would prevent us from
+ *  doing the previous check and thus failing testcase 6b.)
+ *   3. Check for rename/rename(1to2) conflicts (at the directory level).
+ *  In the future, we could potentially record this info as well and
+ *  omit reporting rename/rename(1to2) conflicts for each path within
+ *  the affected directories, thus cleaning up the merge output.
+ *   NOTE: We do NOT check for rename/rename(2to1) conflicts at the
+ * directory level, because merging directories is fine.  If it
+ * causes conflicts for files within those merged directories, then
+ * that should be detected at the individual path level.
+ */
+static void handle_directory_level_conflicts(struct merge_options *o,
+struct hashmap *dir_re_head,
+struct tree *head,
+struct hashmap *dir_re_merge,
+struct tree *merge)
+{
+   struct hashmap_iter iter;
+   struct dir_rename_entry *head_ent;
+   struct dir_rename_entry *merge_ent;
+   int i;
+
+   struct string_list remove_from_head = STRING_LIST_INIT_NODUP;
+   struct string_list remove_from_merge = STRING_LIST_INIT_NODUP;
+
+   hashmap_iter_init(dir_re_head, );
+   while ((head_ent = hashmap_iter_next())) {
+   merge_ent = dir_rename_find_entry(dir_re_merge, head_ent->dir);
+   if (merge_ent &&
+   !head_ent->non_unique_new_dir &&
+   !merge_ent->non_unique_new_dir &&
+   !strcmp(head_ent->new_dir, merge_ent->new_dir)) {
+   /* 1. Renamed identically; remove it from both sides */
+   string_list_append(_from_head,
+  head_ent->dir)->util = head_ent;
+   free(head_ent->new_dir);
+   string_list_append(_from_merge,
+  merge_ent->dir)->util = merge_ent;
+   free(merge_ent->new_dir);
+   } else if (tree_has_path(head, head_ent->dir)) {
+   /* 2. This wasn't a directory rename after all */
+   string_list_append(_from_head,
+  head_ent->dir)->util = head_ent;
+   free(head_ent->new_dir);
+   }
+   }
+
+   hashmap_iter_init(dir_re_merge, );
+   while ((merge_ent = hashmap_iter_next())) {
+   head_ent = dir_rename_find_entry(dir_re_head, merge_ent->dir);
+   if (tree_has_path(merge, merge_ent->dir)) {
+   /* 2. This wasn't a directory rename after all */
+   string_list_append(_from_merge,
+  merge_ent->dir)->util = merge_ent;
+   } else if (head_ent &&
+  !head_ent->non_unique_new_dir &&
+  !merge_ent->non_unique_new_dir) {
+   /* 3. rename/rename(1to2) */
+   /* We can assume it's not 

[PATCH 15/30] merge-recursive: Move the get_renames() function

2017-11-10 Thread Elijah Newren
I want to re-use some other functions in the file without moving those other
functions or dealing with a handful of annoying split function declarations
and definitions.

Signed-off-by: Elijah Newren 
---
 merge-recursive.c | 138 +++---
 1 file changed, 69 insertions(+), 69 deletions(-)

diff --git a/merge-recursive.c b/merge-recursive.c
index 3526c8d0b8..382016508b 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -540,75 +540,6 @@ struct rename {
unsigned processed:1;
 };
 
-/*
- * Get information of all renames which occurred between 'o_tree' and
- * 'tree'. We need the three trees in the merge ('o_tree', 'a_tree' and
- * 'b_tree') to be able to associate the correct cache entries with
- * the rename information. 'tree' is always equal to either a_tree or b_tree.
- */
-static struct string_list *get_renames(struct merge_options *o,
-  struct tree *tree,
-  struct tree *o_tree,
-  struct tree *a_tree,
-  struct tree *b_tree,
-  struct string_list *entries)
-{
-   int i;
-   struct string_list *renames;
-   struct diff_options opts;
-
-   renames = xcalloc(1, sizeof(struct string_list));
-   if (!o->detect_rename)
-   return renames;
-
-   diff_setup();
-   DIFF_OPT_SET(, RECURSIVE);
-   DIFF_OPT_CLR(, RENAME_EMPTY);
-   opts.detect_rename = DIFF_DETECT_RENAME;
-   opts.rename_limit = o->merge_rename_limit >= 0 ? o->merge_rename_limit :
-   o->diff_rename_limit >= 0 ? o->diff_rename_limit :
-   1000;
-   opts.rename_score = o->rename_score;
-   opts.show_rename_progress = o->show_rename_progress;
-   opts.output_format = DIFF_FORMAT_NO_OUTPUT;
-   diff_setup_done();
-   diff_tree_oid(_tree->object.oid, >object.oid, "", );
-   diffcore_std();
-   if (opts.needed_rename_limit > o->needed_rename_limit)
-   o->needed_rename_limit = opts.needed_rename_limit;
-   for (i = 0; i < diff_queued_diff.nr; ++i) {
-   struct string_list_item *item;
-   struct rename *re;
-   struct diff_filepair *pair = diff_queued_diff.queue[i];
-   if (pair->status != 'R') {
-   diff_free_filepair(pair);
-   continue;
-   }
-   re = xmalloc(sizeof(*re));
-   re->processed = 0;
-   re->pair = pair;
-   item = string_list_lookup(entries, re->pair->one->path);
-   if (!item)
-   re->src_entry = insert_stage_data(re->pair->one->path,
-   o_tree, a_tree, b_tree, entries);
-   else
-   re->src_entry = item->util;
-
-   item = string_list_lookup(entries, re->pair->two->path);
-   if (!item)
-   re->dst_entry = insert_stage_data(re->pair->two->path,
-   o_tree, a_tree, b_tree, entries);
-   else
-   re->dst_entry = item->util;
-   item = string_list_insert(renames, pair->one->path);
-   item->util = re;
-   }
-   opts.output_format = DIFF_FORMAT_NO_OUTPUT;
-   diff_queued_diff.nr = 0;
-   diff_flush();
-   return renames;
-}
-
 static int update_stages(struct merge_options *opt, const char *path,
 const struct diff_filespec *o,
 const struct diff_filespec *a,
@@ -1383,6 +1314,75 @@ static int conflict_rename_rename_2to1(struct 
merge_options *o,
return ret;
 }
 
+/*
+ * Get information of all renames which occurred between 'o_tree' and
+ * 'tree'. We need the three trees in the merge ('o_tree', 'a_tree' and
+ * 'b_tree') to be able to associate the correct cache entries with
+ * the rename information. 'tree' is always equal to either a_tree or b_tree.
+ */
+static struct string_list *get_renames(struct merge_options *o,
+  struct tree *tree,
+  struct tree *o_tree,
+  struct tree *a_tree,
+  struct tree *b_tree,
+  struct string_list *entries)
+{
+   int i;
+   struct string_list *renames;
+   struct diff_options opts;
+
+   renames = xcalloc(1, sizeof(struct string_list));
+   if (!o->detect_rename)
+   return renames;
+
+   diff_setup();
+   DIFF_OPT_SET(, RECURSIVE);
+   DIFF_OPT_CLR(, RENAME_EMPTY);
+   opts.detect_rename = DIFF_DETECT_RENAME;
+   opts.rename_limit = o->merge_rename_limit >= 0 ? o->merge_rename_limit :
+   

[PATCH 10/30] directory rename detection: more involved edge/corner testcases

2017-11-10 Thread Elijah Newren
Signed-off-by: Elijah Newren 
---
 t/t6043-merge-rename-directories.sh | 347 
 1 file changed, 347 insertions(+)

diff --git a/t/t6043-merge-rename-directories.sh 
b/t/t6043-merge-rename-directories.sh
index 157299105f..115d0d2622 100755
--- a/t/t6043-merge-rename-directories.sh
+++ b/t/t6043-merge-rename-directories.sh
@@ -1336,4 +1336,351 @@ test_expect_success '6e-check: Add/add from one side' '
test $(git rev-parse HEAD:z/d) = $(git rev-parse C:z/d)
 '
 
+
+###
+# SECTION 7: More involved Edge/Corner cases
+#
+# The ruleset we have generated in the above sections seems to provide
+# well-defined merges.  But can we find edge/corner cases that either (a)
+# are harder for users to understand, or (b) have a resolution that is
+# non-intuitive or suboptimal?
+#
+# The testcases in this section dive into cases that I've tried to craft in
+# a way to find some that might be surprising to users or difficult for
+# them to understand (the next section will look at non-intuitive or
+# suboptimal merge results).  Some of the testcases are similar to ones
+# from past sections, but have been simplified to try to highlight error
+# messages using a "modified" path (due to the directory rename).  Are
+# users okay with these?
+#
+# In my opinion, testcases that are difficult to understand from this
+# section is due to difficulty in the testcase rather than the directory
+# renaming (similar to how t6042 and t6036 have difficult resolutions due
+# to the problem setup itself being complex).  And I don't think the
+# error messages are a problem.
+#
+# On the other hand, the testcases in section 8 worry me slightly more...
+###
+
+# Testcase 7a, rename-dir vs. rename-dir (NOT split evenly) PLUS add-other-file
+#   Commit A: z/{b,c}
+#   Commit B: y/{b,c}
+#   Commit C: w/b, x/c, z/d
+#   Expected: y/d, CONFLICT(rename/rename for both z/b and z/c)
+#   NOTE: There's a rename of z/ here, y/ has more renames, so z/d -> y/d.
+
+test_expect_success '7a-setup: rename-dir vs. rename-dir (NOT split evenly) 
PLUS add-other-file' '
+   git rm -rf . &&
+   git clean -fdqx &&
+   rm -rf .git &&
+   git init &&
+
+   mkdir z &&
+   echo b >z/b &&
+   echo c >z/c &&
+   git add z &&
+   test_tick &&
+   git commit -m "A" &&
+
+   git branch A &&
+   git branch B &&
+   git branch C &&
+
+   git checkout B &&
+   git mv z y &&
+   test_tick &&
+   git commit -m "B" &&
+
+   git checkout C &&
+   mkdir w &&
+   mkdir x &&
+   git mv z/b w/ &&
+   git mv z/c x/ &&
+   echo d > z/d &&
+   git add z/d &&
+   test_tick &&
+   git commit -m "C"
+'
+
+test_expect_failure '7a-check: rename-dir vs. rename-dir (NOT split evenly) 
PLUS add-other-file' '
+   git checkout B^0 &&
+
+   test_must_fail git merge -s recursive C^0 >out &&
+   test_i18ngrep "CONFLICT (rename/rename).*z/b.*y/b.*w/b" out &&
+   test_i18ngrep "CONFLICT (rename/rename).*z/c.*y/c.*x/c" out &&
+
+   test 7 -eq $(git ls-files -s | wc -l) &&
+   test 6 -eq $(git ls-files -u | wc -l) &&
+   test 1 -eq $(git ls-files -o | wc -l) &&
+
+   test $(git rev-parse :0:y/d) = $(git rev-parse C:z/d) &&
+
+   test $(git rev-parse :1:z/b) = $(git rev-parse A:z/b) &&
+   test $(git rev-parse :2:y/b) = $(git rev-parse A:z/b) &&
+   test $(git rev-parse :3:w/b) = $(git rev-parse A:z/b) &&
+
+   test $(git rev-parse :1:z/c) = $(git rev-parse A:z/c) &&
+   test $(git rev-parse :2:y/c) = $(git rev-parse A:z/c) &&
+   test $(git rev-parse :3:x/c) = $(git rev-parse A:z/c) &&
+
+   test $(git hash-object y/b) = $(git rev-parse A:z/b) &&
+   test $(git hash-object w/b) = $(git rev-parse A:z/b) &&
+   test $(git hash-object y/c) = $(git rev-parse A:z/c) &&
+   test $(git hash-object x/c) = $(git rev-parse A:z/c)
+'
+
+# Testcase 7b, rename/rename(2to1), but only due to transitive rename
+#   (Related to testcase 1d)
+#   Commit A: z/{b,c}, x/d_1, w/d_2
+#   Commit B: y/{b,c,d_2}, x/d_1
+#   Commit C: z/{b,c,d_1},w/d_2
+#   Expected: y/{b,c}, CONFLICT(rename/rename(2to1): x/d_1, w/d_2 -> y_d)
+
+test_expect_success '7b-setup: rename/rename(2to1), but only due to transitive 
rename' '
+   git rm -rf . &&
+   git clean -fdqx &&
+   rm -rf .git &&
+   git init &&
+
+   mkdir z &&
+   mkdir x &&
+   mkdir w &&
+   echo b >z/b &&
+   echo c >z/c &&
+   echo d1 > x/d &&
+   echo d2 > w/d &&
+   git add z x w &&
+   test_tick &&
+   git commit -m "A" &&
+
+   git branch A &&
+   git branch B &&
+   git branch C &&
+
+   git checkout B &&
+   git mv z y &&
+   git mv w/d y/ &&
+   test_tick &&
+   git commit -m "B" &&
+
+   git 

[PATCH 14/30] directory rename detection: tests for handling overwriting dirty files

2017-11-10 Thread Elijah Newren
Signed-off-by: Elijah Newren 
---
 t/t6043-merge-rename-directories.sh | 401 
 1 file changed, 401 insertions(+)

diff --git a/t/t6043-merge-rename-directories.sh 
b/t/t6043-merge-rename-directories.sh
index 7af8962512..4066b08767 100755
--- a/t/t6043-merge-rename-directories.sh
+++ b/t/t6043-merge-rename-directories.sh
@@ -2873,4 +2873,405 @@ test_expect_failure '10e-check: Does git complain about 
untracked file that is n
test "random" = "$(cat z/c)"
 '
 
+###
+# SECTION 11: Handling dirty (not up-to-date) files
+#
+# unpack_trees(), upon which the recursive merge algorithm is based, aborts
+# the operation if untracked or dirty files would be deleted or overwritten
+# by the merge.  Unfortunately, unpack_trees() does not understand renames,
+# and if it doesn't abort, then it muddies up the working directory before
+# we even get to the point of detecting renames, so we need some special
+# handling.  This was true even of normal renames, but there are additional
+# codepaths that need special handling with directory renames.  Add
+# testcases for both renamed-by-directory-rename-detection and standard
+# rename cases.
+###
+
+# Testcase 11a, Avoid losing dirty contents with simple rename
+#   Commit A: z/{a,b_v1},
+#   Commit B: z/{a,c_v1}, and z/c_v1 has uncommitted mods
+#   Commit C: z/{a,b_v2}
+#   Expected: ERROR_MSG(Refusing to lose dirty file at z/c) +
+# z/a, staged version of z/c has sha1sum matching C:z/b_v2,
+# z/c~HEAD with contents of C:z/b_v2,
+# z/c with uncommitted mods on top of B:z/c_v1
+
+test_expect_success '11a-setup: Avoid losing dirty contents with simple 
rename' '
+   git rm -rf . &&
+   git clean -fdqx &&
+   rm -rf .git &&
+   git init &&
+
+   mkdir z &&
+   echo a >z/a &&
+   test_seq 1 10 >z/b &&
+   git add z &&
+   test_tick &&
+   git commit -m "A" &&
+
+   git branch A &&
+   git branch B &&
+   git branch C &&
+
+   git checkout B &&
+   git mv z/b z/c &&
+   test_tick &&
+   git commit -m "B" &&
+
+   git checkout C &&
+   echo 11 >>z/b &&
+   git add z/b &&
+   test_tick &&
+   git commit -m "C"
+'
+
+test_expect_failure '11a-check: Avoid losing dirty contents with simple 
rename' '
+   git checkout B^0 &&
+   echo stuff >>z/c &&
+
+   test_must_fail git merge -s recursive C^0 >out 2>err &&
+   test_i18ngrep "Refusing to lose dirty file at z/c" out &&
+
+   test_seq 1 10 >expected &&
+   echo stuff >>expected &&
+
+   test 2 -eq $(git ls-files -s | wc -l) &&
+   test 1 -eq $(git ls-files -u | wc -l) &&
+   test 4 -eq $(git ls-files -o | wc -l) &&
+
+   test $(git rev-parse :0:z/a) = $(git rev-parse A:z/a) &&
+   test $(git rev-parse :2:z/c) = $(git rev-parse C:z/b) &&
+
+   test "$(git hash-object z/c~HEAD)" = $(git rev-parse C:z/b) &&
+   test_cmp expected z/c
+'
+
+# Testcase 11b, Avoid losing dirty file involved in directory rename
+#   Commit A: z/a, x/{b,c_v1}
+#   Commit B: z/{a,c_v1},  x/b,   and z/c_v1 has uncommitted mods
+#   Commit C: y/a, x/{b,c_v2}
+#   Expected: y/{a,c_v2}, x/b, z/c_v1 with uncommited mods untracked,
+# ERROR_MSG(Refusing to lose dirty file at z/c)
+
+
+test_expect_success '11b-setup: Avoid losing dirty file involved in directory 
rename' '
+   git rm -rf . &&
+   git clean -fdqx &&
+   rm -rf .git &&
+   git init &&
+
+   mkdir z x &&
+   echo a >z/a &&
+   echo b >x/b &&
+   test_seq 1 10 >x/c &&
+   git add z x &&
+   test_tick &&
+   git commit -m "A" &&
+
+   git branch A &&
+   git branch B &&
+   git branch C &&
+
+   git checkout B &&
+   git mv x/c z/c &&
+   test_tick &&
+   git commit -m "B" &&
+
+   git checkout C &&
+   git mv z y &&
+   echo 11 >>x/c &&
+   git add x/c &&
+   test_tick &&
+   git commit -m "C"
+'
+
+test_expect_failure '11b-check: Avoid losing dirty file involved in directory 
rename' '
+   git checkout B^0 &&
+   echo stuff >>z/c &&
+
+   git merge -s recursive C^0 >out 2>err &&
+   test_i18ngrep "Refusing to lose dirty file at z/c" out &&
+
+   grep -q stuff */* &&
+   test_seq 1 10 >expected &&
+   echo stuff >>expected &&
+
+   test 3 -eq $(git ls-files -s | wc -l) &&
+   test 0 -eq $(git ls-files -u | wc -l) &&
+   test 0 -eq $(git ls-files -m | wc -l) &&
+   test 4 -eq $(git ls-files -o | wc -l) &&
+
+   test $(git rev-parse :0:x/b) = $(git rev-parse A:x/b) &&
+   test $(git rev-parse :0:y/a) = $(git rev-parse A:z/a) &&
+   test $(git rev-parse :0:y/c) = $(git rev-parse C:x/c) &&
+
+   test "$(git hash-object y/c)" = $(git rev-parse C:x/c) 

[RFC PATCH 29/30] merge-recursive: Fix overwriting dirty files involved in renames

2017-11-10 Thread Elijah Newren
This fixes an issue that existed before my directory rename detection
patches that affects both normal renames and renames implied by
directory rename detection.  Additional codepaths that only affect
overwriting of directy files that are involved in directory rename
detection will be added in a subsequent commit.

Signed-off-by: Elijah Newren 
---
Seems kinda hacky, in multiple different ways.  It seems like there
should be a better way to handle lots of different things about this
patch, though I have a hard time seeing how without doing a bigger
rewrite of the whole interface between unpack_trees and
merge-recursive (possibly rewriting them so there isn't an interface
but some piece of code that does both functions).  Alternate simpler
suggestions?


 merge-recursive.c   | 81 -
 merge-recursive.h   |  2 +
 t/t3501-revert-cherry-pick.sh   |  2 +-
 t/t6043-merge-rename-directories.sh |  2 +-
 t/t7607-merge-overwrite.sh  |  2 +-
 unpack-trees.c  |  4 +-
 unpack-trees.h  |  4 ++
 7 files changed, 73 insertions(+), 24 deletions(-)

diff --git a/merge-recursive.c b/merge-recursive.c
index 1b3ee5b9fb..86ddb89727 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -323,32 +323,32 @@ static void init_tree_desc_from_tree(struct tree_desc 
*desc, struct tree *tree)
init_tree_desc(desc, tree->buffer, tree->size);
 }
 
-static int git_merge_trees(int index_only,
+static int git_merge_trees(struct merge_options *o,
   struct tree *common,
   struct tree *head,
   struct tree *merge)
 {
int rc;
struct tree_desc t[3];
-   struct unpack_trees_options opts;
 
-   memset(, 0, sizeof(opts));
-   if (index_only)
-   opts.index_only = 1;
+   memset(>unpack_opts, 0, sizeof(o->unpack_opts));
+   if (o->call_depth)
+   o->unpack_opts.index_only = 1;
else
-   opts.update = 1;
-   opts.merge = 1;
-   opts.head_idx = 2;
-   opts.fn = threeway_merge;
-   opts.src_index = _index;
-   opts.dst_index = _index;
-   setup_unpack_trees_porcelain(, "merge");
+   o->unpack_opts.update = 1;
+   o->unpack_opts.merge = 1;
+   o->unpack_opts.head_idx = 2;
+   o->unpack_opts.fn = threeway_merge;
+   o->unpack_opts.src_index = _index;
+   o->unpack_opts.dst_index = _index;
+   setup_unpack_trees_porcelain(>unpack_opts, "merge");
 
init_tree_desc_from_tree(t+0, common);
init_tree_desc_from_tree(t+1, head);
init_tree_desc_from_tree(t+2, merge);
 
-   rc = unpack_trees(3, t, );
+   rc = unpack_trees(3, t, >unpack_opts);
+   o->unpack_opts.src_index = _index; // unpack_trees NULLifies 
src_index, but it's used in verify_uptodate, so set it back
cache_tree_free(_cache_tree);
return rc;
 }
@@ -783,6 +783,20 @@ static int would_lose_untracked(const char *path)
return !was_tracked(path) && file_exists(path);
 }
 
+static int was_dirty(struct merge_options *o, const char *path)
+{
+   struct cache_entry *ce;
+
+   int dirty = 1;
+   if (o->call_depth || !was_tracked(path))
+ return !dirty;
+
+   ce = cache_file_exists(path, strlen(path), ignore_case);
+   dirty = (ce->ce_stat_data.sd_mtime.sec > 0 &&
+verify_uptodate(ce, >unpack_opts) != 0);
+   return dirty;
+}
+
 static int make_room_for_path(struct merge_options *o, const char *path)
 {
int status, i;
@@ -2635,6 +2649,7 @@ static int handle_modify_delete(struct merge_options *o,
 
 static int merge_content(struct merge_options *o,
 const char *path,
+int file_in_way,
 struct object_id *o_oid, int o_mode,
 struct object_id *a_oid, int a_mode,
 struct object_id *b_oid, int b_mode,
@@ -2709,7 +2724,7 @@ static int merge_content(struct merge_options *o,
return -1;
}
 
-   if (df_conflict_remains) {
+   if (df_conflict_remains || file_in_way) {
char *new_path;
if (o->call_depth) {
remove_file_from_cache(path);
@@ -2743,6 +2758,31 @@ static int merge_content(struct merge_options *o,
return mfi.clean;
 }
 
+static int conflict_rename_normal(struct merge_options *o,
+ const char *path,
+ struct object_id *o_oid, unsigned o_mode,
+ struct object_id *a_oid, unsigned a_mode,
+ struct object_id *b_oid, unsigned b_mode,
+ struct rename_conflict_info *ci)
+{
+   int clean_merge;
+   int file_in_the_way = 0;
+
+   if (was_dirty(o, path)) {
+

[PATCH 12/30] directory rename detection: miscellaneous testcases to complete coverage

2017-11-10 Thread Elijah Newren
Signed-off-by: Elijah Newren 
---
 t/t6043-merge-rename-directories.sh | 505 
 1 file changed, 505 insertions(+)

diff --git a/t/t6043-merge-rename-directories.sh 
b/t/t6043-merge-rename-directories.sh
index bdfd943c88..bb179b16c8 100755
--- a/t/t6043-merge-rename-directories.sh
+++ b/t/t6043-merge-rename-directories.sh
@@ -265,6 +265,7 @@ test_expect_failure '1d-check: Directory renames cause a 
rename/rename(2to1) con
 '
 
 # Testcase 1e, Renamed directory, with all filenames being renamed too
+#   (Related to testcases 9f & 9g)
 #   Commit A: z/{oldb,oldc}
 #   Commit B: y/{newb,newc}
 #   Commit C: z/{oldb,oldc,d}
@@ -2054,4 +2055,508 @@ test_expect_failure '8e-check: Both sides rename, one 
side adds to original dire
test_i18ngrep CONFLICT.*rename/rename.*z/b.*y/b.*w/b out
 '
 
+###
+# SECTION 9: Other testcases
+#
+# I came up with the testcases in the first eight sections before coding up
+# the implementation.  The testcases in this section were mostly ones I
+# thought of while coding/debugging, and which I was too lazy to insert
+# into the previous sections because I didn't want to re-label with all the
+# testcase references.  :-)
+###
+
+# Testcase 9a, Inner renamed directory within outer renamed directory
+#   (Related to testcase 1f)
+#   Commit A: z/{b,c,d/{e,f,g}}
+#   Commit B: y/{b,c}, x/w/{e,f,g}
+#   Commit C: z/{b,c,d/{e,f,g,h},i}
+#   Expected: y/{b,c,i}, x/w/{e,f,g,h}
+#   NOTE: The only reason this one is interesting is because when a directory
+# is split into multiple other directories, we determine by the weight
+# of which one had the most paths going to it.  A naive implementation
+# of that could take the new file in commit C at z/i to x/w/i or x/i.
+
+test_expect_success '9a-setup: Inner renamed directory within outer renamed 
directory' '
+   git rm -rf . &&
+   git clean -fdqx &&
+   rm -rf .git &&
+   git init &&
+
+   mkdir -p z/d &&
+   echo b >z/b &&
+   echo c >z/c &&
+   echo e >z/d/e &&
+   echo f >z/d/f &&
+   echo g >z/d/g &&
+   git add z &&
+   test_tick &&
+   git commit -m "A" &&
+
+   git branch A &&
+   git branch B &&
+   git branch C &&
+
+   git checkout B &&
+   mkdir x &&
+   git mv z/d x/w &&
+   git mv z y &&
+   test_tick &&
+   git commit -m "B" &&
+
+   git checkout C &&
+   echo h >z/d/h &&
+   echo i >z/i &&
+   git add z &&
+   test_tick &&
+   git commit -m "C"
+'
+
+test_expect_failure '9a-check: Inner renamed directory within outer renamed 
directory' '
+   git checkout B^0 &&
+
+   git merge -s recursive C^0 &&
+
+   test 7 -eq $(git ls-files -s | wc -l) &&
+   test 0 -eq $(git ls-files -u | wc -l) &&
+   test 0 -eq $(git ls-files -o | wc -l) &&
+
+   test $(git rev-parse HEAD:y/b) = $(git rev-parse A:z/b) &&
+   test $(git rev-parse HEAD:y/c) = $(git rev-parse A:z/c) &&
+   test $(git rev-parse HEAD:y/i) = $(git rev-parse C:z/i) &&
+
+   test $(git rev-parse HEAD:x/w/e) = $(git rev-parse A:z/d/e) &&
+   test $(git rev-parse HEAD:x/w/f) = $(git rev-parse A:z/d/f) &&
+   test $(git rev-parse HEAD:x/w/g) = $(git rev-parse A:z/d/g) &&
+   test $(git rev-parse HEAD:x/w/h) = $(git rev-parse C:z/d/h)
+'
+
+# Testcase 9b, Transitive rename with content merge
+#   (Related to testcase 1c)
+#   Commit A: z/{b,c},   x/d_1
+#   Commit B: y/{b,c},   x/d_2
+#   Commit C: z/{b,c,d_3}
+#   Expected: y/{b,c,d_merged}
+
+test_expect_success '9b-setup: Transitive rename with content merge' '
+   git rm -rf . &&
+   git clean -fdqx &&
+   rm -rf .git &&
+   git init &&
+
+   mkdir z &&
+   echo b >z/b &&
+   echo c >z/c &&
+   mkdir x &&
+   test_seq 1 10 >x/d &&
+   git add z x &&
+   test_tick &&
+   git commit -m "A" &&
+
+   git branch A &&
+   git branch B &&
+   git branch C &&
+
+   git checkout B &&
+   git mv z y &&
+   test_seq 1 11 >x/d &&
+   git add x/d &&
+   test_tick &&
+   git commit -m "B" &&
+
+   git checkout C &&
+   test_seq 0 10 >x/d &&
+   git mv x/d z/d &&
+   git add z/d &&
+   test_tick &&
+   git commit -m "C"
+'
+
+test_expect_failure '9b-check: Transitive rename with content merge' '
+   git checkout B^0 &&
+
+   git merge -s recursive C^0 &&
+
+   test 3 -eq $(git ls-files -s | wc -l) &&
+
+   test $(git rev-parse HEAD:y/b) = $(git rev-parse A:z/b) &&
+   test $(git rev-parse HEAD:y/c) = $(git rev-parse A:z/c) &&
+   test_must_fail git rev-parse HEAD:x/d &&
+   test_must_fail git rev-parse HEAD:z/d &&
+   test ! -f z/d &&
+
+   test $(git rev-parse HEAD:y/d) != $(git rev-parse A:x/d) &&
+   test $(git 

[PATCH 05/30] directory rename detection: directory splitting testcases

2017-11-10 Thread Elijah Newren
Signed-off-by: Elijah Newren 
---
 t/t6043-merge-rename-directories.sh | 125 
 1 file changed, 125 insertions(+)

diff --git a/t/t6043-merge-rename-directories.sh 
b/t/t6043-merge-rename-directories.sh
index b737b0a105..00811f512a 100755
--- a/t/t6043-merge-rename-directories.sh
+++ b/t/t6043-merge-rename-directories.sh
@@ -388,4 +388,129 @@ test_expect_failure '1f-check: Split a directory into two 
other directories' '
 #   in section 2, plus testcases 3a and 4a.
 ###
 
+
+###
+# SECTION 2: Split into multiple directories, with equal number of paths
+#
+# Explore the splitting-a-directory rules a bit; what happens in the
+# edge cases?
+#
+# Note that there is a closely related case of a directory not being
+# split on either side of history, but being renamed differently on
+# each side.  See testcase 8e for that.
+###
+
+# Testcase 2a, Directory split into two on one side, with equal numbers of 
paths
+#   Commit A: z/{b,c}
+#   Commit B: y/b, w/c
+#   Commit C: z/{b,c,d}
+#   Expected: y/b, w/c, z/d, with warning about z/ -> (y/ vs. w/) conflict
+test_expect_success '2a-setup: Directory split into two on one side, with 
equal numbers of paths' '
+   git rm -rf . &&
+   git clean -fdqx &&
+   rm -rf .git &&
+   git init &&
+
+   mkdir z &&
+   echo b >z/b &&
+   echo c >z/c &&
+   git add z &&
+   test_tick &&
+   git commit -m "A" &&
+
+   git branch A &&
+   git branch B &&
+   git branch C &&
+
+   git checkout B &&
+   mkdir y &&
+   mkdir w &&
+   git mv z/b y/ &&
+   git mv z/c w/ &&
+   test_tick &&
+   git commit -m "B" &&
+
+   git checkout C &&
+   echo d >z/d &&
+   git add z/d &&
+   test_tick &&
+   git commit -m "C"
+'
+
+test_expect_failure '2a-check: Directory split into two on one side, with 
equal numbers of paths' '
+   git checkout B^0 &&
+
+   test_must_fail git merge -s recursive C^0 >out &&
+
+   test 3 -eq $(git ls-files -s | wc -l) &&
+   test 0 -eq $(git ls-files -u | wc -l) &&
+   test 1 -eq $(git ls-files -o | wc -l) &&
+
+   test $(git rev-parse :0:y/b) = $(git rev-parse A:z/b) &&
+   test $(git rev-parse :0:w/c) = $(git rev-parse A:z/c) &&
+   test $(git rev-parse :0:z/d) = $(git rev-parse C:z/d) &&
+   test_i18ngrep "CONFLICT.*directory rename split" out
+'
+
+# Testcase 2b, Directory split into two on one side, with equal numbers of 
paths
+#   Commit A: z/{b,c}
+#   Commit B: y/b, w/c
+#   Commit C: z/{b,c}, x/d
+#   Expected: y/b, w/c, x/d; No warning about z/ -> (y/ vs. w/) conflict
+test_expect_success '2b-setup: Directory split into two on one side, with 
equal numbers of paths' '
+   git rm -rf . &&
+   git clean -fdqx &&
+   rm -rf .git &&
+   git init &&
+
+   mkdir z &&
+   echo b >z/b &&
+   echo c >z/c &&
+   git add z &&
+   test_tick &&
+   git commit -m "A" &&
+
+   git branch A &&
+   git branch B &&
+   git branch C &&
+
+   git checkout B &&
+   mkdir y &&
+   mkdir w &&
+   git mv z/b y/ &&
+   git mv z/c w/ &&
+   test_tick &&
+   git commit -m "B" &&
+
+   git checkout C &&
+   mkdir x &&
+   echo d >x/d &&
+   git add x/d &&
+   test_tick &&
+   git commit -m "C"
+'
+
+test_expect_success '2b-check: Directory split into two on one side, with 
equal numbers of paths' '
+   git checkout B^0 &&
+
+   git merge -s recursive C^0 >out &&
+
+   test 3 -eq $(git ls-files -s | wc -l) &&
+   test 0 -eq $(git ls-files -u | wc -l) &&
+   test 1 -eq $(git ls-files -o | wc -l) &&
+
+   test $(git rev-parse :0:y/b) = $(git rev-parse A:z/b) &&
+   test $(git rev-parse :0:w/c) = $(git rev-parse A:z/c) &&
+   test $(git rev-parse :0:x/d) = $(git rev-parse C:x/d) &&
+   ! test_i18ngrep "CONFLICT.*directory rename split" out
+'
+
+###
+# Rules suggested by section 2:
+#
+#   None; the rule was already covered in section 1.  These testcases are
+#   here just to make sure the conflict resolution and necessary warning
+#   messages are handled correctly.
+###
+
 test_done
-- 
2.15.0.5.g9567be9905



[PATCH 07/30] directory rename detection: partially renamed directory testcase/discussion

2017-11-10 Thread Elijah Newren
Signed-off-by: Elijah Newren 
---
 t/t6043-merge-rename-directories.sh | 100 
 1 file changed, 100 insertions(+)

diff --git a/t/t6043-merge-rename-directories.sh 
b/t/t6043-merge-rename-directories.sh
index 021513ec00..ec054b210a 100755
--- a/t/t6043-merge-rename-directories.sh
+++ b/t/t6043-merge-rename-directories.sh
@@ -650,4 +650,104 @@ test_expect_success '3b-check: Avoid implicit rename if 
involved as source on cu
 #   of a rename on either side of a merge.
 ###
 
+
+###
+# SECTION 4: Partially renamed directory; still exists on both sides of merge
+#
+# What if we were to attempt to do directory rename detection when someone
+# "mostly" moved a directory but still left some files around, or,
+# equivalently, fully renamed a directory in one commmit and then recreated
+# that directory in a later commit adding some new files and then tried to
+# merge?
+#
+# It's hard to divine user intent in these cases, because you can make an
+# argument that, depending on the intermediate history of the side being
+# merged, that some users will want files in that directory to
+# automatically be detected and renamed, while users with a different
+# intermediate history wouldn't want that rename to happen.
+#
+# I think that it is best to simply not have directory rename detection
+# apply to such cases.  My reasoning for this is four-fold: (1) it's
+# easiest for users in general to figure out what happened if we don't
+# apply directory rename detection in any such case, (2) it's an easy rule
+# to explain ["We don't do directory rename detection if the directory
+# still exists on both sides of the merge"], (3) we can get some hairy
+# edge/corner cases that would be really confusing and possibly not even
+# representable in the index if we were to even try, and [related to 3] (4)
+# attempting to resolve this issue of divining user intent by examining
+# intermediate history goes against the spirit of three-way merges and is a
+# path towards crazy corner cases that are far more complex than what we're
+# already dealing with.
+#
+# This section contains a test for this partially-renamed-directory case.
+###
+
+# Testcase 4a, Directory split, with original directory still present
+#   (Related to testcase 1f)
+#   Commit A: z/{b,c,d,e}
+#   Commit B: y/{b,c,d}, z/e
+#   Commit C: z/{b,c,d,e,f}
+#   Expected: y/{b,c,d}, z/{e,f}
+#   NOTE: Even though most files from z moved to y, we don't want f to follow.
+
+test_expect_success '4a-setup: Directory split, with original directory still 
present' '
+   git rm -rf . &&
+   git clean -fdqx &&
+   rm -rf .git &&
+   git init &&
+
+   mkdir z &&
+   echo b >z/b &&
+   echo c >z/c &&
+   echo d >z/d &&
+   echo e >z/e &&
+   git add z &&
+   test_tick &&
+   git commit -m "A" &&
+
+   git branch A &&
+   git branch B &&
+   git branch C &&
+
+   git checkout B &&
+   mkdir y &&
+   git mv z/b y/ &&
+   git mv z/c y/ &&
+   git mv z/d y/ &&
+   test_tick &&
+   git commit -m "B" &&
+
+   git checkout C &&
+   echo f >z/f &&
+   git add z/f &&
+   test_tick &&
+   git commit -m "C"
+'
+
+test_expect_success '4a-check: Directory split, with original directory still 
present' '
+   git checkout B^0 &&
+
+   git merge -s recursive C^0 &&
+
+   test 5 -eq $(git ls-files -s | wc -l) &&
+   test 0 -eq $(git ls-files -u | wc -l) &&
+   test 0 -eq $(git ls-files -o | wc -l) &&
+
+   test $(git rev-parse HEAD:y/b) = $(git rev-parse A:z/b) &&
+   test $(git rev-parse HEAD:y/c) = $(git rev-parse A:z/c) &&
+   test $(git rev-parse HEAD:y/d) = $(git rev-parse A:z/d) &&
+   test $(git rev-parse HEAD:z/e) = $(git rev-parse A:z/e) &&
+   test $(git rev-parse HEAD:z/f) = $(git rev-parse C:z/f)
+'
+
+###
+# Rules suggested by section 4:
+#
+#   Directory-rename-detection should be turned off for any directories (as
+#   a source for renames) that exist on both sides of the merge.  (The "as
+#   a source for renames" clarification is due to cases like 1c where
+#   the target directory exists on both sides and we do want the rename
+#   detection.)  But, sadly, see testcase 8b.
+###
+
 test_done
-- 
2.15.0.5.g9567be9905



[PATCH 26/30] merge-recursive: When comparing files, don't include trees

2017-11-10 Thread Elijah Newren
get_renames() would look up stage data that already existed (populated
in get_unmerged(), taken from whatever unpack_trees() created), and if
it didn't exist, would call insert_stage_data() to create the necessary
entry for the given file.  The insert_stage_data() fallback becomes
much more important for directory rename detection, because that creates
a mechanism to have a file in the resulting merge that didn't exist on
either side of history.  However, insert_stage_data(), due to calling
get_tree_entry() loaded up trees as readily as files.  We aren't
interested in comparing trees to files; the D/F conflict handling is
done elsewhere.  This code is just concerned with what entries existed
for a given path on the different sides of the merge, so create a
get_tree_entry_if_blob() helper function and use it.

Signed-off-by: Elijah Newren 
---
 merge-recursive.c | 27 +--
 1 file changed, 21 insertions(+), 6 deletions(-)

diff --git a/merge-recursive.c b/merge-recursive.c
index 7c2c29bb51..2a7258f6bb 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -407,6 +407,21 @@ static void get_files_dirs(struct merge_options *o, struct 
tree *tree)
read_tree_recursive(tree, "", 0, 0, _all, save_files_dirs, o);
 }
 
+static int get_tree_entry_if_blob(const unsigned char *tree,
+ const char *path,
+ unsigned char *hashy,
+ unsigned *mode_o)
+{
+   int ret;
+
+   ret = get_tree_entry(tree, path, hashy, mode_o);
+   if (S_ISDIR(*mode_o)) {
+   hashcpy(hashy, null_sha1);
+   *mode_o = 0;
+   }
+   return ret;
+}
+
 /*
  * Returns an index_entry instance which doesn't have to correspond to
  * a real cache entry in Git's index.
@@ -417,12 +432,12 @@ static struct stage_data *insert_stage_data(const char 
*path,
 {
struct string_list_item *item;
struct stage_data *e = xcalloc(1, sizeof(struct stage_data));
-   get_tree_entry(o->object.oid.hash, path,
-   e->stages[1].oid.hash, >stages[1].mode);
-   get_tree_entry(a->object.oid.hash, path,
-   e->stages[2].oid.hash, >stages[2].mode);
-   get_tree_entry(b->object.oid.hash, path,
-   e->stages[3].oid.hash, >stages[3].mode);
+   get_tree_entry_if_blob(o->object.oid.hash, path,
+  e->stages[1].oid.hash, >stages[1].mode);
+   get_tree_entry_if_blob(a->object.oid.hash, path,
+  e->stages[2].oid.hash, >stages[2].mode);
+   get_tree_entry_if_blob(b->object.oid.hash, path,
+  e->stages[3].oid.hash, >stages[3].mode);
item = string_list_insert(entries, path);
item->util = e;
return e;
-- 
2.15.0.5.g9567be9905



[PATCH 24/30] merge-recursive: Add computation of collisions due to dir rename & merging

2017-11-10 Thread Elijah Newren
directory renaming and merging can cause one or more files to be moved to
where an existing file is, or to cause several files to all be moved to
the same (otherwise vacant) location.  Add checking and reporting for such
cases, falling back to no-directory-rename handling for such paths.

Signed-off-by: Elijah Newren 
---
 merge-recursive.c | 118 +-
 1 file changed, 116 insertions(+), 2 deletions(-)

diff --git a/merge-recursive.c b/merge-recursive.c
index 1858686c35..251c4cc7fa 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -1408,6 +1408,30 @@ static int tree_has_path(struct tree *tree, const char 
*path)
   hashy, _o);
 }
 
+/*
+ * Return a new string that replaces the beginning portion (which matches
+ * entry->dir), with entry->new_dir.  In perl-speak:
+ *   new_path_name = (old_path =~ s/entry->dir/entry->new_dir/);
+ */
+static char *apply_dir_rename(struct dir_rename_entry *entry,
+  const char *old_path) {
+   char *new_path;
+   int oldlen, newlen;
+
+   if (entry->non_unique_new_dir)
+   return NULL;
+
+   oldlen = strlen(entry->dir);
+   assert(strncmp(entry->dir, old_path, oldlen) == 0 &&
+  old_path[oldlen] == '/');
+   newlen = strlen(entry->new_dir) + (strlen(old_path) - oldlen) + 1;
+   new_path = malloc(newlen);
+   strcpy(new_path, entry->new_dir);
+   strcpy(_path[strlen(new_path)], _path[oldlen]);
+
+   return new_path;
+}
+
 static void get_renamed_dir_portion(const char *old_path, const char *new_path,
char **old_dir, char **new_dir) {
*old_dir = NULL;
@@ -1625,6 +1649,82 @@ static struct hashmap *get_directory_renames(struct 
diff_queue_struct *pairs,
return dir_renames;
 }
 
+static struct dir_rename_entry *check_dir_renamed(const char *path,
+ struct hashmap *dir_renames) {
+   char temp[PATH_MAX];
+   char *end;
+   struct dir_rename_entry *entry;
+
+   strcpy(temp, path);
+   while ((end = strrchr(temp, '/'))) {
+   *end = '\0';
+   entry = dir_rename_find_entry(dir_renames, temp);
+   if (entry)
+   return entry;
+   }
+   return NULL;
+}
+
+static void compute_collisions(struct hashmap *collisions,
+  struct hashmap *dir_renames,
+  struct diff_queue_struct *pairs)
+{
+   int i;
+
+   /*
+* Multiple files can be mapped to the same path due to directory
+* renames done by the other side of history.  Since that other
+* side of history could have merged multiple directories into one,
+* if our side of history added the same file basename to each of
+* those directories, then all N of them would get implicitly
+* renamed by the directory rename detection into the same path,
+* and we'd get an add/add/.../add conflict, and all those adds
+* from *this* side of history.  This is not representable in the
+* index, and users aren't going to easily be able to make sense of
+* it.  So we need to provide a good warning about what's
+* happening, and fall back to no-directory-rename detection
+* behavior for those paths.
+*
+* See testcases 9e and all of section 5 from t6043 for examples.
+*/
+   collision_init(collisions);
+
+   for (i = 0; i < pairs->nr; ++i) {
+   struct dir_rename_entry *dir_rename_ent;
+   struct collision_entry *collision_ent;
+   char *new_path;
+   struct diff_filepair *pair = pairs->queue[i];
+
+   if (pair->status == 'D')
+   continue;
+   dir_rename_ent = check_dir_renamed(pair->two->path, 
dir_renames);
+   if (!dir_rename_ent)
+   continue;
+
+   new_path = apply_dir_rename(dir_rename_ent, pair->two->path);
+   if (!new_path)
+   /*
+* dir_rename_ent->non_unique_new_path is true, which
+* means there is no directory rename for us to use,
+* which means it won't cause us any additional
+* collisions.
+*/
+   continue;
+   collision_ent = collision_find_entry(collisions, new_path);
+   if (!collision_ent) {
+   collision_ent = xcalloc(1,
+   sizeof(struct collision_entry));
+   hashmap_entry_init(collision_ent, strhash(new_path));
+   hashmap_put(collisions, collision_ent);
+   collision_ent->target_file = new_path;
+   } else {
+  

[PATCH 23/30] merge-recursive: Add a new hashmap for storing file collisions

2017-11-10 Thread Elijah Newren
Directory renames with the ability to merge directories opens up the
possibility of add/add/add/.../add conflicts, if each of the N
directories being merged into one target directory all had a file with
the same name.  We need a way to check for and report on such
collisions; this hashmap will be used for this purpose.

Signed-off-by: Elijah Newren 
---
 merge-recursive.c | 23 +++
 merge-recursive.h |  7 +++
 2 files changed, 30 insertions(+)

diff --git a/merge-recursive.c b/merge-recursive.c
index 3633be0123..1858686c35 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -73,6 +73,29 @@ static void dir_rename_init(struct hashmap *map)
hashmap_init(map, (hashmap_cmp_fn) dir_rename_cmp, NULL, 0);
 }
 
+static struct collision_entry *collision_find_entry(struct hashmap *hashmap,
+   char *target_file)
+{
+   struct collision_entry key;
+
+   hashmap_entry_init(, strhash(target_file));
+   key.target_file = target_file;
+   return hashmap_get(hashmap, , NULL);
+}
+
+static int collision_cmp(void *unused_cmp_data,
+const struct collision_entry *e1,
+const struct collision_entry *e2,
+const void *unused_keydata)
+{
+   return strcmp(e1->target_file, e2->target_file);
+}
+
+static void collision_init(struct hashmap *map)
+{
+   hashmap_init(map, (hashmap_cmp_fn) collision_cmp, NULL, 0);
+}
+
 static void flush_output(struct merge_options *o)
 {
if (o->buffer_output < 2 && o->obuf.len) {
diff --git a/merge-recursive.h b/merge-recursive.h
index a024949739..e02c1e1243 100644
--- a/merge-recursive.h
+++ b/merge-recursive.h
@@ -37,6 +37,13 @@ struct dir_rename_entry {
struct string_list possible_new_dirs;
 };
 
+struct collision_entry {
+   struct hashmap_entry ent; /* must be the first member! */
+   char *target_file;
+   struct string_list source_files;
+   unsigned reported_already:1;
+};
+
 /* merge_trees() but with recursive ancestor consolidation */
 int merge_recursive(struct merge_options *o,
struct commit *h1,
-- 
2.15.0.5.g9567be9905



[PATCH 11/30] directory rename detection: testcases exploring possibly suboptimal merges

2017-11-10 Thread Elijah Newren
Signed-off-by: Elijah Newren 
---
 t/t6043-merge-rename-directories.sh | 371 
 1 file changed, 371 insertions(+)

diff --git a/t/t6043-merge-rename-directories.sh 
b/t/t6043-merge-rename-directories.sh
index 115d0d2622..bdfd943c88 100755
--- a/t/t6043-merge-rename-directories.sh
+++ b/t/t6043-merge-rename-directories.sh
@@ -1683,4 +1683,375 @@ test_expect_failure '7e-check: transitive rename in 
rename/delete AND dirs in th
test $(git hash-object y/d~C^0) = $(git rev-parse A:x/d)
 '
 
+
+###
+# SECTION 8: Suboptimal merges
+#
+# As alluded to in the last section, the ruleset we have built up for
+# detecting directory renames unfortunately has some special cases where it
+# results in slightly suboptimal or non-intuitive behavior.  This section
+# explores these cases.
+#
+# To be fair, we already had non-intuitive or suboptimal behavior for most
+# of these cases in git before introducing implicit directory rename
+# detection, but it'd be nice if there was a modified ruleset out there
+# that handled these cases a bit better.
+###
+
+# Testcase 8a, Dual-directory rename, one into the others' way
+#   Commit A. x/{a,b},   y/{c,d}
+#   Commit B. x/{a,b,e}, y/{c,d,f}
+#   Commit C. y/{a,b},   z/{c,d}
+#
+# Possible Resolutions:
+#   Previous git: y/{a,b,f},   z/{c,d},   x/e
+#   Expected: y/{a,b,e,f}, z/{c,d}
+#   Preferred:y/{a,b,e},   z/{c,d,f}
+#
+# Note: Both x and y got renamed and it'd be nice to detect both, and we do
+# better with directory rename detection than git did previously, but the
+# simple rule from section 5 prevents me from handling this as optimally as
+# we potentially could.
+
+test_expect_success '8a-setup: Dual-directory rename, one into the others way' 
'
+   git rm -rf . &&
+   git clean -fdqx &&
+   rm -rf .git &&
+   git init &&
+
+   mkdir x &&
+   mkdir y &&
+   echo a >x/a &&
+   echo b >x/b &&
+   echo c >y/c &&
+   echo d >y/d &&
+   git add x y &&
+   test_tick &&
+   git commit -m "A" &&
+
+   git branch A &&
+   git branch B &&
+   git branch C &&
+
+   git checkout B &&
+   echo e >x/e &&
+   echo f >y/f &&
+   git add x/e y/f &&
+   test_tick &&
+   git commit -m "B" &&
+
+   git checkout C &&
+   git mv y z &&
+   git mv x y &&
+   test_tick &&
+   git commit -m "C"
+'
+
+test_expect_failure '8a-check: Dual-directory rename, one into the others way' 
'
+   git checkout B^0 &&
+
+   git merge -s recursive C^0 &&
+
+   test 6 -eq $(git ls-files -s | wc -l) &&
+   test 0 -eq $(git ls-files -u | wc -l) &&
+   test 0 -eq $(git ls-files -o | wc -l) &&
+
+   test $(git rev-parse HEAD:y/a) = $(git rev-parse A:x/a) &&
+   test $(git rev-parse HEAD:y/b) = $(git rev-parse A:x/b) &&
+   test $(git rev-parse HEAD:y/e) = $(git rev-parse B:x/e) &&
+   test $(git rev-parse HEAD:y/f) = $(git rev-parse B:y/f) &&
+   test $(git rev-parse HEAD:z/c) = $(git rev-parse A:y/c) &&
+   test $(git rev-parse HEAD:z/d) = $(git rev-parse A:y/d)
+'
+
+# Testcase 8b, Dual-directory rename, one into the others' way, with 
conflicting filenames
+#   Commit A. x/{a_1,b_1}, y/{a_2,b_2}
+#   Commit B. x/{a_1,b_1,e_1}, y/{a_2,b_2,e_2}
+#   Commit C. y/{a_1,b_1}, z/{a_2,b_2}
+#
+# Possible Resolutions:
+#   Previous git: y/{a_1,b_1,e_2}, z/{a_2,b_2}, x/e_1
+#   Scary:y/{a_1,b_1}, z/{a_2,b_2}, CONFLICT(add/add, e_1 vs. e_2)
+#   Preferred:y/{a_1,b_1,e_1}, z/{a_2,b_2,e_2}
+#
+# Note: Very similar to 8a, except instead of 'e' and 'f' in directories x and
+# y, both are named 'e'.  Without directory rename detection, neither file
+# moves directories.  Implment directory rename detection suboptimally, and
+# you get an add/add conflict, but both files were added in commit B, so this
+# is an add/add conflict where one side of history added both files --
+# something we can't represent in the index.  Obviously, we'd prefer the last
+# resolution, but our previous rules are too coarse to allow it.  Using both
+# the rules from section 4 and section 5 save us from the Scary resolution,
+# making us fall back to pre-directory-rename-detection behavior for both
+# e_1 and e_2.
+
+test_expect_success '8b-setup: Dual-directory rename, one into the others way, 
with conflicting filenames' '
+   git rm -rf . &&
+   git clean -fdqx &&
+   rm -rf .git &&
+   git init &&
+
+   mkdir x &&
+   mkdir y &&
+   echo a1 >x/a &&
+   echo b1 >x/b &&
+   echo a2 >y/a &&
+   echo b2 >y/b &&
+   git add x y &&
+   test_tick &&
+   git commit -m "A" &&
+
+   git branch A &&
+   git branch B &&
+   git branch C &&
+
+   git checkout B &&
+   echo e1 >x/e &&
+   echo e2 >y/e 

[PATCH 09/30] directory rename detection: testcases checking which side did the rename

2017-11-10 Thread Elijah Newren
Signed-off-by: Elijah Newren 
---
 t/t6043-merge-rename-directories.sh | 283 
 1 file changed, 283 insertions(+)

diff --git a/t/t6043-merge-rename-directories.sh 
b/t/t6043-merge-rename-directories.sh
index d15153c652..157299105f 100755
--- a/t/t6043-merge-rename-directories.sh
+++ b/t/t6043-merge-rename-directories.sh
@@ -1053,4 +1053,287 @@ test_expect_failure '5d-check: Directory/file/file 
conflict due to directory ren
 #   back to old handling.  But, sadly, see testcases 8a and 8b.
 ###
 
+
+###
+# SECTION 6: Same side of the merge was the one that did the rename
+#
+# It may sound obvious that you only want to apply implicit directory
+# renames to directories if the _other_ side of history did the renaming.
+# If you did make an implementation that didn't explicitly enforce this
+# rule, the majority of cases that would fall under this section would
+# also be solved by following the rules from the above sections.  But
+# there are still a few that stick out, so this section covers them just
+# to make sure we also get them right.
+###
+
+# Testcase 6a, Tricky rename/delete
+#   Commit A: z/{b,c,d}
+#   Commit B: z/b
+#   Commit C: y/{b,c}, z/d
+#   Expected: y/b, CONFLICT(rename/delete, z/c -> y/c vs. NULL)
+#   Note: We're just checking here that the rename of z/b and z/c to put
+# them under y/ doesn't accidentally catch z/d and make it look like
+# it is also involved in a rename/delete conflict.
+
+test_expect_success '6a-setup: Tricky rename/delete' '
+   git rm -rf . &&
+   git clean -fdqx &&
+   rm -rf .git &&
+   git init &&
+
+   mkdir z &&
+   echo b >z/b &&
+   echo c >z/c &&
+   echo d >z/d &&
+   git add z &&
+   test_tick &&
+   git commit -m "A" &&
+
+   git branch A &&
+   git branch B &&
+   git branch C &&
+
+   git checkout B &&
+   git rm z/c &&
+   git rm z/d &&
+   test_tick &&
+   git commit -m "B" &&
+
+   git checkout C &&
+   mkdir y &&
+   git mv z/b y/ &&
+   git mv z/c y/ &&
+   test_tick &&
+   git commit -m "C"
+'
+
+test_expect_success '6a-check: Tricky rename/delete' '
+   git checkout B^0 &&
+
+   test_must_fail git merge -s recursive C^0 >out &&
+   test_i18ngrep "CONFLICT (rename/delete).*z/c.*y/c" out &&
+
+   test 2 -eq $(git ls-files -s | wc -l) &&
+   test 1 -eq $(git ls-files -u | wc -l) &&
+   test 1 -eq $(git ls-files -o | wc -l) &&
+
+   test $(git rev-parse :0:y/b) = $(git rev-parse A:z/b) &&
+   test $(git rev-parse :3:y/c) = $(git rev-parse A:z/c)
+'
+
+# Testcase 6b, Same rename done on both sides
+#   (Related to testcases 6c and 8e)
+#   Commit A: z/{b,c}
+#   Commit B: y/{b,c}
+#   Commit C: y/{b,c}, z/d
+#   Note: If we did directory rename detection here, we'd move z/d into y/,
+# but C did that rename and still decided to put the file into z/,
+# so we probably shouldn't apply directory rename detection for it.
+
+test_expect_success '6b-setup: Same rename done on both sides' '
+   git rm -rf . &&
+   git clean -fdqx &&
+   rm -rf .git &&
+   git init &&
+
+   mkdir z &&
+   echo b >z/b &&
+   echo c >z/c &&
+   git add z &&
+   test_tick &&
+   git commit -m "A" &&
+
+   git branch A &&
+   git branch B &&
+   git branch C &&
+
+   git checkout B &&
+   git mv z y &&
+   test_tick &&
+   git commit -m "B" &&
+
+   git checkout C &&
+   git mv z y &&
+   mkdir z &&
+   echo d >z/d &&
+   git add z/d &&
+   test_tick &&
+   git commit -m "C"
+'
+
+test_expect_success '6b-check: Same rename done on both sides' '
+   git checkout B^0 &&
+
+   git merge -s recursive C^0 &&
+
+   test 3 -eq $(git ls-files -s | wc -l) &&
+   test 0 -eq $(git ls-files -u | wc -l) &&
+   test 0 -eq $(git ls-files -o | wc -l) &&
+
+   test $(git rev-parse HEAD:y/b) = $(git rev-parse A:z/b) &&
+   test $(git rev-parse HEAD:y/c) = $(git rev-parse A:z/c) &&
+   test $(git rev-parse HEAD:z/d) = $(git rev-parse C:z/d)
+'
+
+# Testcase 6c, Rename only done on same side
+#   (Related to testcases 6b and 8e)
+#   Commit A: z/{b,c}
+#   Commit B: z/{b,c} (no change)
+#   Commit C: y/{b,c}, z/d
+#   Expected: y/{b,c}, z/d
+#   NOTE: Seems obvious, but just checking that the implementation doesn't
+# "accidentally detect a rename" and give us y/{b,c,d}.
+
+test_expect_success '6c-setup: Rename only done on same side' '
+   git rm -rf . &&
+   git clean -fdqx &&
+   rm -rf .git &&
+   git init &&
+
+   mkdir z &&
+   echo b >z/b &&
+   echo c >z/c &&
+   git add z &&
+   test_tick &&
+  

[PATCH 08/30] directory rename detection: files/directories in the way of some renames

2017-11-10 Thread Elijah Newren
Signed-off-by: Elijah Newren 
---
 t/t6043-merge-rename-directories.sh | 303 
 1 file changed, 303 insertions(+)

diff --git a/t/t6043-merge-rename-directories.sh 
b/t/t6043-merge-rename-directories.sh
index ec054b210a..d15153c652 100755
--- a/t/t6043-merge-rename-directories.sh
+++ b/t/t6043-merge-rename-directories.sh
@@ -750,4 +750,307 @@ test_expect_success '4a-check: Directory split, with 
original directory still pr
 #   detection.)  But, sadly, see testcase 8b.
 ###
 
+
+###
+# SECTION 5: Files/directories in the way of subset of to-be-renamed paths
+#
+# Implicitly renaming files due to a detected directory rename could run
+# into problems if there are files or directories in the way of the paths
+# we want to rename.  Explore such cases in this section.
+###
+
+# Testcase 5a, Merge directories, other side adds files to original and target
+#   Commit A: z/{b,c},   y/d
+#   Commit B: z/{b,c,e_1,f}, y/{d,e_2}
+#   Commit C: y/{b,c,d}
+#   Expected: z/e_1, y/{b,c,d,e_2,f} + CONFLICT warning
+#   NOTE: While directory rename detection is active here causing z/f to
+# become y/f, we did not apply this for z/e_1 because that would
+# give us an add/add conflict for y/e_1 vs y/e_2.  This problem with
+# this add/add, is that both versions of y/e are from the same side
+# of history, giving us no way to represent this conflict in the
+# index.
+
+test_expect_success '5a-setup: Merge directories, other side adds files to 
original and target' '
+   git rm -rf . &&
+   git clean -fdqx &&
+   rm -rf .git &&
+   git init &&
+
+   mkdir z &&
+   echo b >z/b &&
+   echo c >z/c &&
+   mkdir y &&
+   echo d >y/d &&
+   git add z y &&
+   test_tick &&
+   git commit -m "A" &&
+
+   git branch A &&
+   git branch B &&
+   git branch C &&
+
+   git checkout B &&
+   echo e1 >z/e &&
+   echo f >z/f &&
+   echo e2 >y/e &&
+   git add z/e z/f y/e &&
+   test_tick &&
+   git commit -m "B" &&
+
+   git checkout C &&
+   git mv z/b y/ &&
+   git mv z/c y/ &&
+   rmdir z &&
+   test_tick &&
+   git commit -m "C"
+'
+
+test_expect_failure '5a-check: Merge directories, other side adds files to 
original and target' '
+   git checkout B^0 &&
+
+   test_must_fail git merge -s recursive C^0 >out &&
+
+   test 6 -eq $(git ls-files -s | wc -l) &&
+   test 0 -eq $(git ls-files -u | wc -l) &&
+   test 1 -eq $(git ls-files -o | wc -l) &&
+
+   test $(git rev-parse :0:y/b) = $(git rev-parse A:z/b) &&
+   test $(git rev-parse :0:y/c) = $(git rev-parse A:z/c) &&
+   test $(git rev-parse :0:y/d) = $(git rev-parse A:y/d) &&
+
+   test $(git rev-parse :0:y/e) = $(git rev-parse B:y/e) &&
+   test $(git rev-parse :0:z/e) = $(git rev-parse B:z/e) &&
+
+   test $(git rev-parse :0:y/f) = $(git rev-parse B:z/f) &&
+
+   test_i18ngrep "CONFLICT.*implicit dir rename" out
+'
+
+# Testcase 5b, Rename/delete in order to get add/add/add conflict
+#   (Related to testcase 8d; these may appear slightly inconsistent to users;
+#Also related to testcases 7d and 7e)
+#   Commit A: z/{b,c,d_1}
+#   Commit B: y/{b,c,d_2}
+#   Commit C: z/{b,c,d_1,e}, y/d_3
+#   Expected: y/{b,c,e}, CONFLICT(add/add: y/d_2 vs. y/d_3)
+#   NOTE: If z/d_1 in commit C were to be involved in dir rename detection, as
+# we normaly would since z/ is being renamed to y/, then this would be
+# a rename/delete (z/d_1 -> y/d_1 vs. deleted) AND an add/add/add
+# conflict of y/d_1 vs. y/d_2 vs. y/d_3.  Add/add/add is not
+# representable in the index, so the existence of y/d_3 needs to
+# cause us to bail on directory rename detection for that path, falling
+# back to git behavior without the directory rename detection.
+
+test_expect_success '5b-setup: Rename/delete in order to get add/add/add 
conflict' '
+   git rm -rf . &&
+   git clean -fdqx &&
+   rm -rf .git &&
+   git init &&
+
+   mkdir z &&
+   echo b >z/b &&
+   echo c >z/c &&
+   echo d1 >z/d &&
+   git add z &&
+   test_tick &&
+   git commit -m "A" &&
+
+   git branch A &&
+   git branch B &&
+   git branch C &&
+
+   git checkout B &&
+   git rm z/d &&
+   git mv z y &&
+   echo d2 >y/d &&
+   git add y/d &&
+   test_tick &&
+   git commit -m "B" &&
+
+   git checkout C &&
+   mkdir y &&
+   echo d3 >y/d &&
+   echo e >z/e &&
+   git add y/d z/e &&
+   test_tick &&
+   git commit -m "C"
+'
+
+test_expect_failure '5b-check: Rename/delete in order to get add/add/add 
conflict' '
+   git 

[PATCH 16/30] merge-recursive: Introduce new functions to handle rename logic

2017-11-10 Thread Elijah Newren
The amount of logic in merge_trees() relative to renames was just a few
lines, but split it out into new handle_renames() and cleanup_renames()
functions to prepare for additional logic to be added to each.  No code
or logic changes, just a new place to put stuff for when the rename
detection gains additional checks.

Signed-off-by: Elijah Newren 
---
 merge-recursive.c | 48 ++--
 1 file changed, 38 insertions(+), 10 deletions(-)

diff --git a/merge-recursive.c b/merge-recursive.c
index 382016508b..49710c0964 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -1638,6 +1638,38 @@ static int process_renames(struct merge_options *o,
return clean_merge;
 }
 
+struct rename_info {
+   struct string_list *head_renames;
+   struct string_list *merge_renames;
+};
+
+static struct rename_info *handle_renames(struct merge_options *o,
+ struct tree *common,
+ struct tree *head,
+ struct tree *merge,
+ struct string_list *entries,
+ int *clean)
+{
+   struct rename_info *rei = xcalloc(1, sizeof(struct rename_info));
+
+   rei->head_renames  = get_renames(o, head, common, head, merge, entries);
+   rei->merge_renames = get_renames(o, merge, common, head, merge, 
entries);
+   *clean = process_renames(o, rei->head_renames, rei->merge_renames);
+
+   return rei;
+}
+
+static void cleanup_renames(struct rename_info *re_info)
+{
+   string_list_clear(re_info->head_renames, 0);
+   string_list_clear(re_info->merge_renames, 0);
+
+   free(re_info->head_renames);
+   free(re_info->merge_renames);
+
+   free(re_info);
+}
+
 static struct object_id *stage_oid(const struct object_id *oid, unsigned mode)
 {
return (is_null_oid(oid) || mode == 0) ? NULL: (struct object_id *)oid;
@@ -1989,7 +2021,8 @@ int merge_trees(struct merge_options *o,
}
 
if (unmerged_cache()) {
-   struct string_list *entries, *re_head, *re_merge;
+   struct string_list *entries;
+   struct rename_info *re_info;
int i;
/*
 * Only need the hashmap while processing entries, so
@@ -2003,9 +2036,7 @@ int merge_trees(struct merge_options *o,
get_files_dirs(o, merge);
 
entries = get_unmerged();
-   re_head  = get_renames(o, head, common, head, merge, entries);
-   re_merge = get_renames(o, merge, common, head, merge, entries);
-   clean = process_renames(o, re_head, re_merge);
+   re_info = handle_renames(o, common, head, merge, entries, 
);
record_df_conflict_files(o, entries);
if (clean < 0)
goto cleanup;
@@ -2030,16 +2061,13 @@ int merge_trees(struct merge_options *o,
}
 
 cleanup:
-   string_list_clear(re_merge, 0);
-   string_list_clear(re_head, 0);
+   cleanup_renames(re_info);
+
string_list_clear(entries, 1);
+   free(entries);
 
hashmap_free(>current_file_dir_set, 1);
 
-   free(re_merge);
-   free(re_head);
-   free(entries);
-
if (clean < 0)
return clean;
}
-- 
2.15.0.5.g9567be9905



[PATCH 06/30] directory rename detection: testcases to avoid taking detection too far

2017-11-10 Thread Elijah Newren
Signed-off-by: Elijah Newren 
---
 t/t6043-merge-rename-directories.sh | 137 
 1 file changed, 137 insertions(+)

diff --git a/t/t6043-merge-rename-directories.sh 
b/t/t6043-merge-rename-directories.sh
index 00811f512a..021513ec00 100755
--- a/t/t6043-merge-rename-directories.sh
+++ b/t/t6043-merge-rename-directories.sh
@@ -513,4 +513,141 @@ test_expect_success '2b-check: Directory split into two 
on one side, with equal
 #   messages are handled correctly.
 ###
 
+
+###
+# SECTION 3: Path in question is the source path for some rename already
+#
+# Combining cases from Section 1 and trying to handle them could lead to
+# directory renaming detection being over-applied.  So, this section
+# provides some good testcases to check that the implementation doesn't go
+# too far.
+###
+
+# Testcase 3a, Avoid implicit rename if involved as source on other side
+#   (Related to testcases 1c and 1f)
+#   Commit A: z/{b,c,d}
+#   Commit B: z/{b,c,d} (no change)
+#   Commit C: y/{b,c}, x/d
+#   Expected: y/{b,c}, x/d
+test_expect_success '3a-setup: Avoid implicit rename if involved as source on 
other side' '
+   git rm -rf . &&
+   git clean -fdqx &&
+   rm -rf .git &&
+   git init &&
+
+   mkdir z &&
+   echo b >z/b &&
+   echo c >z/c &&
+   echo d >z/d &&
+   git add z &&
+   test_tick &&
+   git commit -m "A" &&
+
+   git branch A &&
+   git branch B &&
+   git branch C &&
+
+   git checkout B &&
+   test_tick &&
+   git commit --allow-empty -m "B" &&
+
+   git checkout C &&
+   mkdir y &&
+   mkdir x &&
+   git mv z/b y/ &&
+   git mv z/c y/ &&
+   git mv z/d x/ &&
+   rmdir z &&
+   test_tick &&
+   git commit -m "C"
+'
+
+test_expect_success '3a-check: Avoid implicit rename if involved as source on 
other side' '
+   git checkout B^0 &&
+
+   git merge -s recursive C^0 &&
+
+   test 3 -eq $(git ls-files -s | wc -l) &&
+
+   test $(git rev-parse HEAD:y/b) = $(git rev-parse A:z/b) &&
+   test $(git rev-parse HEAD:y/c) = $(git rev-parse A:z/c) &&
+   test $(git rev-parse HEAD:x/d) = $(git rev-parse A:z/d)
+'
+
+# Testcase 3b, Avoid implicit rename if involved as source on other side
+#   (Related to testcases 5c and 7c, also kind of 1e and 1f)
+#   Commit A: z/{b,c,d}
+#   Commit B: y/{b,c}, x/d
+#   Commit C: z/{b,c}, w/d
+#   Expected: y/{b,c}, CONFLICT:(z/d -> x/d vs. w/d)
+#   NOTE: We're particularly checking that since z/d is already involved as
+# a source in a file rename on the same side of history, that we don't
+# get it involved in directory rename detection.  If it were, we might
+# end up with CONFLICT:(z/d -> y/d vs. x/d vs. w/d), i.e. a
+# rename/rename/rename(1to3) conflict, which is just weird.
+test_expect_success '3b-setup: Avoid implicit rename if involved as source on 
current side' '
+   git rm -rf . &&
+   git clean -fdqx &&
+   rm -rf .git &&
+   git init &&
+
+   mkdir z &&
+   echo b >z/b &&
+   echo c >z/c &&
+   echo d >z/d &&
+   git add z &&
+   test_tick &&
+   git commit -m "A" &&
+
+   git branch A &&
+   git branch B &&
+   git branch C &&
+
+   git checkout B &&
+   mkdir y &&
+   mkdir x &&
+   git mv z/b y/ &&
+   git mv z/c y/ &&
+   git mv z/d x/ &&
+   rmdir z &&
+   test_tick &&
+   git commit -m "B" &&
+
+   git checkout C &&
+   mkdir w &&
+   git mv z/d w/ &&
+   test_tick &&
+   git commit -m "C"
+'
+
+test_expect_success '3b-check: Avoid implicit rename if involved as source on 
current side' '
+   git checkout B^0 &&
+
+   test_must_fail git merge -s recursive C^0 >out &&
+
+   test 5 -eq $(git ls-files -s | wc -l) &&
+   test 3 -eq $(git ls-files -u | wc -l) &&
+   test 1 -eq $(git ls-files -o | wc -l) &&
+
+   test $(git rev-parse :0:y/b) = $(git rev-parse A:z/b) &&
+   test $(git rev-parse :0:y/c) = $(git rev-parse A:z/c) &&
+
+   test $(git rev-parse :1:z/d) = $(git rev-parse A:z/d) &&
+   test $(git rev-parse :2:x/d) = $(git rev-parse A:z/d) &&
+   test $(git rev-parse :3:w/d) = $(git rev-parse A:z/d) &&
+   test ! -f z/d &&
+   test $(git hash-object x/d) = $(git rev-parse A:z/d) &&
+   test $(git hash-object w/d) = $(git rev-parse A:z/d) &&
+
+   test_i18ngrep CONFLICT.*rename/rename.*z/d.*x/d.*w/d out &&
+   ! test_i18ngrep CONFLICT.*rename/rename.*y/d
+'
+
+###
+# Rules suggested by section 3:
+#
+#   Avoid directory-rename-detection for a path, if that path is the source
+#   of a rename on 

[PATCH 00/30] Add directory rename detection to git

2017-11-10 Thread Elijah Newren
[This series is entirely independent of my rename detection limits series.
However, I have a separate rename detection performance series that depends
on both this series and the rename detection limits series.]

In this patchset, I introduce directory rename detection to merge-recursive,
predominantly so that when files are added to directories on one side of
history and those directories are renamed on the other side of history, the
files will end up in the proper location after a merge or cherry-pick.

However, this isn't limited to that simplistic case.  More interesting
possibilities exist, such as:

  * a file being renamed into a directory which is renamed on the other
side of history, causing the need for a transitive rename.

  * two (or three or N) directories being merged (with no conflicts so
long as files/directories within the merged directory have different
names), and the "merging" being detected as a directory rename for
each original directory.

  * not all files in a directory being renamed to the same location;
i.e. perhaps the directory was renamed, but some files within it were
renamed to a different location

  * a directory being renamed, which also contained a subdirectory that
was renamed to some entirely different location.  (And perhaps the
inner directory itself contained inner directories that were renamed
to yet other locations).

Also, I found it useful to allow all files within the directory being
renamed to themselves be renamed and still detect the directory rename.
For example, if goal/a and goal/b are renamed to priority/alpha and
priority/bravo, we can detect that goal/ was renamed to priority/, so that
if someone adds goal/c on the other side of history, after the merge we'll
end up with priority/c.  (In the absence of a readily available
libmindread.so library that I can link to, we can't rename directly from
goal/c to priority/charlie automatically, and will need to have priority/c
suffice.)

Naturally, an attempt to do all of the above brings up all kinds of
interesting edge and corner cases, some of which result in conflicts
that cannot be represented in the index, and others of which might be
considered too complex for users to understand and resolve.  For
example:

  * An add/add/add/.../add conflict, all on one side of history (see
testcase 9e in the new t6043, or any of the testcases in section 5)

  * Doubly, triply, or N-fold transitive renames (testcases 9c & 9d)

In order to prevent such problems, I introduce a couple basic rules that
limit when directory rename detection applies:

  1) If a subset of to-be-renamed files have a file or directory in the
 way (or would be in the way of each other), "turn off" the directory
 rename for those specific sub-paths and report the conflict to the
 user.

  2) If the other side of history did a directory rename to a path that
 your side of history renamed away, then ignore that particular
 rename from the other side of history for any implicit directory
 renames (but warn the user).

Further, there's a basic question about when directory rename detection
should be applied at all.  I have a simple rule:

  3) If a given directory still exists on both sides of a merge, we do
 not consider it to have been renamed.

Rule 3 may sound obvious at first, but it will probably arise as a
question for some users -- what if someone "mostly" moved a directory but
still left some files around, or, equivalently (from the perspective of the
three-way merge that merge-recursive performs), fully renamed a directory
in one commmit and then recreated that directory in a later commit adding
some new files and then tried to merge?  See the big comment in section 4
of the new t6043 for further discussion of this rule.

This set of rules seems to be reasonably easy to explain, is
self-consistent, allows all conflict cases to be represented without
changing any on-disk data structures or introducing new terminology or
commands for users, prevents excessively complex conflicts that users
might struggle to understand, and brings peace to the middle east.
Actually, maybe not that last one.

While I feel that this directory rename detection reduces the number of
suboptimal merges and cherry-picks that git performs, there are sadly
still a number of cases that remain suboptimal, or that even newly appear
to be not-quite-consistent with other cases.  The fact that one file
layout might trigger some of the rules above while another "slightly"
different file layout doesn't might occasionally cause some user
grumblings.  I've tried to explore and document these cases in section 8
of the new t6043-merge-rename-directories.sh

Finally, from an implementation perspective, there's another strong
advantage to the ruleset above: it means that any path to which we want
to apply an implicit directory rename will have a free and open spot
for us to move it into.  Thus, we can just adjust 

[PATCH 01/30] Tighten and correct a few testcases for merging and cherry-picking

2017-11-10 Thread Elijah Newren
t3501 had a testcase originally added to ensure cherry-pick wouldn't
segfault when working with a dirty file involved in a rename.  While
the segfault was fixed, there was another problem this test demonstrated:
namely, that git would overwrite a dirty file involved in a rename.
Further, the test encoded a "successful merge" and overwriting of this
file as correct behavior.  Modify the test so that it would still catch
the segfault, but to require the correct behavior.

t7607 had a test specific to looking for a merge overwriting a dirty file
involved in a rename, but it too actually encoded what I would term
incorrect behavior: it expected the merge to succeed.  Fix that, and add
a few more checks to make sure that the merge really does produce the
expected results.

Signed-off-by: Elijah Newren 
---
 t/t3501-revert-cherry-pick.sh | 7 +--
 t/t7607-merge-overwrite.sh| 5 -
 2 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/t/t3501-revert-cherry-pick.sh b/t/t3501-revert-cherry-pick.sh
index 4f2a263b63..783bdbf59d 100755
--- a/t/t3501-revert-cherry-pick.sh
+++ b/t/t3501-revert-cherry-pick.sh
@@ -141,7 +141,7 @@ test_expect_success 'cherry-pick "-" works with arguments' '
test_cmp expect actual
 '
 
-test_expect_success 'cherry-pick works with dirty renamed file' '
+test_expect_failure 'cherry-pick works with dirty renamed file' '
test_commit to-rename &&
git checkout -b unrelated &&
test_commit unrelated &&
@@ -150,7 +150,10 @@ test_expect_success 'cherry-pick works with dirty renamed 
file' '
test_tick &&
git commit -m renamed &&
echo modified >renamed &&
-   git cherry-pick refs/heads/unrelated
+   test_must_fail git cherry-pick refs/heads/unrelated >out &&
+   test_i18ngrep "Refusing to lose dirty file at renamed" out &&
+   test $(git rev-parse :0:renamed) = $(git rev-parse HEAD^:to-rename.t) &&
+   grep -q "^modified$" renamed
 '
 
 test_done
diff --git a/t/t7607-merge-overwrite.sh b/t/t7607-merge-overwrite.sh
index 9444d6a9b9..00617dadf8 100755
--- a/t/t7607-merge-overwrite.sh
+++ b/t/t7607-merge-overwrite.sh
@@ -97,7 +97,10 @@ test_expect_failure 'will not overwrite unstaged changes in 
renamed file' '
git mv c1.c other.c &&
git commit -m rename &&
cp important other.c &&
-   git merge c1a &&
+   test_must_fail git merge c1a >out &&
+   test_i18ngrep "Refusing to lose dirty file at other.c" out &&
+   test -f other.c~HEAD &&
+   test $(git hash-object other.c~HEAD) = $(git rev-parse c1a:c1.c) &&
test_cmp important other.c
 '
 
-- 
2.15.0.5.g9567be9905



Re: [PATCH v3 0/8] Create Git/Packet.pm

2017-11-10 Thread Junio C Hamano
Christian Couder  writes:

> There are only a few small changes compared to v2:

Please go incremental as v2 is already in 'next', and small changes
are easier to reiew and understand when given as follow-up "polish
this minor glitch in the initial effort" patches.

Thanks.


Re: [PATCH v2 2/9] commit: move empty message checks to libgit

2017-11-10 Thread Ramsay Jones


On 10/11/17 11:09, Phillip Wood wrote:
> From: Phillip Wood 
> 
> Move the functions that check for empty messages from bulitin/commit.c
> to sequencer.c so they can be shared with other commands. The
> functions are refactored to take an explicit cleanup mode and template
> filename passed by the caller.
> 
> Signed-off-by: Phillip Wood 
> ---
> 
> Notes:
> changes since v1:
>  - prefix cleanup_mode enum and constants with commit_msg_
> 
>  builtin/commit.c | 99 
> +++-
>  sequencer.c  | 61 ++
>  sequencer.h  | 11 +++
>  3 files changed, 91 insertions(+), 80 deletions(-)
> 

Just an idle thought - why are these functions moving to
sequencer.[ch] rather than commit[.ch]?

Similar comments for other patches in the series which moves
code from builtin/commit.c to sequencer.[ch].

ATB,
Ramsay Jones




Re: [PATCH v2 3/9] Add a function to update HEAD after creating a commit

2017-11-10 Thread Junio C Hamano
Phillip Wood  writes:

> From: Phillip Wood 
>
> Add update_head() based on the code that updates HEAD after committing
> in builtin/commit.c that can be called by 'git commit' and other
> commands.
>
> Signed-off-by: Phillip Wood 
> ---
>
> Notes:
> changes since v1:
>  - rename update_head() to update_head_with_reflog()

The above proposed log message still calls it with the old name
though.


Re: [PATCH 2/4] Remove silent clamp of renameLimit

2017-11-10 Thread Elijah Newren
Thanks for taking a look!

On Fri, Nov 10, 2017 at 10:26 AM, Stefan Beller  wrote:

>> -   if (rename_limit <= 0 || rename_limit > 32767)
>> -   rename_limit = 32767;
>> if ((num_create <= rename_limit || num_src <= rename_limit) &&
>> -   (num_create * num_src <= rename_limit * rename_limit))
>> +   ((double)num_create * (double)num_src
>> +<= (double)rename_limit * (double)rename_limit))
>> return 0;
>
> From a technical perspective, I would think that if
> (num_create <= rename_limit || num_src <= rename_limit)
> holds true, that the double-cast condition would also be always true?
> Could we just remove that last check?

Not necessarily.  For example, if num_create = rename_limit-1 and
num_src = rename_limit+2, then the first condition will be satisfied
but the second won't.  If it was && rather than ||, then the second
condition would be superfluous.

> Or phrased differently, if we can cast to double and extend the check
> here, do we have to adapt code at other places as well?

Good point, and yes.  Perhaps I should have re-ordered my patch series
because I came back to it later and realized that the progress code
was broken due to overflow/wraparound, and a patch 3 fixed that.

Further, the later patch used uint64_t instead of double.  While
double works, perhaps I should change the double here to uint64_t for
consistency?


Re: [RFC] cover-at-tip

2017-11-10 Thread Jonathan Tan
On Fri, 10 Nov 2017 16:37:49 +0100
Nicolas Morey-Chaisemartin  wrote:

> > Hi,
> >
> > I'm starting to look into the cover-at-tip topic that I found in the 
> > leftover bits (http://www.spinics.net/lists/git/msg259573.html)

Thanks - I personally would find this very useful.

> > Here's a first draft of a patch that adds support for format-patch 
> > --cover-at-tip. It compiles and works in my nice and user firnedly test 
> > case.
> > Just wanted to make sure I was going roughly in the right direction here.
> >
> >
> > I was wondering where is the right place to put a commit_is_cover_at_tip() 
> > as the test will be needed in other place as the feature is extended to git 
> > am/merge/pull.

I think you can put this in (root)/commit.c, especially since that test
operates on a "struct commit *".

> > Feel free to comment. I know the help is not clear at this point and 
> > there's still some work to do on option handling (add a config option, 
> > probably have --cover-at-tip imply --cover-letter, etc) and
> > some testing :)

Both are good ideas. You should probably use a
--cover-letter={no,auto,yes} instead of the current boolean, so that the
config can use the same options and configuring it to "auto" (to use a
cover letter if the tip is empty and singly-parented, and not to use a
cover letter otherwise) is meaningful.

> The proposed patch for format-patch does not output any "---" to signal the 
> end of the commit log and the begining of the patch in the cover letter.
> This means that the log summary, the diffstat and the git footer ( --\n version>) is seen as part of the commit log. Which is just wrong.
> 
> Removing them would solve the issue but I feel they bring some useful info 
> (or they would not be here).
> Adding a "---" between the commit log and those added infos poses another 
> problem: git am does not see an empty patch anymore.
> I would need to add "some" level of parsing to am.c to make sure the patch 
> content is just garbage and that there are no actual hunks for that.

Could you just take the message from the commit and put that in the
cover letter? The summary and diffstat do normally have useful info, but
if the commit is specifically made to be used only for the cover letter,
I think that is no longer true.


Re: [PATCH 0/4] Fix issues with rename detection limits

2017-11-10 Thread Elijah Newren
On Fri, Nov 10, 2017 at 10:13 AM, Elijah Newren  wrote:
> In a repo with around 60k files with deep directory hierarchies (due to

> Elijah Newren (4):
>   sequencer: Warn when internal merge may be suboptimal due to
> renameLimit
>   Remove silent clamp of renameLimit
>   progress: Fix progress meters when dealing with lots of work
>   sequencer: Show rename progress during cherry picks

Sorry for the duplicate send.

I noticed the cover letter didn't appear on in the email list,
double-checked https://public-inbox.org/git/, waited over half an hour
and double checked both email and public-inbox again and still didn't
see it, so I re-sent.  Just as soon as I did, it seems that the
original and the new cover letter emails suddenly showed up right
then.  *shrug*.


Re: [PATCH 2/4] Remove silent clamp of renameLimit

2017-11-10 Thread Stefan Beller
On Fri, Nov 10, 2017 at 9:39 AM, Elijah Newren  wrote:
> In commit 0024a5492 (Fix the rename detection limit checking; 2007-09-14),
> the renameLimit was clamped to 32767.  This appears to have been to simply
> avoid integer overflow in the following computation:
>
>num_create * num_src <= rename_limit * rename_limit
>
> although it also could be viewed as a hardcoded bound on the amount of CPU
> time we're willing to allow users to tell git to spend on handling
> renames.  An upper bound may make sense, particularly as the computation
> is O(rename_limit^2), but only if the bound is documented and communicated
> to the user -- neither of which were true.
>
> In fact, the silent clamping of the renameLimit to a smaller value and
> lack of reporting of the needed renameLimit when it was too large made it
> appear to the user as though they had used a high enough value; however,
> git would proceed to mess up the merge or cherry-pick badly based on the
> lack of rename detection.  Some hardy folks, despite the lack of feedback
> on the correct limit to choose, were desperate enough to repeatedly retry
> their cherry-picks with increasingly larger renameLimit values (going
> orders of magnitude beyond the built-in limit of 32767), but were
> consistently met with the same failure.
>
> Although large limits can make things slow, we have users who would be
> ecstatic to have a small five file change be correctly cherry picked even
> if they have to manually specify a large limit and it took git ten minutes
> to compute it.
>
> Signed-off-by: Elijah Newren 
> ---
>  diff.c|  2 +-
>  diffcore-rename.c | 11 ---
>  2 files changed, 5 insertions(+), 8 deletions(-)
>
> diff --git a/diff.c b/diff.c
> index 6fd288420b..c6597e3231 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -5524,7 +5524,7 @@ void diff_warn_rename_limit(const char *varname, int 
> needed, int degraded_cc)
> warning(_(rename_limit_warning));
> else
> return;
> -   if (0 < needed && needed < 32767)
> +   if (0 < needed)
> warning(_(rename_limit_advice), varname, needed);
>  }
>
> diff --git a/diffcore-rename.c b/diffcore-rename.c
> index 0d8c3d2ee4..7f9a463f5a 100644
> --- a/diffcore-rename.c
> +++ b/diffcore-rename.c
> @@ -391,14 +391,10 @@ static int too_many_rename_candidates(int num_create,
>  * growing larger than a "rename_limit" square matrix, ie:
>  *
>  *num_create * num_src > rename_limit * rename_limit
> -*
> -* but handles the potential overflow case specially (and we
> -* assume at least 32-bit integers)
>  */
> -   if (rename_limit <= 0 || rename_limit > 32767)
> -   rename_limit = 32767;
> if ((num_create <= rename_limit || num_src <= rename_limit) &&
> -   (num_create * num_src <= rename_limit * rename_limit))
> +   ((double)num_create * (double)num_src
> +<= (double)rename_limit * (double)rename_limit))
> return 0;

>From a technical perspective, I would think that if
(num_create <= rename_limit || num_src <= rename_limit)
holds true, that the double-cast condition would also be always true?
Could we just remove that last check?

Or phrased differently, if we can cast to double and extend the check
here, do we have to adapt code at other places as well?

>
> options->needed_rename_limit =
> @@ -415,7 +411,8 @@ static int too_many_rename_candidates(int num_create,
> num_src++;
> }
> if ((num_create <= rename_limit || num_src <= rename_limit) &&
> -   (num_create * num_src <= rename_limit * rename_limit))
> +   ((double)num_create * (double)num_src
> +<= (double)rename_limit * (double)rename_limit))
> return 2;
> return 1;
>  }
> --
> 2.15.0.5.g9567be9905
>


Re: [RFC] cover-at-tip

2017-11-10 Thread Junio C Hamano
Nicolas Morey-Chaisemartin  writes:

> I would need to add "some" level of parsing to am.c to make sure
> the patch content is just garbage and that there are no actual
> hunks for that.
>
> I did not find any public API that would allow me to do that,
> although apply_path/parse_chunk would fit the bill.  Is that the
> right way to approach this ?

I do not think you would want this non-patch cruft seen at the apply
layer at all.  Reading a mailbox, with the help of mailsplit and
mailinfo, and being the driver to create a series of commits is what
"am" is about, and it would have to notice that the non-patch cruft
at the beginning is not a patch at all and defer creation of an
empty commit with that cover material at the end.  For each of the
other messages in the series that has patches, it will need to call
apply to update the index and the working tree so that it can make a
commit, but there is NO reason whatsoever to ask help from apply, whose
sole purpose is to read a patch and make modifications to the index
and the working tree, to handle the cover material.




Re: Bug - Status - Space in Filename

2017-11-10 Thread Junio C Hamano
Jeff King  writes:

> Are there callers who don't just print the result? If not, we could just
> always emit. That's slightly more efficient since it drops the expensive
> strbuf_insert (though there are already so many copies going on in
> quote_path_relative that it hardly matters). But it also drops the need
> for the caller to know about the strbuf at all.

I am fine by that, too.  Consider that I'm still suffering from the
trauma caused by some patches (not in this area but others) that
changed output we have long been directly emiting to stdio to
instead capture to strings ;-)

This might also be a good bite-sized material for the weekend thing
;-)


[PATCH 0/4] Fix issues with rename detection limits

2017-11-10 Thread Elijah Newren
In a repo with around 60k files with deep directory hierarchies (due to
being predominantly java code) and which had a few high level directories
moved around (making it appear to git as dozens of thousands of individual
file renames), attempts to cherry-pick commits across product versions
resulted in non-sensical delete/modify conflicts (rather than
rename/modify as expected).  We had a few teams who were in the unenviable
position of having to backport hundreds or thousands of commits across
such product versions, and the result was months of pain, which could have
been alleviated were it not for a few small git bugs:

  * When you try to cherry-pick changes, unlike when you merge two
branches, git will not notify you when you need to set a higher
merge.renameLimit.

  * If you know git internals well enough, you can try to trick git into
telling you the correct renameLimit by doing a merge instead of a
cherry-pick.  If you do that, and have a renameLimit that is too
small, git will let you know the value you need.  You can then undo
the merge and proceed with the cherry-pick.  Except that...

  * If you are performing a merge and specify a large renameLimit that
isn't large enough, and the necessary renameLimit you need is greater
than 32767, then git tells you nothing, leading you to believe that
the limit you specified is high enough, but only to watch it still
mess up the merge badly.

  * If you happen to specify a merge.renameLimit that *is* high enough,
but just happens to be greater than 32767, then git will silently
pretend you specified 32767, determine that 32767 is not high enough,
not tell you anything about it's silent clamping, and then proceed to
mess up the merge or cherry-pick badly.  Not only do you get no
feedback, the clamping to 32767 isn't documented anywhere even if you
tried to read every manual page.

Folks did discover the merge.renameLimit and tried setting it to various
values, spread over the spectrum from 1000 (the default) up to 9
(or maybe more, that's just the highest number I heard), completely
unaware that most their attempts were ignored and wondering why git
couldn't handle things and couldn't explain why either.

Eventually folks wrote scripts that would run the output of format-patch
through a bunch of sed commands to pretend patches were against the
filename on the other side of history and then pipe back through git-am.
Such scripts grew as more and more rename rules were added.

I eventually learned of one of these scripts and said something close to,
"You don't need these pile of rename rules; just set merge.renameLimit to
something higher."  When they responded that merge.renameLimit didn't
work, I didn't believe them.  This patch series, along with two others
that I will be sending shortly, represent my attempt to continue to not
believe them.  :-)

Elijah Newren (4):
  sequencer: Warn when internal merge may be suboptimal due to
renameLimit
  Remove silent clamp of renameLimit
  progress: Fix progress meters when dealing with lots of work
  sequencer: Show rename progress during cherry picks

 diff.c|  2 +-
 diffcore-rename.c | 15 ++-
 progress.c| 22 +++---
 progress.h|  8 
 sequencer.c   |  2 ++
 5 files changed, 24 insertions(+), 25 deletions(-)

-- 
2.15.0.5.g9567be9905



[PATCH 0/4] Fix issues with rename detection limits

2017-11-10 Thread Elijah Newren
In a repo with around 60k files with deep directory hierarchies (due to
being predominantly java code) and which had a few high level directories
moved around (making it appear to git as dozens of thousands of individual
file renames), attempts to cherry-pick commits across product versions
resulted in non-sensical delete/modify conflicts (rather than
rename/modify as expected).  We had a few teams who were in the unenviable
position of having to backport hundreds or thousands of commits across
such product versions, and the result was months of pain, which could have
been alleviated were it not for a few small git bugs:

  * When you try to cherry-pick changes, unlike when you merge two
branches, git will not notify you when you need to set a higher
merge.renameLimit.

  * If you know git internals well enough, you can try to trick git into
telling you the correct renameLimit by doing a merge instead of a
cherry-pick.  If you do that, and have a renameLimit that is too
small, git will let you know the value you need.  You can then undo
the merge and proceed with the cherry-pick.  Except that...

  * If you are performing a merge and specify a large renameLimit that
isn't large enough, and the necessary renameLimit you need is greater
than 32767, then git tells you nothing, leading you to believe that
the limit you specified is high enough, but only to watch it still
mess up the merge badly.

  * If you happen to specify a merge.renameLimit that *is* high enough,
but just happens to be greater than 32767, then git will silently
pretend you specified 32767, determine that 32767 is not high enough,
not tell you anything about it's silent clamping, and then proceed to
mess up the merge or cherry-pick badly.  Not only do you get no
feedback, the clamping to 32767 isn't documented anywhere even if you
tried to read every manual page.

Folks did discover the merge.renameLimit and tried setting it to various
values, spread over the spectrum from 1000 (the default) up to 9
(or maybe more, that's just the highest number I heard), completely
unaware that most their attempts were ignored and wondering why git
couldn't handle things and couldn't explain why either.

Eventually folks wrote scripts that would run the output of format-patch
through a bunch of sed commands to pretend patches were against the
filename on the other side of history and then pipe back through git-am.
Such scripts grew as more and more rename rules were added.

I eventually learned of one of these scripts and said something close to,
"You don't need these pile of rename rules; just set merge.renameLimit to
something higher."  When they responded that merge.renameLimit didn't
work, I didn't believe them.  This patch series, along with two others
that I will be sending shortly, represent my attempt to continue to not
believe them.  :-)

Elijah Newren (4):
  sequencer: Warn when internal merge may be suboptimal due to
renameLimit
  Remove silent clamp of renameLimit
  progress: Fix progress meters when dealing with lots of work
  sequencer: Show rename progress during cherry picks

 diff.c|  2 +-
 diffcore-rename.c | 15 ++-
 progress.c| 22 +++---
 progress.h|  8 
 sequencer.c   |  2 ++
 5 files changed, 24 insertions(+), 25 deletions(-)

-- 
2.15.0.5.g9567be9905



Re: is there a stylistic preference for a trailing "--" on a command?

2017-11-10 Thread Stefan Beller
On Fri, Nov 10, 2017 at 5:57 AM, Robert P. J. Day  wrote:
>
>   just noticed these examples in "man git-bisect":
>
> EXAMPLES
>   $ git bisect start HEAD v1.2 --  # HEAD is bad, v1.2 is good
>   ...
>   $ git bisect start HEAD origin --# HEAD is bad, origin is good
>   ...
>   $ git bisect start HEAD HEAD~10 --   # culprit is among the last 10
>
> is there some rationale or stylistic significance to those trailing
> "--" on those commands? i assume they have no effect, just curious as
> to why they're there.

By having the -- there, it is clear that the strings are ref specs and not files
of such a name. (Who would want to store a file named HEAD~10 in their
repo?)


Re: [PATCH v1 5/8] sequencer: don't die in print_commit_summary()

2017-11-10 Thread Junio C Hamano
Phillip Wood  writes:

> On 07/11/17 15:13, Junio C Hamano wrote:
> ...
>> Another possibility perhaps is that the function is safe to reuse
>> already even without this patch, of course ;-).
>> 
> Hmm, maybe it is. Looking at pick_commits() and do_pick_commit() if the
> sequencer dies in print_commit_summary() (which can only happen when
> cherry-picking or reverting) then neither the todo list or the abort
> safety file are updated to reflect the commit that was just made.
> 
> As I understand it print_commit_summary() dies because: (i) it cannot
> resolve HEAD either because some other process is updating it (which is
> bad news in the middle of a cherry-pick); (ii) because something went
> wrong HEAD is corrupt; or (iii) log_tree_commit() cannot read some
> objects. In all those cases dying will leave the sequencer in a sane
> state for aborting - 'git cherry-pick --abort' will rewind HEAD to the
> last successful commit before there was a problem with HEAD or the
> object database. If the user somehow fixes the problem and runs 'git
> cherry-pick --continue' then the sequencer will try and pick the same
> commit again which may or may not be what the user wants depending on
> what caused print_commit_summary() to die.

The above is all good analysis---thanks for your diligence.  Perhaps
some if not all of it can go to the log message?



Re: [Query] Separate hooks for Git worktrees

2017-11-10 Thread Stefan Beller
On Thu, Nov 9, 2017 at 9:00 PM, Junio C Hamano  wrote:
> Stefan Beller  writes:
>
>> We have no worktree specific config yet, though patches for
>> this were floated on the mailing list.
>>
>> Though recent versions of git learned to conditionally include
>> config files. (look for includeIf in man git-config), which I think
>> could be used to set the option gerrit.createChangeId  depending
>> on the worktree you are in.
>>
>>> Any idea how I can get around this problem without having separate
>>> repositories for kernel and android ?
>>
>> The proposed approach above might be hacky but sounds as if
>> it should work?
>
> If you meant "conditional include" by "proposed approach above", I
> do not see which part you found possibly hacky.

Compared to a per-worktree configuration that you can setup via

git config --for-worktree=X key value

the setup using conditional includes seems hacky to me.
(I just realize that these conditional includes can be set using
regular git-config, so maybe it is not as hacky as I thought.)

>  It is to allow
> different set of configurations to apply depending on where you are
> working at, which I think was invented exactly for something like
> this.

>From a UX perspective, I can imagine a way easier workflow,
but the data model seems to make sense.

> It certainly is not any hackier than using the same repository to
> house separately manged projects even if they may be related
> projects.

Well it is the same project with different upstream workflows.
For example I would imagine that Viresh wants to cherry-pick
from one branch to another, or even send the same patch
(just with different commit messages, with or without the
ChangeId) to the different upstreams?

> Where does the aversion of "having separate repositories" primarily
> come from?  Is it bad due to disk consumption?  Is it bad because
> you cannot do "git diff android-branch kernel-branch"?  Something
> else?

Yeah, that is an interesting question!
(I suspect workflow related things, diff/cherry-pick)

> If it is purely disk consumption that is an issue, perhaps the real
> solution is to make it easier to maintain separate repositories
> while sharing as much disk space as possible.  GC may have to be
> made a lot more robust in the presense of alternate object stores,
> for example.


Re: [PATCH v4] doc/SubmittingPatches: correct subject guidance

2017-11-10 Thread Josh Triplett
On Fri, Nov 10, 2017 at 03:02:50PM +, Adam Dinwoodie wrote:
> The examples and common practice for adding markers such as "RFC" or
> "v2" to the subject of patch emails is to have them within the same
> brackets as the "PATCH" text, not after the closing bracket.  Further,
> the practice of `git format-patch` and the like, as well as what appears
> to be the more common pratice on the mailing list, is to use "[RFC
> PATCH]", not "[PATCH/RFC]".
> 
> Update the SubmittingPatches article to match and to reference the
> `format-patch` helper arguments, and also make some minor text
> clarifications in the area.
> 
> Signed-off-by: Adam Dinwoodie 
> Helped-by: Eric Sunshine 

This looks great! Thank you for updating this documentation.

Reviewed-by: Josh Triplett 

> ---
> 
> Notes:
> Changes since v3:
> - Clarified meaning of "RFC" per Eric's suggestion
> - Made the impact of --subject-prefix and friends clearer per Eric's
>   suggestion
> 
> Thank you for your nitpicking, Eric, it's useful and very much
> appreciated :)
> 
>  Documentation/SubmittingPatches | 19 ---
>  1 file changed, 12 insertions(+), 7 deletions(-)
> 
> diff --git a/Documentation/SubmittingPatches b/Documentation/SubmittingPatches
> index 558d465b6..89f239071 100644
> --- a/Documentation/SubmittingPatches
> +++ b/Documentation/SubmittingPatches
> @@ -184,21 +184,26 @@ lose tabs that way if you are not careful.
>  
>  It is a common convention to prefix your subject line with
>  [PATCH].  This lets people easily distinguish patches from other
> -e-mail discussions.  Use of additional markers after PATCH and
> -the closing bracket to mark the nature of the patch is also
> -encouraged.  E.g. [PATCH/RFC] is often used when the patch is
> -not ready to be applied but it is for discussion, [PATCH v2],
> -[PATCH v3] etc. are often seen when you are sending an update to
> -what you have previously sent.
> +e-mail discussions.  Use of markers in addition to PATCH within
> +the brackets to describe the nature of the patch is also
> +encouraged.  E.g. [RFC PATCH] (where RFC stands for "request for
> +comments") is often used to indicate a patch needs further
> +discussion before being accepted, [PATCH v2], [PATCH v3] etc.
> +are often seen when you are sending an update to what you have
> +previously sent.
>  
> -"git format-patch" command follows the best current practice to
> +The "git format-patch" command follows the best current practice to
>  format the body of an e-mail message.  At the beginning of the
>  patch should come your commit message, ending with the
>  Signed-off-by: lines, and a line that consists of three dashes,
>  followed by the diffstat information and the patch itself.  If
>  you are forwarding a patch from somebody else, optionally, at
>  the beginning of the e-mail message just before the commit
>  message starts, you can put a "From: " line to name that person.
> +To change the default "[PATCH]" in the subject to "[]", use
> +`git format-patch --subject-prefix=`.  As a shortcut, you
> +can use `--rfc` instead of `--subject-prefix="RFC PATCH"`, or
> +`-v ` instead of `--subject-prefix="PATCH v"`.
>  
>  You often want to add additional explanation about the patch,
>  other than the commit message itself.  Place such "cover letter"
> -- 
> 2.14.3
> 


[PATCH 4/4] sequencer: Show rename progress during cherry picks

2017-11-10 Thread Elijah Newren
When trying to cherry-pick a change that has lots of renames, it is
somewhat unsettling to wait a really long time without any feedback.

Signed-off-by: Elijah Newren 
---
 sequencer.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/sequencer.c b/sequencer.c
index 2b4cecb617..247d93f363 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -448,6 +448,7 @@ static int do_recursive_merge(struct commit *base, struct 
commit *next,
o.branch2 = next ? next_label : "(empty tree)";
if (is_rebase_i(opts))
o.buffer_output = 2;
+   o.show_rename_progress = 1;
 
head_tree = parse_tree_indirect(head);
next_tree = next ? next->tree : empty_tree();
-- 
2.15.0.5.g9567be9905



[PATCH 1/4] sequencer: Warn when internal merge may be suboptimal due to renameLimit

2017-11-10 Thread Elijah Newren
Having renamed files silently treated as deleted/modified conflicts
instead of correctly resolving the renamed/modified case has caused lots
of pain for some users.  Eventually, some of them figured out to set
merge.renameLimit to something higher, but without feedback about what
value it should have, they were just repeatedly guessing and retrying.

Signed-off-by: Elijah Newren 
---
 sequencer.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/sequencer.c b/sequencer.c
index f2a10cc4f2..2b4cecb617 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -462,6 +462,7 @@ static int do_recursive_merge(struct commit *base, struct 
commit *next,
if (is_rebase_i(opts) && clean <= 0)
fputs(o.obuf.buf, stdout);
strbuf_release();
+   diff_warn_rename_limit("merge.renamelimit", o.needed_rename_limit, 0);
if (clean < 0)
return clean;
 
-- 
2.15.0.5.g9567be9905



[PATCH 3/4] progress: Fix progress meters when dealing with lots of work

2017-11-10 Thread Elijah Newren
The possibility of setting merge.renameLimit beyond 2^16 raises the
possibility that the values passed to progress can exceed 2^32.  For my
usecase of interest, I only needed to pass a value a little over 2^31.  If
I were only interested in fixing my usecase, I could have simply changed
last_value from int to unsigned, and casted each of rename_dst_nr and
rename_src_nr (in merge-recursive.c) from int to unsigned just before
multiplying them.  However, as long as we're making changes to allow
larger progress meters, we may as well make a little more room in general.
Use uint64_t, because it "ought to be enough for anybody".  :-)

Signed-off-by: Elijah Newren 
---
 diffcore-rename.c |  4 ++--
 progress.c| 22 +++---
 progress.h|  8 
 3 files changed, 17 insertions(+), 17 deletions(-)

diff --git a/diffcore-rename.c b/diffcore-rename.c
index 7f9a463f5a..6ba6157c61 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -531,7 +531,7 @@ void diffcore_rename(struct diff_options *options)
if (options->show_rename_progress) {
progress = start_delayed_progress(
_("Performing inexact rename detection"),
-   rename_dst_nr * rename_src_nr);
+   (uint64_t)rename_dst_nr * 
(uint64_t)rename_src_nr);
}
 
mx = xcalloc(st_mult(NUM_CANDIDATE_PER_DST, num_create), sizeof(*mx));
@@ -568,7 +568,7 @@ void diffcore_rename(struct diff_options *options)
diff_free_filespec_blob(two);
}
dst_cnt++;
-   display_progress(progress, (i+1)*rename_src_nr);
+   display_progress(progress, 
(uint64_t)(i+1)*(uint64_t)rename_src_nr);
}
stop_progress();
 
diff --git a/progress.c b/progress.c
index 289678d43d..7e4a2f9532 100644
--- a/progress.c
+++ b/progress.c
@@ -30,8 +30,8 @@ struct throughput {
 
 struct progress {
const char *title;
-   int last_value;
-   unsigned total;
+   uint64_t last_value;
+   uint64_t total;
unsigned last_percent;
unsigned delay;
unsigned delayed_percent_threshold;
@@ -79,7 +79,7 @@ static int is_foreground_fd(int fd)
return tpgrp < 0 || tpgrp == getpgid(0);
 }
 
-static int display(struct progress *progress, unsigned n, const char *done)
+static int display(struct progress *progress, uint64_t n, const char *done)
 {
const char *eol, *tp;
 
@@ -106,7 +106,7 @@ static int display(struct progress *progress, unsigned n, 
const char *done)
if (percent != progress->last_percent || progress_update) {
progress->last_percent = percent;
if (is_foreground_fd(fileno(stderr)) || done) {
-   fprintf(stderr, "%s: %3u%% (%u/%u)%s%s",
+   fprintf(stderr, "%s: %3u%% (%lu/%lu)%s%s",
progress->title, percent, n,
progress->total, tp, eol);
fflush(stderr);
@@ -116,7 +116,7 @@ static int display(struct progress *progress, unsigned n, 
const char *done)
}
} else if (progress_update) {
if (is_foreground_fd(fileno(stderr)) || done) {
-   fprintf(stderr, "%s: %u%s%s",
+   fprintf(stderr, "%s: %lu%s%s",
progress->title, n, tp, eol);
fflush(stderr);
}
@@ -127,7 +127,7 @@ static int display(struct progress *progress, unsigned n, 
const char *done)
return 0;
 }
 
-static void throughput_string(struct strbuf *buf, off_t total,
+static void throughput_string(struct strbuf *buf, uint64_t total,
  unsigned int rate)
 {
strbuf_reset(buf);
@@ -138,7 +138,7 @@ static void throughput_string(struct strbuf *buf, off_t 
total,
strbuf_addstr(buf, "/s");
 }
 
-void display_throughput(struct progress *progress, off_t total)
+void display_throughput(struct progress *progress, uint64_t total)
 {
struct throughput *tp;
uint64_t now_ns;
@@ -200,12 +200,12 @@ void display_throughput(struct progress *progress, off_t 
total)
display(progress, progress->last_value, NULL);
 }
 
-int display_progress(struct progress *progress, unsigned n)
+int display_progress(struct progress *progress, uint64_t n)
 {
return progress ? display(progress, n, NULL) : 0;
 }
 
-static struct progress *start_progress_delay(const char *title, unsigned total,
+static struct progress *start_progress_delay(const char *title, uint64_t total,
 unsigned percent_threshold, 
unsigned delay)
 {
struct progress *progress = malloc(sizeof(*progress));
@@ -227,12 +227,12 @@ static struct progress *start_progress_delay(const char 
*title, unsigned 

[PATCH 2/4] Remove silent clamp of renameLimit

2017-11-10 Thread Elijah Newren
In commit 0024a5492 (Fix the rename detection limit checking; 2007-09-14),
the renameLimit was clamped to 32767.  This appears to have been to simply
avoid integer overflow in the following computation:

   num_create * num_src <= rename_limit * rename_limit

although it also could be viewed as a hardcoded bound on the amount of CPU
time we're willing to allow users to tell git to spend on handling
renames.  An upper bound may make sense, particularly as the computation
is O(rename_limit^2), but only if the bound is documented and communicated
to the user -- neither of which were true.

In fact, the silent clamping of the renameLimit to a smaller value and
lack of reporting of the needed renameLimit when it was too large made it
appear to the user as though they had used a high enough value; however,
git would proceed to mess up the merge or cherry-pick badly based on the
lack of rename detection.  Some hardy folks, despite the lack of feedback
on the correct limit to choose, were desperate enough to repeatedly retry
their cherry-picks with increasingly larger renameLimit values (going
orders of magnitude beyond the built-in limit of 32767), but were
consistently met with the same failure.

Although large limits can make things slow, we have users who would be
ecstatic to have a small five file change be correctly cherry picked even
if they have to manually specify a large limit and it took git ten minutes
to compute it.

Signed-off-by: Elijah Newren 
---
 diff.c|  2 +-
 diffcore-rename.c | 11 ---
 2 files changed, 5 insertions(+), 8 deletions(-)

diff --git a/diff.c b/diff.c
index 6fd288420b..c6597e3231 100644
--- a/diff.c
+++ b/diff.c
@@ -5524,7 +5524,7 @@ void diff_warn_rename_limit(const char *varname, int 
needed, int degraded_cc)
warning(_(rename_limit_warning));
else
return;
-   if (0 < needed && needed < 32767)
+   if (0 < needed)
warning(_(rename_limit_advice), varname, needed);
 }
 
diff --git a/diffcore-rename.c b/diffcore-rename.c
index 0d8c3d2ee4..7f9a463f5a 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -391,14 +391,10 @@ static int too_many_rename_candidates(int num_create,
 * growing larger than a "rename_limit" square matrix, ie:
 *
 *num_create * num_src > rename_limit * rename_limit
-*
-* but handles the potential overflow case specially (and we
-* assume at least 32-bit integers)
 */
-   if (rename_limit <= 0 || rename_limit > 32767)
-   rename_limit = 32767;
if ((num_create <= rename_limit || num_src <= rename_limit) &&
-   (num_create * num_src <= rename_limit * rename_limit))
+   ((double)num_create * (double)num_src
+<= (double)rename_limit * (double)rename_limit))
return 0;
 
options->needed_rename_limit =
@@ -415,7 +411,8 @@ static int too_many_rename_candidates(int num_create,
num_src++;
}
if ((num_create <= rename_limit || num_src <= rename_limit) &&
-   (num_create * num_src <= rename_limit * rename_limit))
+   ((double)num_create * (double)num_src
+<= (double)rename_limit * (double)rename_limit))
return 2;
return 1;
 }
-- 
2.15.0.5.g9567be9905



Re: [PATCH v1 2/2] log: add option to choose which refs to decorate

2017-11-10 Thread Junio C Hamano
Rafael Ascensão  writes:

> I agree that describe should receive the "normalize" treatment. However,
> and following the same reasoning, why should describe users adopt the
> rules imposed by --glob? I could argue they're also used to the way it
> works now.
>
> That being said, the suggestion I mentioned earlier would allow to keep
> both current behaviors consistent at the expense of the extra call to
> refs.c::ref_exists().

In any case, updating the "describe" for consistency is something we
can and should leave for later, to be done as a separate topic.

While I agree with you that the consistent behaviour between
commands is desirable, and also I agree with you that given a
pattern $X that does not have any glob char, trying to match $X when
a ref whose name exactly is $X exists and trying to match $X/*
otherwise would give us a consistent semantics without hurting any
existing uses, I do not think you need to pay any extra expense of
calling ref_exists() at all to achieve that.

That is because when $X exists, you already know $X/otherthing does
not exist.  And when $X does not exist, $X/otherthing might.  So a
naive implementation would be just to add two patterns $X and $X/*
to the filter list and be done with it.  If you exactly have
refs/heads/master, even with the naive logic may throw both
refs/heads/master and refs/heads/master/* to the filter list,
nothing will match the latter to contaminate your result (and vice
versa).

A bit more clever implementation "just throw in two items" would go
like this.  It is not all that involved:

 - In load_ref_decorations(), before running add_ref_decoration for
   each ref and head ref, iterate over the elements in the refname
   filter list.  For each element:

   - if item->string has a trailing '/', trim that.

   - store NULL in the item->util field for item whose string field
 has a glob char.

   - store something non-NULL (e.g. item->string) for item whose
 string field does not have a glob char.

 - In add_ref_decoration(), where your previous round iterates over
   filter->{include,exclude}, get rid of normalize_glob_ref() and
   use of real_pattern.  Instead do something like:

matched = 0;
if (item->util == NULL) {
if (!wildmatch(item->string, refname, 0))
matched = 1;
} else {
const char *rest;
if (skip_prefix(refname, item->string, ) &&
(!*rest || *rest == '/'))
matched = 1;
}
if (matched)
...

   Of course, you would probably want to encapsulate the logic to
   set matched = 1/0 in a helper function, e.g.

static int match_ref_pattern(const char *refname,
 const struct string_list_item *item) {
int matched = 0;
... do either wildmatch or head match with tail validation
... depending on the item->util's NULLness (see above)
return matched;
}

   and call that from the two loops for exclude and include list.

Hmm?


Re: cherry-pick very slow on big repository

2017-11-10 Thread Elijah Newren
Interesting timing.  I have some performance patches specifically
developed because rename detection during merges made a small
cherry-pick in a large repo rather slow...in my case, I dropped the
time for the cherry pick by a factor of about 30 (no guarantees you'll
see the same; it's very history-specific).  I was just about to start
sending my three series of patches, the performance one being the
third...

On Fri, Nov 10, 2017 at 6:05 AM, Peter Krefting  wrote:
> Derrick Stolee:
>
>> Git is spending time detecting renames, which implies you probably renamed
>> a folder or added and deleted a large number of files. This rename detection
>> is quadratic (# adds times # deletes).
>
>
> Yes, a couple of directories with a lot of template files have been renamed
> (and some removed, some added) between the current development branch and
> this old maintenance branch. I get the "Performing inexact rename detection"
> a lot when merging changes in the other direction.
>
> However, none of them applies to these particular commits, which only
> touches files that are in the exact same location on both branches.
>
>> You can remove this rename detection by running your cherry-pick with `git
>> -c diff.renameLimit=1 cherry-pick ...`
>
>
> That didn't work, actually it failed to finish with this setting in effect,
> it hangs in such a way that I can't stop it with Ctrl+C (neither when
> running from the command line, nor when running inside gdb). It didn't
> finish in the 20 minutes I gave it.
>
> I also tried with diff.renames=false, which also seemed to fail.
>
>
> --
> \\// Peter - http://www.softwolves.pp.se/


JPMorgan Chase & Co.

2017-11-10 Thread Chase Bank

JPMorgan Chase & Co.
270 Park Avenue,
Midtown Manhattan
New York City,

Attention:

We are pleased to inform you about your fund which was seized by International 
Monetary Fund (IMF) due to your failure to provide necessary credentials which 
state the legitimacy of your fund, the fund was said to be transferred from BOA 
before its seizure by IMF. Presently the fund is in JPMorgan Chase bank for 
immediate remittance to your nominated bank account, below is the account 
details we have about you kindly reconfirm if all the details are correct and 
update before we make the transfer.

NOTE: no one will ask you for transfer or any form of charge to transfer your 
fund if ever happen please report to us because it’s never from us.

Acct. name: Doreen Koehler.
Bank name: Bank of America
WIRE#026009593
ACC.898046532236

Yours sincerely
John Mour
(917) 900-0351


Urgent Message.

2017-11-10 Thread Western Union
Dear Western Union Customer,
 
You have been awarded with the sum of $50,000 USD by our office, as one of our 
customers who use Western Union in their daily business transaction. This award 
was been selected through the internet, where your e-mail address was indicated 
and notified. Please provide Mr. James Udo with the following details listed 
below so that your fund will be remited to you through Western Union.
 
1. Name:
2. Address:
3. Country:
4. Phone Number:
5. Occupation:
6. Sex:
7. Age:
 
Mr. James Udo E-mail: westerrnunion2...@outlook.com
 
As soon as these details are received and verified, your fund will be 
transferred to you.Thank you, for using western union.



RE: cherry-pick very slow on big repository

2017-11-10 Thread Kevin Willford
Since this is happening during a merge, you might need to use merge.renameLimit
or the merge strategy option of -Xno-renames.  Although the code does fallback
to use the diff.renameLimit but there is still a lot that is done before even 
checking
the rename limit so I would first try getting renames turned off.

Thanks,
Kevin

> -Original Message-
> From: git-ow...@vger.kernel.org [mailto:git-ow...@vger.kernel.org] On Behalf
> Of Peter Krefting
> Sent: Friday, November 10, 2017 7:05 AM
> To: Derrick Stolee 
> Cc: Jeff King ; Git Mailing List 
> Subject: Re: cherry-pick very slow on big repository
> 
> Derrick Stolee:
> 
> > Git is spending time detecting renames, which implies you probably
> > renamed a folder or added and deleted a large number of files. This
> > rename detection is quadratic (# adds times # deletes).
> 
> Yes, a couple of directories with a lot of template files have been
> renamed (and some removed, some added) between the current development
> branch and this old maintenance branch. I get the "Performing inexact
> rename detection" a lot when merging changes in the other direction.
> 
> However, none of them applies to these particular commits, which only
> touches files that are in the exact same location on both branches.
> 
> > You can remove this rename detection by running your cherry-pick
> > with `git -c diff.renameLimit=1 cherry-pick ...`
> 
> That didn't work, actually it failed to finish with this setting in
> effect, it hangs in such a way that I can't stop it with Ctrl+C
> (neither when running from the command line, nor when running inside
> gdb). It didn't finish in the 20 minutes I gave it.
> 
> I also tried with diff.renames=false, which also seemed to fail.
> 
> --
> \\// Peter -
> https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.softw
> olves.pp.se%2F=02%7C01%7Ckewillf%40microsoft.com%7C6b831a75739e4
> 0428d3808d52844106c%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636
> 459195209466999=kJtNLAs1LSoPy%2B%2BNADJkuEBPMZVcxkSkKzOEEeIG
> VpM%3D=0


Bug: cherry-pick & submodule

2017-11-10 Thread Ефимов Василий

I faced an unexpected behaviour during cherry-picking
a commit to a branch with a submodule.

Git graph:

A -- B [master]
 \
  `- C -- D [test]

Both branches have a file with name `a_file`.
It has been added by commit A.
Commits B and C add a folder with name `a_submodule` to the respective 
branches.

Commit C does it regularly by adding a file with name `a_submodule/content`.
Commit B adds a submodule with name `a_submodule`.
Commit D only modifies `a_file`.
Note that `a_file` and `a_submodule` are not related.

[repo]
  |- a_file
  `- a_submodule

When I trying to cherry pick commit D on commit B, I got
a conflict with `a_submodule`. Changes of `a_file` are
successfully cherry-picked.

I expected, that there would be no conflicts.

A bash script reproducing the bug is listed below.

Vasily.


rm -rf a_submodule a_repo

mkdir a_submodule
cd a_submodule
git init
touch a_file_in_a_submodule
git add a_file_in_a_submodule
git commit -m "add a file in a submodule"
cd ..

mkdir a_repo
cd a_repo
git init

touch a_file
git add a_file
git commit -m "add a file"

git branch test
git checkout test

mkdir a_submodule
touch a_submodule/content
git add a_submodule/content
git commit -m "add a regular folder with name a_submodule"

echo "123" > a_file
git add a_file
git commit -m "modify a file"

git checkout master

git submodule add ../a_submodule a_submodule
git submodule update a_submodule
git commit -m "add a submodule info folder with name a_submodule"

# Trying to cherry-pick modification of a file from test branch.
git cherry-pick test

# some debug
git status



[PATCH] bisect: mention "view" as an alternative to "visualize"

2017-11-10 Thread Robert P. J. Day
Tweak a number of files to mention "view" as an alternative to
"visualize":

 Documentation/git-bisect.txt   | 9 -
 Documentation/user-manual.txt  | 3 ++-
 builtin/bisect--helper.c   | 2 +-
 contrib/completion/git-completion.bash | 2 +-
 git-bisect.sh  | 4 ++--
 5 files changed, 10 insertions(+), 10 deletions(-)

Signed-off-by: Robert P. J. Day 

---

  here's hoping i have the right format for this patch ... are there
any parts of this that are inappropriate, such as extending the bash
completion?

diff --git a/Documentation/git-bisect.txt b/Documentation/git-bisect.txt
index 6c42abf07..89e6f9667 100644
--- a/Documentation/git-bisect.txt
+++ b/Documentation/git-bisect.txt
@@ -23,7 +23,7 @@ on the subcommand:
  git bisect terms [--term-good | --term-bad]
  git bisect skip [(|)...]
  git bisect reset []
- git bisect visualize
+ git bisect visualize|view
  git bisect replay 
  git bisect log
  git bisect run ...
@@ -196,15 +196,14 @@ of `git bisect good` and `git bisect bad` to mark commits.
 Bisect visualize
 

-To see the currently remaining suspects in 'gitk', issue the following
-command during the bisection process:
+To see the currently remaining suspects in 'gitk', issue either of the
+following equivalent commands during the bisection process:

 
 $ git bisect visualize
+$ git bisect view
 

-`view` may also be used as a synonym for `visualize`.
-
 If the `DISPLAY` environment variable is not set, 'git log' is used
 instead.  You can also give command-line options such as `-p` and
 `--stat`.
diff --git a/Documentation/user-manual.txt b/Documentation/user-manual.txt
index 3a03e63eb..55ec58986 100644
--- a/Documentation/user-manual.txt
+++ b/Documentation/user-manual.txt
@@ -538,10 +538,11 @@ Note that the version which `git bisect` checks out for 
you at each
 point is just a suggestion, and you're free to try a different
 version if you think it would be a good idea.  For example,
 occasionally you may land on a commit that broke something unrelated;
-run
+run either of the equivalent commands

 -
 $ git bisect visualize
+$ git bisect view
 -

 which will run gitk and label the commit it chose with a marker that
diff --git a/builtin/bisect--helper.c b/builtin/bisect--helper.c
index 35d2105f9..4b5fadcbe 100644
--- a/builtin/bisect--helper.c
+++ b/builtin/bisect--helper.c
@@ -46,7 +46,7 @@ static int check_term_format(const char *term, const char 
*orig_term)
return error(_("'%s' is not a valid term"), term);

if (one_of(term, "help", "start", "skip", "next", "reset",
-   "visualize", "replay", "log", "run", "terms", NULL))
+   "visualize", "view", "replay", "log", "run", "terms", 
NULL))
return error(_("can't use the builtin command '%s' as a term"), 
term);

/*
diff --git a/contrib/completion/git-completion.bash 
b/contrib/completion/git-completion.bash
index fdd984d34..52f68c922 100644
--- a/contrib/completion/git-completion.bash
+++ b/contrib/completion/git-completion.bash
@@ -1162,7 +1162,7 @@ _git_bisect ()
 {
__git_has_doubledash && return

-   local subcommands="start bad good skip reset visualize replay log run"
+   local subcommands="start bad good skip reset visualize view replay log 
run"
local subcommand="$(__git_find_on_cmdline "$subcommands")"
if [ -z "$subcommand" ]; then
__git_find_repo_path
diff --git a/git-bisect.sh b/git-bisect.sh
index 0138a8860..e8b622a47 100755
--- a/git-bisect.sh
+++ b/git-bisect.sh
@@ -1,6 +1,6 @@
 #!/bin/sh

-USAGE='[help|start|bad|good|new|old|terms|skip|next|reset|visualize|replay|log|run]'
+USAGE='[help|start|bad|good|new|old|terms|skip|next|reset|visualize|view|replay|log|run]'
 LONG_USAGE='git bisect help
print this long help message.
 git bisect start [--term-{old,good}= --term-{new,bad}=]
@@ -20,7 +20,7 @@ git bisect next
find next bisection to test and check it out.
 git bisect reset []
finish bisection search and go back to commit.
-git bisect visualize
+git bisect visualize|view
show bisect status in gitk.
 git bisect replay 
replay bisection log.


-- 


Robert P. J. Day Ottawa, Ontario, CANADA
http://crashcourse.ca

Twitter:   http://twitter.com/rpjday
LinkedIn:   http://ca.linkedin.com/in/rpjday



proper patch prefix for tweaking both git-bisect.sh and git-bisect.txt?

2017-11-10 Thread Robert P. J. Day

  digging through both git-bisect.sh and git-bisect.txt, and it seems
pretty clear they're both a bit out of date WRT documenting the newer
alternatives "old"/"new" as opposed to the older "good"/"bad" terms,
and a few other things.

  first, trivially, neither the script nor the man page mention "view"
as an alternative to "visualize", but that's easy to fix. however,
most of the inconsistency involves that good/bad/old/new stuff.

  the man page reads (in part):

  git bisect (bad|new|) []
  git bisect (good|old|) [...]

which i assume should actually read:

  git bisect (bad|new||) []
  git bisect (good|old||) [...]

unless there's some implicit assumption that isn't mentioned there.

  also from the man page, i'm guessing that:

  git bisect terms [--term-good | --term-bad]

might need to say:

  git bisect terms [--term-good | --term-bad | --term-new | --term-old]

and so on, and so on (again, unless the generality of those terms is
understood).

  so given that all of that is, technically, documentation (even the
usage message in the script), if one submits a patch to change both
files appropriately, what is the subject line prefix to use?

  anyway, maybe i'll do this in bite-size pieces to keep it
manageable. my first patch to the code base ... whoo hoo!

rday

-- 


Robert P. J. Day Ottawa, Ontario, CANADA
http://crashcourse.ca

Twitter:   http://twitter.com/rpjday
LinkedIn:   http://ca.linkedin.com/in/rpjday



Bug: cherry-pick & submodule

2017-11-10 Thread Ефимов Василий

I faced an unexpected behaviour during cherry-picking
a commit to a branch with a submodule.

Git graph:

A -- B [master]
 \
  `- C -- D [test]

Both branches have a file with name `a_file`.
It has been added by commit A.
Commits B and C add a folder with name `a_submodule` to the respective 
branches.

Commit C does it regularly by adding a file with name `a_submodule/content`.
Commit B adds a submodule with name `a_submodule`.
Commit D only modifies `a_file`.
Note that `a_file` and `a_submodule` are not related.

[repo]
  |- a_file
  `- a_submodule

When I trying to cherry pick commit D on commit B, I got
a conflict with `a_submodule`. Changes of `a_file` are
successfully cherry-picked.

I expected, that there would be no conflicts.

A bash script reproducing the bug is attached.

Vasily.


bug.sh
Description: application/shellscript


Re: [RFC] cover-at-tip

2017-11-10 Thread Nicolas Morey-Chaisemartin


Le 10/11/2017 à 11:24, Nicolas Morey-Chaisemartin a écrit :
> Hi,
>
> I'm starting to look into the cover-at-tip topic that I found in the leftover 
> bits (http://www.spinics.net/lists/git/msg259573.html)
>
> Here's a first draft of a patch that adds support for format-patch 
> --cover-at-tip. It compiles and works in my nice and user firnedly test case.
> Just wanted to make sure I was going roughly in the right direction here.
>
>
> I was wondering where is the right place to put a commit_is_cover_at_tip() as 
> the test will be needed in other place as the feature is extended to git 
> am/merge/pull.
>
> Feel free to comment. I know the help is not clear at this point and there's 
> still some work to do on option handling (add a config option, probably have 
> --cover-at-tip imply --cover-letter, etc) and
> some testing :)
>
>
> ---

Leaving some more updates and questions before the week end:

I started on git am --cover-at-tip.

The proposed patch for format-patch does not output any "---" to signal the end 
of the commit log and the begining of the patch in the cover letter.
This means that the log summary, the diffstat and the git footer ( --\n) is seen as part of the commit log. Which is just wrong.

Removing them would solve the issue but I feel they bring some useful info (or 
they would not be here).
Adding a "---" between the commit log and those added infos poses another 
problem: git am does not see an empty patch anymore.
I would need to add "some" level of parsing to am.c to make sure the patch 
content is just garbage and that there are no actual hunks for that.

I did not find any public API that would allow me to do that, although 
apply_path/parse_chunk would fit the bill.
Is that the right way to approach this ?

My branch is here if anyone want to give a look: 
https://github.com/nmorey/git/tree/dev/cover-at-tip

Nicolas






[PATCH v4] doc/SubmittingPatches: correct subject guidance

2017-11-10 Thread Adam Dinwoodie
The examples and common practice for adding markers such as "RFC" or
"v2" to the subject of patch emails is to have them within the same
brackets as the "PATCH" text, not after the closing bracket.  Further,
the practice of `git format-patch` and the like, as well as what appears
to be the more common pratice on the mailing list, is to use "[RFC
PATCH]", not "[PATCH/RFC]".

Update the SubmittingPatches article to match and to reference the
`format-patch` helper arguments, and also make some minor text
clarifications in the area.

Signed-off-by: Adam Dinwoodie 
Helped-by: Eric Sunshine 
---

Notes:
Changes since v3:
- Clarified meaning of "RFC" per Eric's suggestion
- Made the impact of --subject-prefix and friends clearer per Eric's
  suggestion

Thank you for your nitpicking, Eric, it's useful and very much
appreciated :)

 Documentation/SubmittingPatches | 19 ---
 1 file changed, 12 insertions(+), 7 deletions(-)

diff --git a/Documentation/SubmittingPatches b/Documentation/SubmittingPatches
index 558d465b6..89f239071 100644
--- a/Documentation/SubmittingPatches
+++ b/Documentation/SubmittingPatches
@@ -184,21 +184,26 @@ lose tabs that way if you are not careful.
 
 It is a common convention to prefix your subject line with
 [PATCH].  This lets people easily distinguish patches from other
-e-mail discussions.  Use of additional markers after PATCH and
-the closing bracket to mark the nature of the patch is also
-encouraged.  E.g. [PATCH/RFC] is often used when the patch is
-not ready to be applied but it is for discussion, [PATCH v2],
-[PATCH v3] etc. are often seen when you are sending an update to
-what you have previously sent.
+e-mail discussions.  Use of markers in addition to PATCH within
+the brackets to describe the nature of the patch is also
+encouraged.  E.g. [RFC PATCH] (where RFC stands for "request for
+comments") is often used to indicate a patch needs further
+discussion before being accepted, [PATCH v2], [PATCH v3] etc.
+are often seen when you are sending an update to what you have
+previously sent.
 
-"git format-patch" command follows the best current practice to
+The "git format-patch" command follows the best current practice to
 format the body of an e-mail message.  At the beginning of the
 patch should come your commit message, ending with the
 Signed-off-by: lines, and a line that consists of three dashes,
 followed by the diffstat information and the patch itself.  If
 you are forwarding a patch from somebody else, optionally, at
 the beginning of the e-mail message just before the commit
 message starts, you can put a "From: " line to name that person.
+To change the default "[PATCH]" in the subject to "[]", use
+`git format-patch --subject-prefix=`.  As a shortcut, you
+can use `--rfc` instead of `--subject-prefix="RFC PATCH"`, or
+`-v ` instead of `--subject-prefix="PATCH v"`.
 
 You often want to add additional explanation about the patch,
 other than the commit message itself.  Place such "cover letter"
-- 
2.14.3



Re: [PATCH v1 5/8] sequencer: don't die in print_commit_summary()

2017-11-10 Thread Phillip Wood
On 07/11/17 15:13, Junio C Hamano wrote:
> Junio C Hamano  writes:
> 
>> And this step is going in the right direction, but I am not sure if
>> this made the function safe enough to be called repeatedly from the
>> rebase machinery and we are ready to unleash this to the end users
>> and tell them it is safe to use it.
> 
> Another possibility perhaps is that the function is safe to reuse
> already even without this patch, of course ;-).
> 
Hmm, maybe it is. Looking at pick_commits() and do_pick_commit() if the
sequencer dies in print_commit_summary() (which can only happen when
cherry-picking or reverting) then neither the todo list or the abort
safety file are updated to reflect the commit that was just made.

As I understand it print_commit_summary() dies because: (i) it cannot
resolve HEAD either because some other process is updating it (which is
bad news in the middle of a cherry-pick); (ii) because something went
wrong HEAD is corrupt; or (iii) log_tree_commit() cannot read some
objects. In all those cases dying will leave the sequencer in a sane
state for aborting - 'git cherry-pick --abort' will rewind HEAD to the
last successful commit before there was a problem with HEAD or the
object database. If the user somehow fixes the problem and runs 'git
cherry-pick --continue' then the sequencer will try and pick the same
commit again which may or may not be what the user wants depending on
what caused print_commit_summary() to die.




Re: cherry-pick very slow on big repository

2017-11-10 Thread Peter Krefting

Derrick Stolee:

Git is spending time detecting renames, which implies you probably 
renamed a folder or added and deleted a large number of files. This 
rename detection is quadratic (# adds times # deletes).


Yes, a couple of directories with a lot of template files have been 
renamed (and some removed, some added) between the current development 
branch and this old maintenance branch. I get the "Performing inexact 
rename detection" a lot when merging changes in the other direction.


However, none of them applies to these particular commits, which only 
touches files that are in the exact same location on both branches.


You can remove this rename detection by running your cherry-pick 
with `git -c diff.renameLimit=1 cherry-pick ...`


That didn't work, actually it failed to finish with this setting in 
effect, it hangs in such a way that I can't stop it with Ctrl+C 
(neither when running from the command line, nor when running inside 
gdb). It didn't finish in the 20 minutes I gave it.


I also tried with diff.renames=false, which also seemed to fail.

--
\\// Peter - http://www.softwolves.pp.se/


is there a stylistic preference for a trailing "--" on a command?

2017-11-10 Thread Robert P. J. Day

  just noticed these examples in "man git-bisect":

EXAMPLES
  $ git bisect start HEAD v1.2 --  # HEAD is bad, v1.2 is good
  ...
  $ git bisect start HEAD origin --# HEAD is bad, origin is good
  ...
  $ git bisect start HEAD HEAD~10 --   # culprit is among the last 10

is there some rationale or stylistic significance to those trailing
"--" on those commands? i assume they have no effect, just curious as
to why they're there.

rday

-- 


Robert P. J. Day Ottawa, Ontario, CANADA
http://crashcourse.ca

Twitter:   http://twitter.com/rpjday
LinkedIn:   http://ca.linkedin.com/in/rpjday



Re: [PATCH v1 2/2] log: add option to choose which refs to decorate

2017-11-10 Thread Rafael Ascensão
On 07/11/17 00:18, Junio C Hamano wrote:
> Jacob Keller  writes:
> 
> I would have to say that the describe's one is wrong if it does not
> match what for_each_glob_ref() does for the log family of commands'
> "--branches=" etc.  describe.c::get_name() uses positive
> and negative patterns, just like log-tree.c::add_ref_decoration()
> would with the patch we are discussing, so perhaps the items in
> these lists should get the same "normalize" treatment the patch 1/2
> of this series brings in to make things consistent?
> 

I agree that describe should receive the "normalize" treatment. However,
and following the same reasoning, why should describe users adopt the
rules imposed by --glob? I could argue they're also used to the way it
works now.

That being said, the suggestion I mentioned earlier would allow to keep
both current behaviors consistent at the expense of the extra call to
refs.c::ref_exists().

+if (!has_glob_specials(pattern) && !ref_exists(normalized_pattern->buf)) {
+/* Append implied '/' '*' if not present. */
+strbuf_complete(normalized_pattern, '/');
+/* No need to check for '*', there is none. */
+strbuf_addch(normalized_pattern, '*');
+}

But I don't have enough expertise to decide if this consistency is worth 
the extra call to refs.c::ref_exists() or if there are other side-effects
I am not considering.

>> That being said, if we think the extra glob would not cause
>> problems and generally do what people mean... I guess consistent
>> with --glob would be good... But it's definitely not what I'd
>> expect at first glance.

My position is that consistency is good, but the "first glance
expectation" is definitely something important we should take into
consideration.


[PATCH v3 8/8] Add Git/Packet.pm from parts of t0021/rot13-filter.pl

2017-11-10 Thread Christian Couder
And while at it let's simplify t0021/rot13-filter.pl by
using Git/Packet.pm.

This will make it possible to reuse packet related
functions in other test scripts.

Signed-off-by: Christian Couder 
---
 perl/Git/Packet.pm  | 169 
 perl/Makefile   |   1 +
 t/t0021/rot13-filter.pl | 141 +---
 3 files changed, 173 insertions(+), 138 deletions(-)
 create mode 100644 perl/Git/Packet.pm

diff --git a/perl/Git/Packet.pm b/perl/Git/Packet.pm
new file mode 100644
index 00..1682403ffc
--- /dev/null
+++ b/perl/Git/Packet.pm
@@ -0,0 +1,169 @@
+package Git::Packet;
+use 5.008;
+use strict;
+use warnings;
+BEGIN {
+   require Exporter;
+   if ($] < 5.008003) {
+   *import = \::import;
+   } else {
+   # Exporter 5.57 which supports this invocation was
+   # released with perl 5.8.3
+   Exporter->import('import');
+   }
+}
+
+our @EXPORT = qw(
+   packet_compare_lists
+   packet_bin_read
+   packet_txt_read
+   packet_key_val_read
+   packet_bin_write
+   packet_txt_write
+   packet_flush
+   packet_initialize
+   packet_read_capabilities
+   packet_read_and_check_capabilities
+   packet_check_and_write_capabilities
+   );
+our @EXPORT_OK = @EXPORT;
+
+sub packet_compare_lists {
+   my ($expect, @result) = @_;
+   my $ix;
+   if (scalar @$expect != scalar @result) {
+   return undef;
+   }
+   for ($ix = 0; $ix < $#result; $ix++) {
+   if ($expect->[$ix] ne $result[$ix]) {
+   return undef;
+   }
+   }
+   return 1;
+}
+
+sub packet_bin_read {
+   my $buffer;
+   my $bytes_read = read STDIN, $buffer, 4;
+   if ( $bytes_read == 0 ) {
+   # EOF - Git stopped talking to us!
+   return ( -1, "" );
+   } elsif ( $bytes_read != 4 ) {
+   die "invalid packet: '$buffer'";
+   }
+   my $pkt_size = hex($buffer);
+   if ( $pkt_size == 0 ) {
+   return ( 1, "" );
+   } elsif ( $pkt_size > 4 ) {
+   my $content_size = $pkt_size - 4;
+   $bytes_read = read STDIN, $buffer, $content_size;
+   if ( $bytes_read != $content_size ) {
+   die "invalid packet ($content_size bytes expected; 
$bytes_read bytes read)";
+   }
+   return ( 0, $buffer );
+   } else {
+   die "invalid packet size: $pkt_size";
+   }
+}
+
+sub remove_final_lf_or_die {
+   my $buf = shift;
+   if ( $buf =~ s/\n$// ) {
+   return $buf;
+   }
+   die "A non-binary line MUST be terminated by an LF.\n"
+   . "Received: '$buf'";
+}
+
+sub packet_txt_read {
+   my ( $res, $buf ) = packet_bin_read();
+   if ( $res != -1 and $buf ne '' ) {
+   $buf = remove_final_lf_or_die($buf);
+   }
+   return ( $res, $buf );
+}
+
+# Read a text line and check that it is in the form "key=value"
+sub packet_key_val_read {
+   my ( $key ) = @_;
+   my ( $res, $buf ) = packet_txt_read();
+   if ( $res == -1 or ( $buf =~ s/^$key=// and $buf ne '' ) ) {
+   return ( $res, $buf );
+   }
+   die "bad $key: '$buf'";
+}
+
+sub packet_bin_write {
+   my $buf = shift;
+   print STDOUT sprintf( "%04x", length($buf) + 4 );
+   print STDOUT $buf;
+   STDOUT->flush();
+}
+
+sub packet_txt_write {
+   packet_bin_write( $_[0] . "\n" );
+}
+
+sub packet_flush {
+   print STDOUT sprintf( "%04x", 0 );
+   STDOUT->flush();
+}
+
+sub packet_initialize {
+   my ($name, $version) = @_;
+
+   packet_compare_lists([0, $name . "-client"], packet_txt_read()) ||
+   die "bad initialize";
+   packet_compare_lists([0, "version=" . $version], packet_txt_read()) ||
+   die "bad version";
+   packet_compare_lists([1, ""], packet_bin_read()) ||
+   die "bad version end";
+
+   packet_txt_write( $name . "-server" );
+   packet_txt_write( "version=" . $version );
+   packet_flush();
+}
+
+sub packet_read_capabilities {
+   my @cap;
+   while (1) {
+   my ( $res, $buf ) = packet_bin_read();
+   if ( $res == -1 ) {
+   die "unexpected EOF when reading capabilities";
+   }
+   return ( $res, @cap ) if ( $res != 0 );
+   $buf = remove_final_lf_or_die($buf);
+   unless ( $buf =~ s/capability=// ) {
+   die "bad capability buf: '$buf'";
+   }
+   push @cap, $buf;
+   }
+}
+
+# Read remote capabilities and check them against 

  1   2   >