Re: [PATCH 00/30] Add directory rename detection to git

2017-11-13 Thread Philip Oakley

From: "Elijah Newren" 
: Friday, November 10, 2017 11:26 PM
On Fri, Nov 10, 2017 at 2:27 PM, Philip Oakley  
wrote:

From: "Elijah Newren" 


In this patchset, I introduce directory rename detection to
merge-recursive,
predominantly so that when files are added to directories on one side of
history and those directories are renamed on the other side of history,
the
files will end up in the proper location after a merge or cherry-pick.

However, this isn't limited to that simplistic case.  More interesting
possibilities exist, such as:

 * a file being renamed into a directory which is renamed on the other
   side of history, causing the need for a transitive rename.



How does this cope with the case insensitive case preserving file systems 
on
Mac and Windows, esp when core.ignorecase is true. If it's a bigger 
problem

that the series already covers, would the likely changes be reasonably
localised?

This came up recently on GfW for `git checkout` of a branch where the 
case
changed ("Test" <-> "test"), but git didn't notice that it needed to 
rename

the directories on such an file system.
https://github.com/git-for-windows/git/issues/1333


I wasn't aware there were problems with git on case insensitive case
preserving filesystems; fixing them wasn't something I had in mind
when writing this series.


I was mainly ensuring awareness of the potential issue, as it's not easy to 
solve.



However, the particular bug you mention is
actually completely orthogonal to this series; it talks about
git-checkout without the -m/--merge option, which doesn't touch any
code path I modified in my series, so my series can't really fix or
worsen that particular issue.


That's good.


But, if there are further issues with such filesystems that also
affect merges/cherry-picks/rebases, then I don't think my series will
either help or hurt there either.  The recursive merge machinery
already has remove_file() and update_file() wrappers that it uses
whenever it needs to remove/add/update a file in the working directory
and/or index, and I have simply continued using those, so the number
of places you'd need to modify to fix issues would remain just as
localized as before.


It's when the working directory path/filename has a case change that goes 
undetected (one way or another) that can cause issues. I think that part of 
the problem (after awareness) is not having a cannonical expectation of 
which way is 'right', and what options there may be. E,g. if a project is 
wholly on a case insensitive system then the filenames in the worktree never 
matter, but aligning the path/filenames in the repository would still be a 
problem.



 Also, I continue to depend on the reading of the
index & trees that unpack_trees() does, which I haven't modified, so
again it'd be the same number of places that someone would need to
fix.  (However, the whole design to have unpack_trees() do the initial
work and then have recursive merge try to "fix it up" is really
starting to strain.


Interesting point.


 I'm starting to think, again, that merge
recursive needs a redesign, and have some arguments I wanted to float
out there...but I've dumped enough on the list for a day.)

It's possible that this series fixes one particular issue -- namely
when merging, if the merge-base contained a "Test" directory, one side
added a file to that directory, and the other side renamed "Test" to
"test", and if the presence of both "Test" and "test" directories in
the merge result is problematic, then at least with my fixes you
wouldn't end up with both directories and could thus avoid that
problem in a narrow set of cases.


I'll think on that. It may provide extra clues as to what the right 
solutions could be!


Sorry that I don't have any better news than that for you.

Elijah


Thanks
--
Philip 



Re: [PATCH 00/30] Add directory rename detection to git

2017-11-10 Thread Elijah Newren
On Fri, Nov 10, 2017 at 2:27 PM, Philip Oakley  wrote:
> From: "Elijah Newren" 
>>
>> In this patchset, I introduce directory rename detection to
>> merge-recursive,
>> predominantly so that when files are added to directories on one side of
>> history and those directories are renamed on the other side of history,
>> the
>> files will end up in the proper location after a merge or cherry-pick.
>>
>> However, this isn't limited to that simplistic case.  More interesting
>> possibilities exist, such as:
>>
>>  * a file being renamed into a directory which is renamed on the other
>>side of history, causing the need for a transitive rename.
>>
>
> How does this cope with the case insensitive case preserving file systems on
> Mac and Windows, esp when core.ignorecase is true. If it's a bigger problem
> that the series already covers, would the likely changes be reasonably
> localised?
>
> This came up recently on GfW for `git checkout` of a branch where the case
> changed ("Test" <-> "test"), but git didn't notice that it needed to rename
> the directories on such an file system.
> https://github.com/git-for-windows/git/issues/1333

I wasn't aware there were problems with git on case insensitive case
preserving filesystems; fixing them wasn't something I had in mind
when writing this series.  However, the particular bug you mention is
actually completely orthogonal to this series; it talks about
git-checkout without the -m/--merge option, which doesn't touch any
code path I modified in my series, so my series can't really fix or
worsen that particular issue.

But, if there are further issues with such filesystems that also
affect merges/cherry-picks/rebases, then I don't think my series will
either help or hurt there either.  The recursive merge machinery
already has remove_file() and update_file() wrappers that it uses
whenever it needs to remove/add/update a file in the working directory
and/or index, and I have simply continued using those, so the number
of places you'd need to modify to fix issues would remain just as
localized as before.  Also, I continue to depend on the reading of the
index & trees that unpack_trees() does, which I haven't modified, so
again it'd be the same number of places that someone would need to
fix.  (However, the whole design to have unpack_trees() do the initial
work and then have recursive merge try to "fix it up" is really
starting to strain.  I'm starting to think, again, that merge
recursive needs a redesign, and have some arguments I wanted to float
out there...but I've dumped enough on the list for a day.)

It's possible that this series fixes one particular issue -- namely
when merging, if the merge-base contained a "Test" directory, one side
added a file to that directory, and the other side renamed "Test" to
"test", and if the presence of both "Test" and "test" directories in
the merge result is problematic, then at least with my fixes you
wouldn't end up with both directories and could thus avoid that
problem in a narrow set of cases.

Sorry that I don't have any better news than that for you.

Elijah


Re: [PATCH 00/30] Add directory rename detection to git

2017-11-10 Thread Philip Oakley

From: "Elijah Newren" 

[This series is entirely independent of my rename detection limits series.
However, I have a separate rename detection performance series that 
depends

on both this series and the rename detection limits series.]

In this patchset, I introduce directory rename detection to 
merge-recursive,

predominantly so that when files are added to directories on one side of
history and those directories are renamed on the other side of history, 
the

files will end up in the proper location after a merge or cherry-pick.

However, this isn't limited to that simplistic case.  More interesting
possibilities exist, such as:

 * a file being renamed into a directory which is renamed on the other
   side of history, causing the need for a transitive rename.



How does this cope with the case insensitive case preserving file systems on 
Mac and Windows, esp when core.ignorecase is true. If it's a bigger problem 
that the series already covers, would the likely changes be reasonably 
localised?


This came up recently on GfW for `git checkout` of a branch where the case 
changed ("Test" <-> "test"), but git didn't notice that it needed to rename 
the directories on such an file system. 
https://github.com/git-for-windows/git/issues/1333




--
Philip 



[PATCH 00/30] Add directory rename detection to git

2017-11-10 Thread Elijah Newren
[This series is entirely independent of my rename detection limits series.
However, I have a separate rename detection performance series that depends
on both this series and the rename detection limits series.]

In this patchset, I introduce directory rename detection to merge-recursive,
predominantly so that when files are added to directories on one side of
history and those directories are renamed on the other side of history, the
files will end up in the proper location after a merge or cherry-pick.

However, this isn't limited to that simplistic case.  More interesting
possibilities exist, such as:

  * a file being renamed into a directory which is renamed on the other
side of history, causing the need for a transitive rename.

  * two (or three or N) directories being merged (with no conflicts so
long as files/directories within the merged directory have different
names), and the "merging" being detected as a directory rename for
each original directory.

  * not all files in a directory being renamed to the same location;
i.e. perhaps the directory was renamed, but some files within it were
renamed to a different location

  * a directory being renamed, which also contained a subdirectory that
was renamed to some entirely different location.  (And perhaps the
inner directory itself contained inner directories that were renamed
to yet other locations).

Also, I found it useful to allow all files within the directory being
renamed to themselves be renamed and still detect the directory rename.
For example, if goal/a and goal/b are renamed to priority/alpha and
priority/bravo, we can detect that goal/ was renamed to priority/, so that
if someone adds goal/c on the other side of history, after the merge we'll
end up with priority/c.  (In the absence of a readily available
libmindread.so library that I can link to, we can't rename directly from
goal/c to priority/charlie automatically, and will need to have priority/c
suffice.)

Naturally, an attempt to do all of the above brings up all kinds of
interesting edge and corner cases, some of which result in conflicts
that cannot be represented in the index, and others of which might be
considered too complex for users to understand and resolve.  For
example:

  * An add/add/add/.../add conflict, all on one side of history (see
testcase 9e in the new t6043, or any of the testcases in section 5)

  * Doubly, triply, or N-fold transitive renames (testcases 9c & 9d)

In order to prevent such problems, I introduce a couple basic rules that
limit when directory rename detection applies:

  1) If a subset of to-be-renamed files have a file or directory in the
 way (or would be in the way of each other), "turn off" the directory
 rename for those specific sub-paths and report the conflict to the
 user.

  2) If the other side of history did a directory rename to a path that
 your side of history renamed away, then ignore that particular
 rename from the other side of history for any implicit directory
 renames (but warn the user).

Further, there's a basic question about when directory rename detection
should be applied at all.  I have a simple rule:

  3) If a given directory still exists on both sides of a merge, we do
 not consider it to have been renamed.

Rule 3 may sound obvious at first, but it will probably arise as a
question for some users -- what if someone "mostly" moved a directory but
still left some files around, or, equivalently (from the perspective of the
three-way merge that merge-recursive performs), fully renamed a directory
in one commmit and then recreated that directory in a later commit adding
some new files and then tried to merge?  See the big comment in section 4
of the new t6043 for further discussion of this rule.

This set of rules seems to be reasonably easy to explain, is
self-consistent, allows all conflict cases to be represented without
changing any on-disk data structures or introducing new terminology or
commands for users, prevents excessively complex conflicts that users
might struggle to understand, and brings peace to the middle east.
Actually, maybe not that last one.

While I feel that this directory rename detection reduces the number of
suboptimal merges and cherry-picks that git performs, there are sadly
still a number of cases that remain suboptimal, or that even newly appear
to be not-quite-consistent with other cases.  The fact that one file
layout might trigger some of the rules above while another "slightly"
different file layout doesn't might occasionally cause some user
grumblings.  I've tried to explore and document these cases in section 8
of the new t6043-merge-rename-directories.sh

Finally, from an implementation perspective, there's another strong
advantage to the ruleset above: it means that any path to which we want
to apply an implicit directory rename will have a free and open spot
for us to move it into.  Thus, we can just adjust