Re: [PATCH 00/30] Add directory rename detection to git
From: "Elijah Newren": Friday, November 10, 2017 11:26 PM On Fri, Nov 10, 2017 at 2:27 PM, Philip Oakley wrote: From: "Elijah Newren" In this patchset, I introduce directory rename detection to merge-recursive, predominantly so that when files are added to directories on one side of history and those directories are renamed on the other side of history, the files will end up in the proper location after a merge or cherry-pick. However, this isn't limited to that simplistic case. More interesting possibilities exist, such as: * a file being renamed into a directory which is renamed on the other side of history, causing the need for a transitive rename. How does this cope with the case insensitive case preserving file systems on Mac and Windows, esp when core.ignorecase is true. If it's a bigger problem that the series already covers, would the likely changes be reasonably localised? This came up recently on GfW for `git checkout` of a branch where the case changed ("Test" <-> "test"), but git didn't notice that it needed to rename the directories on such an file system. https://github.com/git-for-windows/git/issues/1333 I wasn't aware there were problems with git on case insensitive case preserving filesystems; fixing them wasn't something I had in mind when writing this series. I was mainly ensuring awareness of the potential issue, as it's not easy to solve. However, the particular bug you mention is actually completely orthogonal to this series; it talks about git-checkout without the -m/--merge option, which doesn't touch any code path I modified in my series, so my series can't really fix or worsen that particular issue. That's good. But, if there are further issues with such filesystems that also affect merges/cherry-picks/rebases, then I don't think my series will either help or hurt there either. The recursive merge machinery already has remove_file() and update_file() wrappers that it uses whenever it needs to remove/add/update a file in the working directory and/or index, and I have simply continued using those, so the number of places you'd need to modify to fix issues would remain just as localized as before. It's when the working directory path/filename has a case change that goes undetected (one way or another) that can cause issues. I think that part of the problem (after awareness) is not having a cannonical expectation of which way is 'right', and what options there may be. E,g. if a project is wholly on a case insensitive system then the filenames in the worktree never matter, but aligning the path/filenames in the repository would still be a problem. Also, I continue to depend on the reading of the index & trees that unpack_trees() does, which I haven't modified, so again it'd be the same number of places that someone would need to fix. (However, the whole design to have unpack_trees() do the initial work and then have recursive merge try to "fix it up" is really starting to strain. Interesting point. I'm starting to think, again, that merge recursive needs a redesign, and have some arguments I wanted to float out there...but I've dumped enough on the list for a day.) It's possible that this series fixes one particular issue -- namely when merging, if the merge-base contained a "Test" directory, one side added a file to that directory, and the other side renamed "Test" to "test", and if the presence of both "Test" and "test" directories in the merge result is problematic, then at least with my fixes you wouldn't end up with both directories and could thus avoid that problem in a narrow set of cases. I'll think on that. It may provide extra clues as to what the right solutions could be! Sorry that I don't have any better news than that for you. Elijah Thanks -- Philip
Re: [PATCH 00/30] Add directory rename detection to git
On Fri, Nov 10, 2017 at 2:27 PM, Philip Oakleywrote: > From: "Elijah Newren" >> >> In this patchset, I introduce directory rename detection to >> merge-recursive, >> predominantly so that when files are added to directories on one side of >> history and those directories are renamed on the other side of history, >> the >> files will end up in the proper location after a merge or cherry-pick. >> >> However, this isn't limited to that simplistic case. More interesting >> possibilities exist, such as: >> >> * a file being renamed into a directory which is renamed on the other >>side of history, causing the need for a transitive rename. >> > > How does this cope with the case insensitive case preserving file systems on > Mac and Windows, esp when core.ignorecase is true. If it's a bigger problem > that the series already covers, would the likely changes be reasonably > localised? > > This came up recently on GfW for `git checkout` of a branch where the case > changed ("Test" <-> "test"), but git didn't notice that it needed to rename > the directories on such an file system. > https://github.com/git-for-windows/git/issues/1333 I wasn't aware there were problems with git on case insensitive case preserving filesystems; fixing them wasn't something I had in mind when writing this series. However, the particular bug you mention is actually completely orthogonal to this series; it talks about git-checkout without the -m/--merge option, which doesn't touch any code path I modified in my series, so my series can't really fix or worsen that particular issue. But, if there are further issues with such filesystems that also affect merges/cherry-picks/rebases, then I don't think my series will either help or hurt there either. The recursive merge machinery already has remove_file() and update_file() wrappers that it uses whenever it needs to remove/add/update a file in the working directory and/or index, and I have simply continued using those, so the number of places you'd need to modify to fix issues would remain just as localized as before. Also, I continue to depend on the reading of the index & trees that unpack_trees() does, which I haven't modified, so again it'd be the same number of places that someone would need to fix. (However, the whole design to have unpack_trees() do the initial work and then have recursive merge try to "fix it up" is really starting to strain. I'm starting to think, again, that merge recursive needs a redesign, and have some arguments I wanted to float out there...but I've dumped enough on the list for a day.) It's possible that this series fixes one particular issue -- namely when merging, if the merge-base contained a "Test" directory, one side added a file to that directory, and the other side renamed "Test" to "test", and if the presence of both "Test" and "test" directories in the merge result is problematic, then at least with my fixes you wouldn't end up with both directories and could thus avoid that problem in a narrow set of cases. Sorry that I don't have any better news than that for you. Elijah
Re: [PATCH 00/30] Add directory rename detection to git
From: "Elijah Newren"[This series is entirely independent of my rename detection limits series. However, I have a separate rename detection performance series that depends on both this series and the rename detection limits series.] In this patchset, I introduce directory rename detection to merge-recursive, predominantly so that when files are added to directories on one side of history and those directories are renamed on the other side of history, the files will end up in the proper location after a merge or cherry-pick. However, this isn't limited to that simplistic case. More interesting possibilities exist, such as: * a file being renamed into a directory which is renamed on the other side of history, causing the need for a transitive rename. How does this cope with the case insensitive case preserving file systems on Mac and Windows, esp when core.ignorecase is true. If it's a bigger problem that the series already covers, would the likely changes be reasonably localised? This came up recently on GfW for `git checkout` of a branch where the case changed ("Test" <-> "test"), but git didn't notice that it needed to rename the directories on such an file system. https://github.com/git-for-windows/git/issues/1333 -- Philip
[PATCH 00/30] Add directory rename detection to git
[This series is entirely independent of my rename detection limits series. However, I have a separate rename detection performance series that depends on both this series and the rename detection limits series.] In this patchset, I introduce directory rename detection to merge-recursive, predominantly so that when files are added to directories on one side of history and those directories are renamed on the other side of history, the files will end up in the proper location after a merge or cherry-pick. However, this isn't limited to that simplistic case. More interesting possibilities exist, such as: * a file being renamed into a directory which is renamed on the other side of history, causing the need for a transitive rename. * two (or three or N) directories being merged (with no conflicts so long as files/directories within the merged directory have different names), and the "merging" being detected as a directory rename for each original directory. * not all files in a directory being renamed to the same location; i.e. perhaps the directory was renamed, but some files within it were renamed to a different location * a directory being renamed, which also contained a subdirectory that was renamed to some entirely different location. (And perhaps the inner directory itself contained inner directories that were renamed to yet other locations). Also, I found it useful to allow all files within the directory being renamed to themselves be renamed and still detect the directory rename. For example, if goal/a and goal/b are renamed to priority/alpha and priority/bravo, we can detect that goal/ was renamed to priority/, so that if someone adds goal/c on the other side of history, after the merge we'll end up with priority/c. (In the absence of a readily available libmindread.so library that I can link to, we can't rename directly from goal/c to priority/charlie automatically, and will need to have priority/c suffice.) Naturally, an attempt to do all of the above brings up all kinds of interesting edge and corner cases, some of which result in conflicts that cannot be represented in the index, and others of which might be considered too complex for users to understand and resolve. For example: * An add/add/add/.../add conflict, all on one side of history (see testcase 9e in the new t6043, or any of the testcases in section 5) * Doubly, triply, or N-fold transitive renames (testcases 9c & 9d) In order to prevent such problems, I introduce a couple basic rules that limit when directory rename detection applies: 1) If a subset of to-be-renamed files have a file or directory in the way (or would be in the way of each other), "turn off" the directory rename for those specific sub-paths and report the conflict to the user. 2) If the other side of history did a directory rename to a path that your side of history renamed away, then ignore that particular rename from the other side of history for any implicit directory renames (but warn the user). Further, there's a basic question about when directory rename detection should be applied at all. I have a simple rule: 3) If a given directory still exists on both sides of a merge, we do not consider it to have been renamed. Rule 3 may sound obvious at first, but it will probably arise as a question for some users -- what if someone "mostly" moved a directory but still left some files around, or, equivalently (from the perspective of the three-way merge that merge-recursive performs), fully renamed a directory in one commmit and then recreated that directory in a later commit adding some new files and then tried to merge? See the big comment in section 4 of the new t6043 for further discussion of this rule. This set of rules seems to be reasonably easy to explain, is self-consistent, allows all conflict cases to be represented without changing any on-disk data structures or introducing new terminology or commands for users, prevents excessively complex conflicts that users might struggle to understand, and brings peace to the middle east. Actually, maybe not that last one. While I feel that this directory rename detection reduces the number of suboptimal merges and cherry-picks that git performs, there are sadly still a number of cases that remain suboptimal, or that even newly appear to be not-quite-consistent with other cases. The fact that one file layout might trigger some of the rules above while another "slightly" different file layout doesn't might occasionally cause some user grumblings. I've tried to explore and document these cases in section 8 of the new t6043-merge-rename-directories.sh Finally, from an implementation perspective, there's another strong advantage to the ruleset above: it means that any path to which we want to apply an implicit directory rename will have a free and open spot for us to move it into. Thus, we can just adjust