Re: [PATCH v10 33/36] merge-recursive: fix was_tracked() to quit lying with some renamed paths

2018-04-20 Thread Elijah Newren
On Fri, Apr 20, 2018 at 5:23 AM, SZEDER Gábor  wrote:
> This patch causes memory corruption when the split index feature is in
> use, making several tests fail.  Now, while the split index feature
> sure has its own set of problems, AFAIK those are not that bad to
> cause memory corruption, they "only" tend to cause transient test
> failures due to a variant of the classic racy git issue [1].
>
> Here is a test failure:
>
>   $ GIT_TEST_SPLIT_INDEX=DareISayYes ./t3030-merge-recursive.sh

Running under valgrind shows that merge-recursive.c's add_cacheinfo
(which calls add_cache_entry()) results in data used by o->orig_index
getting free()'d.  That means that anything trying to use that memory
(whether a later call to discard_index, or just a call to was_dirty()
or was_tracked()) will be access'ing free'd memory.  (The exact same
tests run valgrind clean when GIT_TEST_SPLIT_INDEX is not turned on.)

The fact that add_cacheinfo() frees data used by o->orig_index
surprises me.  add_cacheinfo is only supposed to modify the_index.
Are o->orig_index and the_index sharing data somehow?  Did I do
something wrong or incomplete for the split index case when swapping
indexes?  My swapping logic, as shown in this patch was:

/*
 * Update the_index to match the new results, AFTER saving a copy
 * in o->orig_index.  Update src_index to point to the saved copy.
 * (verify_uptodate() checks src_index, and the original index is
 * the one that had the necessary modification timestamps.)
 */
o->orig_index = the_index;
the_index = tmp_index;
o->unpack_opts.src_index = >orig_index;

Do I need to do more?


Re: [PATCH v10 33/36] merge-recursive: fix was_tracked() to quit lying with some renamed paths

2018-04-20 Thread SZEDER Gábor
> In commit aacb82de3ff8 ("merge-recursive: Split was_tracked() out of
> would_lose_untracked()", 2011-08-11), was_tracked() was split out of
> would_lose_untracked() with the intent to provide a function that could
> answer whether a path was tracked in the index before the merge.  Sadly,
> it instead returned whether the path was in the working tree due to having
> been tracked in the index before the merge OR having been written there by
> unpack_trees().  The distinction is important when renames are involved,
> e.g. for a merge where:
> 
>HEAD:  modifies path b
>other: renames b->c
> 
> In this case, c was not tracked in the index before the merge, but would
> have been added to the index at stage 0 and written to the working tree by
> unpack_trees().  would_lose_untracked() is more interested in the
> in-working-copy-for-either-reason behavior, while all other uses of
> was_tracked() want just was-it-tracked-in-index-before-merge behavior.
> 
> Unsplit would_lose_untracked() and write a new was_tracked() function
> which answers whether a path was tracked in the index before the merge
> started.
> 
> This will also affect was_dirty(), helping it to return better results
> since it can base answers off the original index rather than an index that
> possibly only copied over some of the stat information.  However,
> was_dirty() will need an additional change that will be made in a
> subsequent patch.
> 
> Signed-off-by: Elijah Newren 
> ---

This patch causes memory corruption when the split index feature is in
use, making several tests fail.  Now, while the split index feature
sure has its own set of problems, AFAIK those are not that bad to
cause memory corruption, they "only" tend to cause transient test
failures due to a variant of the classic racy git issue [1].

Here is a test failure:

  $ GIT_TEST_SPLIT_INDEX=DareISayYes ./t3030-merge-recursive.sh
  <...>
  ok 31 - merge-recursive simple w/submodule result
  *** Error in `/home/szeder/src/git/git': free(): invalid pointer: 
0x01f646d0 ***
  === Backtrace: =
  /lib/x86_64-linux-gnu/libc.so.6(+0x777e5)[0x7f84e0c5b7e5]
  /lib/x86_64-linux-gnu/libc.so.6(+0x7f72a)[0x7f84e0c6372a]
  /lib/x86_64-linux-gnu/libc.so.6(cfree+0xf7)[0x7f84e0c685e7]
  /home/szeder/src/git/git[0x5181ee]
  /home/szeder/src/git/git[0x4f1e82]
  /home/szeder/src/git/git[0x4f394b]
  /home/szeder/src/git/git[0x44a37f]
  /home/szeder/src/git/git[0x44afa9]
  /home/szeder/src/git/git[0x406640]
  /home/szeder/src/git/git[0x4070f0]
  /home/szeder/src/git/git[0x4062a7]
  /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0)[0x7f84e0c04830]
  /home/szeder/src/git/git[0x4062f9]
  === Memory map: 
  0040-00616000 r-xp  08:06 2255502
/home/szeder/src/git/git
  00815000-00816000 r--p 00215000 08:06 2255502
/home/szeder/src/git/git
  00816000-00823000 rw-p 00216000 08:06 2255502
/home/szeder/src/git/git
  00823000-00866000 rw-p  00:00 0 
  01f63000-01fa6000 rw-p  00:00 0  
[heap]
  7f84e09ce000-7f84e09e4000 r-xp  08:06 921674 
/lib/x86_64-linux-gnu/libgcc_s.so.1
  7f84e09e4000-7f84e0be3000 ---p 00016000 08:06 921674 
/lib/x86_64-linux-gnu/libgcc_s.so.1
  7f84e0be3000-7f84e0be4000 rw-p 00015000 08:06 921674 
/lib/x86_64-linux-gnu/libgcc_s.so.1
  7f84e0be4000-7f84e0da4000 r-xp  08:06 917791 
/lib/x86_64-linux-gnu/libc-2.23.so
  7f84e0da4000-7f84e0fa4000 ---p 001c 08:06 917791 
/lib/x86_64-linux-gnu/libc-2.23.so
  7f84e0fa4000-7f84e0fa8000 r--p 001c 08:06 917791 
/lib/x86_64-linux-gnu/libc-2.23.so
  7f84e0fa8000-7f84e0faa000 rw-p 001c4000 08:06 917791 
/lib/x86_64-linux-gnu/libc-2.23.so
  7f84e0faa000-7f84e0fae000 rw-p  00:00 0 
  7f84e0fae000-7f84e0fb5000 r-xp  08:06 917825 
/lib/x86_64-linux-gnu/librt-2.23.so
  7f84e0fb5000-7f84e11b4000 ---p 7000 08:06 917825 
/lib/x86_64-linux-gnu/librt-2.23.so
  7f84e11b4000-7f84e11b5000 r--p 6000 08:06 917825 
/lib/x86_64-linux-gnu/librt-2.23.so
  7f84e11b5000-7f84e11b6000 rw-p 7000 08:06 917825 
/lib/x86_64-linux-gnu/librt-2.23.so
  7f84e11b6000-7f84e11ce000 r-xp  08:06 917789 
/lib/x86_64-linux-gnu/libpthread-2.23.so
  7f84e11ce000-7f84e13cd000 ---p 00018000 08:06 917789 
/lib/x86_64-linux-gnu/libpthread-2.23.so
  7f84e13cd000-7f84e13ce000 r--p 00017000 08:06 917789 
/lib/x86_64-linux-gnu/libpthread-2.23.so
  7f84e13ce000-7f84e13cf000 rw-p 00018000 08:06 917789 
/lib/x86_64-linux-gnu/libpthread-2.23.so
  7f84e13cf000-7f84e13d3000 rw-p  00:00 0 
  7f84e13d3000-7f84e13ec000 r-xp  08:06 

Re: [PATCH v10 33/36] merge-recursive: fix was_tracked() to quit lying with some renamed paths

2018-04-19 Thread Elijah Newren
On Thu, Apr 19, 2018 at 1:39 PM, Martin Ågren  wrote:
> On 19 April 2018 at 19:58, Elijah Newren  wrote:
>> +   /* Free the extra index left from git_merge_trees() */
>> +   /*
>> +* FIXME: Need to also data allocated by 
>> setup_unpack_trees_porcelain()
>> +* tucked away in o->unpack_opts.msgs, but the problem is that only
>> +* half of it refers to dynamically allocated data, while the other
>> +* half points at static strings.
>> +*/
>
> Timing. I've been preparing a patch that provides
> `clear_unpack_trees_porcelain()` and fixes all such leaks. (About 10% of
> all the leaks that are reported when I run the test-suite!) My patch

Nice!

> conflicts with this series for obvious reasons. Figuring out the
> conflict resolution might be non-trivial, and I suspect it would even be
> an evil merge. I'll be holding off on that patch until this has landed.
>
> BTW: s/also data/also free data/. But since I'm promising to get rid of
> this TODO quite soon after this is merged... ;-)

Oops, good catch.  I can fix it up since I need to fix the issues
SZEDER found, but yeah if you're just going to implement the fix and
rip this comment out then it's not that critical.


Re: [PATCH v10 33/36] merge-recursive: fix was_tracked() to quit lying with some renamed paths

2018-04-19 Thread Martin Ågren
On 19 April 2018 at 19:58, Elijah Newren  wrote:
> +   /* Free the extra index left from git_merge_trees() */
> +   /*
> +* FIXME: Need to also data allocated by 
> setup_unpack_trees_porcelain()
> +* tucked away in o->unpack_opts.msgs, but the problem is that only
> +* half of it refers to dynamically allocated data, while the other
> +* half points at static strings.
> +*/

Timing. I've been preparing a patch that provides
`clear_unpack_trees_porcelain()` and fixes all such leaks. (About 10% of
all the leaks that are reported when I run the test-suite!) My patch
conflicts with this series for obvious reasons. Figuring out the
conflict resolution might be non-trivial, and I suspect it would even be
an evil merge. I'll be holding off on that patch until this has landed.

BTW: s/also data/also free data/. But since I'm promising to get rid of
this TODO quite soon after this is merged... ;-)

Martin


[PATCH v10 33/36] merge-recursive: fix was_tracked() to quit lying with some renamed paths

2018-04-19 Thread Elijah Newren
In commit aacb82de3ff8 ("merge-recursive: Split was_tracked() out of
would_lose_untracked()", 2011-08-11), was_tracked() was split out of
would_lose_untracked() with the intent to provide a function that could
answer whether a path was tracked in the index before the merge.  Sadly,
it instead returned whether the path was in the working tree due to having
been tracked in the index before the merge OR having been written there by
unpack_trees().  The distinction is important when renames are involved,
e.g. for a merge where:

   HEAD:  modifies path b
   other: renames b->c

In this case, c was not tracked in the index before the merge, but would
have been added to the index at stage 0 and written to the working tree by
unpack_trees().  would_lose_untracked() is more interested in the
in-working-copy-for-either-reason behavior, while all other uses of
was_tracked() want just was-it-tracked-in-index-before-merge behavior.

Unsplit would_lose_untracked() and write a new was_tracked() function
which answers whether a path was tracked in the index before the merge
started.

This will also affect was_dirty(), helping it to return better results
since it can base answers off the original index rather than an index that
possibly only copied over some of the stat information.  However,
was_dirty() will need an additional change that will be made in a
subsequent patch.

Signed-off-by: Elijah Newren 
---

This patch is nearly identical to one I sent out as an RFC and which
was previously reviewed by Junio at

  
https://public-inbox.org/git/CABPp-BFPTJsTUVoPxxN=2u5jeqn1ngddvmnhp+vlzktgzau...@mail.gmail.com/

It is not clear whether my responses in that thread were sufficient, but
I did make the two changes I mentioned there:
  - Fix the broken comment in git_merge_trees()
  - Add a note to the comment in would_lose_untracked() about the
annoying worktree-first-then-index requirement

 merge-recursive.c | 91 ++-
 merge-recursive.h |  1 +
 2 files changed, 68 insertions(+), 24 deletions(-)

diff --git a/merge-recursive.c b/merge-recursive.c
index b32e8d817a..097de7e5a7 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -344,6 +344,7 @@ static int git_merge_trees(struct merge_options *o,
 {
int rc;
struct tree_desc t[3];
+   struct index_state tmp_index = { NULL };
 
memset(>unpack_opts, 0, sizeof(o->unpack_opts));
if (o->call_depth)
@@ -354,7 +355,7 @@ static int git_merge_trees(struct merge_options *o,
o->unpack_opts.head_idx = 2;
o->unpack_opts.fn = threeway_merge;
o->unpack_opts.src_index = _index;
-   o->unpack_opts.dst_index = _index;
+   o->unpack_opts.dst_index = _index;
setup_unpack_trees_porcelain(>unpack_opts, "merge");
 
init_tree_desc_from_tree(t+0, common);
@@ -362,13 +363,18 @@ static int git_merge_trees(struct merge_options *o,
init_tree_desc_from_tree(t+2, merge);
 
rc = unpack_trees(3, t, >unpack_opts);
+   cache_tree_free(_cache_tree);
+
/*
-* unpack_trees NULLifies src_index, but it's used in verify_uptodate,
-* so set to the new index which will usually have modification
-* timestamp info copied over.
+* Update the_index to match the new results, AFTER saving a copy
+* in o->orig_index.  Update src_index to point to the saved copy.
+* (verify_uptodate() checks src_index, and the original index is
+* the one that had the necessary modification timestamps.)
 */
-   o->unpack_opts.src_index = _index;
-   cache_tree_free(_cache_tree);
+   o->orig_index = the_index;
+   the_index = tmp_index;
+   o->unpack_opts.src_index = >orig_index;
+
return rc;
 }
 
@@ -773,31 +779,59 @@ static int dir_in_way(const char *path, int 
check_working_copy, int empty_ok)
!(empty_ok && is_empty_dir(path));
 }
 
-static int was_tracked(const char *path)
+/*
+ * Returns whether path was tracked in the index before the merge started
+ */
+static int was_tracked(struct merge_options *o, const char *path)
 {
-   int pos = cache_name_pos(path, strlen(path));
+   int pos = index_name_pos(>orig_index, path, strlen(path));
 
if (0 <= pos)
-   /* we have been tracking this path */
+   /* we were tracking this path before the merge */
return 1;
 
-   /*
-* Look for an unmerged entry for the path,
-* specifically stage #2, which would indicate
-* that "our" side before the merge started
-* had the path tracked (and resulted in a conflict).
-*/
-   for (pos = -1 - pos;
-pos < active_nr && !strcmp(path, active_cache[pos]->name);
-pos++)
-   if (ce_stage(active_cache[pos]) == 2)
-   return 1;
return 0;
 }
 
 static int would_lose_untracked(const char *path)
 {
-   return