Re: [PATCH v3] unpack_trees: fix breakage when o->src_index != o->dst_index

2018-04-30 Thread Duy Nguyen
On Mon, Apr 30, 2018 at 6:19 PM, Elijah Newren  wrote:
> Hi Duy,
>
> On Mon, Apr 30, 2018 at 7:45 AM, Duy Nguyen  wrote:
>> On Mon, Apr 30, 2018 at 4:42 PM, Duy Nguyen  wrote:
>>> On Sun, Apr 29, 2018 at 10:53 PM, Johannes Schindelin
>>>  wrote:
> > @@ -1412,12 +1422,13 @@ int unpack_trees(unsigned len, struct tree_desc 
> > *t, struct unpack_trees_options
> >   WRITE_TREE_SILENT |
> >   WRITE_TREE_REPAIR);
> > }
> > -   move_index_extensions(>result, o->dst_index);
> > +   move_index_extensions(>result, o->src_index);
>
> While this looks like the right thing to do on paper, I believe it's
> actually broken for a specific case of untracked cache. In short,
> please do not touch this line. I will send a patch to revert
> edf3b90553 (unpack-trees: preserve index extensions - 2017-05-08),
> which essentially deletes this line, with proper explanation and
> perhaps a test if I could come up with one.
>
> When we update the index, we depend on the fact that all updates must
> invalidate the right untracked cache correctly. In this unpack
> operations, we start copying entries over from src to result. Since
> 'result' (at least from the beginning) does not have an untracked
> cache, it has nothing to invalidate when we copy entries over. By the
> time we have done preparing 'result', what's recorded in src's (or
> dst's for that matter) untracked cache may or may not apply to
> 'result'  index anymore. This copying only leads to more problems when
> untracked cache is used.

 Is there really no way to invalidate just individual entries?
>>>
>>> Grr the short answer is the current code (i.e. without Elijah's
>>> changes) works but in a twisted way. So you get to keep untracked
>>> cache in the end.
>>
>> GAAAHH.. it works _with_ Elijah's changes (since he made the change
>> from dst to src) not without (and no performance regression).
>
> So...is that an Acked-by for the patch

Yes, Acked-by: me.

> or does the "two wrong make a
> right, I guess" comment suggest that we should still drop the
> move_index_extensions change (essentially reverting to v1 of the PATCH
> as found at 20180421193736.12722-1-new...@gmail.com), and you'll fix
> things up further in a separate series?

I think I'll stay away from this file for a while. When I gather
enough courage, I'll need to read it through since it sounds like a
mine field.
-- 
Duy


Re: [PATCH v3] unpack_trees: fix breakage when o->src_index != o->dst_index

2018-04-30 Thread Elijah Newren
Hi Duy,

On Mon, Apr 30, 2018 at 7:45 AM, Duy Nguyen  wrote:
> On Mon, Apr 30, 2018 at 4:42 PM, Duy Nguyen  wrote:
>> On Sun, Apr 29, 2018 at 10:53 PM, Johannes Schindelin
>>  wrote:
 > @@ -1412,12 +1422,13 @@ int unpack_trees(unsigned len, struct tree_desc 
 > *t, struct unpack_trees_options
 >   WRITE_TREE_SILENT |
 >   WRITE_TREE_REPAIR);
 > }
 > -   move_index_extensions(>result, o->dst_index);
 > +   move_index_extensions(>result, o->src_index);

 While this looks like the right thing to do on paper, I believe it's
 actually broken for a specific case of untracked cache. In short,
 please do not touch this line. I will send a patch to revert
 edf3b90553 (unpack-trees: preserve index extensions - 2017-05-08),
 which essentially deletes this line, with proper explanation and
 perhaps a test if I could come up with one.

 When we update the index, we depend on the fact that all updates must
 invalidate the right untracked cache correctly. In this unpack
 operations, we start copying entries over from src to result. Since
 'result' (at least from the beginning) does not have an untracked
 cache, it has nothing to invalidate when we copy entries over. By the
 time we have done preparing 'result', what's recorded in src's (or
 dst's for that matter) untracked cache may or may not apply to
 'result'  index anymore. This copying only leads to more problems when
 untracked cache is used.
>>>
>>> Is there really no way to invalidate just individual entries?
>>
>> Grr the short answer is the current code (i.e. without Elijah's
>> changes) works but in a twisted way. So you get to keep untracked
>> cache in the end.
>
> GAAAHH.. it works _with_ Elijah's changes (since he made the change
> from dst to src) not without (and no performance regression).

So...is that an Acked-by for the patch, or does the "two wrong make a
right, I guess" comment suggest that we should still drop the
move_index_extensions change (essentially reverting to v1 of the PATCH
as found at 20180421193736.12722-1-new...@gmail.com), and you'll fix
things up further in a separate series?

> This file really messes my brain up.

I'm glad I'm not the only one.  :-)


Elijah


Re: [PATCH v3] unpack_trees: fix breakage when o->src_index != o->dst_index

2018-04-30 Thread Duy Nguyen
On Mon, Apr 30, 2018 at 4:42 PM, Duy Nguyen  wrote:
> On Sun, Apr 29, 2018 at 10:53 PM, Johannes Schindelin
>  wrote:
>>> > @@ -1412,12 +1422,13 @@ int unpack_trees(unsigned len, struct tree_desc 
>>> > *t, struct unpack_trees_options
>>> >   WRITE_TREE_SILENT |
>>> >   WRITE_TREE_REPAIR);
>>> > }
>>> > -   move_index_extensions(>result, o->dst_index);
>>> > +   move_index_extensions(>result, o->src_index);
>>>
>>> While this looks like the right thing to do on paper, I believe it's
>>> actually broken for a specific case of untracked cache. In short,
>>> please do not touch this line. I will send a patch to revert
>>> edf3b90553 (unpack-trees: preserve index extensions - 2017-05-08),
>>> which essentially deletes this line, with proper explanation and
>>> perhaps a test if I could come up with one.
>>>
>>> When we update the index, we depend on the fact that all updates must
>>> invalidate the right untracked cache correctly. In this unpack
>>> operations, we start copying entries over from src to result. Since
>>> 'result' (at least from the beginning) does not have an untracked
>>> cache, it has nothing to invalidate when we copy entries over. By the
>>> time we have done preparing 'result', what's recorded in src's (or
>>> dst's for that matter) untracked cache may or may not apply to
>>> 'result'  index anymore. This copying only leads to more problems when
>>> untracked cache is used.
>>
>> Is there really no way to invalidate just individual entries?
>
> Grr the short answer is the current code (i.e. without Elijah's
> changes) works but in a twisted way. So you get to keep untracked
> cache in the end.

GAAAHH.. it works _with_ Elijah's changes (since he made the change
from dst to src) not without (and no performance regression). This
file really messes my brain up.
-- 
Duy


Re: [PATCH v3] unpack_trees: fix breakage when o->src_index != o->dst_index

2018-04-30 Thread Duy Nguyen
On Sun, Apr 29, 2018 at 10:53 PM, Johannes Schindelin
 wrote:
>> > @@ -1412,12 +1422,13 @@ int unpack_trees(unsigned len, struct tree_desc 
>> > *t, struct unpack_trees_options
>> >   WRITE_TREE_SILENT |
>> >   WRITE_TREE_REPAIR);
>> > }
>> > -   move_index_extensions(>result, o->dst_index);
>> > +   move_index_extensions(>result, o->src_index);
>>
>> While this looks like the right thing to do on paper, I believe it's
>> actually broken for a specific case of untracked cache. In short,
>> please do not touch this line. I will send a patch to revert
>> edf3b90553 (unpack-trees: preserve index extensions - 2017-05-08),
>> which essentially deletes this line, with proper explanation and
>> perhaps a test if I could come up with one.
>>
>> When we update the index, we depend on the fact that all updates must
>> invalidate the right untracked cache correctly. In this unpack
>> operations, we start copying entries over from src to result. Since
>> 'result' (at least from the beginning) does not have an untracked
>> cache, it has nothing to invalidate when we copy entries over. By the
>> time we have done preparing 'result', what's recorded in src's (or
>> dst's for that matter) untracked cache may or may not apply to
>> 'result'  index anymore. This copying only leads to more problems when
>> untracked cache is used.
>
> Is there really no way to invalidate just individual entries?

Grr the short answer is the current code (i.e. without Elijah's
changes) works but in a twisted way. So you get to keep untracked
cache in the end.

I was right about the invalidation stuff. I knew about
invalidate_ce_path() in this file. What I didn't remember was this
function actually invalidates entries from the _source_ index, not the
result one. What kind of logic is that? You copy/move entries from
source to result than you go invalidate the source. Since the original
move_index_extensions() call moves extensions from the source, these
are already properly invalidated (both untracked cache and cache
tree), it it looks like it does the right thing. Two wrongs make a
right, I guess.

Sorry for venting. I was not happy with what I found. And sorry for
wasting your time making this move_index.. change then remove it.

> I have a couple of worktrees which are *huge*. And edf3b90553 really
> helped relieve the pain a bit when running `git status`. Now you say that
> even a `git checkout -b new-branch` would blow the untracked cache away
> again?
>
> It would be *really* nice if we could prevent that performance regression
> somehow.
>
> Ciao,
> Dscho



-- 
Duy


Re: [PATCH v3] unpack_trees: fix breakage when o->src_index != o->dst_index

2018-04-29 Thread Johannes Schindelin
Hi Duy,

On Sun, 29 Apr 2018, Duy Nguyen wrote:

> On Tue, Apr 24, 2018 at 8:50 AM, Elijah Newren  wrote:
> > Currently, all callers of unpack_trees() set o->src_index == o->dst_index.
> > The code in unpack_trees() does not correctly handle them being different.
> > There are two separate issues:
> >
> > First, there is the possibility of memory corruption.  Since
> > unpack_trees() creates a temporary index in o->result and then discards
> > o->dst_index and overwrites it with o->result, in the special case that
> > o->src_index == o->dst_index, it is safe to just reuse o->src_index's
> > split_index for o->result.  However, when src and dst are different,
> > reusing o->src_index's split_index for o->result will cause the
> > split_index to be shared.  If either index then has entries replaced or
> > removed, it will result in the other index referring to free()'d memory.
> >
> > Second, we can drop the index extensions.  Previously, we were moving
> > index extensions from o->dst_index to o->result.  Since o->src_index is
> > the one that will have the necessary extensions (o->dst_index is likely to
> > be a new index temporary index created to store the results), we should be
> > moving the index extensions from there.
> >
> > Signed-off-by: Elijah Newren 
> > ---
> >
> > Differences from v2:
> >   - Don't NULLify src_index until we're done using it
> >   - Actually built and tested[1]
> >
> > But it now passes the testsuite on both linux and mac[2], and I even 
> > re-merged
> > all 53288 merge commits in linux.git (with a merge of this patch together 
> > with
> > the directory rename detection series) for good measure.  [Only 7 commits
> > showed a difference, all due to directory rename detection kicking in.]
> >
> > [1] Turns out that getting all fancy with an m4.10xlarge and nice levels of
> > parallelization are great until you realize that your new setup omitted a
> > critical step, leaving you running a slightly stale version of git 
> > instead...
> > :-(
> >
> > [2] Actually, I get two test failures on mac from t0050-filesystem.sh, both
> > with unicode normalization tests, but those two tests fail before my changes
> > too.  All the other tests pass.
> >
> >  unpack-trees.c | 19 +++
> >  1 file changed, 15 insertions(+), 4 deletions(-)
> >
> > diff --git a/unpack-trees.c b/unpack-trees.c
> > index e73745051e..49526d70aa 100644
> > --- a/unpack-trees.c
> > +++ b/unpack-trees.c
> > @@ -1284,9 +1284,20 @@ int unpack_trees(unsigned len, struct tree_desc *t, 
> > struct unpack_trees_options
> > o->result.timestamp.sec = o->src_index->timestamp.sec;
> > o->result.timestamp.nsec = o->src_index->timestamp.nsec;
> > o->result.version = o->src_index->version;
> > -   o->result.split_index = o->src_index->split_index;
> > -   if (o->result.split_index)
> > +   if (!o->src_index->split_index) {
> > +   o->result.split_index = NULL;
> > +   } else if (o->src_index == o->dst_index) {
> > +   /*
> > +* o->dst_index (and thus o->src_index) will be discarded
> > +* and overwritten with o->result at the end of this 
> > function,
> > +* so just use src_index's split_index to avoid having to
> > +* create a new one.
> > +*/
> > +   o->result.split_index = o->src_index->split_index;
> > o->result.split_index->refcount++;
> > +   } else {
> > +   o->result.split_index = init_split_index(>result);
> > +   }
> > hashcpy(o->result.sha1, o->src_index->sha1);
> > o->merge_size = len;
> > mark_all_ce_unused(o->src_index);
> > @@ -1401,7 +1412,6 @@ int unpack_trees(unsigned len, struct tree_desc *t, 
> > struct unpack_trees_options
> > }
> > }
> >
> > -   o->src_index = NULL;
> > ret = check_updates(o) ? (-2) : 0;
> > if (o->dst_index) {
> > if (!ret) {
> > @@ -1412,12 +1422,13 @@ int unpack_trees(unsigned len, struct tree_desc *t, 
> > struct unpack_trees_options
> >   WRITE_TREE_SILENT |
> >   WRITE_TREE_REPAIR);
> > }
> > -   move_index_extensions(>result, o->dst_index);
> > +   move_index_extensions(>result, o->src_index);
> 
> While this looks like the right thing to do on paper, I believe it's
> actually broken for a specific case of untracked cache. In short,
> please do not touch this line. I will send a patch to revert
> edf3b90553 (unpack-trees: preserve index extensions - 2017-05-08),
> which essentially deletes this line, with proper explanation and
> perhaps a test if I could come up with one.
> 
> When we update the index, we depend on the fact that all updates must
> invalidate the right untracked cache correctly. In this unpack
> operations, we 

Re: [PATCH v3] unpack_trees: fix breakage when o->src_index != o->dst_index

2018-04-29 Thread Duy Nguyen
On Tue, Apr 24, 2018 at 8:50 AM, Elijah Newren  wrote:
> Currently, all callers of unpack_trees() set o->src_index == o->dst_index.
> The code in unpack_trees() does not correctly handle them being different.
> There are two separate issues:
>
> First, there is the possibility of memory corruption.  Since
> unpack_trees() creates a temporary index in o->result and then discards
> o->dst_index and overwrites it with o->result, in the special case that
> o->src_index == o->dst_index, it is safe to just reuse o->src_index's
> split_index for o->result.  However, when src and dst are different,
> reusing o->src_index's split_index for o->result will cause the
> split_index to be shared.  If either index then has entries replaced or
> removed, it will result in the other index referring to free()'d memory.
>
> Second, we can drop the index extensions.  Previously, we were moving
> index extensions from o->dst_index to o->result.  Since o->src_index is
> the one that will have the necessary extensions (o->dst_index is likely to
> be a new index temporary index created to store the results), we should be
> moving the index extensions from there.
>
> Signed-off-by: Elijah Newren 
> ---
>
> Differences from v2:
>   - Don't NULLify src_index until we're done using it
>   - Actually built and tested[1]
>
> But it now passes the testsuite on both linux and mac[2], and I even re-merged
> all 53288 merge commits in linux.git (with a merge of this patch together with
> the directory rename detection series) for good measure.  [Only 7 commits
> showed a difference, all due to directory rename detection kicking in.]
>
> [1] Turns out that getting all fancy with an m4.10xlarge and nice levels of
> parallelization are great until you realize that your new setup omitted a
> critical step, leaving you running a slightly stale version of git instead...
> :-(
>
> [2] Actually, I get two test failures on mac from t0050-filesystem.sh, both
> with unicode normalization tests, but those two tests fail before my changes
> too.  All the other tests pass.
>
>  unpack-trees.c | 19 +++
>  1 file changed, 15 insertions(+), 4 deletions(-)
>
> diff --git a/unpack-trees.c b/unpack-trees.c
> index e73745051e..49526d70aa 100644
> --- a/unpack-trees.c
> +++ b/unpack-trees.c
> @@ -1284,9 +1284,20 @@ int unpack_trees(unsigned len, struct tree_desc *t, 
> struct unpack_trees_options
> o->result.timestamp.sec = o->src_index->timestamp.sec;
> o->result.timestamp.nsec = o->src_index->timestamp.nsec;
> o->result.version = o->src_index->version;
> -   o->result.split_index = o->src_index->split_index;
> -   if (o->result.split_index)
> +   if (!o->src_index->split_index) {
> +   o->result.split_index = NULL;
> +   } else if (o->src_index == o->dst_index) {
> +   /*
> +* o->dst_index (and thus o->src_index) will be discarded
> +* and overwritten with o->result at the end of this function,
> +* so just use src_index's split_index to avoid having to
> +* create a new one.
> +*/
> +   o->result.split_index = o->src_index->split_index;
> o->result.split_index->refcount++;
> +   } else {
> +   o->result.split_index = init_split_index(>result);
> +   }
> hashcpy(o->result.sha1, o->src_index->sha1);
> o->merge_size = len;
> mark_all_ce_unused(o->src_index);
> @@ -1401,7 +1412,6 @@ int unpack_trees(unsigned len, struct tree_desc *t, 
> struct unpack_trees_options
> }
> }
>
> -   o->src_index = NULL;
> ret = check_updates(o) ? (-2) : 0;
> if (o->dst_index) {
> if (!ret) {
> @@ -1412,12 +1422,13 @@ int unpack_trees(unsigned len, struct tree_desc *t, 
> struct unpack_trees_options
>   WRITE_TREE_SILENT |
>   WRITE_TREE_REPAIR);
> }
> -   move_index_extensions(>result, o->dst_index);
> +   move_index_extensions(>result, o->src_index);

While this looks like the right thing to do on paper, I believe it's
actually broken for a specific case of untracked cache. In short,
please do not touch this line. I will send a patch to revert
edf3b90553 (unpack-trees: preserve index extensions - 2017-05-08),
which essentially deletes this line, with proper explanation and
perhaps a test if I could come up with one.

When we update the index, we depend on the fact that all updates must
invalidate the right untracked cache correctly. In this unpack
operations, we start copying entries over from src to result. Since
'result' (at least from the beginning) does not have an untracked
cache, it has nothing to invalidate when we copy entries over. By the
time we have done preparing 'result', what's recorded in src's (or
dst's for 

[PATCH v3] unpack_trees: fix breakage when o->src_index != o->dst_index

2018-04-24 Thread Elijah Newren
Currently, all callers of unpack_trees() set o->src_index == o->dst_index.
The code in unpack_trees() does not correctly handle them being different.
There are two separate issues:

First, there is the possibility of memory corruption.  Since
unpack_trees() creates a temporary index in o->result and then discards
o->dst_index and overwrites it with o->result, in the special case that
o->src_index == o->dst_index, it is safe to just reuse o->src_index's
split_index for o->result.  However, when src and dst are different,
reusing o->src_index's split_index for o->result will cause the
split_index to be shared.  If either index then has entries replaced or
removed, it will result in the other index referring to free()'d memory.

Second, we can drop the index extensions.  Previously, we were moving
index extensions from o->dst_index to o->result.  Since o->src_index is
the one that will have the necessary extensions (o->dst_index is likely to
be a new index temporary index created to store the results), we should be
moving the index extensions from there.

Signed-off-by: Elijah Newren 
---

Differences from v2:
  - Don't NULLify src_index until we're done using it
  - Actually built and tested[1]

But it now passes the testsuite on both linux and mac[2], and I even re-merged
all 53288 merge commits in linux.git (with a merge of this patch together with
the directory rename detection series) for good measure.  [Only 7 commits
showed a difference, all due to directory rename detection kicking in.]

[1] Turns out that getting all fancy with an m4.10xlarge and nice levels of
parallelization are great until you realize that your new setup omitted a
critical step, leaving you running a slightly stale version of git instead...
:-(

[2] Actually, I get two test failures on mac from t0050-filesystem.sh, both
with unicode normalization tests, but those two tests fail before my changes
too.  All the other tests pass.

 unpack-trees.c | 19 +++
 1 file changed, 15 insertions(+), 4 deletions(-)

diff --git a/unpack-trees.c b/unpack-trees.c
index e73745051e..49526d70aa 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -1284,9 +1284,20 @@ int unpack_trees(unsigned len, struct tree_desc *t, 
struct unpack_trees_options
o->result.timestamp.sec = o->src_index->timestamp.sec;
o->result.timestamp.nsec = o->src_index->timestamp.nsec;
o->result.version = o->src_index->version;
-   o->result.split_index = o->src_index->split_index;
-   if (o->result.split_index)
+   if (!o->src_index->split_index) {
+   o->result.split_index = NULL;
+   } else if (o->src_index == o->dst_index) {
+   /*
+* o->dst_index (and thus o->src_index) will be discarded
+* and overwritten with o->result at the end of this function,
+* so just use src_index's split_index to avoid having to
+* create a new one.
+*/
+   o->result.split_index = o->src_index->split_index;
o->result.split_index->refcount++;
+   } else {
+   o->result.split_index = init_split_index(>result);
+   }
hashcpy(o->result.sha1, o->src_index->sha1);
o->merge_size = len;
mark_all_ce_unused(o->src_index);
@@ -1401,7 +1412,6 @@ int unpack_trees(unsigned len, struct tree_desc *t, 
struct unpack_trees_options
}
}
 
-   o->src_index = NULL;
ret = check_updates(o) ? (-2) : 0;
if (o->dst_index) {
if (!ret) {
@@ -1412,12 +1422,13 @@ int unpack_trees(unsigned len, struct tree_desc *t, 
struct unpack_trees_options
  WRITE_TREE_SILENT |
  WRITE_TREE_REPAIR);
}
-   move_index_extensions(>result, o->dst_index);
+   move_index_extensions(>result, o->src_index);
discard_index(o->dst_index);
*o->dst_index = o->result;
} else {
discard_index(>result);
}
+   o->src_index = NULL;
 
 done:
clear_exclude_list();
-- 
2.17.0.253.g32393f1d0a