Re: [PATCH/RFC/BUG] unpack-trees.c: do not use "the_index"

2018-06-04 Thread Brandon Williams
On 06/02, Elijah Newren wrote:
> On Fri, Jun 1, 2018 at 10:03 PM, Duy Nguyen  wrote:
> > On Fri, Jun 1, 2018 at 8:34 PM, Elijah Newren  wrote:
> >> On Fri, Jun 1, 2018 at 9:11 AM, Nguyễn Thái Ngọc Duy  
> >> wrote:
> >>> This is more of a bug report than an actual fix because I'm not sure
> >>> if "o->src_index" is always the correct answer instead of "the_index"
> >>> here. But this is very similar to 7db118303a (unpack_trees: fix
> >>> breakage when o->src_index != o->dst_index - 2018-04-23) and could
> >>> potentially break things again...
> 
> I'm pretty sure your patch is correct.  Adding Brandon Williams to the
> cc for comment since his patches came up in the analysis below...
> 
> >> Actually, I don't think the patch will break anything in the current
> >> code.  Currently, all callers of unpack_trees() (even merge recursive
> >> which uses different src_index and dst_index now) set src_index =
> >> _index.  So, these changes should continue to work as before (with
> >> a minor possible exception of merge-recursive calling into other
> >> functions from unpack-trees.c after unpack_trees() has finished..).
> >> That's not to say that your patch is bug free, just that I think any
> >> bugs shouldn't be triggerable from the current codebase.
> >
> > Ah.. I thought merge-recursive would do fancier things and used some
> > temporary index. Good to know.
> 
> Well, it does does use a temporary index, but for dst_index rather
> than src_index.  It then does some fancier things, but not until the
> call to unpack_trees() is over.  In particular, at that point, it
> swaps the_index and tmp_index, reversing their roles so that now
> tmp_index is the original index and the_index becomes the result after
> unpack_trees() is run.  That's done because I later want to use the
> original index for calling verify_uptodate().  verify_uptodate() is
> then used for was_tracked_and_matches() and was_tracked().
> 
> Anyway, the whole upshot of this is:
>   * merge-recursive uses src_index == _index for the unpack_trees() call.
>   * merge-recursive uses src_index == o->orig_index for subsequent
> calls to verify_uptodate(), but verify_uptodate() doesn't call into
> any of the sites you have modified.
> 
> Further:
>   * Every other existing caller of unpack-trees in the code sets
> src_index == _index, so this won't break any of them.
>   * There are only two callers in the codebase that set dst_index to
> something other than _index -- diff-lib.c (which sets it to NULL)
> and merge-recursive (which does the stuff described above).
> 
> So, having done that analysis, I am now pretty convinced your patch
> won't break anything.  That's one half...
> 
> >> Also, if any of the changes you made are wrong, what was there before
> >> was also clearly wrong.  So I think we're at least no worse off.
> >>
> >> But, I agree, it's not easy to verify what the correct thing should be
> >> in all cases.  I'll try to take a closer look in the next couple days.
> >
> > Thanks. I will also stare at this code some more in the next couple
> > days trying to remember what these functions do.
> 
> Your patch has two divisible parts:
> 
> 1) Your modifications to
>   * clear_ce_flags_1()
>   * clear_ce_flags_dir()
>   * clear_ce_flags()
>   * mark_new_skip_worktree()
> The clear_ce_flags*() functions are only called by each other and by
> mark_new_skip_worktree(), which in turn is only called from
> unpack_trees().  Also, in all of these, your change ends up only
> modifying what index_state is passed to is_excluded_from_list().
> 
> 2) Your modifications to
>   * verify_clean_subdirectory()
>   * check_ok_to_remove()
> In this case, the former is only called by the latter, and the latter
> ends up only being called (via verify_absent_1()) from verify_absent()
> and verify_absent_sparse().
> 
> I'll address each, in reverse order.
> 
> 2) The stuff that affects verify_absent()
> 
> While verify_absent() is not called from merge-recursive right now, it
> was something I wanted to use in the future for very similar reasons
> that verify_uptodate() started being called directly from
> merge-recursive.  In particular, if the rewrite of merge-recursive[A]
> I want to do sets index_only when calling unpack_trees(), then does
> the whole merge without touching the worktree, then at the end goes to
> update the working tree, it will need to do extra checks.
> verify_absent() will come in handy as one of those extra checks.  For
> that case, using the_index (the new index just created with lots of
> changes) would be wrong in all the same ways that using the_index
> caused massive problems for was_tracked() in merge-recursive (e.g. the
> blow up of when Junio merged the original directory rename detection
> series into master and subsequently reverted it); we'd instead want
> src_index (which was the index that existed when merge was called)
> instead.  So, with this patch you've fixed some important bugs that I
> would have hit later.
> 
> [A] 

Re: [PATCH/RFC/BUG] unpack-trees.c: do not use "the_index"

2018-06-02 Thread Elijah Newren
On Fri, Jun 1, 2018 at 10:03 PM, Duy Nguyen  wrote:
> On Fri, Jun 1, 2018 at 8:34 PM, Elijah Newren  wrote:
>> On Fri, Jun 1, 2018 at 9:11 AM, Nguyễn Thái Ngọc Duy  
>> wrote:
>>> This is more of a bug report than an actual fix because I'm not sure
>>> if "o->src_index" is always the correct answer instead of "the_index"
>>> here. But this is very similar to 7db118303a (unpack_trees: fix
>>> breakage when o->src_index != o->dst_index - 2018-04-23) and could
>>> potentially break things again...

I'm pretty sure your patch is correct.  Adding Brandon Williams to the
cc for comment since his patches came up in the analysis below...

>> Actually, I don't think the patch will break anything in the current
>> code.  Currently, all callers of unpack_trees() (even merge recursive
>> which uses different src_index and dst_index now) set src_index =
>> _index.  So, these changes should continue to work as before (with
>> a minor possible exception of merge-recursive calling into other
>> functions from unpack-trees.c after unpack_trees() has finished..).
>> That's not to say that your patch is bug free, just that I think any
>> bugs shouldn't be triggerable from the current codebase.
>
> Ah.. I thought merge-recursive would do fancier things and used some
> temporary index. Good to know.

Well, it does does use a temporary index, but for dst_index rather
than src_index.  It then does some fancier things, but not until the
call to unpack_trees() is over.  In particular, at that point, it
swaps the_index and tmp_index, reversing their roles so that now
tmp_index is the original index and the_index becomes the result after
unpack_trees() is run.  That's done because I later want to use the
original index for calling verify_uptodate().  verify_uptodate() is
then used for was_tracked_and_matches() and was_tracked().

Anyway, the whole upshot of this is:
  * merge-recursive uses src_index == _index for the unpack_trees() call.
  * merge-recursive uses src_index == o->orig_index for subsequent
calls to verify_uptodate(), but verify_uptodate() doesn't call into
any of the sites you have modified.

Further:
  * Every other existing caller of unpack-trees in the code sets
src_index == _index, so this won't break any of them.
  * There are only two callers in the codebase that set dst_index to
something other than _index -- diff-lib.c (which sets it to NULL)
and merge-recursive (which does the stuff described above).

So, having done that analysis, I am now pretty convinced your patch
won't break anything.  That's one half...

>> Also, if any of the changes you made are wrong, what was there before
>> was also clearly wrong.  So I think we're at least no worse off.
>>
>> But, I agree, it's not easy to verify what the correct thing should be
>> in all cases.  I'll try to take a closer look in the next couple days.
>
> Thanks. I will also stare at this code some more in the next couple
> days trying to remember what these functions do.

Your patch has two divisible parts:

1) Your modifications to
  * clear_ce_flags_1()
  * clear_ce_flags_dir()
  * clear_ce_flags()
  * mark_new_skip_worktree()
The clear_ce_flags*() functions are only called by each other and by
mark_new_skip_worktree(), which in turn is only called from
unpack_trees().  Also, in all of these, your change ends up only
modifying what index_state is passed to is_excluded_from_list().

2) Your modifications to
  * verify_clean_subdirectory()
  * check_ok_to_remove()
In this case, the former is only called by the latter, and the latter
ends up only being called (via verify_absent_1()) from verify_absent()
and verify_absent_sparse().

I'll address each, in reverse order.

2) The stuff that affects verify_absent()

While verify_absent() is not called from merge-recursive right now, it
was something I wanted to use in the future for very similar reasons
that verify_uptodate() started being called directly from
merge-recursive.  In particular, if the rewrite of merge-recursive[A]
I want to do sets index_only when calling unpack_trees(), then does
the whole merge without touching the worktree, then at the end goes to
update the working tree, it will need to do extra checks.
verify_absent() will come in handy as one of those extra checks.  For
that case, using the_index (the new index just created with lots of
changes) would be wrong in all the same ways that using the_index
caused massive problems for was_tracked() in merge-recursive (e.g. the
blow up of when Junio merged the original directory rename detection
series into master and subsequently reverted it); we'd instead want
src_index (which was the index that existed when merge was called)
instead.  So, with this patch you've fixed some important bugs that I
would have hit later.

[A] sidenote: see
https://public-inbox.org/git/xmqqk1ydkbx0@gitster.mtv.corp.google.com/
for more details

1) mark_new_skip_worktree() ... is_excluded_from_list().

Sadly, I'm not very familiar with the skip_worktree and sparse

Re: [PATCH/RFC/BUG] unpack-trees.c: do not use "the_index"

2018-06-01 Thread Duy Nguyen
On Fri, Jun 1, 2018 at 8:34 PM, Elijah Newren  wrote:
> Hi,
>
> On Fri, Jun 1, 2018 at 9:11 AM, Nguyễn Thái Ngọc Duy  
> wrote:
>> unpack-trees code works on multiple indexes specified in
>> unpack_trees_options. Although they normally all refer to the_index at
>> the call site, that is the caller's business. unpack-trees.c should
>> not make any assumption about that and should use the correct index
>> field in unpack_trees_options.
>>
>> This patch is actually confusing because sometimes the function
>> parameter is also named "the_index" while some other times "the_index"
>> is the global variable because the function just does not have a
>> parameter of the same name! The only subtle difference is that the
>> function parameter is a pointer while the global one is not.
>>
>> This is more of a bug report than an actual fix because I'm not sure
>> if "o->src_index" is always the correct answer instead of "the_index"
>> here. But this is very similar to 7db118303a (unpack_trees: fix
>> breakage when o->src_index != o->dst_index - 2018-04-23) and could
>> potentially break things again...
>
> Actually, I don't think the patch will break anything in the current
> code.  Currently, all callers of unpack_trees() (even merge recursive
> which uses different src_index and dst_index now) set src_index =
> _index.  So, these changes should continue to work as before (with
> a minor possible exception of merge-recursive calling into other
> functions from unpack-trees.c after unpack_trees() has finished..).
> That's not to say that your patch is bug free, just that I think any
> bugs shouldn't be triggerable from the current codebase.

Ah.. I thought merge-recursive would do fancier things and used some
temporary index. Good to know.

> Also, if any of the changes you made are wrong, what was there before
> was also clearly wrong.  So I think we're at least no worse off.
>
> But, I agree, it's not easy to verify what the correct thing should be
> in all cases.  I'll try to take a closer look in the next couple days.

Thanks. I will also stare at this code some more in the next couple
days trying to remember what these functions do.
-- 
Duy


Re: [PATCH/RFC/BUG] unpack-trees.c: do not use "the_index"

2018-06-01 Thread Duy Nguyen
On Fri, Jun 1, 2018 at 8:51 PM, Stefan Beller  wrote:
> On Fri, Jun 1, 2018 at 11:34 AM, Elijah Newren  wrote:
>
>>> +/* Do not use the_index here, you probably want o->src_index */
>>> +#define the_index the_index_should_not_be_used here
>>
>> Good call.
>
> Is the space instead of the underscore between the last two words intentional?

Unintentional. I wanted gcc to show this phrase when a developer
accidentally uses the_index here (it's basically
"the_index_should_not_be_used_here is not declared blah blah"). Two
words would cause a different syntax error.
-- 
Duy


Re: [PATCH/RFC/BUG] unpack-trees.c: do not use "the_index"

2018-06-01 Thread Stefan Beller
On Fri, Jun 1, 2018 at 11:34 AM, Elijah Newren  wrote:

>> +/* Do not use the_index here, you probably want o->src_index */
>> +#define the_index the_index_should_not_be_used here
>
> Good call.

Is the space instead of the underscore between the last two words intentional?


Re: [PATCH/RFC/BUG] unpack-trees.c: do not use "the_index"

2018-06-01 Thread Elijah Newren
Hi,

On Fri, Jun 1, 2018 at 9:11 AM, Nguyễn Thái Ngọc Duy  wrote:
> unpack-trees code works on multiple indexes specified in
> unpack_trees_options. Although they normally all refer to the_index at
> the call site, that is the caller's business. unpack-trees.c should
> not make any assumption about that and should use the correct index
> field in unpack_trees_options.
>
> This patch is actually confusing because sometimes the function
> parameter is also named "the_index" while some other times "the_index"
> is the global variable because the function just does not have a
> parameter of the same name! The only subtle difference is that the
> function parameter is a pointer while the global one is not.
>
> This is more of a bug report than an actual fix because I'm not sure
> if "o->src_index" is always the correct answer instead of "the_index"
> here. But this is very similar to 7db118303a (unpack_trees: fix
> breakage when o->src_index != o->dst_index - 2018-04-23) and could
> potentially break things again...

Actually, I don't think the patch will break anything in the current
code.  Currently, all callers of unpack_trees() (even merge recursive
which uses different src_index and dst_index now) set src_index =
_index.  So, these changes should continue to work as before (with
a minor possible exception of merge-recursive calling into other
functions from unpack-trees.c after unpack_trees() has finished..).
That's not to say that your patch is bug free, just that I think any
bugs shouldn't be triggerable from the current codebase.

Also, if any of the changes you made are wrong, what was there before
was also clearly wrong.  So I think we're at least no worse off.

But, I agree, it's not easy to verify what the correct thing should be
in all cases.  I'll try to take a closer look in the next couple days.

>
> Signed-off-by: Nguyễn Thái Ngọc Duy 
> ---
>  unpack-trees.c | 45 ++---
>  1 file changed, 26 insertions(+), 19 deletions(-)
>
> diff --git a/unpack-trees.c b/unpack-trees.c
> index 3a85a02a77..114496cfc2 100644
> --- a/unpack-trees.c
> +++ b/unpack-trees.c
> @@ -18,6 +18,9 @@
>  #include "fsmonitor.h"
>  #include "fetch-object.h"
>
> +/* Do not use the_index here, you probably want o->src_index */
> +#define the_index the_index_should_not_be_used here

Good call.


[PATCH/RFC/BUG] unpack-trees.c: do not use "the_index"

2018-06-01 Thread Nguyễn Thái Ngọc Duy
unpack-trees code works on multiple indexes specified in
unpack_trees_options. Although they normally all refer to the_index at
the call site, that is the caller's business. unpack-trees.c should
not make any assumption about that and should use the correct index
field in unpack_trees_options.

This patch is actually confusing because sometimes the function
parameter is also named "the_index" while some other times "the_index"
is the global variable because the function just does not have a
parameter of the same name! The only subtle difference is that the
function parameter is a pointer while the global one is not.

This is more of a bug report than an actual fix because I'm not sure
if "o->src_index" is always the correct answer instead of "the_index"
here. But this is very similar to 7db118303a (unpack_trees: fix
breakage when o->src_index != o->dst_index - 2018-04-23) and could
potentially break things again...

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 unpack-trees.c | 45 ++---
 1 file changed, 26 insertions(+), 19 deletions(-)

diff --git a/unpack-trees.c b/unpack-trees.c
index 3a85a02a77..114496cfc2 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -18,6 +18,9 @@
 #include "fsmonitor.h"
 #include "fetch-object.h"
 
+/* Do not use the_index here, you probably want o->src_index */
+#define the_index the_index_should_not_be_used here
+
 /*
  * Error messages expected by scripts out of plumbing commands such as
  * read-tree.  Non-scripted Porcelain is not required to use these messages
@@ -1085,13 +1088,15 @@ static int unpack_callback(int n, unsigned long mask, 
unsigned long dirmask, str
return mask;
 }
 
-static int clear_ce_flags_1(struct cache_entry **cache, int nr,
+static int clear_ce_flags_1(struct index_state *istate,
+   struct cache_entry **cache, int nr,
struct strbuf *prefix,
int select_mask, int clear_mask,
struct exclude_list *el, int defval);
 
 /* Whole directory matching */
-static int clear_ce_flags_dir(struct cache_entry **cache, int nr,
+static int clear_ce_flags_dir(struct index_state *istate,
+ struct cache_entry **cache, int nr,
  struct strbuf *prefix,
  char *basename,
  int select_mask, int clear_mask,
@@ -1100,7 +1105,7 @@ static int clear_ce_flags_dir(struct cache_entry **cache, 
int nr,
struct cache_entry **cache_end;
int dtype = DT_DIR;
int ret = is_excluded_from_list(prefix->buf, prefix->len,
-   basename, , el, _index);
+   basename, , el, istate);
int rc;
 
strbuf_addch(prefix, '/');
@@ -1122,7 +1127,7 @@ static int clear_ce_flags_dir(struct cache_entry **cache, 
int nr,
 * calling clear_ce_flags_1(). That function will call
 * the expensive is_excluded_from_list() on every entry.
 */
-   rc = clear_ce_flags_1(cache, cache_end - cache,
+   rc = clear_ce_flags_1(istate, cache, cache_end - cache,
  prefix,
  select_mask, clear_mask,
  el, ret);
@@ -1145,7 +1150,8 @@ static int clear_ce_flags_dir(struct cache_entry **cache, 
int nr,
  *   cache[0]->name[0..(prefix_len-1)]
  * Top level path has prefix_len zero.
  */
-static int clear_ce_flags_1(struct cache_entry **cache, int nr,
+static int clear_ce_flags_1(struct index_state *istate,
+   struct cache_entry **cache, int nr,
struct strbuf *prefix,
int select_mask, int clear_mask,
struct exclude_list *el, int defval)
@@ -1179,7 +1185,7 @@ static int clear_ce_flags_1(struct cache_entry **cache, 
int nr,
len = slash - name;
strbuf_add(prefix, name, len);
 
-   processed = clear_ce_flags_dir(cache, cache_end - cache,
+   processed = clear_ce_flags_dir(istate, cache, cache_end 
- cache,
   prefix,
   prefix->buf + 
prefix->len - len,
   select_mask, clear_mask,
@@ -1193,7 +1199,7 @@ static int clear_ce_flags_1(struct cache_entry **cache, 
int nr,
}
 
strbuf_addch(prefix, '/');
-   cache += clear_ce_flags_1(cache, cache_end - cache,
+   cache += clear_ce_flags_1(istate, cache, cache_end - 
cache,
  prefix,
  select_mask, clear_mask, el, 
defval);
strbuf_setlen(prefix, prefix->len - len -