Re: [PATCH] apply: fix adding new files on i-t-a entries

2015-06-25 Thread Junio C Hamano
Duy Nguyen  writes:

> I think it's clear that you need to revert that commit. I didn't see
> this at all when I made the commit.

I didn't either, and no other reviewers did. But now we know it was
not sufficient, so let's see...

>> Perhaps a good and safe way forward to resurrect what d95d728a
>> wanted to do is to first add an option to tell run_diff_index() and
>> run_diff_files() which behaviour the caller wants to see, add that
>> only to the caller in wt-status.c?  Then incrementally pass that
>> option from more callsites that we are absolutely certain that want
>> this different worldview with respect to i-t-a?
>
> Agreed.

OK.  Perhaps then first I should do that revert and we'll
incrementally rebuild on top.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] apply: fix adding new files on i-t-a entries

2015-06-25 Thread Duy Nguyen
On Thu, Jun 25, 2015 at 12:05 AM, Junio C Hamano  wrote:
> Internal "diff-index --cached" is used for another reason you did
> not mention (and scripted Porcelains literally use that externally
> for the same reason).  When we start a mergy operation, we say it is
> OK if the working tree has local changes relative to the index, but
> we require the index does not have any modifications since the HEAD
> was read.
>
> Side note: some codepaths insist that "diff-index --cached"
> and "diff-files" are both clean, so d95d728a is harmless;
> the former may say "clean" on i-t-a entries more than
> before, but the latter will still catch the difference
> between the index and the working tree and stop the caller
> from going forward.
>
> With d95d728a (diff-lib.c: adjust position of i-t-a entries in diff,
> 2015-03-16)'s world view, an empty output from "diff-index --cached"
> no longer means that.  Entries added with any "git add -N" are not
> reported, so we would go ahead to record the merge result on top of
> that "half-dirty" index.
>
> Side note: a merge based on unpack-trees has an extra layer
> of safety that unpack_trees() does not ignore i-t-a entry as
> "not exist (yet)" and notices that the path does exist in
> the index but not in HEAD.  But that just shows that some
> parts of the world do need to consider that having an i-t-a
> in the index makes it different from HEAD.
>
> If the mergy operation goes without any conflict, the next thing we
> do typically is to write the index out as a tree (to record in a new
> commit, etc.) and we are OK only in that particular case, because
> i-t-a entries are ignored.  But what would happen when the mergy
> operation conflicts?  I haven't thought about that fully, but I
> doubt it is a safe thing to do in general.
>
> But that is just one example that there are also other codepaths
> that do not want to be fooled into thinking that i-t-a entries do
> not exist in the index at all.

I think it's clear that you need to revert that commit. I didn't see
this at all when I made the commit.

> All we learned from the above discussion is that unconditionally
> declaring that adding i-t-a entries to the index without doing
> anything else should keep the index compare the same to HEAD.
>
> If d95d728a were only about what wt_status.c sees (and gets reported
> in "git status" output), which was what the change wanted to address
> anyway, we didn't have to have this discussion.  Without realizing
> that two kinds of callers want different view out of "diff-index
> --cached" and "diff-files", we broke half the world by changing the
> world order unconditionally for everybody, I am afraid.
>
> Perhaps a good and safe way forward to resurrect what d95d728a
> wanted to do is to first add an option to tell run_diff_index() and
> run_diff_files() which behaviour the caller wants to see, add that
> only to the caller in wt-status.c?  Then incrementally pass that
> option from more callsites that we are absolutely certain that want
> this different worldview with respect to i-t-a?

Agreed.
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] apply: fix adding new files on i-t-a entries

2015-06-24 Thread Junio C Hamano
Duy Nguyen  writes:

> Take checkout for example, when you do "git checkout -- foo" where foo
> is i-t-a, the file foo on disk will be emptied because the SHA-1 in
> the i-t-a entry is an empty blob, mostly to help "git diff". I think
> it should behave as if foo is not i-t-a: checkout should error out
> about not matching pathspec, or at least not destroy "foo" on disk. To
> me, when "ce" is an i-t-a entry, only i-t-a flag and ce_name are
> valid, the rest of "ce" should never be accessed.
>
> blame.c's situation is close to check_preimage() where it may read
> zero from ce_mode. It may be ok for check_preimage() to take zero as
> mode, but I think this is like fixed size buffer vs strbuf again. It
> works now, but if the code is reorganized or refactored, then it may
> or may not work. Better be safe than sorry and avoid reading something
> we should not read in the first place.

All of the above say there _are_ some codepaths that want to treat
the existence of a path in the index not as 
boolean but as  tristate.  I do not
think we disagree on that.

But that is different from saying that it is always OK to treat
i-t-a entries the same way as "does not exist" non-entries.

Internal "diff-index --cached" is used for another reason you did
not mention (and scripted Porcelains literally use that externally
for the same reason).  When we start a mergy operation, we say it is
OK if the working tree has local changes relative to the index, but
we require the index does not have any modifications since the HEAD
was read.

Side note: some codepaths insist that "diff-index --cached"
and "diff-files" are both clean, so d95d728a is harmless;
the former may say "clean" on i-t-a entries more than
before, but the latter will still catch the difference
between the index and the working tree and stop the caller
from going forward.

With d95d728a (diff-lib.c: adjust position of i-t-a entries in diff,
2015-03-16)'s world view, an empty output from "diff-index --cached"
no longer means that.  Entries added with any "git add -N" are not
reported, so we would go ahead to record the merge result on top of
that "half-dirty" index.

Side note: a merge based on unpack-trees has an extra layer
of safety that unpack_trees() does not ignore i-t-a entry as
"not exist (yet)" and notices that the path does exist in
the index but not in HEAD.  But that just shows that some
parts of the world do need to consider that having an i-t-a
in the index makes it different from HEAD.

If the mergy operation goes without any conflict, the next thing we
do typically is to write the index out as a tree (to record in a new
commit, etc.) and we are OK only in that particular case, because
i-t-a entries are ignored.  But what would happen when the mergy
operation conflicts?  I haven't thought about that fully, but I
doubt it is a safe thing to do in general.

But that is just one example that there are also other codepaths
that do not want to be fooled into thinking that i-t-a entries do
not exist in the index at all.

All we learned from the above discussion is that unconditionally
declaring that adding i-t-a entries to the index without doing
anything else should keep the index compare the same to HEAD.

If d95d728a were only about what wt_status.c sees (and gets reported
in "git status" output), which was what the change wanted to address
anyway, we didn't have to have this discussion.  Without realizing
that two kinds of callers want different view out of "diff-index
--cached" and "diff-files", we broke half the world by changing the
world order unconditionally for everybody, I am afraid.

Perhaps a good and safe way forward to resurrect what d95d728a
wanted to do is to first add an option to tell run_diff_index() and
run_diff_files() which behaviour the caller wants to see, add that
only to the caller in wt-status.c?  Then incrementally pass that
option from more callsites that we are absolutely certain that want
this different worldview with respect to i-t-a?



--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] apply: fix adding new files on i-t-a entries

2015-06-24 Thread Duy Nguyen
On Tue, Jun 23, 2015 at 11:50 PM, Junio C Hamano  wrote:
>> This patch tightens the "exists in index" check, ignoring i-t-a
>> entries. For fixing the above problem, only the change in
>> check_to_create() is needed.
>
> And the first thing I noticed and found a bit disturbing was that
> this change (which I think is correct, and happens to match what I
> sent out earlier) was the only thing necessary to make your new test
> pass.  IOW, the other changes in this patch have no test coverage.

Because to be honest I don't understand these code well enough to know
how to trigger it :)

>> For other changes,
>>
>>  - load_current(), reporting "not exists in index" is better than "does
>>not match index"
>
> Is that error reporting the only side effect from this change?

The only thing that I can see. If an i-t-a entry is returned, it can't
get past verify_index_match because ce_match_stat(). Hmm.. no the
on-disk version is gone, we'll get to checkout_target() where it will
"restore" the entry with an empty file. This is related to checkout
that I will continue later below.

>>  - get_current_sha1(), or actually build_fake_ancestor(), we should not
>>add i-t-a entries to the temporary index, at least not without also
>>adding i-t-a flag back
>
> This is part of "am" three-way fallback codepath.  I do not think
> the merge-recursive three-way merge code knows and cares about, is
> capable of handling, or would even want to deal with i-t-a entries
> in the first place, so adding an entry as i-t-a bit would not help.
> What the ultimate caller wants from us in this codepath is a tree
> object, and that is written out from the temporary index---and that
> codepath ignores i-t-a entries, so it is correct to omit them from
> the temporary index in the first place.  Unlike the previous two
> changes, I think this change deserves a new test.

Will do, after I study some more about this apply.c.

>
>>  I think I'm opening a can of worms with d95d728. There's nothing
>>  wrong with that patch per se, but with this issue popping up, I need
>>  to go over all {cache,index}_name_pos call sites and see what would be
>>  the sensible behavior when i-t-a entries are involved.
>
> Yeah, I agree that d95d728 should have been a part of a larger
> series that changes the world order, instead of a single change that
> brings inconsistency to the system.
>
> I cannot offhand convince myself that "apply" is the only casualty;
> assuming it is, I think a reasonable way forward is to keep d95d728
> and adjust "apply" to the new world order.  Otherwise, i.e. if there
> are wider fallouts from d95d728, we may instead want to temporarily
> revert it off from 'master', deal with fallouts to "apply" and other
> things, before resurrecting it.
>
> Anything that internally uses "diff-index" is suspect, I think.

Yeah that's one or two more grep runs and more reading.

> What do others think?  You seem to ...
>
>>  So far blame, rm and checkout-entry and "checkout " are on my
>>  to-think-or-fix list. But this patch can get in early to fix a real
>>  regression instead of waiting for one big series. A lot more
>>  discussions will be had before that series gets in good shape.
>
> ... think that the damage could be quite extensive, so I am inclined
> to say that we first revert d95d728 before going forward.

I'm not opposed to reverting if you think it's the safest option and I
will report back soon after grepping diff-index. But those I mentioned
above have more to do with the fact that an i-t-a entry does exist in
the index in a normal way, so reverting does not help.

Take checkout for example, when you do "git checkout -- foo" where foo
is i-t-a, the file foo on disk will be emptied because the SHA-1 in
the i-t-a entry is an empty blob, mostly to help "git diff". I think
it should behave as if foo is not i-t-a: checkout should error out
about not matching pathspec, or at least not destroy "foo" on disk. To
me, when "ce" is an i-t-a entry, only i-t-a flag and ce_name are
valid, the rest of "ce" should never be accessed.

blame.c's situation is close to check_preimage() where it may read
zero from ce_mode. It may be ok for check_preimage() to take zero as
mode, but I think this is like fixed size buffer vs strbuf again. It
works now, but if the code is reorganized or refactored, then it may
or may not work. Better be safe than sorry and avoid reading something
we should not read in the first place.

>>  builtin/apply.c   |  8 
>>  cache.h   |  2 ++
>>  read-cache.c  | 12 
>>  t/t2203-add-intent.sh | 10 ++
>>  4 files changed, 28 insertions(+), 4 deletions(-)
>>
>> diff --git a/builtin/apply.c b/builtin/apply.c
>> index 146be97..4f813ac 100644
>> --- a/builtin/apply.c
>> +++ b/builtin/apply.c
>> @@ -3344,7 +3344,7 @@ static int load_current(struct image *image, struct 
>> patch *patch)
>>   if (!patch->is_new)
>>   die("BUG: patch to %s is not a creation", patch->o

Re: [PATCH] apply: fix adding new files on i-t-a entries

2015-06-23 Thread Junio C Hamano
On Tue, Jun 23, 2015 at 9:50 AM, Junio C Hamano  wrote:
> Nguyễn Thái Ngọc Duy   writes:
>
>> - pos = cache_name_pos(name, strlen(name));
>> + pos = exists_in_cache(name, strlen(name));
>
> Something that is named as if it would return yes/no that returns a
> real value is not a very welcome idea.
>
>> +/* This is the same as index_name_pos, except that i-t-a entries are 
>> invisible */
>> +int exists_in_index(const struct index_state *istate, const char *name, int 
>> namelen)
>> +{
>> + int pos = index_name_stage_pos(istate, name, namelen, 0);
>> +
>> + if (pos < 0)
>> + return pos;
>> + if (istate->cache[pos]->ce_flags & CE_INTENT_TO_ADD)
>> + return -pos-1;
>
> This is a useless complexity.  Your callers cannot use the returned
> value like this:
>
> pos = exists_in_cache(...);
> if (pos < 0) {
> if (active_cache[-pos-1]->ce_flags & CE_INTENT_TO_ADD)
> ; /* ah it actually exists but it is i-t-a */
> else
> ; /* no it does not really exist */
> } else {
> ; /* yes it is really there at pos */
> }
>
> because they cannot tell two cases apart: (1) you do have i-t-a with
> the given name, (2) you do not have the entry but the location you
> would insert an entry with such a name is occupied by an unrelated
> entry (i.e. with a name that sorts adjacent) that happens to be
> i-t-a.

Also, the callers cannot even use that return value in the usual way they
would use the return value from index_name_pos(), either.

pos = exists_in_cache(...);
if (pos < 0) {
/* ah, it does not exist, so... */
pos = -1 - pos;
/*
 * ... it is OK to shift active_cache[pos..] by one and add our
 * entry at active_cache[pos]
 */
   } else {
/* it exists, so update in place */
;
   }

So, returning pos that smells like a return value from index_name_pos()
only has an effect of confusing callers into buggy code, I am afraid. The
callers that care need to be updated to check for ce_flags after finding the
entry with index_name_pos() the usual way if you want to avoid search in
the index_state->cache[] twice, and the callers that are only interested in
knowing if an entry "exists" are better off with an exists_in_cache() that
returns Yes/No and not a confusing and useless "pos", I think.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] apply: fix adding new files on i-t-a entries

2015-06-23 Thread Junio C Hamano
Junio C Hamano  writes:

>>  I think I'm opening a can of worms with d95d728
>
> I cannot offhand convince myself that "apply" is the only casualty;
> assuming it is, I think a reasonable way forward is to keep d95d728
> and adjust "apply" to the new world order.  Otherwise, i.e. if there
> are wider fallouts from d95d728, we may instead want to temporarily
> revert it off from 'master', deal with fallouts to "apply" and other
> things, before resurrecting it.
>
> Anything that internally uses "diff-index" is suspect, I think.
>
> What do others think?  You seem to ...
>
>>  So far blame, rm and checkout-entry and "checkout " are on my
>>  to-think-or-fix list. But this patch can get in early to fix a real
>>  regression instead of waiting for one big series. A lot more
>>  discussions will be had before that series gets in good shape.
>
> ... think that the damage could be quite extensive, so I am inclined
> to say that we first revert d95d728 before going forward.

Let's do this on 'nd/diff-i-t-a' topic and merge the result
immediately to 'master' for now.

-- >8 --
From: Junio C Hamano 
Date: Tue, 23 Jun 2015 10:27:47 -0700
Subject: [PATCH] Revert "diff-lib.c: adjust position of i-t-a entries in diff"

This reverts commit d95d728aba06a34394d15466045cbdabdada58a2.

It turns out that many other commands that need to interact with the
result of running diff-files and diff-index, e.g.  "git apply", "git
rm", etc., need to be adjusted to the new world order it brings in.
For example, it would break this sequence to correct a whitespace
breakage in the parts you changed:

git add -N file
git diff --cached file | git apply --cached --whitespace=fix
git checkout file

In the old world order, "diff" showed a patch to modify an existing
empty file by adding its full contents, and "apply" updated the
index by modifying the existing empty blob (which is what an
Intent-to-Add entry records in the index) with that patch.

In the new world order, "diff" shows a patch to create a new file
with its full contents, but because "apply" thinks that the i-t-a
entry already exists in the index, it refused to accept a creation.

Adjusting "apply" to this new world order is easy, but we need to
assess the extent of the damage to the rest of the system the new
world order brought in before going forward and adjust them all,
after which we can resurrect the commit being reverted here.
---
 builtin/add.c   |  1 -
 diff-lib.c  | 12 
 t/t2203-add-intent.sh   | 23 ---
 t/t4011-diff-symlink.sh | 10 --
 4 files changed, 8 insertions(+), 38 deletions(-)

diff --git a/builtin/add.c b/builtin/add.c
index ee370b0..3390933 100644
--- a/builtin/add.c
+++ b/builtin/add.c
@@ -63,7 +63,6 @@ static void update_callback(struct diff_queue_struct *q,
switch (fix_unmerged_status(p, data)) {
default:
die(_("unexpected diff status %c"), p->status);
-   case DIFF_STATUS_ADDED:
case DIFF_STATUS_MODIFIED:
case DIFF_STATUS_TYPE_CHANGED:
if (add_file_to_index(&the_index, path, data->flags)) {
diff --git a/diff-lib.c b/diff-lib.c
index 714501a..a85c497 100644
--- a/diff-lib.c
+++ b/diff-lib.c
@@ -212,11 +212,6 @@ int run_diff_files(struct rev_info *revs, unsigned int 
option)
   ce->sha1, 
!is_null_sha1(ce->sha1),
   ce->name, 0);
continue;
-   } else if (ce->ce_flags & CE_INTENT_TO_ADD) {
-   diff_addremove(&revs->diffopt, '+', ce->ce_mode,
-  EMPTY_BLOB_SHA1_BIN, 0,
-  ce->name, 0);
-   continue;
}
 
changed = match_stat_with_submodule(&revs->diffopt, ce, 
&st,
@@ -381,13 +376,6 @@ static void do_oneway_diff(struct unpack_trees_options *o,
struct rev_info *revs = o->unpack_data;
int match_missing, cached;
 
-   /* i-t-a entries do not actually exist in the index */
-   if (idx && (idx->ce_flags & CE_INTENT_TO_ADD)) {
-   idx = NULL;
-   if (!tree)
-   return; /* nothing to diff.. */
-   }
-
/* if the entry is not checked out, don't examine work tree */
cached = o->index_only ||
(idx && ((idx->ce_flags & CE_VALID) || ce_skip_worktree(idx)));
diff --git a/t/t2203-add-intent.sh b/t/t2203-add-intent.sh
index 7c641bf..2a4a749 100755
--- a/t/t2203-add-intent.sh
+++ b/t/t2203-add-intent.sh
@@ -5,24 +5,10 @@ test_description='Intent to add'
 . ./test-lib.sh
 
 test_expect_success 'intent to add' '
-   test_commit 1 &&
-   git rm 1.t &&
-   echo hello >1.t &&
echo hello >file &&
echo hello >elif &&
  

Re: [PATCH] apply: fix adding new files on i-t-a entries

2015-06-23 Thread Junio C Hamano
Nguyễn Thái Ngọc Duy   writes:

> Since d95d728 (diff-lib.c: adjust position of i-t-a entries in diff -
> 2015-03-16), a normal "git diff" on i-t-a entries would produce a diff
> that _adds_ those files, not just adding content to existing and empty
> files like before.
>
> This is correct. Unfortunately, applying such a patch on the same
> repository that has the same i-t-a entries will fail with message
> "already exists in index" because git-apply checks, sees those i-t-a
> entries and aborts. git-apply does not realize those are for
> bookkeeping only, they do not really exist in the index.

> This patch tightens the "exists in index" check, ignoring i-t-a
> entries. For fixing the above problem, only the change in
> check_to_create() is needed.

And the first thing I noticed and found a bit disturbing was that
this change (which I think is correct, and happens to match what I
sent out earlier) was the only thing necessary to make your new test
pass.  IOW, the other changes in this patch have no test coverage.

> For other changes,
>
>  - load_current(), reporting "not exists in index" is better than "does
>not match index"

Is that error reporting the only side effect from this change?

This is only used when falling back to three-way merge while
applying a creation patch.

>  - check_preimage(), similar to load_current(), but it may also use
>ce_mode from i-t-a entry which is always zero

This is for the normal (non three-way) application and the idea is
the same as load_current() as you said above.

>  - get_current_sha1(), or actually build_fake_ancestor(), we should not
>add i-t-a entries to the temporary index, at least not without also
>adding i-t-a flag back

This is part of "am" three-way fallback codepath.  I do not think
the merge-recursive three-way merge code knows and cares about, is
capable of handling, or would even want to deal with i-t-a entries
in the first place, so adding an entry as i-t-a bit would not help.
What the ultimate caller wants from us in this codepath is a tree
object, and that is written out from the temporary index---and that
codepath ignores i-t-a entries, so it is correct to omit them from
the temporary index in the first place.  Unlike the previous two
changes, I think this change deserves a new test.

>  I think I'm opening a can of worms with d95d728. There's nothing
>  wrong with that patch per se, but with this issue popping up, I need
>  to go over all {cache,index}_name_pos call sites and see what would be
>  the sensible behavior when i-t-a entries are involved.

Yeah, I agree that d95d728 should have been a part of a larger
series that changes the world order, instead of a single change that
brings inconsistency to the system.

I cannot offhand convince myself that "apply" is the only casualty;
assuming it is, I think a reasonable way forward is to keep d95d728
and adjust "apply" to the new world order.  Otherwise, i.e. if there
are wider fallouts from d95d728, we may instead want to temporarily
revert it off from 'master', deal with fallouts to "apply" and other
things, before resurrecting it.

Anything that internally uses "diff-index" is suspect, I think.

What do others think?  You seem to ...

>  So far blame, rm and checkout-entry and "checkout " are on my
>  to-think-or-fix list. But this patch can get in early to fix a real
>  regression instead of waiting for one big series. A lot more
>  discussions will be had before that series gets in good shape.

... think that the damage could be quite extensive, so I am inclined
to say that we first revert d95d728 before going forward.

>  builtin/apply.c   |  8 
>  cache.h   |  2 ++
>  read-cache.c  | 12 
>  t/t2203-add-intent.sh | 10 ++
>  4 files changed, 28 insertions(+), 4 deletions(-)
>
> diff --git a/builtin/apply.c b/builtin/apply.c
> index 146be97..4f813ac 100644
> --- a/builtin/apply.c
> +++ b/builtin/apply.c
> @@ -3344,7 +3344,7 @@ static int load_current(struct image *image, struct 
> patch *patch)
>   if (!patch->is_new)
>   die("BUG: patch to %s is not a creation", patch->old_name);
>  
> - pos = cache_name_pos(name, strlen(name));
> + pos = exists_in_cache(name, strlen(name));

Something that is named as if it would return yes/no that returns a
real value is not a very welcome idea.

> +/* This is the same as index_name_pos, except that i-t-a entries are 
> invisible */
> +int exists_in_index(const struct index_state *istate, const char *name, int 
> namelen)
> +{
> + int pos = index_name_stage_pos(istate, name, namelen, 0);
> +
> + if (pos < 0)
> + return pos;
> + if (istate->cache[pos]->ce_flags & CE_INTENT_TO_ADD)
> + return -pos-1;

This is a useless complexity.  Your callers cannot use the returned
value like this:

pos = exists_in_cache(...);
if (pos < 0) {
if (active_cache[-pos-1]->ce_flags & CE_INTENT_TO_ADD)