Re: [PATCH] apply: fix adding new files on i-t-a entries
Duy Nguyen writes: > I think it's clear that you need to revert that commit. I didn't see > this at all when I made the commit. I didn't either, and no other reviewers did. But now we know it was not sufficient, so let's see... >> Perhaps a good and safe way forward to resurrect what d95d728a >> wanted to do is to first add an option to tell run_diff_index() and >> run_diff_files() which behaviour the caller wants to see, add that >> only to the caller in wt-status.c? Then incrementally pass that >> option from more callsites that we are absolutely certain that want >> this different worldview with respect to i-t-a? > > Agreed. OK. Perhaps then first I should do that revert and we'll incrementally rebuild on top. Thanks. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] apply: fix adding new files on i-t-a entries
On Thu, Jun 25, 2015 at 12:05 AM, Junio C Hamano wrote: > Internal "diff-index --cached" is used for another reason you did > not mention (and scripted Porcelains literally use that externally > for the same reason). When we start a mergy operation, we say it is > OK if the working tree has local changes relative to the index, but > we require the index does not have any modifications since the HEAD > was read. > > Side note: some codepaths insist that "diff-index --cached" > and "diff-files" are both clean, so d95d728a is harmless; > the former may say "clean" on i-t-a entries more than > before, but the latter will still catch the difference > between the index and the working tree and stop the caller > from going forward. > > With d95d728a (diff-lib.c: adjust position of i-t-a entries in diff, > 2015-03-16)'s world view, an empty output from "diff-index --cached" > no longer means that. Entries added with any "git add -N" are not > reported, so we would go ahead to record the merge result on top of > that "half-dirty" index. > > Side note: a merge based on unpack-trees has an extra layer > of safety that unpack_trees() does not ignore i-t-a entry as > "not exist (yet)" and notices that the path does exist in > the index but not in HEAD. But that just shows that some > parts of the world do need to consider that having an i-t-a > in the index makes it different from HEAD. > > If the mergy operation goes without any conflict, the next thing we > do typically is to write the index out as a tree (to record in a new > commit, etc.) and we are OK only in that particular case, because > i-t-a entries are ignored. But what would happen when the mergy > operation conflicts? I haven't thought about that fully, but I > doubt it is a safe thing to do in general. > > But that is just one example that there are also other codepaths > that do not want to be fooled into thinking that i-t-a entries do > not exist in the index at all. I think it's clear that you need to revert that commit. I didn't see this at all when I made the commit. > All we learned from the above discussion is that unconditionally > declaring that adding i-t-a entries to the index without doing > anything else should keep the index compare the same to HEAD. > > If d95d728a were only about what wt_status.c sees (and gets reported > in "git status" output), which was what the change wanted to address > anyway, we didn't have to have this discussion. Without realizing > that two kinds of callers want different view out of "diff-index > --cached" and "diff-files", we broke half the world by changing the > world order unconditionally for everybody, I am afraid. > > Perhaps a good and safe way forward to resurrect what d95d728a > wanted to do is to first add an option to tell run_diff_index() and > run_diff_files() which behaviour the caller wants to see, add that > only to the caller in wt-status.c? Then incrementally pass that > option from more callsites that we are absolutely certain that want > this different worldview with respect to i-t-a? Agreed. -- Duy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] apply: fix adding new files on i-t-a entries
Duy Nguyen writes: > Take checkout for example, when you do "git checkout -- foo" where foo > is i-t-a, the file foo on disk will be emptied because the SHA-1 in > the i-t-a entry is an empty blob, mostly to help "git diff". I think > it should behave as if foo is not i-t-a: checkout should error out > about not matching pathspec, or at least not destroy "foo" on disk. To > me, when "ce" is an i-t-a entry, only i-t-a flag and ce_name are > valid, the rest of "ce" should never be accessed. > > blame.c's situation is close to check_preimage() where it may read > zero from ce_mode. It may be ok for check_preimage() to take zero as > mode, but I think this is like fixed size buffer vs strbuf again. It > works now, but if the code is reorganized or refactored, then it may > or may not work. Better be safe than sorry and avoid reading something > we should not read in the first place. All of the above say there _are_ some codepaths that want to treat the existence of a path in the index not as boolean but as tristate. I do not think we disagree on that. But that is different from saying that it is always OK to treat i-t-a entries the same way as "does not exist" non-entries. Internal "diff-index --cached" is used for another reason you did not mention (and scripted Porcelains literally use that externally for the same reason). When we start a mergy operation, we say it is OK if the working tree has local changes relative to the index, but we require the index does not have any modifications since the HEAD was read. Side note: some codepaths insist that "diff-index --cached" and "diff-files" are both clean, so d95d728a is harmless; the former may say "clean" on i-t-a entries more than before, but the latter will still catch the difference between the index and the working tree and stop the caller from going forward. With d95d728a (diff-lib.c: adjust position of i-t-a entries in diff, 2015-03-16)'s world view, an empty output from "diff-index --cached" no longer means that. Entries added with any "git add -N" are not reported, so we would go ahead to record the merge result on top of that "half-dirty" index. Side note: a merge based on unpack-trees has an extra layer of safety that unpack_trees() does not ignore i-t-a entry as "not exist (yet)" and notices that the path does exist in the index but not in HEAD. But that just shows that some parts of the world do need to consider that having an i-t-a in the index makes it different from HEAD. If the mergy operation goes without any conflict, the next thing we do typically is to write the index out as a tree (to record in a new commit, etc.) and we are OK only in that particular case, because i-t-a entries are ignored. But what would happen when the mergy operation conflicts? I haven't thought about that fully, but I doubt it is a safe thing to do in general. But that is just one example that there are also other codepaths that do not want to be fooled into thinking that i-t-a entries do not exist in the index at all. All we learned from the above discussion is that unconditionally declaring that adding i-t-a entries to the index without doing anything else should keep the index compare the same to HEAD. If d95d728a were only about what wt_status.c sees (and gets reported in "git status" output), which was what the change wanted to address anyway, we didn't have to have this discussion. Without realizing that two kinds of callers want different view out of "diff-index --cached" and "diff-files", we broke half the world by changing the world order unconditionally for everybody, I am afraid. Perhaps a good and safe way forward to resurrect what d95d728a wanted to do is to first add an option to tell run_diff_index() and run_diff_files() which behaviour the caller wants to see, add that only to the caller in wt-status.c? Then incrementally pass that option from more callsites that we are absolutely certain that want this different worldview with respect to i-t-a? -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] apply: fix adding new files on i-t-a entries
On Tue, Jun 23, 2015 at 11:50 PM, Junio C Hamano wrote: >> This patch tightens the "exists in index" check, ignoring i-t-a >> entries. For fixing the above problem, only the change in >> check_to_create() is needed. > > And the first thing I noticed and found a bit disturbing was that > this change (which I think is correct, and happens to match what I > sent out earlier) was the only thing necessary to make your new test > pass. IOW, the other changes in this patch have no test coverage. Because to be honest I don't understand these code well enough to know how to trigger it :) >> For other changes, >> >> - load_current(), reporting "not exists in index" is better than "does >>not match index" > > Is that error reporting the only side effect from this change? The only thing that I can see. If an i-t-a entry is returned, it can't get past verify_index_match because ce_match_stat(). Hmm.. no the on-disk version is gone, we'll get to checkout_target() where it will "restore" the entry with an empty file. This is related to checkout that I will continue later below. >> - get_current_sha1(), or actually build_fake_ancestor(), we should not >>add i-t-a entries to the temporary index, at least not without also >>adding i-t-a flag back > > This is part of "am" three-way fallback codepath. I do not think > the merge-recursive three-way merge code knows and cares about, is > capable of handling, or would even want to deal with i-t-a entries > in the first place, so adding an entry as i-t-a bit would not help. > What the ultimate caller wants from us in this codepath is a tree > object, and that is written out from the temporary index---and that > codepath ignores i-t-a entries, so it is correct to omit them from > the temporary index in the first place. Unlike the previous two > changes, I think this change deserves a new test. Will do, after I study some more about this apply.c. > >> I think I'm opening a can of worms with d95d728. There's nothing >> wrong with that patch per se, but with this issue popping up, I need >> to go over all {cache,index}_name_pos call sites and see what would be >> the sensible behavior when i-t-a entries are involved. > > Yeah, I agree that d95d728 should have been a part of a larger > series that changes the world order, instead of a single change that > brings inconsistency to the system. > > I cannot offhand convince myself that "apply" is the only casualty; > assuming it is, I think a reasonable way forward is to keep d95d728 > and adjust "apply" to the new world order. Otherwise, i.e. if there > are wider fallouts from d95d728, we may instead want to temporarily > revert it off from 'master', deal with fallouts to "apply" and other > things, before resurrecting it. > > Anything that internally uses "diff-index" is suspect, I think. Yeah that's one or two more grep runs and more reading. > What do others think? You seem to ... > >> So far blame, rm and checkout-entry and "checkout " are on my >> to-think-or-fix list. But this patch can get in early to fix a real >> regression instead of waiting for one big series. A lot more >> discussions will be had before that series gets in good shape. > > ... think that the damage could be quite extensive, so I am inclined > to say that we first revert d95d728 before going forward. I'm not opposed to reverting if you think it's the safest option and I will report back soon after grepping diff-index. But those I mentioned above have more to do with the fact that an i-t-a entry does exist in the index in a normal way, so reverting does not help. Take checkout for example, when you do "git checkout -- foo" where foo is i-t-a, the file foo on disk will be emptied because the SHA-1 in the i-t-a entry is an empty blob, mostly to help "git diff". I think it should behave as if foo is not i-t-a: checkout should error out about not matching pathspec, or at least not destroy "foo" on disk. To me, when "ce" is an i-t-a entry, only i-t-a flag and ce_name are valid, the rest of "ce" should never be accessed. blame.c's situation is close to check_preimage() where it may read zero from ce_mode. It may be ok for check_preimage() to take zero as mode, but I think this is like fixed size buffer vs strbuf again. It works now, but if the code is reorganized or refactored, then it may or may not work. Better be safe than sorry and avoid reading something we should not read in the first place. >> builtin/apply.c | 8 >> cache.h | 2 ++ >> read-cache.c | 12 >> t/t2203-add-intent.sh | 10 ++ >> 4 files changed, 28 insertions(+), 4 deletions(-) >> >> diff --git a/builtin/apply.c b/builtin/apply.c >> index 146be97..4f813ac 100644 >> --- a/builtin/apply.c >> +++ b/builtin/apply.c >> @@ -3344,7 +3344,7 @@ static int load_current(struct image *image, struct >> patch *patch) >> if (!patch->is_new) >> die("BUG: patch to %s is not a creation", patch->o
Re: [PATCH] apply: fix adding new files on i-t-a entries
On Tue, Jun 23, 2015 at 9:50 AM, Junio C Hamano wrote: > Nguyễn Thái Ngọc Duy writes: > >> - pos = cache_name_pos(name, strlen(name)); >> + pos = exists_in_cache(name, strlen(name)); > > Something that is named as if it would return yes/no that returns a > real value is not a very welcome idea. > >> +/* This is the same as index_name_pos, except that i-t-a entries are >> invisible */ >> +int exists_in_index(const struct index_state *istate, const char *name, int >> namelen) >> +{ >> + int pos = index_name_stage_pos(istate, name, namelen, 0); >> + >> + if (pos < 0) >> + return pos; >> + if (istate->cache[pos]->ce_flags & CE_INTENT_TO_ADD) >> + return -pos-1; > > This is a useless complexity. Your callers cannot use the returned > value like this: > > pos = exists_in_cache(...); > if (pos < 0) { > if (active_cache[-pos-1]->ce_flags & CE_INTENT_TO_ADD) > ; /* ah it actually exists but it is i-t-a */ > else > ; /* no it does not really exist */ > } else { > ; /* yes it is really there at pos */ > } > > because they cannot tell two cases apart: (1) you do have i-t-a with > the given name, (2) you do not have the entry but the location you > would insert an entry with such a name is occupied by an unrelated > entry (i.e. with a name that sorts adjacent) that happens to be > i-t-a. Also, the callers cannot even use that return value in the usual way they would use the return value from index_name_pos(), either. pos = exists_in_cache(...); if (pos < 0) { /* ah, it does not exist, so... */ pos = -1 - pos; /* * ... it is OK to shift active_cache[pos..] by one and add our * entry at active_cache[pos] */ } else { /* it exists, so update in place */ ; } So, returning pos that smells like a return value from index_name_pos() only has an effect of confusing callers into buggy code, I am afraid. The callers that care need to be updated to check for ce_flags after finding the entry with index_name_pos() the usual way if you want to avoid search in the index_state->cache[] twice, and the callers that are only interested in knowing if an entry "exists" are better off with an exists_in_cache() that returns Yes/No and not a confusing and useless "pos", I think. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] apply: fix adding new files on i-t-a entries
Junio C Hamano writes: >> I think I'm opening a can of worms with d95d728 > > I cannot offhand convince myself that "apply" is the only casualty; > assuming it is, I think a reasonable way forward is to keep d95d728 > and adjust "apply" to the new world order. Otherwise, i.e. if there > are wider fallouts from d95d728, we may instead want to temporarily > revert it off from 'master', deal with fallouts to "apply" and other > things, before resurrecting it. > > Anything that internally uses "diff-index" is suspect, I think. > > What do others think? You seem to ... > >> So far blame, rm and checkout-entry and "checkout " are on my >> to-think-or-fix list. But this patch can get in early to fix a real >> regression instead of waiting for one big series. A lot more >> discussions will be had before that series gets in good shape. > > ... think that the damage could be quite extensive, so I am inclined > to say that we first revert d95d728 before going forward. Let's do this on 'nd/diff-i-t-a' topic and merge the result immediately to 'master' for now. -- >8 -- From: Junio C Hamano Date: Tue, 23 Jun 2015 10:27:47 -0700 Subject: [PATCH] Revert "diff-lib.c: adjust position of i-t-a entries in diff" This reverts commit d95d728aba06a34394d15466045cbdabdada58a2. It turns out that many other commands that need to interact with the result of running diff-files and diff-index, e.g. "git apply", "git rm", etc., need to be adjusted to the new world order it brings in. For example, it would break this sequence to correct a whitespace breakage in the parts you changed: git add -N file git diff --cached file | git apply --cached --whitespace=fix git checkout file In the old world order, "diff" showed a patch to modify an existing empty file by adding its full contents, and "apply" updated the index by modifying the existing empty blob (which is what an Intent-to-Add entry records in the index) with that patch. In the new world order, "diff" shows a patch to create a new file with its full contents, but because "apply" thinks that the i-t-a entry already exists in the index, it refused to accept a creation. Adjusting "apply" to this new world order is easy, but we need to assess the extent of the damage to the rest of the system the new world order brought in before going forward and adjust them all, after which we can resurrect the commit being reverted here. --- builtin/add.c | 1 - diff-lib.c | 12 t/t2203-add-intent.sh | 23 --- t/t4011-diff-symlink.sh | 10 -- 4 files changed, 8 insertions(+), 38 deletions(-) diff --git a/builtin/add.c b/builtin/add.c index ee370b0..3390933 100644 --- a/builtin/add.c +++ b/builtin/add.c @@ -63,7 +63,6 @@ static void update_callback(struct diff_queue_struct *q, switch (fix_unmerged_status(p, data)) { default: die(_("unexpected diff status %c"), p->status); - case DIFF_STATUS_ADDED: case DIFF_STATUS_MODIFIED: case DIFF_STATUS_TYPE_CHANGED: if (add_file_to_index(&the_index, path, data->flags)) { diff --git a/diff-lib.c b/diff-lib.c index 714501a..a85c497 100644 --- a/diff-lib.c +++ b/diff-lib.c @@ -212,11 +212,6 @@ int run_diff_files(struct rev_info *revs, unsigned int option) ce->sha1, !is_null_sha1(ce->sha1), ce->name, 0); continue; - } else if (ce->ce_flags & CE_INTENT_TO_ADD) { - diff_addremove(&revs->diffopt, '+', ce->ce_mode, - EMPTY_BLOB_SHA1_BIN, 0, - ce->name, 0); - continue; } changed = match_stat_with_submodule(&revs->diffopt, ce, &st, @@ -381,13 +376,6 @@ static void do_oneway_diff(struct unpack_trees_options *o, struct rev_info *revs = o->unpack_data; int match_missing, cached; - /* i-t-a entries do not actually exist in the index */ - if (idx && (idx->ce_flags & CE_INTENT_TO_ADD)) { - idx = NULL; - if (!tree) - return; /* nothing to diff.. */ - } - /* if the entry is not checked out, don't examine work tree */ cached = o->index_only || (idx && ((idx->ce_flags & CE_VALID) || ce_skip_worktree(idx))); diff --git a/t/t2203-add-intent.sh b/t/t2203-add-intent.sh index 7c641bf..2a4a749 100755 --- a/t/t2203-add-intent.sh +++ b/t/t2203-add-intent.sh @@ -5,24 +5,10 @@ test_description='Intent to add' . ./test-lib.sh test_expect_success 'intent to add' ' - test_commit 1 && - git rm 1.t && - echo hello >1.t && echo hello >file && echo hello >elif &&
Re: [PATCH] apply: fix adding new files on i-t-a entries
Nguyễn Thái Ngọc Duy writes: > Since d95d728 (diff-lib.c: adjust position of i-t-a entries in diff - > 2015-03-16), a normal "git diff" on i-t-a entries would produce a diff > that _adds_ those files, not just adding content to existing and empty > files like before. > > This is correct. Unfortunately, applying such a patch on the same > repository that has the same i-t-a entries will fail with message > "already exists in index" because git-apply checks, sees those i-t-a > entries and aborts. git-apply does not realize those are for > bookkeeping only, they do not really exist in the index. > This patch tightens the "exists in index" check, ignoring i-t-a > entries. For fixing the above problem, only the change in > check_to_create() is needed. And the first thing I noticed and found a bit disturbing was that this change (which I think is correct, and happens to match what I sent out earlier) was the only thing necessary to make your new test pass. IOW, the other changes in this patch have no test coverage. > For other changes, > > - load_current(), reporting "not exists in index" is better than "does >not match index" Is that error reporting the only side effect from this change? This is only used when falling back to three-way merge while applying a creation patch. > - check_preimage(), similar to load_current(), but it may also use >ce_mode from i-t-a entry which is always zero This is for the normal (non three-way) application and the idea is the same as load_current() as you said above. > - get_current_sha1(), or actually build_fake_ancestor(), we should not >add i-t-a entries to the temporary index, at least not without also >adding i-t-a flag back This is part of "am" three-way fallback codepath. I do not think the merge-recursive three-way merge code knows and cares about, is capable of handling, or would even want to deal with i-t-a entries in the first place, so adding an entry as i-t-a bit would not help. What the ultimate caller wants from us in this codepath is a tree object, and that is written out from the temporary index---and that codepath ignores i-t-a entries, so it is correct to omit them from the temporary index in the first place. Unlike the previous two changes, I think this change deserves a new test. > I think I'm opening a can of worms with d95d728. There's nothing > wrong with that patch per se, but with this issue popping up, I need > to go over all {cache,index}_name_pos call sites and see what would be > the sensible behavior when i-t-a entries are involved. Yeah, I agree that d95d728 should have been a part of a larger series that changes the world order, instead of a single change that brings inconsistency to the system. I cannot offhand convince myself that "apply" is the only casualty; assuming it is, I think a reasonable way forward is to keep d95d728 and adjust "apply" to the new world order. Otherwise, i.e. if there are wider fallouts from d95d728, we may instead want to temporarily revert it off from 'master', deal with fallouts to "apply" and other things, before resurrecting it. Anything that internally uses "diff-index" is suspect, I think. What do others think? You seem to ... > So far blame, rm and checkout-entry and "checkout " are on my > to-think-or-fix list. But this patch can get in early to fix a real > regression instead of waiting for one big series. A lot more > discussions will be had before that series gets in good shape. ... think that the damage could be quite extensive, so I am inclined to say that we first revert d95d728 before going forward. > builtin/apply.c | 8 > cache.h | 2 ++ > read-cache.c | 12 > t/t2203-add-intent.sh | 10 ++ > 4 files changed, 28 insertions(+), 4 deletions(-) > > diff --git a/builtin/apply.c b/builtin/apply.c > index 146be97..4f813ac 100644 > --- a/builtin/apply.c > +++ b/builtin/apply.c > @@ -3344,7 +3344,7 @@ static int load_current(struct image *image, struct > patch *patch) > if (!patch->is_new) > die("BUG: patch to %s is not a creation", patch->old_name); > > - pos = cache_name_pos(name, strlen(name)); > + pos = exists_in_cache(name, strlen(name)); Something that is named as if it would return yes/no that returns a real value is not a very welcome idea. > +/* This is the same as index_name_pos, except that i-t-a entries are > invisible */ > +int exists_in_index(const struct index_state *istate, const char *name, int > namelen) > +{ > + int pos = index_name_stage_pos(istate, name, namelen, 0); > + > + if (pos < 0) > + return pos; > + if (istate->cache[pos]->ce_flags & CE_INTENT_TO_ADD) > + return -pos-1; This is a useless complexity. Your callers cannot use the returned value like this: pos = exists_in_cache(...); if (pos < 0) { if (active_cache[-pos-1]->ce_flags & CE_INTENT_TO_ADD)