Re: [PATCH] read-cache.c: Ensure unmerged entries are removed

2014-08-18 Thread Junio C Hamano
Jaime Soriano Pastor jsorianopas...@gmail.com writes:

 I'd like to add some tests too for this, but I don't know how to
 reproduce this state with git commands only, is there any way to add
 entries to the index without checkings?

Perhaps feeding update-index --index-info four input lines would work?
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] read-cache.c: Ensure unmerged entries are removed

2014-08-18 Thread Jaime Soriano Pastor
Yes, --index-info worked for this purpouse, thanks!

https://github.com/jsoriano/git/blob/remove-unmerged-index-entry/t/t9904-unmerged-file-with-merged-entry.sh#L25

I'll try to send the patches to the mailing lists later today or tomorrow.

On Mon, Aug 18, 2014 at 6:34 PM, Junio C Hamano gits...@pobox.com wrote:
 Jaime Soriano Pastor jsorianopas...@gmail.com writes:

 I'd like to add some tests too for this, but I don't know how to
 reproduce this state with git commands only, is there any way to add
 entries to the index without checkings?

 Perhaps feeding update-index --index-info four input lines would work?
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] read-cache.c: Ensure unmerged entries are removed

2014-08-16 Thread Jaime Soriano Pastor
I'd like to add some tests too for this, but I don't know how to
reproduce this state with git commands only, is there any way to add
entries to the index without checkings?
Or maybe it could be done by creating a test- command that adds the
entries to an index?

Thanks.

On Fri, Aug 15, 2014 at 11:45 PM, Junio C Hamano gits...@pobox.com wrote:
 Jaime Soriano Pastor jsorianopas...@gmail.com writes:

 On Thu, Aug 14, 2014 at 1:04 AM, Junio C Hamano gits...@pobox.com wrote:
 Being a conservative, I'd rather avoid doing any magic during
 read_cache() time. ls-files -s for example should show the four
 stages so that the broken state can be inspected.

 Well, only read_cache_unmerged() is modified in the sent patch, so no
 magic is done in read_cache(), I'd also avoid changes there.

 Ahh, I must have overlooked that; changes being only in _unmerged()
 variant makes me feel much better, and it probably would make much
 of ...

 Yes, it would be more work,...

 ... moot, hopefully ;-)

 git reset will clean the index anyway if the loop finishes, would it
 be ok?

 Surely.

 git merge is also affected by the loop in read_cache_unmerged(), but
 any of the solutions would be enough for it as only by finishing the
 loop with unmerged entries it will die without commiting the cache to
 the index file.

 Again, true.  The mergy operations want to start from a clean slate
 and they call _unmerged() variant primarily to learn that there were
 unmerged entries in the index, only to abort the operation in that
 case, so a change to _unmerged() variant should be safe for them.

 I'll take another look at your patch later, but not before the 2.1
 final, and by that time you may already have sent a reroll ;-)

 Thanks.



-- 
Jaime Soriano Pastor - Software Developer
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] read-cache.c: Ensure unmerged entries are removed

2014-08-15 Thread Jaime Soriano Pastor
On Thu, Aug 14, 2014 at 1:04 AM, Junio C Hamano gits...@pobox.com wrote:
 Being a conservative, I'd rather avoid doing any magic during
 read_cache() time. ls-files -s for example should show the four
 stages so that the broken state can be inspected.

Well, only read_cache_unmerged() is modified in the sent patch, so no
magic is done in read_cache(), I'd also avoid changes there. Indeed
with the patch, ls-files -s can be used to inspect the problem
without further problems.

 Instead, I suspect that the code paths with problematic iterations
 over the index entries that assume that having stage #0 entry for a
 path guarantees that there will not be any higher stage entry first
 need to be identified (you already said add and reset may be
 problematic, there may be others, or they may be the only ones, I
 dunno), and then the most sensible one, which would be different
 from case to case, among various possibilities need to be chosen as
 a fix to each of them:

 (1) the loop may be fixed to ignore/skip unmerged entries;
 (2) the loop may be fixed to ignore/skip the merged entry;
 (3) the loop may be fixed not to spin indefinitely on a path with
 mixed entries; or
 (4) the command should error out.

git reset will clean the index anyway if the loop finishes, would it
be ok? I think that it'd be acceptable for git reset --hard to clean
the index as git checkout -f already does it even in this case.

git merge is also affected by the loop in read_cache_unmerged(), but
any of the solutions would be enough for it as only by finishing the
loop with unmerged entries it will die without commiting the cache to
the index file.

For git add probably the best option is to error out and ask the user
to check git ls-files -s to investigate the problem and decide what
to do.

The error message given by git commit -a is a bit confusing in this
case, I can take a look to this too.

I'll try to prepare a patch with these cases, and rethinking the loop
to avoid future problems there, I think that is a bit dangerous to
look for the position of a path entry (with index_name_pos) for the
next iteration.

 Yes, it would be more work, but I'd feel safer if the following
 worked:

 $ git ls-files -s
 100644 3cc58df83752123644fef39faab2393af643b1d2 0 conflict
 100644 f70f10e4db19068f79bc43844b49f3eece45c4e8 1 conflict
 100644 3cc58df83752123644fef39faab2393af643b1d2 2 conflict
 100644 223b7836fb19fdf64ba2d3cd6173c6a283141f78 3 conflict
 $ empty
 $ git add empty
 100644 3cc58df83752123644fef39faab2393af643b1d2 0 conflict
 100644 f70f10e4db19068f79bc43844b49f3eece45c4e8 1 conflict
 100644 3cc58df83752123644fef39faab2393af643b1d2 2 conflict
 100644 223b7836fb19fdf64ba2d3cd6173c6a283141f78 3 conflict
 100644 e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 0 empty
 $ git cat-file blob :empty output
 $ cmp empty output  echo OK
 OK

 which would be impossible to do if we nuked the problematic stages
 whenever we read the index, I am afraid.

This works with the first patch as read_cache() is not modified, and
git add would only clean the entries for the paths passed as
arguments.

 BTW, I didn't know git cat-file blob 0:$path, but I only manage to
 get Not a valid object name fatals. How is it supposed to be used?

 That was a typo of :$n:$path (where 0 = $n = 3).
Great, thanks!
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] read-cache.c: Ensure unmerged entries are removed

2014-08-15 Thread Junio C Hamano
Jaime Soriano Pastor jsorianopas...@gmail.com writes:

 On Thu, Aug 14, 2014 at 1:04 AM, Junio C Hamano gits...@pobox.com wrote:
 Being a conservative, I'd rather avoid doing any magic during
 read_cache() time. ls-files -s for example should show the four
 stages so that the broken state can be inspected.

 Well, only read_cache_unmerged() is modified in the sent patch, so no
 magic is done in read_cache(), I'd also avoid changes there.

Ahh, I must have overlooked that; changes being only in _unmerged()
variant makes me feel much better, and it probably would make much
of ...

 Yes, it would be more work,...

... moot, hopefully ;-)

 git reset will clean the index anyway if the loop finishes, would it
 be ok?

Surely.

 git merge is also affected by the loop in read_cache_unmerged(), but
 any of the solutions would be enough for it as only by finishing the
 loop with unmerged entries it will die without commiting the cache to
 the index file.

Again, true.  The mergy operations want to start from a clean slate
and they call _unmerged() variant primarily to learn that there were
unmerged entries in the index, only to abort the operation in that
case, so a change to _unmerged() variant should be safe for them.

I'll take another look at your patch later, but not before the 2.1
final, and by that time you may already have sent a reroll ;-)

Thanks.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] read-cache.c: Ensure unmerged entries are removed

2014-08-13 Thread Jaime Soriano Pastor
On Tue, Aug 12, 2014 at 8:31 PM, Junio C Hamano gits...@pobox.com wrote:

 Jaime Soriano Pastor jsorianopas...@gmail.com writes:

  A file in the index can be left as merged and unmerged at the same time
  by some tools as libgit2, this causes some undesiderable behaviours in git.

 Well, doesn't it mean that libgit2 is broken?  Have you filed a bug
 over there yet?


Yes, exactly, I think libgit2 is broken but I wanted to double-check
that it was still happening in their master branch, and it is. I have
reported the bug after checking it.
https://github.com/libgit2/libgit2/issues/2515


 Having said that, protecting ourselves from insanity left by other
 people is always a good idea, provided that it can be done without
 bending overly backwards.


Yes, I think the most important thing in this case is to protect git
against this kind of inconsistencies.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] read-cache.c: Ensure unmerged entries are removed

2014-08-13 Thread Jaime Soriano Pastor
On Tue, Aug 12, 2014 at 8:39 PM, Junio C Hamano gits...@pobox.com wrote:

 Jaime Soriano Pastor jsorianopas...@gmail.com writes:

  Wrong implementations of tools that modify the index can left
  some files as merged and unmerged at the same time. Avoid undesiderable
  behaviours by handling this situation.

 It is understandable that the way _you_ decided to handle the
 situation is so obvious to be spelled out to _you_, but that is the
 most important design decision that needs to be described here.  Do
 you silently remove higher-stage entries when an entry at stage 0
 exists?  Do you silently remove stage 0 entry when higher-stage
 entries exist?  Do you error out without doing neither?


Sorry, I didn't explain my decission enough, and my knowledge of git
internals is not so good.
The idea of my proposal is to remove higher stage entries when, after
replacing an existing entry at stage 0, there are still entries in
higher stages.

In the problematic cases I've seen (specially git add and git reset
--hard) the final state of both, merged and unmerged files, is that
only an entry in stage 0 exists.
Also, the current implementation of git checkout -f silently removes
higher stage entries in this case.


 Silently removing these at runtime may not be something we would
 want to do; after all, we do not know if the broken tool actually
 wanted to have the higher stage entries, or the merged entry.


Yes, I have to agree on that, the user should have the final decission
about what stage entry to use, although I'm not sure if in the
previously commented cases there could be such an additional loss as
the operations that can be modified are already intended to silently
remove stage entries.

 Ideally, I think we should error out and let the users figure out
 how to proceed (we may of course need to make sure they have the
 necessary tools to do so, e.g. git cat-file blob 0:$path to
 resurrect the contents and git update-index --cacheinfo to stuff
 the contents into the stages).


I have also tried a couple of implementations of this patch with die()
and warning().
The implementation with die() would have a message like There are
other staged versions for merged file, and maybe some recomendation
about how to see the blobs.
The warning implementation could return -1, what would prevent git add
to remove the higher-stage entries, but would still make git reset
--hard to clean the index as it seems that it does it anyway if it
manages to finish the call to read_index_unmerged.
Another option would be to print the deleted entries as a warning but
deleting them anyway.

Which option would be better? And what could be a good message?

BTW, I didn't know git cat-file blob 0:$path, but I only manage to
get Not a valid object name fatals. How is it supposed to be used?

Thanks.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] read-cache.c: Ensure unmerged entries are removed

2014-08-13 Thread Junio C Hamano
Jaime Soriano Pastor jsorianopas...@gmail.com writes:

 In the problematic cases I've seen (specially git add and git reset
 --hard) the final state of both, merged and unmerged files, is that
 only an entry in stage 0 exists.
 Also, the current implementation of git checkout -f silently removes
 higher stage entries in this case.

 Silently removing these at runtime may not be something we would
 want to do; after all, we do not know if the broken tool actually
 wanted to have the higher stage entries, or the merged entry.


 Yes, I have to agree on that, the user should have the final decission
 about what stage entry to use, although I'm not sure if in the
 previously commented cases there could be such an additional loss as
 the operations that can be modified are already intended to silently
 remove stage entries.
 ...
 Which option would be better? And what could be a good message?

Being a conservative, I'd rather avoid doing any magic during
read_cache() time.  ls-files -s for example should show the four
stages so that the broken state can be inspected.

Instead, I suspect that the code paths with problematic iterations
over the index entries that assume that having stage #0 entry for a
path guarantees that there will not be any higher stage entry first
need to be identified (you already said add and reset may be
problematic, there may be others, or they may be the only ones, I
dunno), and then the most sensible one, which would be different
from case to case, among various possibilities need to be chosen as
a fix to each of them:

 (1) the loop may be fixed to ignore/skip unmerged entries;
 (2) the loop may be fixed to ignore/skip the merged entry;
 (3) the loop may be fixed not to spin indefinitely on a path with
 mixed entries; or
 (4) the command should error out.

Yes, it would be more work, but I'd feel safer if the following
worked:

$ git ls-files -s
100644 3cc58df83752123644fef39faab2393af643b1d2 0   conflict
100644 f70f10e4db19068f79bc43844b49f3eece45c4e8 1   conflict
100644 3cc58df83752123644fef39faab2393af643b1d2 2   conflict
100644 223b7836fb19fdf64ba2d3cd6173c6a283141f78 3   conflict
$ empty
$ git add empty
100644 3cc58df83752123644fef39faab2393af643b1d2 0   conflict
100644 f70f10e4db19068f79bc43844b49f3eece45c4e8 1   conflict
100644 3cc58df83752123644fef39faab2393af643b1d2 2   conflict
100644 223b7836fb19fdf64ba2d3cd6173c6a283141f78 3   conflict
100644 e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 0   empty
$ git cat-file blob :empty output
$ cmp empty output  echo OK
OK

which would be impossible to do if we nuked the problematic stages
whenever we read the index, I am afraid.

 BTW, I didn't know git cat-file blob 0:$path, but I only manage to
 get Not a valid object name fatals. How is it supposed to be used?

That was a typo of :$n:$path (where 0 = $n = 3).
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] read-cache.c: Ensure unmerged entries are removed

2014-08-12 Thread Jaime Soriano Pastor
A file in the index can be left as merged and unmerged at the same time
by some tools as libgit2, this causes some undesiderable behaviours in git.

I have seen, at least, these behaviours:
- git reset --hard consuming 100% CPU and never ending
- git reset --hard consuming all memory in git  2.0
- git add/git mergetool not resolving a conflict, even if they finish
  succesfully

The state is something like this:

$ git ls-files -s
100644 257cc5642cb1a054f08cc83f2d943e56fd3ebe99 0   conflict
100644 257cc5642cb1a054f08cc83f2d943e56fd3ebe99 1   conflict
100644 5716ca5987cbf97d6bb54920bea6adde242d87e6 2   conflict
100644 f2e41136eac73c39554dede1fd7e67b12502d577 3   conflict

This can be caused e.g. by libgit2 doing this:
1. Merge with conflicts, without solving them
2. Force checkout

I see that this is not caused by git (I haven't been able to reproduce it
only using git) but I think that git should be able to detect this situation
and even handle it, specially to avoid the never-ending git resets.   

The proposed patch serves as protection and autoremediation for this
kind of cases.
Another option would be to detect the issue and tell the user to clean the
index with git read-tree --empty, but I think this would be more intrussive
than the patch.

Regards,
Jaime Soriano.

Jaime Soriano Pastor (1):
  read-cache.c: Ensure unmerged entries are removed

 read-cache.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

-- 
2.0.4.1.g8a38f21.dirty

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] read-cache.c: Ensure unmerged entries are removed

2014-08-12 Thread Jaime Soriano Pastor
Wrong implementations of tools that modify the index can left
some files as merged and unmerged at the same time. Avoid undesiderable
behaviours by handling this situation.

Signed-off-by: Jaime Soriano Pastor jsorianopas...@gmail.com
---
 read-cache.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/read-cache.c b/read-cache.c
index 7f5645e..23e46e1 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -935,6 +935,7 @@ static int add_index_entry_with_check(struct index_state 
*istate, struct cache_e
int ok_to_replace = option  ADD_CACHE_OK_TO_REPLACE;
int skip_df_check = option  ADD_CACHE_SKIP_DFCHECK;
int new_only = option  ADD_CACHE_NEW_ONLY;
+   int replaced = 0;
 
cache_tree_invalidate_path(istate-cache_tree, ce-name);
pos = index_name_stage_pos(istate, ce-name, ce_namelen(ce), 
ce_stage(ce));
@@ -943,9 +944,10 @@ static int add_index_entry_with_check(struct index_state 
*istate, struct cache_e
if (pos = 0) {
if (!new_only)
replace_index_entry(istate, pos, ce);
-   return 0;
-   }
-   pos = -pos-1;
+   pos++;
+   replaced = 1;
+   } else
+   pos = -pos-1;
 
/*
 * Inserting a merged entry (stage 0) into the index
@@ -959,6 +961,8 @@ static int add_index_entry_with_check(struct index_state 
*istate, struct cache_e
}
}
 
+   if (replaced)
+   return 0;
if (!ok_to_add)
return -1;
if (!verify_path(ce-name))
-- 
2.0.4.1.g8a38f21.dirty

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] read-cache.c: Ensure unmerged entries are removed

2014-08-12 Thread Junio C Hamano
Jaime Soriano Pastor jsorianopas...@gmail.com writes:

 A file in the index can be left as merged and unmerged at the same time
 by some tools as libgit2, this causes some undesiderable behaviours in git.

Well, doesn't it mean that libgit2 is broken?  Have you filed a bug
over there yet?

Having said that, protecting ourselves from insanity left by other
people is always a good idea, provided that it can be done without
bending overly backwards.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] read-cache.c: Ensure unmerged entries are removed

2014-08-12 Thread Junio C Hamano
Jaime Soriano Pastor jsorianopas...@gmail.com writes:

 Wrong implementations of tools that modify the index can left
 some files as merged and unmerged at the same time. Avoid undesiderable
 behaviours by handling this situation.

It is understandable that the way _you_ decided to handle the
situation is so obvious to be spelled out to _you_, but that is the
most important design decision that needs to be described here.  Do
you silently remove higher-stage entries when an entry at stage 0
exists?  Do you silently remove stage 0 entry when higher-stage
entries exist?  Do you error out without doing neither?

Silently removing these at runtime may not be something we would
want to do; after all, we do not know if the broken tool actually
wanted to have the higher stage entries, or the merged entry.

Ideally, I think we should error out and let the users figure out
how to proceed (we may of course need to make sure they have the
necessary tools to do so, e.g. git cat-file blob 0:$path to
resurrect the contents and git update-index --cacheinfo to stuff
the contents into the stages).

Thanks.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html