Re: [PATCH] read-cache.c: Ensure unmerged entries are removed

2014-08-18 Thread Jaime Soriano Pastor
Yes, --index-info worked for this purpouse, thanks!

https://github.com/jsoriano/git/blob/remove-unmerged-index-entry/t/t9904-unmerged-file-with-merged-entry.sh#L25

I'll try to send the patches to the mailing lists later today or tomorrow.

On Mon, Aug 18, 2014 at 6:34 PM, Junio C Hamano  wrote:
> Jaime Soriano Pastor  writes:
>
>> I'd like to add some tests too for this, but I don't know how to
>> reproduce this state with git commands only, is there any way to add
>> entries to the index without checkings?
>
> Perhaps feeding "update-index --index-info" four input lines would work?
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] read-cache.c: Ensure unmerged entries are removed

2014-08-18 Thread Junio C Hamano
Jaime Soriano Pastor  writes:

> I'd like to add some tests too for this, but I don't know how to
> reproduce this state with git commands only, is there any way to add
> entries to the index without checkings?

Perhaps feeding "update-index --index-info" four input lines would work?
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] read-cache.c: Ensure unmerged entries are removed

2014-08-16 Thread Jaime Soriano Pastor
I'd like to add some tests too for this, but I don't know how to
reproduce this state with git commands only, is there any way to add
entries to the index without checkings?
Or maybe it could be done by creating a "test-" command that adds the
entries to an index?

Thanks.

On Fri, Aug 15, 2014 at 11:45 PM, Junio C Hamano  wrote:
> Jaime Soriano Pastor  writes:
>
>> On Thu, Aug 14, 2014 at 1:04 AM, Junio C Hamano  wrote:
>>> Being a conservative, I'd rather avoid doing any magic during
>>> read_cache() time. "ls-files -s" for example should show the four
>>> stages so that the "broken" state can be inspected.
>>>
>> Well, only read_cache_unmerged() is modified in the sent patch, so no
>> magic is done in read_cache(), I'd also avoid changes there.
>
> Ahh, I must have overlooked that; changes being only in _unmerged()
> variant makes me feel much better, and it probably would make much
> of ...
>
>>> Yes, it would be more work,...
>
> ... moot, hopefully ;-)
>
>> git reset will clean the index anyway if the loop finishes, would it
>> be ok?
>
> Surely.
>
>> git merge is also affected by the loop in read_cache_unmerged(), but
>> any of the solutions would be enough for it as only by finishing the
>> loop with unmerged entries it will die without commiting the cache to
>> the index file.
>
> Again, true.  The mergy operations want to start from a clean slate
> and they call _unmerged() variant primarily to learn that there were
> unmerged entries in the index, only to abort the operation in that
> case, so a change to _unmerged() variant should be safe for them.
>
> I'll take another look at your patch later, but not before the 2.1
> final, and by that time you may already have sent a reroll ;-)
>
> Thanks.



-- 
Jaime Soriano Pastor - Software Developer
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] read-cache.c: Ensure unmerged entries are removed

2014-08-15 Thread Junio C Hamano
Jaime Soriano Pastor  writes:

> On Thu, Aug 14, 2014 at 1:04 AM, Junio C Hamano  wrote:
>> Being a conservative, I'd rather avoid doing any magic during
>> read_cache() time. "ls-files -s" for example should show the four
>> stages so that the "broken" state can be inspected.
>>
> Well, only read_cache_unmerged() is modified in the sent patch, so no
> magic is done in read_cache(), I'd also avoid changes there.

Ahh, I must have overlooked that; changes being only in _unmerged()
variant makes me feel much better, and it probably would make much
of ...

>> Yes, it would be more work,...

... moot, hopefully ;-)

> git reset will clean the index anyway if the loop finishes, would it
> be ok?

Surely.

> git merge is also affected by the loop in read_cache_unmerged(), but
> any of the solutions would be enough for it as only by finishing the
> loop with unmerged entries it will die without commiting the cache to
> the index file.

Again, true.  The mergy operations want to start from a clean slate
and they call _unmerged() variant primarily to learn that there were
unmerged entries in the index, only to abort the operation in that
case, so a change to _unmerged() variant should be safe for them.

I'll take another look at your patch later, but not before the 2.1
final, and by that time you may already have sent a reroll ;-)

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] read-cache.c: Ensure unmerged entries are removed

2014-08-15 Thread Jaime Soriano Pastor
On Thu, Aug 14, 2014 at 1:04 AM, Junio C Hamano  wrote:
> Being a conservative, I'd rather avoid doing any magic during
> read_cache() time. "ls-files -s" for example should show the four
> stages so that the "broken" state can be inspected.
>
Well, only read_cache_unmerged() is modified in the sent patch, so no
magic is done in read_cache(), I'd also avoid changes there. Indeed
with the patch, "ls-files -s" can be used to inspect the problem
without further problems.

> Instead, I suspect that the code paths with problematic iterations
> over the index entries that assume that having stage #0 entry for a
> path guarantees that there will not be any higher stage entry first
> need to be identified (you already said "add" and "reset" may be
> problematic, there may be others, or they may be the only ones, I
> dunno), and then the most sensible one, which would be different
> from case to case, among various possibilities need to be chosen as
> a fix to each of them:
>
> (1) the loop may be fixed to ignore/skip unmerged entries;
> (2) the loop may be fixed to ignore/skip the merged entry;
> (3) the loop may be fixed not to spin indefinitely on a path with
> mixed entries; or
> (4) the command should error out.
>
git reset will clean the index anyway if the loop finishes, would it
be ok? I think that it'd be acceptable for git reset --hard to clean
the index as git checkout -f already does it even in this case.

git merge is also affected by the loop in read_cache_unmerged(), but
any of the solutions would be enough for it as only by finishing the
loop with unmerged entries it will die without commiting the cache to
the index file.

For git add probably the best option is to error out and ask the user
to check "git ls-files -s" to investigate the problem and decide what
to do.

The error message given by "git commit -a" is a bit confusing in this
case, I can take a look to this too.

I'll try to prepare a patch with these cases, and rethinking the loop
to avoid future problems there, I think that is a bit dangerous to
look for the position of a path entry (with index_name_pos) for the
next iteration.

> Yes, it would be more work, but I'd feel safer if the following
> worked:
>
> $ git ls-files -s
> 100644 3cc58df83752123644fef39faab2393af643b1d2 0 conflict
> 100644 f70f10e4db19068f79bc43844b49f3eece45c4e8 1 conflict
> 100644 3cc58df83752123644fef39faab2393af643b1d2 2 conflict
> 100644 223b7836fb19fdf64ba2d3cd6173c6a283141f78 3 conflict
> $ >empty
> $ git add empty
> 100644 3cc58df83752123644fef39faab2393af643b1d2 0 conflict
> 100644 f70f10e4db19068f79bc43844b49f3eece45c4e8 1 conflict
> 100644 3cc58df83752123644fef39faab2393af643b1d2 2 conflict
> 100644 223b7836fb19fdf64ba2d3cd6173c6a283141f78 3 conflict
> 100644 e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 0 empty
> $ git cat-file blob :empty >output
> $ cmp empty output && echo OK
> OK
>
> which would be impossible to do if we nuked the "problematic" stages
> whenever we read the index, I am afraid.
>
This works with the first patch as read_cache() is not modified, and
git add would only clean the entries for the paths passed as
arguments.

>> BTW, I didn't know "git cat-file blob 0:$path", but I only manage to
>> get "Not a valid object name" fatals. How is it supposed to be used?
>
> That was a typo of ":$n:$path" (where 0 <= $n <= 3).
Great, thanks!
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] read-cache.c: Ensure unmerged entries are removed

2014-08-13 Thread Junio C Hamano
Jaime Soriano Pastor  writes:

> In the problematic cases I've seen (specially git add and git reset
> --hard) the final state of both, merged and unmerged files, is that
> only an entry in stage 0 exists.
> Also, the current implementation of git checkout -f silently removes
> higher stage entries in this case.
>>
>> Silently removing these at runtime may not be something we would
>> want to do; after all, we do not know if the broken tool actually
>> wanted to have the higher stage entries, or the merged entry.
>>
>
> Yes, I have to agree on that, the user should have the final decission
> about what stage entry to use, although I'm not sure if in the
> previously commented cases there could be such an additional loss as
> the operations that can be modified are already intended to silently
> remove stage entries.
> ...
> Which option would be better? And what could be a good message?

Being a conservative, I'd rather avoid doing any magic during
read_cache() time.  "ls-files -s" for example should show the four
stages so that the "broken" state can be inspected.

Instead, I suspect that the code paths with problematic iterations
over the index entries that assume that having stage #0 entry for a
path guarantees that there will not be any higher stage entry first
need to be identified (you already said "add" and "reset" may be
problematic, there may be others, or they may be the only ones, I
dunno), and then the most sensible one, which would be different
from case to case, among various possibilities need to be chosen as
a fix to each of them:

 (1) the loop may be fixed to ignore/skip unmerged entries;
 (2) the loop may be fixed to ignore/skip the merged entry;
 (3) the loop may be fixed not to spin indefinitely on a path with
 mixed entries; or
 (4) the command should error out.

Yes, it would be more work, but I'd feel safer if the following
worked:

$ git ls-files -s
100644 3cc58df83752123644fef39faab2393af643b1d2 0   conflict
100644 f70f10e4db19068f79bc43844b49f3eece45c4e8 1   conflict
100644 3cc58df83752123644fef39faab2393af643b1d2 2   conflict
100644 223b7836fb19fdf64ba2d3cd6173c6a283141f78 3   conflict
$ >empty
$ git add empty
100644 3cc58df83752123644fef39faab2393af643b1d2 0   conflict
100644 f70f10e4db19068f79bc43844b49f3eece45c4e8 1   conflict
100644 3cc58df83752123644fef39faab2393af643b1d2 2   conflict
100644 223b7836fb19fdf64ba2d3cd6173c6a283141f78 3   conflict
100644 e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 0   empty
$ git cat-file blob :empty >output
$ cmp empty output && echo OK
OK

which would be impossible to do if we nuked the "problematic" stages
whenever we read the index, I am afraid.

> BTW, I didn't know "git cat-file blob 0:$path", but I only manage to
> get "Not a valid object name" fatals. How is it supposed to be used?

That was a typo of ":$n:$path" (where 0 <= $n <= 3).
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] read-cache.c: Ensure unmerged entries are removed

2014-08-13 Thread Jaime Soriano Pastor
On Tue, Aug 12, 2014 at 8:39 PM, Junio C Hamano  wrote:
>
> Jaime Soriano Pastor  writes:
>
> > Wrong implementations of tools that modify the index can left
> > some files as merged and unmerged at the same time. Avoid undesiderable
> > behaviours by handling this situation.
>
> It is understandable that the way _you_ decided to "handle the
> situation" is so obvious to be spelled out to _you_, but that is the
> most important design decision that needs to be described here.  Do
> you silently remove higher-stage entries when an entry at stage 0
> exists?  Do you silently remove stage 0 entry when higher-stage
> entries exist?  Do you error out without doing neither?
>

Sorry, I didn't explain my decission enough, and my knowledge of git
internals is not so good.
The idea of my proposal is to remove higher stage entries when, after
replacing an existing entry at stage 0, there are still entries in
higher stages.

In the problematic cases I've seen (specially git add and git reset
--hard) the final state of both, merged and unmerged files, is that
only an entry in stage 0 exists.
Also, the current implementation of git checkout -f silently removes
higher stage entries in this case.

>
> Silently removing these at runtime may not be something we would
> want to do; after all, we do not know if the broken tool actually
> wanted to have the higher stage entries, or the merged entry.
>

Yes, I have to agree on that, the user should have the final decission
about what stage entry to use, although I'm not sure if in the
previously commented cases there could be such an additional loss as
the operations that can be modified are already intended to silently
remove stage entries.

> Ideally, I think we should error out and let the users figure out
> how to proceed (we may of course need to make sure they have the
> necessary tools to do so, e.g. "git cat-file blob 0:$path" to
> resurrect the contents and "git update-index --cacheinfo" to stuff
> the contents into the stages).
>

I have also tried a couple of implementations of this patch with die()
and warning().
The implementation with die() would have a message like "There are
other staged versions for merged file", and maybe some recomendation
about how to see the blobs.
The warning implementation could return -1, what would prevent git add
to remove the higher-stage entries, but would still make git reset
--hard to clean the index as it seems that it does it anyway if it
manages to finish the call to read_index_unmerged.
Another option would be to print the deleted entries as a warning but
deleting them anyway.

Which option would be better? And what could be a good message?

BTW, I didn't know "git cat-file blob 0:$path", but I only manage to
get "Not a valid object name" fatals. How is it supposed to be used?

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] read-cache.c: Ensure unmerged entries are removed

2014-08-13 Thread Jaime Soriano Pastor
On Tue, Aug 12, 2014 at 8:31 PM, Junio C Hamano  wrote:
>
> Jaime Soriano Pastor  writes:
>
> > A file in the index can be left as merged and unmerged at the same time
> > by some tools as libgit2, this causes some undesiderable behaviours in git.
>
> Well, doesn't it mean that libgit2 is broken?  Have you filed a bug
> over there yet?
>

Yes, exactly, I think libgit2 is broken but I wanted to double-check
that it was still happening in their master branch, and it is. I have
reported the bug after checking it.
https://github.com/libgit2/libgit2/issues/2515

>
> Having said that, protecting ourselves from insanity left by other
> people is always a good idea, provided that it can be done without
> bending overly backwards.


Yes, I think the most important thing in this case is to protect git
against this kind of inconsistencies.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] read-cache.c: Ensure unmerged entries are removed

2014-08-12 Thread Junio C Hamano
Jaime Soriano Pastor  writes:

> Wrong implementations of tools that modify the index can left
> some files as merged and unmerged at the same time. Avoid undesiderable
> behaviours by handling this situation.

It is understandable that the way _you_ decided to "handle the
situation" is so obvious to be spelled out to _you_, but that is the
most important design decision that needs to be described here.  Do
you silently remove higher-stage entries when an entry at stage 0
exists?  Do you silently remove stage 0 entry when higher-stage
entries exist?  Do you error out without doing neither?

Silently removing these at runtime may not be something we would
want to do; after all, we do not know if the broken tool actually
wanted to have the higher stage entries, or the merged entry.

Ideally, I think we should error out and let the users figure out
how to proceed (we may of course need to make sure they have the
necessary tools to do so, e.g. "git cat-file blob 0:$path" to
resurrect the contents and "git update-index --cacheinfo" to stuff
the contents into the stages).

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] read-cache.c: Ensure unmerged entries are removed

2014-08-12 Thread Junio C Hamano
Jaime Soriano Pastor  writes:

> A file in the index can be left as merged and unmerged at the same time
> by some tools as libgit2, this causes some undesiderable behaviours in git.

Well, doesn't it mean that libgit2 is broken?  Have you filed a bug
over there yet?

Having said that, protecting ourselves from insanity left by other
people is always a good idea, provided that it can be done without
bending overly backwards.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] read-cache.c: Ensure unmerged entries are removed

2014-08-12 Thread Jaime Soriano Pastor
A file in the index can be left as merged and unmerged at the same time
by some tools as libgit2, this causes some undesiderable behaviours in git.

I have seen, at least, these behaviours:
- git reset --hard consuming 100% CPU and never ending
- git reset --hard consuming all memory in git < 2.0
- git add/git mergetool not resolving a conflict, even if they finish
  succesfully

The state is something like this:

$ git ls-files -s
100644 257cc5642cb1a054f08cc83f2d943e56fd3ebe99 0   conflict
100644 257cc5642cb1a054f08cc83f2d943e56fd3ebe99 1   conflict
100644 5716ca5987cbf97d6bb54920bea6adde242d87e6 2   conflict
100644 f2e41136eac73c39554dede1fd7e67b12502d577 3   conflict

This can be caused e.g. by libgit2 doing this:
1. Merge with conflicts, without solving them
2. Force checkout

I see that this is not caused by git (I haven't been able to reproduce it
only using git) but I think that git should be able to detect this situation
and even handle it, specially to avoid the never-ending git resets.   

The proposed patch serves as protection and autoremediation for this
kind of cases.
Another option would be to detect the issue and tell the user to clean the
index with git read-tree --empty, but I think this would be more intrussive
than the patch.

Regards,
Jaime Soriano.

Jaime Soriano Pastor (1):
  read-cache.c: Ensure unmerged entries are removed

 read-cache.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

-- 
2.0.4.1.g8a38f21.dirty

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] read-cache.c: Ensure unmerged entries are removed

2014-08-12 Thread Jaime Soriano Pastor
Wrong implementations of tools that modify the index can left
some files as merged and unmerged at the same time. Avoid undesiderable
behaviours by handling this situation.

Signed-off-by: Jaime Soriano Pastor 
---
 read-cache.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/read-cache.c b/read-cache.c
index 7f5645e..23e46e1 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -935,6 +935,7 @@ static int add_index_entry_with_check(struct index_state 
*istate, struct cache_e
int ok_to_replace = option & ADD_CACHE_OK_TO_REPLACE;
int skip_df_check = option & ADD_CACHE_SKIP_DFCHECK;
int new_only = option & ADD_CACHE_NEW_ONLY;
+   int replaced = 0;
 
cache_tree_invalidate_path(istate->cache_tree, ce->name);
pos = index_name_stage_pos(istate, ce->name, ce_namelen(ce), 
ce_stage(ce));
@@ -943,9 +944,10 @@ static int add_index_entry_with_check(struct index_state 
*istate, struct cache_e
if (pos >= 0) {
if (!new_only)
replace_index_entry(istate, pos, ce);
-   return 0;
-   }
-   pos = -pos-1;
+   pos++;
+   replaced = 1;
+   } else
+   pos = -pos-1;
 
/*
 * Inserting a merged entry ("stage 0") into the index
@@ -959,6 +961,8 @@ static int add_index_entry_with_check(struct index_state 
*istate, struct cache_e
}
}
 
+   if (replaced)
+   return 0;
if (!ok_to_add)
return -1;
if (!verify_path(ce->name))
-- 
2.0.4.1.g8a38f21.dirty

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html