Re: [PATCH v2] unpack-trees: do not abort when overwriting an existing file with the same content

2013-01-22 Thread Jeff King
On Mon, Jan 21, 2013 at 05:45:05PM -0800, Junio C Hamano wrote:

> Duy Nguyen  writes:
> 
> > On Tue, Jan 22, 2013 at 6:15 AM, Jeff King  wrote:
> >> Can you elaborate on when this code is triggered?
> >>
> >> In the general case, shouldn't we already know the sha1 of what's on
> >> disk in the index, and be able to just compare the hashes? And if we
> >> don't, because the entry is start-dirty, should we be updating it
> >> (possibly earlier, so we do not even get into the "need to write" code
> >> path) instead of doing this ad-hoc byte comparison?
> 
> If the index knows what is in the working tree, I think I agree.

Right. I was wondering why it didn't (and if it doesn't, why we are not
saving the information there).

But I think I was letting my inaccurate mental model of the index get in
the way. I tend to think of the stat information as "if the file matches
this stat info, then it has sha1 X". But that is not true. The sha1 we
store is the actual index entry, and if it does not match what is in the
working tree, we do not know or store the sha1 of what is in the working
tree. We cannot just "refresh" that value and compare it, which is what
I was implying.

So I think I was just confused. That is what I get for not actually
doing low-level index stuff enough.

> > git reset HEAD~10
> > # blah that was a mistake, undo it
> > git checkout HEAD@{1}

It seems like

  git reset HEAD@{1}

would be the correct undo, as the original never touched the working
tree.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] unpack-trees: do not abort when overwriting an existing file with the same content

2013-01-21 Thread Duy Nguyen
On Tue, Jan 22, 2013 at 8:45 AM, Junio C Hamano  wrote:
> Duy Nguyen  writes:
>
>> On Tue, Jan 22, 2013 at 6:15 AM, Jeff King  wrote:
>>> Can you elaborate on when this code is triggered?
>>>
>>> In the general case, shouldn't we already know the sha1 of what's on
>>> disk in the index, and be able to just compare the hashes? And if we
>>> don't, because the entry is start-dirty, should we be updating it
>>> (possibly earlier, so we do not even get into the "need to write" code
>>> path) instead of doing this ad-hoc byte comparison?
>
> If the index knows what is in the working tree, I think I agree.
>
>>> Confused...
>>
>> git reset HEAD~10
>> # blah that was a mistake, undo it
>> git checkout HEAD@{1}
>>
>> I hit it a few times, probably not with the exact recipe above though.
>
> I've seen myself doing "git reset HEAD~10" by mistake (I wanted "--hard"),
> but the recovery is to do "git reset --hard @{1}" and then come back
> with "git reset --hard HEAD~10", so it hasn't been a real problem
> for me.

Except when you already make some changes elsewhere and you don't want --hard.
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] unpack-trees: do not abort when overwriting an existing file with the same content

2013-01-21 Thread Junio C Hamano
Duy Nguyen  writes:

> On Tue, Jan 22, 2013 at 6:15 AM, Jeff King  wrote:
>> Can you elaborate on when this code is triggered?
>>
>> In the general case, shouldn't we already know the sha1 of what's on
>> disk in the index, and be able to just compare the hashes? And if we
>> don't, because the entry is start-dirty, should we be updating it
>> (possibly earlier, so we do not even get into the "need to write" code
>> path) instead of doing this ad-hoc byte comparison?

If the index knows what is in the working tree, I think I agree.

>> Confused...
>
> git reset HEAD~10
> # blah that was a mistake, undo it
> git checkout HEAD@{1}
>
> I hit it a few times, probably not with the exact recipe above though.

I've seen myself doing "git reset HEAD~10" by mistake (I wanted "--hard"),
but the recovery is to do "git reset --hard @{1}" and then come back
with "git reset --hard HEAD~10", so it hasn't been a real problem
for me.

A case similar to this is already covered by a special case of
two-tree read-tree where the index already matches the tree we are
checking out even though it is different from HEAD; in other words,
if you did this:

git init
date >file
git add file; git commit -m 'has a file'
git checkout -b side
git rm file; git commit -m 'does not have the file'
git checkout master file
: now index has the file from 'master' and worktree is clean
git checkout master

you shouldn't get any complaint, I think.

If you did "git rm --cached file" to lose it from the index,
immediately after "git checkout master file", then we wouldn't know
what we may be losing.  If the file in the working tree happens to
match the index and the HEAD after checking out the other branch
('master' in this case), it is _not_ losing information, so we might
be able to treat it as an extension of the existing special case.

I haven't thought things through to see if in a more general case
that this codepath triggers we might be losing a local changes that
we should be carrying forward while checking out a new branch,
though.


--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] unpack-trees: do not abort when overwriting an existing file with the same content

2013-01-21 Thread Duy Nguyen
On Tue, Jan 22, 2013 at 6:15 AM, Jeff King  wrote:
> Can you elaborate on when this code is triggered?
>
> In the general case, shouldn't we already know the sha1 of what's on
> disk in the index, and be able to just compare the hashes? And if we
> don't, because the entry is start-dirty, should we be updating it
> (possibly earlier, so we do not even get into the "need to write" code
> path) instead of doing this ad-hoc byte comparison?
>
> Confused...

git reset HEAD~10
# blah that was a mistake, undo it
git checkout HEAD@{1}

I hit it a few times, probably not with the exact recipe above though.
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] unpack-trees: do not abort when overwriting an existing file with the same content

2013-01-21 Thread Jeff King
On Mon, Jan 21, 2013 at 06:40:33PM +0700, Nguyen Thai Ngoc Duy wrote:

> + /*
> +  * If it has the same content that we are going to overwrite,
> +  * there's no point in complaining. We still overwrite it in
> +  * the end though.
> +  */
> + if (ce &&
> + S_ISREG(st->st_mode) && S_ISREG(ce->ce_mode) &&
> + (!trust_executable_bit ||
> +  (0100 & (ce->ce_mode ^ st->st_mode)) == 0) &&
> + st->st_size < SAME_CONTENT_SIZE_LIMIT &&
> + sha1_object_info(ce->sha1, &ce_size) == OBJ_BLOB &&
> + ce_size == st->st_size) {
> + void *buffer = NULL;
> + unsigned long size;
> + enum object_type type;
> + struct strbuf sb = STRBUF_INIT;
> + int matched =
> + strbuf_read_file(&sb, ce->name, ce_size) == ce_size &&
> + (buffer = read_sha1_file(ce->sha1, &type, &size)) != 
> NULL &&
> + type == OBJ_BLOB &&
> + size == ce_size &&
> + !memcmp(buffer, sb.buf, size);
> + free(buffer);
> + strbuf_release(&sb);
> + if (matched)
> + return 0;
> + }

Can you elaborate on when this code is triggered?

In the general case, shouldn't we already know the sha1 of what's on
disk in the index, and be able to just compare the hashes? And if we
don't, because the entry is start-dirty, should we be updating it
(possibly earlier, so we do not even get into the "need to write" code
path) instead of doing this ad-hoc byte comparison?

Confused...

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] unpack-trees: do not abort when overwriting an existing file with the same content

2013-01-21 Thread Nguyễn Thái Ngọc Duy

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 Changes: do not lose worktree's executable permission.

 t/t1011-read-tree-sparse-checkout.sh |  3 ++-
 t/t2021-checkout-overwrite.sh| 18 ++
 unpack-trees.c   | 31 +++
 3 files changed, 51 insertions(+), 1 deletion(-)

diff --git a/t/t1011-read-tree-sparse-checkout.sh 
b/t/t1011-read-tree-sparse-checkout.sh
index 5c0053a..38f9899 100755
--- a/t/t1011-read-tree-sparse-checkout.sh
+++ b/t/t1011-read-tree-sparse-checkout.sh
@@ -238,7 +238,8 @@ test_expect_success 'print errors when failed to update 
worktree' '
echo sub >.git/info/sparse-checkout &&
git checkout -f init &&
mkdir sub &&
-   touch sub/added sub/addedtoo &&
+   echo modified >sub/added &&
+   echo modified >sub/addedtoo &&
test_must_fail git checkout top 2>actual &&
cat >expected <<\EOF &&
 error: The following untracked working tree files would be overwritten by 
checkout:
diff --git a/t/t2021-checkout-overwrite.sh b/t/t2021-checkout-overwrite.sh
index 5da63e9..bb1696d 100755
--- a/t/t2021-checkout-overwrite.sh
+++ b/t/t2021-checkout-overwrite.sh
@@ -47,4 +47,22 @@ test_expect_success SYMLINKS 'checkout commit with dir must 
not remove untracked
test -h a/b
 '
 
+test_expect_success 'do not abort on overwriting an existing file with the 
same content' '
+   echo abc >bar &&
+   git add bar &&
+   git commit -m "new file" &&
+   git reset HEAD^ &&
+   git checkout HEAD@{1}
+'
+
+test_expect_success POSIXPERM 'do abort on an existing file, same content but 
different permission' '
+   git checkout -f HEAD^ &&
+   echo abc >bar &&
+   git add bar &&
+   git commit -m "new file" &&
+   git reset HEAD^ &&
+   chmod a+x bar &&
+   test_must_fail git checkout HEAD@{1}
+'
+
 test_done
diff --git a/unpack-trees.c b/unpack-trees.c
index 0e1a196..ea204ae 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -9,6 +9,8 @@
 #include "refs.h"
 #include "attr.h"
 
+#define SAME_CONTENT_SIZE_LIMIT (1024 * 1024)
+
 /*
  * Error messages expected by scripts out of plumbing commands such as
  * read-tree.  Non-scripted Porcelain is not required to use these messages
@@ -1363,6 +1365,7 @@ static int check_ok_to_remove(const char *name, int len, 
int dtype,
  struct unpack_trees_options *o)
 {
struct cache_entry *result;
+   unsigned long ce_size;
 
/*
 * It may be that the 'lstat()' succeeded even though
@@ -1405,6 +1408,34 @@ static int check_ok_to_remove(const char *name, int len, 
int dtype,
return 0;
}
 
+   /*
+* If it has the same content that we are going to overwrite,
+* there's no point in complaining. We still overwrite it in
+* the end though.
+*/
+   if (ce &&
+   S_ISREG(st->st_mode) && S_ISREG(ce->ce_mode) &&
+   (!trust_executable_bit ||
+(0100 & (ce->ce_mode ^ st->st_mode)) == 0) &&
+   st->st_size < SAME_CONTENT_SIZE_LIMIT &&
+   sha1_object_info(ce->sha1, &ce_size) == OBJ_BLOB &&
+   ce_size == st->st_size) {
+   void *buffer = NULL;
+   unsigned long size;
+   enum object_type type;
+   struct strbuf sb = STRBUF_INIT;
+   int matched =
+   strbuf_read_file(&sb, ce->name, ce_size) == ce_size &&
+   (buffer = read_sha1_file(ce->sha1, &type, &size)) != 
NULL &&
+   type == OBJ_BLOB &&
+   size == ce_size &&
+   !memcmp(buffer, sb.buf, size);
+   free(buffer);
+   strbuf_release(&sb);
+   if (matched)
+   return 0;
+   }
+
return o->gently ? -1 :
add_rejected_path(o, error_type, name);
 }
-- 
1.8.0.rc2.23.g1fb49df

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html