Re: [PATCH 3/3] reset.c: update files when using sparse to avoid data loss.

2017-04-17 Thread Duy Nguyen
On Sun, Apr 16, 2017 at 11:25 AM, Duy Nguyen  wrote:
>> Because this is a reset --mixed it will never run through unpack_trees and
>> The entries are never marked with CE_REMOVE.
>
> I know. But in my view, it should. All updates from a tree object to
> the index should happen through unpack_trees().

Just fyi. My view is wrong. We need to handle a diff here, not through
unpack_trees() because "git reset --mixed" support partial reset, see
2ce633b928 (git-reset [--mixed]  [--] ... - 2006-12-14).
We might be able to make unpack_trees() leave certain paths(pec)
untouched, but I don't think it is worth it. In other words, your
original patch is the way to go.

PS. I briefly wondered if "git checkout  -- " had the
same problem. I think not, because while --mixed does not touch
worktree, checkout does, so it should restore on-disk versions if
needed. The read_tree_some() call in checkout_paths() should respect
sparse patterns and add skip-worktree bits back if needed though, but
I don't think it does that.
-- 
Duy


Re: [PATCH 3/3] reset.c: update files when using sparse to avoid data loss.

2017-04-15 Thread Duy Nguyen
On Wed, Apr 12, 2017 at 10:37 PM, Kevin Willford <kewi...@microsoft.com> wrote:
>
>> -Original Message-
>> From: git-ow...@vger.kernel.org [mailto:git-ow...@vger.kernel.org] On
>> Behalf Of Duy Nguyen
>> Sent: Wednesday, April 12, 2017 7:21 AM
>> To: Kevin Willford <kewi...@microsoft.com>
>> Cc: Kevin Willford <kcwillf...@gmail.com>; git@vger.kernel.org;
>> gits...@pobox.com; p...@peff.net
>> Subject: Re: [PATCH 3/3] reset.c: update files when using sparse to avoid
>> data loss.
>>
>> On Wed, Apr 12, 2017 at 5:30 AM, Kevin Willford <kewi...@microsoft.com>
>> wrote:
>> > The loss of the skip-worktree bits is part of the problem if you are
>> > talking about modified files.  The other issue that I was having is
>> > when running a reset and there were files added in the commit that is
>> > being reset, there will not be an entry in the index and not a file on
>> > disk so the data for that file is completely lost at that point.
>> > "status" also doesn't include anything about this loss of data.  On
>> > modified files status will at least have the file as deleted since
>> > there is still an index entry but again the previous version of the file 
>> > and it's
>> data is lost.
>>
>> Well, we could have "deleted" index entries, those marked with
>> CE_REMOVE. They are never written down to file though, so 'status'
>> won't benefit from that. Hopefully we can restore the file before the index
>> file is written down and we really lose skip-worktree bits?
>
> Because this is a reset --mixed it will never run through unpack_trees and
> The entries are never marked with CE_REMOVE.

I know. But in my view, it should. All updates from a tree object to
the index should happen through unpack_trees().

>> > To me this is totally unexpected behavior, for example if I am on a
>> > commit where there are only files that where added and run a reset
>> > HEAD~1 and then a status, it will show a clean working directory.
>> > Regardless of skip-worktree bits the user needs to have the data in
>> > the working directory after the reset or data is lost which is always bad.
>>
>> I agree we no longer have a place to save the skip-worktree bit, we have to
>> restore the state back as if skip-worktree bit does not exist.
>> It would be good if we could keep the logic in unpack_trees() though.
>> For example, if the file is present on disk even if skip-worktree bit is on,
>> unpack_trees() would abort instead of silently overwriting it.
>> This is a difference between skip-worktree and assume-unchanged bits.
>> If you do explicit checkout_entry() you might have to add more checks to
>> keep behavior consistent.
>> --
>> Duy
>
> Because this is a reset --mixed it will follow the code path calling 
> read_from_tree
> and ends up calling update_index_from_diff in the format_callback of the diff,
> so unpack_trees() is never called in the --mixed case.  This code change also 
> only applies
> when the file does not exist and the skip-worktree bit is on and the updated
> index entry either will be missing (covers the added scenario) or was not 
> missing
> before (covers the modified scenario).  If there is a better way to get the 
> previous
> index entry to disk than what I am doing, I am happy to implement it 
> correctly.

I think it's ok to just look at the diff (from update_index_from_diff)
and restore the on-disk version for now. I'd like to make --mixed use
unpack_trees() too but I haven't studied  this code long enough to see
why it went with "diff" instead of "read-tree" (which translates
directly to unpack_trees). Maybe there is some subtle reason for that.
Though it looks like it was more convenient to do "diff" in the
git-reset.sh version, and that got translated literally to C when the
command was rewritten.
-- 
Duy


RE: [PATCH 3/3] reset.c: update files when using sparse to avoid data loss.

2017-04-12 Thread Kevin Willford

> -Original Message-
> From: git-ow...@vger.kernel.org [mailto:git-ow...@vger.kernel.org] On
> Behalf Of Duy Nguyen
> Sent: Wednesday, April 12, 2017 7:21 AM
> To: Kevin Willford <kewi...@microsoft.com>
> Cc: Kevin Willford <kcwillf...@gmail.com>; git@vger.kernel.org;
> gits...@pobox.com; p...@peff.net
> Subject: Re: [PATCH 3/3] reset.c: update files when using sparse to avoid
> data loss.
> 
> On Wed, Apr 12, 2017 at 5:30 AM, Kevin Willford <kewi...@microsoft.com>
> wrote:
> > The loss of the skip-worktree bits is part of the problem if you are
> > talking about modified files.  The other issue that I was having is
> > when running a reset and there were files added in the commit that is
> > being reset, there will not be an entry in the index and not a file on
> > disk so the data for that file is completely lost at that point.
> > "status" also doesn't include anything about this loss of data.  On
> > modified files status will at least have the file as deleted since
> > there is still an index entry but again the previous version of the file 
> > and it's
> data is lost.
> 
> Well, we could have "deleted" index entries, those marked with
> CE_REMOVE. They are never written down to file though, so 'status'
> won't benefit from that. Hopefully we can restore the file before the index
> file is written down and we really lose skip-worktree bits?

Because this is a reset --mixed it will never run through unpack_trees and 
The entries are never marked with CE_REMOVE.

> 
> > To me this is totally unexpected behavior, for example if I am on a
> > commit where there are only files that where added and run a reset
> > HEAD~1 and then a status, it will show a clean working directory.
> > Regardless of skip-worktree bits the user needs to have the data in
> > the working directory after the reset or data is lost which is always bad.
> 
> I agree we no longer have a place to save the skip-worktree bit, we have to
> restore the state back as if skip-worktree bit does not exist.
> It would be good if we could keep the logic in unpack_trees() though.
> For example, if the file is present on disk even if skip-worktree bit is on,
> unpack_trees() would abort instead of silently overwriting it.
> This is a difference between skip-worktree and assume-unchanged bits.
> If you do explicit checkout_entry() you might have to add more checks to
> keep behavior consistent.
> --
> Duy

Because this is a reset --mixed it will follow the code path calling 
read_from_tree
and ends up calling update_index_from_diff in the format_callback of the diff,
so unpack_trees() is never called in the --mixed case.  This code change also 
only applies
when the file does not exist and the skip-worktree bit is on and the updated 
index entry either will be missing (covers the added scenario) or was not 
missing
before (covers the modified scenario).  If there is a better way to get the 
previous
index entry to disk than what I am doing, I am happy to implement it correctly.

Thanks,
Kevin


Re: [PATCH 3/3] reset.c: update files when using sparse to avoid data loss.

2017-04-12 Thread Duy Nguyen
On Wed, Apr 12, 2017 at 5:30 AM, Kevin Willford  wrote:
> The loss of the skip-worktree bits is part of the problem if you are talking
> about modified files.  The other issue that I was having is when running a 
> reset
> and there were files added in the commit that is being reset, there will not
> be an entry in the index and not a file on disk so the data for that file is
> completely lost at that point.  "status" also doesn't include anything about
> this loss of data.  On modified files status will at least have the file as 
> deleted
> since there is still an index entry but again the previous version of the file
> and it's data is lost.

Well, we could have "deleted" index entries, those marked with
CE_REMOVE. They are never written down to file though, so 'status'
won't benefit from that. Hopefully we can restore the file before the
index file is written down and we really lose skip-worktree bits?

> To me this is totally unexpected behavior, for example if I am on a commit
> where there are only files that where added and run a reset HEAD~1 and
> then a status, it will show a clean working directory. Regardless of
> skip-worktree bits the user needs to have the data in the working directory
> after the reset or data is lost which is always bad.

I agree we no longer have a place to save the skip-worktree bit, we
have to restore the state back as if skip-worktree bit does not exist.
It would be good if we could keep the logic in unpack_trees() though.
For example, if the file is present on disk even if skip-worktree bit
is on, unpack_trees() would abort instead of silently overwriting it.
This is a difference between skip-worktree and assume-unchanged bits.
If you do explicit checkout_entry() you might have to add more checks
to keep behavior consistent.
-- 
Duy


RE: [PATCH 3/3] reset.c: update files when using sparse to avoid data loss.

2017-04-11 Thread Kevin Willford
> -Original Message-
> From: Duy Nguyen [mailto:pclo...@gmail.com]
> Sent: Monday, April 10, 2017 4:24 AM
> To: Kevin Willford <kcwillf...@gmail.com>
> Cc: git@vger.kernel.org; gits...@pobox.com; p...@peff.net; Kevin Willford
> <kewi...@microsoft.com>
> Subject: Re: [PATCH 3/3] reset.c: update files when using sparse to avoid
> data loss.
> 
> On Fri, Apr 07, 2017 at 12:23:57PM -0700, Kevin Willford wrote:
> > When using the sparse checkout feature the git reset command will add
> 
> "git reset" has three different modes. It would be good if you mention what
> mode is affected here. The tests are for --mixed only. I wonder if we need to
> do anything for --hard and --soft?
> 
> --soft touches branch SHA-1 index only, not worktree, so probably not.
> 
> --hard should be handled by unpack_trees(), I think.
> 
> But it would be good to cover these in the commit message as well to stop
> readers from wondering.

Sounds good.

> 
> > entries to the index that will have the skip-worktree bit off but will
> > leave the working directory empty.  File data is lost because the
> > index version of the files has been changed but there is nothing that
> > is in the working directory.  This will cause the next status call to
> > show either deleted for files modified or deleting or nothing for files
> added.
> > The added files should be shown as untracked and modified files should
> > be shown as modified.
> 
> Hmm.. reading --mixed documentation again ("Resets the index but not
> working tree"), I think the current behavior is expected regardless of skip-
> worktree bits.
> 
> Perhaps the problem is the loss of skip-worktree bits on entries added by
> update_index_from_diff()? If the bits are at the right place, then it should
> not matter if the same version exists on worktree or not and "status" or
> "commit" should work as expected, I think.
> 
> --
> Duy

The loss of the skip-worktree bits is part of the problem if you are talking
about modified files.  The other issue that I was having is when running a reset
and there were files added in the commit that is being reset, there will not
be an entry in the index and not a file on disk so the data for that file is
completely lost at that point.  "status" also doesn't include anything about
this loss of data.  On modified files status will at least have the file as 
deleted
since there is still an index entry but again the previous version of the file
and it's data is lost.

To me this is totally unexpected behavior, for example if I am on a commit
where there are only files that where added and run a reset HEAD~1 and
then a status, it will show a clean working directory. Regardless of 
skip-worktree bits the user needs to have the data in the working directory
after the reset or data is lost which is always bad.

Kevin



Re: [PATCH 3/3] reset.c: update files when using sparse to avoid data loss.

2017-04-10 Thread Duy Nguyen
On Fri, Apr 07, 2017 at 12:23:57PM -0700, Kevin Willford wrote:
> When using the sparse checkout feature the git reset command will add

"git reset" has three different modes. It would be good if you mention
what mode is affected here. The tests are for --mixed only. I wonder
if we need to do anything for --hard and --soft?

--soft touches branch SHA-1 index only, not worktree, so probably not.

--hard should be handled by unpack_trees(), I think.

But it would be good to cover these in the commit message as well to
stop readers from wondering.

> entries to the index that will have the skip-worktree bit off but will
> leave the working directory empty.  File data is lost because the index
> version of the files has been changed but there is nothing that is in the
> working directory.  This will cause the next status call to show either
> deleted for files modified or deleting or nothing for files added.
> The added files should be shown as untracked and modified files should
> be shown as modified.

Hmm.. reading --mixed documentation again ("Resets the index but not
working tree"), I think the current behavior is expected regardless of
skip-worktree bits.

Perhaps the problem is the loss of skip-worktree bits on entries added
by update_index_from_diff()? If the bits are at the right place, then
it should not matter if the same version exists on worktree or not and
"status" or "commit" should work as expected, I think.

--
Duy


Re: [PATCH 3/3] reset.c: update files when using sparse to avoid data loss.

2017-04-07 Thread Stefan Beller
On Fri, Apr 7, 2017 at 12:23 PM, Kevin Willford  wrote:
> When using the sparse checkout feature the git reset command will add
> entries to the index that will have the skip-worktree bit off but will
> leave the working directory empty.  File data is lost because the index
> version of the files has been changed but there is nothing that is in the
> working directory.  This will cause the next status call to show either
> deleted for files modified or deleting or nothing for files added.
> The added files should be shown as untracked and modified files should
> be shown as modified.
>
> To fix this when the reset is running if there is not a file in the
> working directory and if it will be missing with the new index entry
> or was not missing in the previous version, we create the previous index
> version of the file in the working directory so that status will report
> correctly and the files will be availble for the user to deal with.
>
> Signed-off-by: Kevin Willford 
> ---
>  builtin/reset.c  | 34 +++
>  t/t7114-reset-sparse-checkout.sh | 58 
> 
>  2 files changed, 92 insertions(+)
>  create mode 100755 t/t7114-reset-sparse-checkout.sh
>
> diff --git a/builtin/reset.c b/builtin/reset.c
> index 8ab915bfcb..8ba97999f4 100644
> --- a/builtin/reset.c
> +++ b/builtin/reset.c
> @@ -21,6 +21,7 @@
>  #include "parse-options.h"
>  #include "unpack-trees.h"
>  #include "cache-tree.h"
> +#include "dir.h"
>
>  static const char * const git_reset_usage[] = {
> N_("git reset [--mixed | --soft | --hard | --merge | --keep] [-q] 
> []"),
> @@ -117,12 +118,45 @@ static void update_index_from_diff(struct 
> diff_queue_struct *q,
> struct diff_options *opt, void *data)
>  {
> int i;
> +   int pos;

You could declare this inside the for loop.

> int intent_to_add = *(int *)data;
>
> for (i = 0; i < q->nr; i++) {
> struct diff_filespec *one = q->queue[i]->one;
> +   struct diff_filespec *two = q->queue[i]->two;
> int is_missing = !(one->mode && !is_null_oid(>oid));
> +   int was_missing = !two->mode && is_null_oid(>oid);
> struct cache_entry *ce;
> +   struct cache_entry *ceBefore;
> +   struct checkout state = CHECKOUT_INIT;
> +
> +   /*
> +* When using the sparse-checkout feature the cache entries 
> that are
> +* added here will not have the skip-worktree bit set.
> +* Without this code there is data that is lost because the 
> files that
> +* would normally be in the working directory are not there 
> and show as
> +* deleted for the next status or in the case of added files 
> just disappear.
> +* We need to create the previous version of the files in the 
> working
> +* directory so that they will have the right content and the 
> next
> +* status call will show modified or untracked files 
> correctly.
> +*/

This comment also belongs to the commit message IMHO, as
it describes the bug on a high level, and when it is in the code it wastes
screen real estate; commit messages however have low prices for
screen real estate. ;-)

> +   if (core_apply_sparse_checkout && !file_exists(two->path))
> +   {

Coding style: Git prefers { at the end of the line:

  if (..) {
..

> +test_expect_success 'reset after deleting file without skip-worktree bit' '
> +   git checkout -f endCommit &&
> +   git clean -xdf &&
> +   echo "/c
> +/m" >.git/info/sparse-checkout &&

huh? Did your mailer wrap lines here or do you intend to have
a LF in the output?

Assuming the latter, I think we prefer cat with here-doc for
multi line content, i.e.

cat >.git/info/sparse-checkout <<-\EOF
/c
/m
EOF
test_config ...
...


Thanks,
Stefan


[PATCH 3/3] reset.c: update files when using sparse to avoid data loss.

2017-04-07 Thread Kevin Willford
When using the sparse checkout feature the git reset command will add
entries to the index that will have the skip-worktree bit off but will
leave the working directory empty.  File data is lost because the index
version of the files has been changed but there is nothing that is in the
working directory.  This will cause the next status call to show either
deleted for files modified or deleting or nothing for files added.
The added files should be shown as untracked and modified files should
be shown as modified.

To fix this when the reset is running if there is not a file in the
working directory and if it will be missing with the new index entry
or was not missing in the previous version, we create the previous index
version of the file in the working directory so that status will report
correctly and the files will be availble for the user to deal with.

Signed-off-by: Kevin Willford 
---
 builtin/reset.c  | 34 +++
 t/t7114-reset-sparse-checkout.sh | 58 
 2 files changed, 92 insertions(+)
 create mode 100755 t/t7114-reset-sparse-checkout.sh

diff --git a/builtin/reset.c b/builtin/reset.c
index 8ab915bfcb..8ba97999f4 100644
--- a/builtin/reset.c
+++ b/builtin/reset.c
@@ -21,6 +21,7 @@
 #include "parse-options.h"
 #include "unpack-trees.h"
 #include "cache-tree.h"
+#include "dir.h"
 
 static const char * const git_reset_usage[] = {
N_("git reset [--mixed | --soft | --hard | --merge | --keep] [-q] 
[]"),
@@ -117,12 +118,45 @@ static void update_index_from_diff(struct 
diff_queue_struct *q,
struct diff_options *opt, void *data)
 {
int i;
+   int pos;
int intent_to_add = *(int *)data;
 
for (i = 0; i < q->nr; i++) {
struct diff_filespec *one = q->queue[i]->one;
+   struct diff_filespec *two = q->queue[i]->two;
int is_missing = !(one->mode && !is_null_oid(>oid));
+   int was_missing = !two->mode && is_null_oid(>oid);
struct cache_entry *ce;
+   struct cache_entry *ceBefore;
+   struct checkout state = CHECKOUT_INIT;
+
+   /*
+* When using the sparse-checkout feature the cache entries 
that are
+* added here will not have the skip-worktree bit set.
+* Without this code there is data that is lost because the 
files that
+* would normally be in the working directory are not there and 
show as
+* deleted for the next status or in the case of added files 
just disappear.
+* We need to create the previous version of the files in the 
working
+* directory so that they will have the right content and the 
next
+* status call will show modified or untracked files correctly.
+*/
+   if (core_apply_sparse_checkout && !file_exists(two->path))
+   {
+   pos = cache_name_pos(two->path, strlen(two->path));
+   if ((pos >= 0 && ce_skip_worktree(active_cache[pos])) 
&& (is_missing || !was_missing))
+   {
+   state.force = 1;
+   state.refresh_cache = 1;
+   state.istate = _index;
+   ceBefore = make_cache_entry(two->mode, 
two->oid.hash, two->path,
+   0, 0);
+   if (!ceBefore)
+   die(_("make_cache_entry failed for path 
'%s'"),
+   two->path);
+
+   checkout_entry(ceBefore, , NULL);
+   }
+   }
 
if (is_missing && !intent_to_add) {
remove_file_from_cache(one->path);
diff --git a/t/t7114-reset-sparse-checkout.sh b/t/t7114-reset-sparse-checkout.sh
new file mode 100755
index 00..8dd88fd46d
--- /dev/null
+++ b/t/t7114-reset-sparse-checkout.sh
@@ -0,0 +1,58 @@
+#!/bin/sh
+
+test_description='reset when using a sparse-checkout'
+
+. ./test-lib.sh
+
+# reset using a sparse-checkout file
+
+test_expect_success 'setup' '
+   test_tick &&
+   echo "checkout file" >c &&
+   echo "modify file" >m &&
+   echo "delete file" >d &&
+   git add . &&
+   git commit -m "initial commit" &&
+   echo "added file" >a &&
+   echo "modification of a file" >m &&
+   git rm d &&
+   git add . &&
+   git commit -m "second commit" &&
+   git checkout -b endCommit
+'
+
+test_expect_success 'reset when there is a sparse-checkout' '
+   echo "/c" >.git/info/sparse-checkout &&
+   test_config core.sparsecheckout true &&
+   git checkout -b resetBranch &&
+   test_path_is_missing m &&
+   test_path_is_missing a &&
+   test_path_is_missing d &&
+