Re: [RFC] git checkout $tree -- $path always rewrites files

2014-11-14 Thread Junio C Hamano
David Aguilar dav...@gmail.com writes:

 On Sun, Nov 09, 2014 at 09:21:49AM -0800, Junio C Hamano wrote:
 
 It might be less risky if the updated semantics were to make the
 paths that are originally in the index but not in $tree untracked
 (as opposed to reset --hard emulation where they will be lost)
 unless they need to be removed to make room for D/F conflict issues,
 but I haven't thought it through.

 Git has always been really careful to not lose data.

 One way to avoid the problem of changing existing semantics is
 to make the new semantics accessible behind a flag, e.g.
 git checkout --hard HEAD -- some-new-path.

Yup, but you seem to be behind by a few exchanges, as we tentatively
decided that we won't talk about changing the semantics and concentrate
on fixing the implementation glitches only at least for now ;-)

I find that --hard is not a very good name for the new mode.
There will be different kinds of more than what we usually do
modes of operations discovered over time in the coming years, and it
is better to be more specific to denote in what way we are doing it
harder (I think the difference the proposed new mode has is to also
checkout absense of the paths).

But in this particular case, making the paths that are absent in $tree
we are checking out of into untracked paths (instead of removing) is
a right balance of safety---it is similar to git reset HEAD (no
--hard) after adding a new path which leaves the file in the
working tree.



--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] git checkout $tree -- $path always rewrites files

2014-11-13 Thread Jeff King
On Sun, Nov 09, 2014 at 09:21:49AM -0800, Junio C Hamano wrote:

 Jeff King p...@peff.net writes:
 
  On Fri, Nov 07, 2014 at 11:35:59PM -0800, Junio C Hamano wrote:
 
  I think that has direct linkage; what you have in mind I think is
  http://thread.gmane.org/gmane.comp.version-control.git/234903/focus=234935
 
  Thanks for that link.
 
 It was one of the items in the git blame leftover bits list
 (websearch for that exact phrase), so I didn't have to do any
 digging just for this thread ;-)
 
 But I made a huge typo above.  s/I think/I do not think/;

Oh. That might explain some of my confusion. :)

 The original observation you made in this thread is that when git
 checkout $tree - $pathspec, whose defintion is to grab paths in
 the $tree that match $pathspec and put them in the index, and then
 overwrite the working tree paths with the contents of these paths
 that are updated in the index with the contents taken from the
 $tree, unnecessarily rewrites the working tree paths even when they
 already had contents that were up to date.  That is what I called an
 implementation glitch.
 
 The old thread is a different topic.
 [...]

Right, I do agree that these things don't need to be linked. The reason
I ended up dealing with the deletion thing is that one obvious way to
implement do not touch entries which have not changed is by running a
diff between $tree and the index. But doing a diff means we have to
reconsider all of the code surrounding deletions (and handling of
unmerged entries in the pathspec but not in $tree). I tried a few
variants and had trouble making it work (getting caught up either in the
make sure each pathspec matched code, or the treat unmerged entries
specially behavior).

In the patch below, I ended up retaining the existing
read_tree_recursive code, and just specially handling the replacement of
index entries (which is itself sort of a diff, but it fits nicely into
the existing scheme).

 I'd prefer that these two to be treated separately.

Yeah, that makes sense after reading your emails. What I was really
unclear on was whether the handling of deletion was a bug or a design
choice, and it is the latter (if it were the former, we would not need a
transition plan :) ).

Anyway, here is the patch.

-- 8 --
Subject: checkout $tree: do not throw away unchanged index entries

When we git checkout $tree, we pull paths from $tree into
the index, and then check the resulting entries out to the
worktree. Our method for the first step is rather
heavy-handed, though; it clobbers the entire existing index
entry, even if the content is the same. This means we lose
our stat information, leading checkout_entry to later
rewrite the entire file with identical content.

Instead, let's see if we have the identical entry already in
the index, in which case we leave it in place. That lets
checkout_entry do the right thing. Our tests cover two
interesting cases:

  1. We make sure that a file which has no changes is not
 rewritten.

  2. We make sure that we do update a file that is unchanged
 in the index (versus $tree), but has working tree
 changes. We keep the old index entry, and
 checkout_entry is able to realize that our stat
 information is out of date.

Signed-off-by: Jeff King p...@peff.net
---
Note that the test refreshes the index manually (because we are tweaking
the timestamp of file2). In normal use this should not be necessary
(i.e., your entries should generally be uptodate). I did wonder if
checkout should be refreshing the index itself, but it would a bunch of
extra lstats in the common case.

 builtin/checkout.c| 31 +--
 t/t2022-checkout-paths.sh | 17 +
 2 files changed, 42 insertions(+), 6 deletions(-)

diff --git a/builtin/checkout.c b/builtin/checkout.c
index 5410dac..67cab4e 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -65,21 +65,40 @@ static int post_checkout_hook(struct commit *old, struct 
commit *new,
 static int update_some(const unsigned char *sha1, const char *base, int 
baselen,
const char *pathname, unsigned mode, int stage, void *context)
 {
-   int len;
+   struct strbuf buf = STRBUF_INIT;
struct cache_entry *ce;
+   int pos;
 
if (S_ISDIR(mode))
return READ_TREE_RECURSIVE;
 
-   len = baselen + strlen(pathname);
-   ce = xcalloc(1, cache_entry_size(len));
+   strbuf_add(buf, base, baselen);
+   strbuf_addstr(buf, pathname);
+
+   /*
+* If the entry is the same as the current index, we can leave the old
+* entry in place. Whether it is UPTODATE or not, checkout_entry will
+* do the right thing.
+*/
+   pos = cache_name_pos(buf.buf, buf.len);
+   if (pos = 0) {
+   struct cache_entry *old = active_cache[pos];
+   if (create_ce_mode(mode) == old-ce_mode 
+   !hashcmp(old-sha1, sha1)) {
+   old-ce_flags |= 

Re: [RFC] git checkout $tree -- $path always rewrites files

2014-11-13 Thread Junio C Hamano
Jeff King p...@peff.net writes:

 On Sun, Nov 09, 2014 at 09:21:49AM -0800, Junio C Hamano wrote:

 Jeff King p...@peff.net writes:
 
  On Fri, Nov 07, 2014 at 11:35:59PM -0800, Junio C Hamano wrote:
 
  I think that has direct linkage; what you have in mind I think is
  http://thread.gmane.org/gmane.comp.version-control.git/234903/focus=234935
 
  Thanks for that link.
 
 It was one of the items in the git blame leftover bits list
 (websearch for that exact phrase), so I didn't have to do any
 digging just for this thread ;-)
 
 But I made a huge typo above.  s/I think/I do not think/;

 Oh. That might explain some of my confusion. :)

Yeah, tells me to never type on a tablet X-.

 I'd prefer that these two to be treated separately.

 Yeah, that makes sense after reading your emails. What I was really
 unclear on was whether the handling of deletion was a bug or a design
 choice, and it is the latter (if it were the former, we would not need a
 transition plan :) ).

Yeah, I think we agree to refrain from saying if that design choice
was a good one or bad one at least for now.

 Subject: checkout $tree: do not throw away unchanged index entries

 When we git checkout $tree, we pull paths from $tree into
 the index, and then check the resulting entries out to the
 worktree. Our method for the first step is rather
 heavy-handed, though; it clobbers the entire existing index
 entry, even if the content is the same. This means we lose
 our stat information, leading checkout_entry to later
 rewrite the entire file with identical content.

 Instead, let's see if we have the identical entry already in
 the index, in which case we leave it in place. That lets
 checkout_entry do the right thing. Our tests cover two
 interesting cases:

   1. We make sure that a file which has no changes is not
  rewritten.

   2. We make sure that we do update a file that is unchanged
  in the index (versus $tree), but has working tree
  changes. We keep the old index entry, and
  checkout_entry is able to realize that our stat
  information is out of date.

 Signed-off-by: Jeff King p...@peff.net
 ---
 Note that the test refreshes the index manually (because we are tweaking
 the timestamp of file2). In normal use this should not be necessary
 (i.e., your entries should generally be uptodate). I did wonder if
 checkout should be refreshing the index itself, but it would a bunch of
 extra lstats in the common case.

  builtin/checkout.c| 31 +--
  t/t2022-checkout-paths.sh | 17 +
  2 files changed, 42 insertions(+), 6 deletions(-)

 diff --git a/builtin/checkout.c b/builtin/checkout.c
 index 5410dac..67cab4e 100644
 --- a/builtin/checkout.c
 +++ b/builtin/checkout.c
 @@ -65,21 +65,40 @@ static int post_checkout_hook(struct commit *old, struct 
 commit *new,
  static int update_some(const unsigned char *sha1, const char *base, int 
 baselen,
   const char *pathname, unsigned mode, int stage, void *context)
  {
 ...
  }

Makes sense, including the use of strbuf (otherwise you would
allocate ce and then discard when it turns out that it is not
needed, which is probably with the same allocation pressure, but
looks uglier).

 diff --git a/t/t2022-checkout-paths.sh b/t/t2022-checkout-paths.sh
 index 8e3545d..f46d049 100755
 --- a/t/t2022-checkout-paths.sh
 +++ b/t/t2022-checkout-paths.sh
 @@ -61,4 +61,21 @@ test_expect_success 'do not touch unmerged entries 
 matching $path but not in $tr
   test_cmp expect.next0 actual.next0
  '
  
 +test_expect_success 'do not touch files that are already up-to-date' '
 + git reset --hard 
 + echo one file1 
 + echo two file2 
 + git add file1 file2 
 + git commit -m base 
 + echo modified file1 
 + test-chmtime =10 file2 

Is the idea behind the hardcoded timestamp that this is sufficiently
old (Sep 2001) that we will not get in trouble comparing with the
real timestamp we get from the filesystem (which will definitely newer
than that anyway) no matter when we run this test (unless you have a
time-machine, that is)?

 + git update-index -q --refresh 
 + git checkout HEAD -- file1 file2 
 + echo one expect 
 + test_cmp expect file1 
 + echo 10file2 expect 
 + test-chmtime -v +0 file2 actual 
 + test_cmp expect actual
 +'
 +
  test_done
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] git checkout $tree -- $path always rewrites files

2014-11-13 Thread Jeff King
On Thu, Nov 13, 2014 at 11:15:27AM -0800, Junio C Hamano wrote:

  diff --git a/builtin/checkout.c b/builtin/checkout.c
  index 5410dac..67cab4e 100644
  --- a/builtin/checkout.c
  +++ b/builtin/checkout.c
  @@ -65,21 +65,40 @@ static int post_checkout_hook(struct commit *old, 
  struct commit *new,
   static int update_some(const unsigned char *sha1, const char *base, int 
  baselen,
  const char *pathname, unsigned mode, int stage, void *context)
   {
  ...
   }
 
 Makes sense, including the use of strbuf (otherwise you would
 allocate ce and then discard when it turns out that it is not
 needed, which is probably with the same allocation pressure, but
 looks uglier).

Exactly. Constructing it in ce-name does save you an allocation/memcpy
in the case that we actually use the new entry, but I thought it would
look weirder. It probably doesn't matter much either way, so I tried to
write the most obvious thing.

(I also experimented with using make_cache_entry at one point, which
requires the strbuf; some of my thinking on what looks reasonable may be
left over from that approach).

  +test_expect_success 'do not touch files that are already up-to-date' '
  +   git reset --hard 
  +   echo one file1 
  +   echo two file2 
  +   git add file1 file2 
  +   git commit -m base 
  +   echo modified file1 
  +   test-chmtime =10 file2 
 
 Is the idea behind the hardcoded timestamp that this is sufficiently
 old (Sep 2001) that we will not get in trouble comparing with the
 real timestamp we get from the filesystem (which will definitely newer
 than that anyway) no matter when we run this test (unless you have a
 time-machine, that is)?

I didn't actually calculate what the timestamp was. The important thing
is that it is not the timestamp that your system would give to the file
if git-checkout opened and rewrote it. :)

I initially used 123, but was worried that would cause weird
portability problems on systems. So I opted for something closer to
normal, but in the past. I think it is fine (modulo time machines),
but I'd be happy to put in some more obvious sentinel, too.

And the worst case if you did have a time machine is that we might
accidentally declare a buggy git to be correct (racily!). I can live
with that, but I guess you could use a relative value (like -1)
instead of a fixed sentinel (but then you'd have to record it for the
expect check).

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] git checkout $tree -- $path always rewrites files

2014-11-13 Thread Jeff King
On Thu, Nov 13, 2014 at 02:26:55PM -0500, Jeff King wrote:

  Makes sense, including the use of strbuf (otherwise you would
  allocate ce and then discard when it turns out that it is not
  needed, which is probably with the same allocation pressure, but
  looks uglier).
 
 Exactly. Constructing it in ce-name does save you an allocation/memcpy
 in the case that we actually use the new entry, but I thought it would
 look weirder. It probably doesn't matter much either way, so I tried to
 write the most obvious thing.

Actually, it is not that bad:

diff --git a/builtin/checkout.c b/builtin/checkout.c
index 5410dac..5a78758 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -67,6 +67,7 @@ static int update_some(const unsigned char *sha1, const char 
*base, int baselen,
 {
int len;
struct cache_entry *ce;
+   int pos;
 
if (S_ISDIR(mode))
return READ_TREE_RECURSIVE;
@@ -79,6 +80,23 @@ static int update_some(const unsigned char *sha1, const char 
*base, int baselen,
ce-ce_flags = create_ce_flags(0) | CE_UPDATE;
ce-ce_namelen = len;
ce-ce_mode = create_ce_mode(mode);
+
+   /*
+* If the entry is the same as the current index, we can leave the old
+* entry in place. Whether it is UPTODATE or not, checkout_entry will
+* do the right thing.
+*/
+   pos = cache_name_pos(ce-name, ce-ce_namelen);
+   if (pos = 0) {
+   struct cache_entry *old = active_cache[pos];
+   if (ce-ce_mode == old-ce_mode 
+   !hashcmp(ce-sha1, old-sha1)) {
+   old-ce_flags |= CE_UPDATE;
+   free(ce);
+   return 0;
+   }
+   }
+
add_cache_entry(ce, ADD_CACHE_OK_TO_ADD | ADD_CACHE_OK_TO_REPLACE);
return 0;
 }

and in some ways more readable, as you form the whole thing, and then as
the final step either add it, or realize that what is there is fine (I'd
almost wonder if it could be a flag to add_cache_entry).

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] git checkout $tree -- $path always rewrites files

2014-11-13 Thread Junio C Hamano
Jeff King p...@peff.net writes:

 On Thu, Nov 13, 2014 at 02:26:55PM -0500, Jeff King wrote:

  Makes sense, including the use of strbuf (otherwise you would
  allocate ce and then discard when it turns out that it is not
  needed, which is probably with the same allocation pressure, but
  looks uglier).
 
 Exactly. Constructing it in ce-name does save you an allocation/memcpy
 in the case that we actually use the new entry, but I thought it would
 look weirder. It probably doesn't matter much either way, so I tried to
 write the most obvious thing.

 Actually, it is not that bad:

Yeah, actually it does look better; want me to squash it into the
patch before queuing?


 diff --git a/builtin/checkout.c b/builtin/checkout.c
 index 5410dac..5a78758 100644
 --- a/builtin/checkout.c
 +++ b/builtin/checkout.c
 @@ -67,6 +67,7 @@ static int update_some(const unsigned char *sha1, const 
 char *base, int baselen,
  {
   int len;
   struct cache_entry *ce;
 + int pos;
  
   if (S_ISDIR(mode))
   return READ_TREE_RECURSIVE;
 @@ -79,6 +80,23 @@ static int update_some(const unsigned char *sha1, const 
 char *base, int baselen,
   ce-ce_flags = create_ce_flags(0) | CE_UPDATE;
   ce-ce_namelen = len;
   ce-ce_mode = create_ce_mode(mode);
 +
 + /*
 +  * If the entry is the same as the current index, we can leave the old
 +  * entry in place. Whether it is UPTODATE or not, checkout_entry will
 +  * do the right thing.
 +  */
 + pos = cache_name_pos(ce-name, ce-ce_namelen);
 + if (pos = 0) {
 + struct cache_entry *old = active_cache[pos];
 + if (ce-ce_mode == old-ce_mode 
 + !hashcmp(ce-sha1, old-sha1)) {
 + old-ce_flags |= CE_UPDATE;
 + free(ce);
 + return 0;
 + }
 + }
 +
   add_cache_entry(ce, ADD_CACHE_OK_TO_ADD | ADD_CACHE_OK_TO_REPLACE);
   return 0;
  }

 and in some ways more readable, as you form the whole thing, and then as
 the final step either add it, or realize that what is there is fine (I'd
 almost wonder if it could be a flag to add_cache_entry).

 -Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] git checkout $tree -- $path always rewrites files

2014-11-13 Thread Jeff King
On Thu, Nov 13, 2014 at 01:18:43PM -0800, Junio C Hamano wrote:

 Jeff King p...@peff.net writes:
 
  On Thu, Nov 13, 2014 at 02:26:55PM -0500, Jeff King wrote:
 
   Makes sense, including the use of strbuf (otherwise you would
   allocate ce and then discard when it turns out that it is not
   needed, which is probably with the same allocation pressure, but
   looks uglier).
  
  Exactly. Constructing it in ce-name does save you an allocation/memcpy
  in the case that we actually use the new entry, but I thought it would
  look weirder. It probably doesn't matter much either way, so I tried to
  write the most obvious thing.
 
  Actually, it is not that bad:
 
 Yeah, actually it does look better; want me to squash it into the
 patch before queuing?

Yeah, if you like it, too, then let's go with it. Thanks.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] git checkout $tree -- $path always rewrites files

2014-11-13 Thread David Aguilar
On Sun, Nov 09, 2014 at 09:21:49AM -0800, Junio C Hamano wrote:
 Jeff King p...@peff.net writes:
 
  So just to be clear, the behavior we want is that:
 
echo foo some-new-path
git add some-new-path
git checkout HEAD -- .
 
  will delete some-new-path (whereas the current code turns it into an
  untracked file).
 
 With the updated semantics proposed in the old thread, yes, that is
 what should happen.
 
git checkout HEAD -- some-new-path
 
  do in that case?
 
 Likewise.  And if some-new-path were a directory, with existing path
 O and new path N both in the index but only the former in HEAD, the
 operation would revert some-new-path/O to that of HEAD and remove
 some-new-path/N.  That is the only logical thing we could do if we
 were to take the updated sematics.
 
 That is one of the reasons why I am not 100% convinced that the
 proposed updated semantics is better, even though I was fairly
 positive in the old discussion and also I kept the topic in the
 leftover bits list.  The above command is a fairly common way to
 say I started refactoring the existing path some-path/O and
 sprinkled its original contents spread into new files A, B and C in
 the same directory.  Now I no longer have O in the working tree, but
 let me double check by grabbing it out of the state recoded in the
 commit.  You expect that git checkout HEAD -- some-path would not
 lose A, B or C, knowing some-path only had O.  That expectation
 would even be stronger if you are used to the current semantics, but
 that is something we could fix, if we decide that the proposed
 updated semantics is better, with a careful transition plan.
 
 It might be less risky if the updated semantics were to make the
 paths that are originally in the index but not in $tree untracked
 (as opposed to reset --hard emulation where they will be lost)
 unless they need to be removed to make room for D/F conflict issues,
 but I haven't thought it through.


Git has always been really careful to not lose data.

One way to avoid the problem of changing existing semantics is
to make the new semantics accessible behind a flag, e.g.
git checkout --hard HEAD -- some-new-path.
-- 
David
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] git checkout $tree -- $path always rewrites files

2014-11-09 Thread Jeff King
On Sat, Nov 08, 2014 at 08:19:21AM -0800, Martin von Zweigbergk wrote:

  What should:
 
git checkout HEAD -- some-new-path
 
  do in that case? With the current code, it actually barfs, complaining
  that nothing matched some-new-path (because it is not part of HEAD, and
  therefore we don't consider it at all), and aborts the whole operation.
  I think we would want to delete some-new-path in that case, too.
 
 I don't think we'd want it to be deleted. I would view 'git reset
 --hard' as the role model here, and that command (without paths) would
 not remove the file. And applying it to a path should not change the
 behavior, just restrict it to the paths, right?

Are you sure about git reset here? If I do:

  git init
  echo content file  git add file  git commit -m base
  echo modified file
  echo new some-new-path
  git add file some-new-path
  git reset --hard

then we delete some-new-path (it is not untracked, because the index
knows about it). That makes sense to me. I.e., we treat it with the same
preciousness whether it is named explicitly or not.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] git checkout $tree -- $path always rewrites files

2014-11-09 Thread Junio C Hamano
Jeff King p...@peff.net writes:

 On Fri, Nov 07, 2014 at 11:35:59PM -0800, Junio C Hamano wrote:

 I think that has direct linkage; what you have in mind I think is
 http://thread.gmane.org/gmane.comp.version-control.git/234903/focus=234935

 Thanks for that link.

It was one of the items in the git blame leftover bits list
(websearch for that exact phrase), so I didn't have to do any
digging just for this thread ;-)

But I made a huge typo above.  s/I think/I do not think/;

The original observation you made in this thread is that when git
checkout $tree - $pathspec, whose defintion is to grab paths in
the $tree that match $pathspec and put them in the index, and then
overwrite the working tree paths with the contents of these paths
that are updated in the index with the contents taken from the
$tree, unnecessarily rewrites the working tree paths even when they
already had contents that were up to date.  That is what I called an
implementation glitch.

The old thread is a different topic.  It is about changing the
semantics of the operation to make paths in the index and in the
$tree identical, and then update the working tree paths to also
match, all with respect to the $pathspec.  This, as Martin noted,
needs careful debate on the merit and transition plan if we decide
that it is worth doing.  The do ignore paths that are not in $tree
is a deliberate design choice.

I'd prefer that these two to be treated separately.  That is, even
if our plan did not involve changing the semantics of the operation,
we would want to fix the implementation glitch.  Compare the $tree
with the index using $pathspec, adjust the index by adding paths
that are missing from the index that are in $tree, but not removing
the entries in the index only because they are not in $tree (note:
when the index has a path A, the $tree has a path A/B, and the
$pathspec says A, we would end up removing A to make room for A/B,
and that should be allowed---it does not fall into the only because
the path is not in $tree.  In such a scenario, we remove A not
because $tree does not have A but because A/B that the $tree has is
what we were asked to materialize).  And after updating the index
that way, do an equivalent of git checkout -- $pathspec.  The
entries that were the same between the $tree and the index will have
the up-to-dateness kept and will not unnecessarily rewrite an
unmodified path that way, while things that are modified with the
operation will be overwritten, I would think.

And with that machinery in place, we could start thinking about
updating the semantics.  It will be a small change to the loop that
goes over the result from diff_index() and modifying the code that
used to do a not remove only because not in $tree to do a remove
if not in $tree.

 So just to be clear, the behavior we want is that:

   echo foo some-new-path
   git add some-new-path
   git checkout HEAD -- .

 will delete some-new-path (whereas the current code turns it into an
 untracked file).

With the updated semantics proposed in the old thread, yes, that is
what should happen.

   git checkout HEAD -- some-new-path

 do in that case?

Likewise.  And if some-new-path were a directory, with existing path
O and new path N both in the index but only the former in HEAD, the
operation would revert some-new-path/O to that of HEAD and remove
some-new-path/N.  That is the only logical thing we could do if we
were to take the updated sematics.

That is one of the reasons why I am not 100% convinced that the
proposed updated semantics is better, even though I was fairly
positive in the old discussion and also I kept the topic in the
leftover bits list.  The above command is a fairly common way to
say I started refactoring the existing path some-path/O and
sprinkled its original contents spread into new files A, B and C in
the same directory.  Now I no longer have O in the working tree, but
let me double check by grabbing it out of the state recoded in the
commit.  You expect that git checkout HEAD -- some-path would not
lose A, B or C, knowing some-path only had O.  That expectation
would even be stronger if you are used to the current semantics, but
that is something we could fix, if we decide that the proposed
updated semantics is better, with a careful transition plan.

It might be less risky if the updated semantics were to make the
paths that are originally in the index but not in $tree untracked
(as opposed to reset --hard emulation where they will be lost)
unless they need to be removed to make room for D/F conflict issues,
but I haven't thought it through.

Thanks.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] git checkout $tree -- $path always rewrites files

2014-11-09 Thread Junio C Hamano
Jeff King p...@peff.net writes:

 On Sat, Nov 08, 2014 at 03:30:40AM -0500, Jeff King wrote:

 So just to be clear, the behavior we want is that:
 
   echo foo some-new-path
   git add some-new-path
   git checkout HEAD -- .
 
 will delete some-new-path (whereas the current code turns it into an
 untracked file). What should:
 
   git checkout HEAD -- some-new-path
 
 do in that case? With the current code, it actually barfs, complaining
 that nothing matched some-new-path (because it is not part of HEAD, and
 therefore we don't consider it at all), and aborts the whole operation.
 I think we would want to delete some-new-path in that case, too.

 Also, t2022.3 has me very confused.

 It is explicitly checking that if we have subdir/foo unmerged in the
 index, and we git checkout $tree -- subdir, and $tree does not mention
 foo, that we _leave_ foo in place.

The definition of how checkout $tree -- $pathspec should behave
logically leads to that, I think.  Grabbing everything that matches
the subdir pathspec from $tree, adding them to the index and
checking them out will not touch subdir/foo that does not appear in
that $tree.

With the proposed updated semantics, it would behave differently, so
it is natural that we have tests that protect the traditional
definition of how it should behave and we will have to visit them
and update their expectation if we decide that the proposed updated
semantics is what we want.

 That seems very counter-intuitive to me. If you asked to make subdir
 look like $tree, then we should clobber it.

Yes.  But the existing expectation is different ;-)

If the $tree has 'subdir/foo', we would want checkout $tree --
subdir/foo to keep working as a way for the user to declare the
resolution of the ongoing merge.  With the proposed change in
sematics, git checkout $tree -- subdir will become a way to say
Everything inside subdir/ I'd resolve to the state recorded in
$tree during such a conflicted merge, and it will lose paths not in
the $tree.

Which may be a good thing, but existing users who are used to the
traditional behaviour will find it confusing and even risky (i.e. I
am checking OUT; never expected it to lose any path).  A counter
argument that I find sensible, of course, is You told us to check
out subdir/, and the state recorded for subdir/ in $tree does not
have 'foo' in it.  That is the state you asked us to check out.

The argument for checkout $tree -- subdir/foo is essentially the
same.  The state recorded in $tree for subdir/foo is non-existence,
and that is checked out with the proposed new semantics.

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] git checkout $tree -- $path always rewrites files

2014-11-08 Thread Martin von Zweigbergk
I'm not sure I had seen that particular thread, but it seems like
we're all in agreement on that topic. I'm keeping my fingers crossed
that Jeff will have time to tackle that this time :-)

On Fri, Nov 7, 2014 at 11:35 PM, Junio C Hamano gits...@pobox.com wrote:
 I think that has direct linkage; what you have in mind I think is
 http://thread.gmane.org/gmane.comp.version-control.git/234903/focus=234935

 What is on thread here is an implementation glitch, not semantic one.
 Checking out a directory as opposed to set of paths that match the
 pathspec was a deliberate design choice, not an implementation glitch.

 Pardon HTML, misspellings and grammos, typed on a tablet.

 On Nov 7, 2014 11:10 PM, Martin von Zweigbergk martinv...@gmail.com
 wrote:

 Trying again from plain old gmail which I think does not send a
 multipart content.

 On Fri, Nov 7, 2014 at 11:06 PM, Martin von Zweigbergk
 martinv...@gmail.com wrote:
  Is this also related to git checkout $rev . not removing removed
  files?
  What you say about the difference in implementation between checkout and
  reset sounds vaguely familiar from when I looked at it some years ago.
  Curiously, I just talked to Jonathan about this over lunch yesterday. I
  think we would both be happy to get that oddity of checkout fixed. If
  what
  you mention here is indeed related to fixing that, it does complicate
  things
  with regards to backwards compatibility.
 
 
  On Fri Nov 07 2014 at 11:24:09 AM Jeff King p...@peff.net wrote:
 
  On Fri, Nov 07, 2014 at 09:14:42AM -0800, Junio C Hamano wrote:
 
   Jeff King p...@peff.net writes:
  
Is there a reason that we don't use this diff technique for
checkout?
  
   I suspect that the reasons are probably mixture of these:
  
(1) the code path may descend from checkout-index and predates the
in-core diff machinery;
  
(2) in the context of checkout-index, it was more desirable to be
able to say I want the contents on the filesystem, even if you
think I already have it there, as you as a new software are
likely to be wrong and I know better; or
  
(3) it was easier to code that way ;-)
  
   I do not see there is any reason not to do what you suggest.
 
  OK. It's not very much code (and can mostly be lifted from git-reset),
  so I may take a stab at it.
 
  -Peff
  --
  To unsubscribe from this list: send the line unsubscribe git in
  the body of a message to majord...@vger.kernel.org
  More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] git checkout $tree -- $path always rewrites files

2014-11-08 Thread Jeff King
On Fri, Nov 07, 2014 at 11:35:59PM -0800, Junio C Hamano wrote:

 I think that has direct linkage; what you have in mind I think is
 http://thread.gmane.org/gmane.comp.version-control.git/234903/focus=234935

Thanks for that link.

I did spend a few hours on this topic earlier today, and got very
confused trying to figure out what the deletion behavior _should_ be,
and whether I was breaking it.  For some reason I had zero recollection
of a conversation from last year that I was obviously a major part of. I
think I am getting old. :)

The end of that thread concludes that a diff-based approach is not going
to work, because we need to update the working tree even for files not
mentioned by the diff. I do not think that is a show-stopper, though.
It just means that we need to load the new index as one step (done now
with read_tree_recursive, but ideally using diff), and then walk over
the whole resulting index applying our pathspec again (instead of
relying on CE_UPDATE flags).

This turns out not to be a big deal, because the existing code is
already doing most of that second pathspec application anyway. It does
it because read_tree_recursive is not smart enough to update the seen
bits for the pathspec. But now we would have another reason to do it
this way. :)

So just to be clear, the behavior we want is that:

  echo foo some-new-path
  git add some-new-path
  git checkout HEAD -- .

will delete some-new-path (whereas the current code turns it into an
untracked file). What should:

  git checkout HEAD -- some-new-path

do in that case? With the current code, it actually barfs, complaining
that nothing matched some-new-path (because it is not part of HEAD, and
therefore we don't consider it at all), and aborts the whole operation.
I think we would want to delete some-new-path in that case, too.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] git checkout $tree -- $path always rewrites files

2014-11-08 Thread Jeff King
On Sat, Nov 08, 2014 at 03:30:40AM -0500, Jeff King wrote:

 So just to be clear, the behavior we want is that:
 
   echo foo some-new-path
   git add some-new-path
   git checkout HEAD -- .
 
 will delete some-new-path (whereas the current code turns it into an
 untracked file). What should:
 
   git checkout HEAD -- some-new-path
 
 do in that case? With the current code, it actually barfs, complaining
 that nothing matched some-new-path (because it is not part of HEAD, and
 therefore we don't consider it at all), and aborts the whole operation.
 I think we would want to delete some-new-path in that case, too.

Also, t2022.3 has me very confused.

It is explicitly checking that if we have subdir/foo unmerged in the
index, and we git checkout $tree -- subdir, and $tree does not mention
foo, that we _leave_ foo in place.

That seems very counter-intuitive to me. If you asked to make subdir
look like $tree, then we should clobber it. That change comes from
e721c15 (checkout: avoid unnecessary match_pathspec calls, 2013-03-27),
where it is mentioned as a _bugfix_. That in turn references 0a1283b
(checkout $tree $path: do not clobber local changes in $path not in
$tree, 2011-09-30), which explicitly goes against the goal we are
talking about here. It is not make my index and working tree look like
$tree at all.

So now I'm doubly confused about what we want to do.

If we want to retain that behavior, I think we can still cover these
cases by marking items missing from $tree as to remove during the
diff/update the index phase, and then being more gentle with to
remove files (e.g., not clobbering changed worktree files unless -f
is given).

I am not sure that provides a sane user experience, though. Why is it OK
to clobber local changes to a file if we are replacing it with other
content, but _not_ if we are replacing it with nothing?  Either the
content we are losing is valuable or not, but it has nothing to do with
what we are replacing. And Junio argued in the thread linked elsewhere
that the point of git checkout $tree -- $path is to clobber what is in
$path, which I would agree with.

I think the argument made in 0a1283b is that git checkout $tree $path
is not make $path like $tree, but rather pick bits of $path out of
$tree. Which would mean this whole deletion thing we are talking about
is completely contrary to that.

So which is it?

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] git checkout $tree -- $path always rewrites files

2014-11-08 Thread Martin von Zweigbergk
First of all, thanks again for spending time on this.

On Sat, Nov 8, 2014 at 12:30 AM, Jeff King p...@peff.net wrote:
 On Fri, Nov 07, 2014 at 11:35:59PM -0800, Junio C Hamano wrote:

 So just to be clear, the behavior we want is that:

   echo foo some-new-path
   git add some-new-path
   git checkout HEAD -- .

 will delete some-new-path (whereas the current code turns it into an
 untracked file).

Yes, I think that's what I would expect.

 What should:

   git checkout HEAD -- some-new-path

 do in that case? With the current code, it actually barfs, complaining
 that nothing matched some-new-path (because it is not part of HEAD, and
 therefore we don't consider it at all), and aborts the whole operation.
 I think we would want to delete some-new-path in that case, too.

I don't think we'd want it to be deleted. I would view 'git reset
--hard' as the role model here, and that command (without paths) would
not remove the file. And applying it to a path should not change the
behavior, just restrict it to the paths, right?
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC] git checkout $tree -- $path always rewrites files

2014-11-07 Thread Jeff King
I noticed that git checkout $tree -- $path will _always_ unlink and
write a new copy of each matching path, even if they are up-to-date with
the index and the content in $tree is the same.

Here's a simple reproduction:

# some repo with a file in it
git init
echo content foo  git add foo  git commit -m foo

# give the file a known timestamp (it doesn't matter what)
perl -e 'utime(123, 123, @ARGV)' foo
git update-index --refresh

# now check out an identical copy
git checkout HEAD -- foo

# and check whether it was updated
perl -le 'print ((stat)[9]) for @ARGV' foo

which yields a modern timestamp, showing that the file was rewritten.

If you use

git checkout -- foo

instead to checkout from the index, that works fine (the final line
prints 123). Similarly, if you load the new index and checkout
separately, like:

git read-tree -m $tree
git checkout -- foo

that also works. Of course it is not quite the same; read-tree loads the
whole index from $tree, whereas checkout would only touch a subset of
the entries. And that's the culprit; checkout does not use unpack-trees
to only touch the entries that need updating. It uses read_tree_recursive
with a pathspec, and just replaces any index entries that match.

Below is a patch which makes it do what I expect by silently copying
flags and stat-data from the old entry, but I feel like it is going in
the wrong direction. Besides causing a few tests to fail (which I
suspect is because I implemented it at the wrong layer), it really feels
like this should be using unpack-trees or something similar.

After all, that is what git reset $tree -- foo does, and it gets this
case right (in fact, you can replace the read-tree above with it). It
looks like it actually does a pathspec-limited index-diff rather than
unpack-trees, but the end effect is the same (and while checkout could
just pass update to unpack-trees, we need to handle checking out
directly from the index anyway, so I do not think it is a big deal to
keep the index update as a separate step).

Is there a reason that we don't use this diff technique for checkout?

Anyway, here is the (wrong) patch I came up with initially.

diff --git a/builtin/checkout.c b/builtin/checkout.c
index 29b0f6d..802e556 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -273,19 +273,13 @@ static int checkout_paths(const struct checkout_opts 
*opts,
ce-ce_flags = ~CE_MATCHED;
if (!opts-ignore_skipworktree  ce_skip_worktree(ce))
continue;
-   if (opts-source_tree  !(ce-ce_flags  CE_UPDATE))
-   /*
-* git checkout tree-ish -- path, but this entry
-* is in the original index; it will not be checked
-* out to the working tree and it does not matter
-* if pathspec matched this entry.  We will not do
-* anything to this entry at all.
-*/
-   continue;
+
/*
-* Either this entry came from the tree-ish we are
-* checking the paths out of, or we are checking out
-* of the index.
+* If we are checking out from a tree-ish, we already
+* did our pathspec matching.  However, since we then
+* drop the CE_UPDATE flag from any paths that do not
+* need updating, we don't know which ones were not
+* mentioned and which ones were simply up-to-date.
 *
 * If it comes from the tree-ish, we already know it
 * matches the pathspec and could just stamp
diff --git a/read-cache.c b/read-cache.c
index 8f3e9eb..fb2240b 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -962,8 +962,13 @@ static int add_index_entry_with_check(struct index_state 
*istate, struct cache_e
 
/* existing match? Just replace it. */
if (pos = 0) {
-   if (!new_only)
+   if (!new_only) {
+   struct cache_entry *old = istate-cache[pos];
+   if (!is_null_sha1(ce-sha1) 
+   !hashcmp(old-sha1, ce-sha1))
+   copy_cache_entry(ce, old);
replace_index_entry(istate, pos, ce);
+   }
return 0;
}
pos = -pos-1;
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] git checkout $tree -- $path always rewrites files

2014-11-07 Thread Jeff King
On Fri, Nov 07, 2014 at 03:13:24AM -0500, Jeff King wrote:

 I noticed that git checkout $tree -- $path will _always_ unlink and
 write a new copy of each matching path, even if they are up-to-date with
 the index and the content in $tree is the same.

By the way, one other thing I wondered while looking at this code: when
we checkout a working tree file, we unlink the old one and write the new
one in-place. Is there a particular reason we do this versus writing to
a temporary file and renaming it into place?  That would give
simultaneous readers a more atomic view.

I suspect the answer is something like: you cannot always do a rename,
because you might have a typechange, directory becoming a file, or vice
versa; so anyone relying on an atomic view during a checkout operation
is already Doing It Wrong.  Handling a content-change of an existing
path would complicate the code, so we do not bother.

But I would be curious to hear confirmation of that.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] git checkout $tree -- $path always rewrites files

2014-11-07 Thread Duy Nguyen
On Fri, Nov 7, 2014 at 3:38 PM, Jeff King p...@peff.net wrote:
 On Fri, Nov 07, 2014 at 03:13:24AM -0500, Jeff King wrote:

 I noticed that git checkout $tree -- $path will _always_ unlink and
 write a new copy of each matching path, even if they are up-to-date with
 the index and the content in $tree is the same.

 By the way, one other thing I wondered while looking at this code: when
 we checkout a working tree file, we unlink the old one and write the new
 one in-place. Is there a particular reason we do this versus writing to
 a temporary file and renaming it into place?  That would give
 simultaneous readers a more atomic view.

 I suspect the answer is something like: you cannot always do a rename,
 because you might have a typechange, directory becoming a file, or vice
 versa; so anyone relying on an atomic view during a checkout operation
 is already Doing It Wrong.  Handling a content-change of an existing
 path would complicate the code, so we do not bother.

Not a confirmation, but it looks like Linus did it just to make sure
he had new permissions right, in e447947 (Be much more liberal about
the file mode bits. - 2005-04-16).
-- 
Duy
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] git checkout $tree -- $path always rewrites files

2014-11-07 Thread Junio C Hamano
Duy Nguyen pclo...@gmail.com writes:

 On Fri, Nov 7, 2014 at 3:38 PM, Jeff King p...@peff.net wrote:
 On Fri, Nov 07, 2014 at 03:13:24AM -0500, Jeff King wrote:

 I noticed that git checkout $tree -- $path will _always_ unlink and
 write a new copy of each matching path, even if they are up-to-date with
 the index and the content in $tree is the same.

 By the way, one other thing I wondered while looking at this code: when
 we checkout a working tree file, we unlink the old one and write the new
 one in-place. Is there a particular reason we do this versus writing to
 a temporary file and renaming it into place?  That would give
 simultaneous readers a more atomic view.

 I suspect the answer is something like: you cannot always do a rename,
 because you might have a typechange, directory becoming a file, or vice
 versa; so anyone relying on an atomic view during a checkout operation
 is already Doing It Wrong.  Handling a content-change of an existing
 path would complicate the code, so we do not bother.

 Not a confirmation, but it looks like Linus did it just to make sure
 he had new permissions right, in e447947 (Be much more liberal about
 the file mode bits. - 2005-04-16).

I think you are referring to the ... to get the new one with the
right permissions comment in that patch, but I do not think that
affects the choice between (1) unlink and write anew to the final
and (2) create a new temporary and rename.  Either way, you do not
update the existing file in-place and try to checkout the permission
bits with chmod(2).

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] git checkout $tree -- $path always rewrites files

2014-11-07 Thread Junio C Hamano
Jeff King p...@peff.net writes:

 Is there a reason that we don't use this diff technique for checkout?

I suspect that the reasons are probably mixture of these:

 (1) the code path may descend from checkout-index and predates the
 in-core diff machinery;

 (2) in the context of checkout-index, it was more desirable to be
 able to say I want the contents on the filesystem, even if you
 think I already have it there, as you as a new software are
 likely to be wrong and I know better; or

 (3) it was easier to code that way ;-)

I do not see there is any reason not to do what you suggest.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] git checkout $tree -- $path always rewrites files

2014-11-07 Thread Jeff King
On Fri, Nov 07, 2014 at 05:13:47PM +0700, Duy Nguyen wrote:

  By the way, one other thing I wondered while looking at this code: when
  we checkout a working tree file, we unlink the old one and write the new
  one in-place. Is there a particular reason we do this versus writing to
  a temporary file and renaming it into place?  That would give
  simultaneous readers a more atomic view.
 
  I suspect the answer is something like: you cannot always do a rename,
  because you might have a typechange, directory becoming a file, or vice
  versa; so anyone relying on an atomic view during a checkout operation
  is already Doing It Wrong.  Handling a content-change of an existing
  path would complicate the code, so we do not bother.
 
 Not a confirmation, but it looks like Linus did it just to make sure
 he had new permissions right, in e447947 (Be much more liberal about
 the file mode bits. - 2005-04-16).

Thanks for digging that up. I think that only gives us half the story,
though. That explains why we would unlink/open instead of relying on
just open(O_TRUNC). But I think opening a new tempfile would work the
same as the current code in that respect.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] git checkout $tree -- $path always rewrites files

2014-11-07 Thread Jeff King
On Fri, Nov 07, 2014 at 09:14:42AM -0800, Junio C Hamano wrote:

 Jeff King p...@peff.net writes:
 
  Is there a reason that we don't use this diff technique for checkout?
 
 I suspect that the reasons are probably mixture of these:
 
  (1) the code path may descend from checkout-index and predates the
  in-core diff machinery;
 
  (2) in the context of checkout-index, it was more desirable to be
  able to say I want the contents on the filesystem, even if you
  think I already have it there, as you as a new software are
  likely to be wrong and I know better; or
 
  (3) it was easier to code that way ;-)
 
 I do not see there is any reason not to do what you suggest.

OK. It's not very much code (and can mostly be lifted from git-reset),
so I may take a stab at it.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] git checkout $tree -- $path always rewrites files

2014-11-07 Thread Martin von Zweigbergk
Trying again from plain old gmail which I think does not send a
multipart content.

On Fri, Nov 7, 2014 at 11:06 PM, Martin von Zweigbergk
martinv...@gmail.com wrote:
 Is this also related to git checkout $rev . not removing removed files?
 What you say about the difference in implementation between checkout and
 reset sounds vaguely familiar from when I looked at it some years ago.
 Curiously, I just talked to Jonathan about this over lunch yesterday. I
 think we would both be happy to get that oddity of checkout fixed. If what
 you mention here is indeed related to fixing that, it does complicate things
 with regards to backwards compatibility.


 On Fri Nov 07 2014 at 11:24:09 AM Jeff King p...@peff.net wrote:

 On Fri, Nov 07, 2014 at 09:14:42AM -0800, Junio C Hamano wrote:

  Jeff King p...@peff.net writes:
 
   Is there a reason that we don't use this diff technique for checkout?
 
  I suspect that the reasons are probably mixture of these:
 
   (1) the code path may descend from checkout-index and predates the
   in-core diff machinery;
 
   (2) in the context of checkout-index, it was more desirable to be
   able to say I want the contents on the filesystem, even if you
   think I already have it there, as you as a new software are
   likely to be wrong and I know better; or
 
   (3) it was easier to code that way ;-)
 
  I do not see there is any reason not to do what you suggest.

 OK. It's not very much code (and can mostly be lifted from git-reset),
 so I may take a stab at it.

 -Peff
 --
 To unsubscribe from this list: send the line unsubscribe git in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html