Re: [RFC] Possible strategy cleanup for git add/remove/diff etc.

2005-04-20 Thread Junio C Hamano
> "LT" == Linus Torvalds <[EMAIL PROTECTED]> writes:

LT> And as you can see, the output matches "diff-tree -r" output (we always do
LT> "-r", since the index is always fully populated). All the same rules: "+"  
LT> means added file, "-" means removed file, and "*" means changed file. You 
LT> can trivially see that the above is a rename.

I do not know if Pasky tools already have something like this
already, or not; but just FIY, here is what I use to extract a
"patch" out of a working tree.

Usage:

$ diff-tree -z [-r] ... | jit-diff-tree-helper [ | less ]
$ diff-cache -z ... | jit-diff-tree-helper [ | less ]

This would be useful for the merge I described in my initial
message in this thread to take a snapshot of what the user has
done since the last commit, to be applied on the result of the
merge.

Signed-off-by: Junio C Hamano <[EMAIL PROTECTED]>
---

--- jit-diff-tree-helper2005-03-19 15:28:25.0 -0800
+++ jit-diff-tree-helper2005-04-20 19:15:32.0 -0700
@@ -0,0 +1,63 @@
+#!/usr/bin/perl -w
+
+use strict;
+use File::Temp qw(mkstemp);
+
+sub cat_file {
+my ($sha1, $file) = @_;
+unless (defined $sha1) { return "/dev/null"; }
+if ($sha1 =~ /^0{40}$/) {
+   open I, '<', $file;
+} else {
+   local $/; # slurp mode
+   open I, "-|", "cat-file", "blob", $sha1
+   or die "$0: cannot read $sha1";
+}
+my ($o, $filename) = mkstemp(",,jit-diff-tree-helperXX");
+print $o join("",);
+close I
+   or die "$0: closing cat-file pipe from $sha1";
+close $o
+   or die "$0: closing write fd to $filename";
+return $filename;
+}
+$/ = "\0";
+my $rM = "[0-7]+";
+my $rI = "[0-9a-f]{40}";
+while () {
+my ($old, $new, $file);
+chomp;
+if (/^\+$rM\tblob\t($rI)\t(.*)$/os) {
+   ($old, $new, $file) = (undef, $1, $2);
+}
+elsif (/^-$rM\tblob\t($rI)\t(.*)$/os) {
+   ($old, $new, $file) = ($1, undef, $2);
+}
+elsif (/^\*$rM->$rM\tblob\t($rI)->($rI)\t(.*)$/os) {
+   ($old, $new, $file) = ($1, $2, $3);
+}
+else {
+   chomp;
+   print STDERR "warning: $0: ignoring $_\n";
+   next;
+}
+if (@ARGV) {
+   my $matches = 0;
+   for (@ARGV) {
+   my $l = length($_);
+   if ($file eq $_ ||
+   (substr($file, 0, $l) eq $_ &&
+substr($file, $l, 1) eq "/")) {
+   $matches = 1;
+   last;
+   }
+   }
+   next unless $matches;
+}
+$old = cat_file $old, $file;
+$new = cat_file $new, $file;
+system "diff", "-L", "l/$file", "-L", "k/$file", "-pu", $old, $new;
+for ($old, $new) {
+   unlink $_ if $_ ne '/dev/null';
+}
+}


-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Possible strategy cleanup for git add/remove/diff etc.

2005-04-19 Thread Junio C Hamano
> "LT" == Linus Torvalds <[EMAIL PROTECTED]> writes:

>> I'll immediately write a tool to diff the current working directory 
>> against a tree object, and hopefully that will just make pasky happy with 
>> this model too. 

The model you have always had is that there are three things the
user needs to be aware of:

 * files in working tree -- this is what you touch with your
   editor and feed compilers with.

 * files in dircache -- update-cache copies from working
   tree to here, checkout-cache copies from here to working
   tree.

 * committed tree state -- write-tree + commit-tree copies from
   dircache to this state, read-tree copies from here to
   dircache.

The original message I started this thread with suggested that I
wish if Cogito sugarcoating layer treated the dircache invisible
to the user by keeping it virtually and lazily in sync with the
working tree, as opposed to the way the current git-pasky does,
which is to keep it in sync with the committed state.

But after thinking about it more, I changed my mind.  With
something like diff-cache available to the user, making aware of
the three hierarchy to the user might be cleaner.  

The workflow becomes:
 
 * Initial read-tree + checkout-cache -f -a; makes the three in
   sync.

 * Hack away.  Makes the working tree drift from dircache.

 * show-diff to see what's changed since your last "checkpoint".
   update-cache when happy.  Working tree is in sync with
   dircache which is the "staging area" for my half-baked but
   still good stuff.  Makes the dircache different from the
   committed.

 * Hack away more.  show-diff does not show your earlier changes
   anymore.  This is sometimes inconvenient when you want to see
   what you earlier changed but not committed.  Here comes the
   new shiny diff-cache to rescue.

 * When satisfied with all the changes diff-cache --cached
   shows, finally, say write-tree + commit-tree.  This makes all
   three in sync again.

I vaguely recall having heard about some SCM that distinguishes
check-in and commit.  Maybe this two-staged update-cache and
write-tree + commit-tree workflow is similar to it?

-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Possible strategy cleanup for git add/remove/diff etc.

2005-04-19 Thread Linus Torvalds


On Tue, 19 Apr 2005, Linus Torvalds wrote:
> 
> That is indeed the whole point of the index file. In my world-view, the
> index file does _everything_. It's the staging area ("work file"), it's
> the merging area ("merge directory") and it's the cache file ("stat
> cache").
> 
> I'll immediately write a tool to diff the current working directory 
> against a tree object, and hopefully that will just make pasky happy with 
> this model too. 

Ok, "immediately" took a bit longer than I wanted to, and quite frankly,
the end result is not very well tested. It was a bit more complex than I
was hoping for to match up the index file against a tree object, since
unlike the tree<->tree comparison in diff-tree, you have to compare two
cases where the layout isn't the same.

No matter. It seems to work to a first approximation, and the result is
such a cool tool that it's worth committing and pushing out immediately. 

The code ain't exactly pretty, but hey, maybe that's just me having higher 
standards of beauty than most. Or maybe you just shudder at what I 
consider pretty in the first place, in which case you probably shouldn't 
look too closely at this one.

What the new "diff-cache" does is basically emulate "diff-tree", except 
one of the trees is always the index file.

You can also choose whether you want to trust the index file entirely
(using the "--cached" flag) or ask the diff logic to show any files that
don't match the stat state as being "tentatively changed".  Both of these
operations are very useful indeed.

For example, let's say that you have worked on your index file, and are
ready to commit. You want to see eactly _what_ you are going to commit is
without having to write a new tree object and compare it that way, and to
do that, you just do

diff-cache --cached $(cat .git/HEAD)

(another difference between diff-tree and diff-cache is that the new 
diff-cache can take a "commit" object, and it automatically just extracts 
the tree information from there).

Example: let's say I had renamed "commit.c" to "git-commit.c", and I had 
done an "upate-cache" to make that effective in the index file. 
"show-diff" wouldn't show anything at all, since the index file matches 
my working directory. But doing a diff-cache does:

[EMAIL PROTECTED]:~/git> diff-cache --cached $(cat .git/HEAD)
-100644 blob4161aecc6700a2eb579e842af0b7f22b98443f74commit.c
+100644 blob4161aecc6700a2eb579e842af0b7f22b98443f74
git-commit.c

So what the above "diff-cache" command line does is to say

   "show me the differences between HEAD and the current index contents 
(the ones I'd write with a "write-tree")"

And as you can see, the output matches "diff-tree -r" output (we always do
"-r", since the index is always fully populated). All the same rules: "+"  
means added file, "-" means removed file, and "*" means changed file. You 
can trivially see that the above is a rename.

In fact, "diff-tree --cached" _should_ always be entirely equivalent to
actually doing a "write-tree" and comparing that. Except this one is much
nicer for the case where you just want to check. Maybe you don't want to
do the tree.

So doing a "diff-cache --cached" is basically very useful when you are 
asking yourself "what have I already marked for being committed, and 
what's the difference to a previous tree".

However, the "non-cached" version takes a different approach, and is
potentially the even more useful of the two in that what it does can't be
emulated with a "write-tree + diff-tree". Thus that's the default mode.  
The non-cached version asks the question

   "show me the differences between HEAD and the currently checked out 
tree - index contents _and_ files that aren't up-to-date"

which is obviously a very useful question too, since that tells you what
you _could_ commit. Again, the output matches the "diff-tree -r" output to
a tee, but with a twist.

The twist is that if some file doesn't match the cache, we don't have a
backing store thing for it, and we use the magic "all-zero" sha1 to show
that. So let's say that you have edited "kernel/sched.c", but have not
actually done an update-cache on it yet - there is no "object" associated
with the new state, and you get:

[EMAIL PROTECTED]:~/v2.6/linux> diff-cache $(cat .git/HEAD )
*100644->100664 blob
7476bbcfe5ef5a1dd87d745f298b831143e4d77e->
  kernel/sched.c

ie it shows that the tree has changed, and that "kernel/sched.c" has is
not up-to-date and may contain new stuff. The all-zero sha1 means that to
get the real diff, you need to look at the object in the working directory
directly rather than do an object-to-object diff.

NOTE! As with other commands of this type, "diff-cache" does not actually 
look at the contents of the file at all. So maybe "kernel/sched.c" hasn't 
actually changed, and it's just that you touched it. In either case, it's 
a note that you ne

Re: [RFC] Possible strategy cleanup for git add/remove/diff etc.

2005-04-19 Thread Junio C Hamano
> "LT" == Linus Torvalds <[EMAIL PROTECTED]> writes:

LT> Is there any other reason why git-pasky wants to have a work file?

Do you mean "why does a user wants to check things out in the
working directory and make changes, possibly run compile tests
before pushing the result to Linus?" ;-)  I'm confused what you
mean by "a work file", I guess...


-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Possible strategy cleanup for git add/remove/diff etc.

2005-04-19 Thread Linus Torvalds


On Tue, 19 Apr 2005, Junio C Hamano wrote:
> 
> Let's for a moment forget what git-pasky currently does, which
> is not to touch .git/index until the user says "Ok, let's
> commit". 

I think git-pasky is wrong.

It's true that we want to often (almost always) diff against the last 
"released" thing, and I actually think git-pasky does what it does because 
I never wrote a tool to diff the current working directory against a 
"tree".

At the same time, I very much worked with a model where you do _not_ have 
a traditional "work file", but the index really _is_ the "work file".

> I'd like to start from a different premise and see what happens:
> 
>  - What .git/index records is *not* the state as the last
>commit.  It is just an cache Cogito uses to speed up access
>to the user's working tree.  From the user's point of view,
>it does not even exist.

Yes. Yes. YES.

That is indeed the whole point of the index file. In my world-view, the
index file does _everything_. It's the staging area ("work file"), it's
the merging area ("merge directory") and it's the cache file ("stat
cache").

I'll immediately write a tool to diff the current working directory 
against a tree object, and hopefully that will just make pasky happy with 
this model too. 

Is there any other reason why git-pasky wants to have a work file?

Linus
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC] Possible strategy cleanup for git add/remove/diff etc.

2005-04-19 Thread Junio C Hamano
I was reading this comment in gitcommit.sh and started
thinking...

# We bother with added/removed files here instead of updating
# the cache at the time of git(add|rm).sh, since we want to
# have the cache in a consistent state representing the tree
# as it was the last time we committed. Otherwise, e.g. partial
# conflicts would be a PITA since added/removed files would
# be committed along automagically as well.

Let's for a moment forget what git-pasky currently does, which
is not to touch .git/index until the user says "Ok, let's
commit".  I am wondering if that is the root cause of all the
trouble git-pasky needs to go through.  Specifically I think
having to deal with add/remove queue seems to affect not just
commit you have that comment above but also with diffs.

I'd like to start from a different premise and see what happens:

 - What .git/index records is *not* the state as the last
   commit.  It is just an cache Cogito uses to speed up access
   to the user's working tree.  From the user's point of view,
   it does not even exist.

 - The way this hypothetical Cogito uses .git/index is to always
   reflect add and remove but modification may be out of sync.
   It is updated lazily when .git/index must match the working
   tree.  Again, this is invisible to the user.  From the user's
   point of view, there are only two things: the last commit
   represented as .git/HEAD and his own working tree.

I call this hypothetical implementation of Cogito "jit-*" in the
following description.  Also this is just to convey the idea, so
all the error checking (e.g. "what the user gave jit-merge is
not a valid commit id") and sugarcoating (e.g. tags, symbolic
foreign repository names instead of rsync URL etc) are omitted.


* jit-checkout $commit_id

  This is like "cvs co".  Same as what you are doing I suppose.

committed_tree=$(cat-file commit $commit_id | sed -e 's/^tree //;q')
read-tree $committed_tree
checkout-cache -f -a
echo $commit_id >.git/HEAD

* jit-add files... | jit-remove files...

  Like "cvs add".  Here, .git/index is treated as just a cache
  of the working tree, not the mirror of previous commit.  So
  unlike git-pasky, jit-* touches .git/index here.

update-cache --add "$@"

---

rm -f "$@" ;# this is debatable...
update-cache --remove "$@"

* jit-diff [files...]

  Like "cvs diff".  The user wants to see what's different
  between his working tree and the last commit.

case "$#" in 0) set x $(show-files --cached); shift ;; esac
update-cache --add --remove "$@" --refresh
current_tree=$(write-tree)

committed_tree=$(cat-file commit $commit_id | sed -e 's/^tree //;q')
diff-tree -r -z $committed_tree $current_tree |
  filter-output-to-limit-to-given-filelist "$@" |
  parse-diff-tree-output-and-show-real-file-diffs

  Unlike git-pasky, jit-* does not keep the state from the last
  commit in .git/index.  Instead, .git/index is meant to cache
  the state of the working tree.  So the first three lines in
  the above updates .git/index lazily from what is in the
  working tree for the part that needs to be diffed.  Then it
  uses helper scripts to filter and parse diff-tree output and
  generates per-file diffs.  Since add and remove are already
  recorded in .git/index, it does not have to special case
  "uncommitted add" and such.

* jit-commit

  Like "cvs commit".

set x $(show-files --cached); shift
update-cache --add --remove "$@"

current_tree=$(write-tree)
next_commit=$(commmit-tree $current_tree -p $(cat .git/HEAD))
echo $next_commit >.git/HEAD

  Unlike git-pasky, .git/index already has adds and removes but
  it does not know about local modifications.  So it runs
  update-cache to make it match the working tree first, and then
  does the usual commit thing.  

  The above only allows the whole tree commit.  But allowing
  single file commit is not that hard:

(
set x $(show-files --cached); shift
update-cache --add --remove "$@"
) ;# we use subshell to preserve "$@" here...
current_tree=$(write-tree)

committed_tree=$(cat-file commit $commit_id | sed -e 's/^tree //;q')
read-tree $(committed_tree)
update-cache --add --remove "$@"
next_commit=$(commmit-tree $current_tree -p $(cat .git/HEAD))
echo $next_commit >.git/HEAD

read-tree $current_tree

  The first four lines are to preserve the current tree state.
  Then we rewind the dircache to the last committed state,
  update only the named files to bring it to the state the user
  wanted to commit, and commit.  Once done, we re-read the state
  to match the user's original intention (e.g. adds recorded in
  .git/index previously but not committed in this run is
  preserved).


* jit-merge $commit_id

  LIke "cvs up -j".  I have working tree which is based on some
  commit, and I want to merge somebody else's head $commit_id.
  Stated more exactly: I want to have the result of my chang