Re: [RFC] Possible strategy cleanup for git add/remove/diff etc.
> "LT" == Linus Torvalds <[EMAIL PROTECTED]> writes: LT> And as you can see, the output matches "diff-tree -r" output (we always do LT> "-r", since the index is always fully populated). All the same rules: "+" LT> means added file, "-" means removed file, and "*" means changed file. You LT> can trivially see that the above is a rename. I do not know if Pasky tools already have something like this already, or not; but just FIY, here is what I use to extract a "patch" out of a working tree. Usage: $ diff-tree -z [-r] ... | jit-diff-tree-helper [ | less ] $ diff-cache -z ... | jit-diff-tree-helper [ | less ] This would be useful for the merge I described in my initial message in this thread to take a snapshot of what the user has done since the last commit, to be applied on the result of the merge. Signed-off-by: Junio C Hamano <[EMAIL PROTECTED]> --- --- jit-diff-tree-helper2005-03-19 15:28:25.0 -0800 +++ jit-diff-tree-helper2005-04-20 19:15:32.0 -0700 @@ -0,0 +1,63 @@ +#!/usr/bin/perl -w + +use strict; +use File::Temp qw(mkstemp); + +sub cat_file { +my ($sha1, $file) = @_; +unless (defined $sha1) { return "/dev/null"; } +if ($sha1 =~ /^0{40}$/) { + open I, '<', $file; +} else { + local $/; # slurp mode + open I, "-|", "cat-file", "blob", $sha1 + or die "$0: cannot read $sha1"; +} +my ($o, $filename) = mkstemp(",,jit-diff-tree-helperXX"); +print $o join("",); +close I + or die "$0: closing cat-file pipe from $sha1"; +close $o + or die "$0: closing write fd to $filename"; +return $filename; +} +$/ = "\0"; +my $rM = "[0-7]+"; +my $rI = "[0-9a-f]{40}"; +while () { +my ($old, $new, $file); +chomp; +if (/^\+$rM\tblob\t($rI)\t(.*)$/os) { + ($old, $new, $file) = (undef, $1, $2); +} +elsif (/^-$rM\tblob\t($rI)\t(.*)$/os) { + ($old, $new, $file) = ($1, undef, $2); +} +elsif (/^\*$rM->$rM\tblob\t($rI)->($rI)\t(.*)$/os) { + ($old, $new, $file) = ($1, $2, $3); +} +else { + chomp; + print STDERR "warning: $0: ignoring $_\n"; + next; +} +if (@ARGV) { + my $matches = 0; + for (@ARGV) { + my $l = length($_); + if ($file eq $_ || + (substr($file, 0, $l) eq $_ && +substr($file, $l, 1) eq "/")) { + $matches = 1; + last; + } + } + next unless $matches; +} +$old = cat_file $old, $file; +$new = cat_file $new, $file; +system "diff", "-L", "l/$file", "-L", "k/$file", "-pu", $old, $new; +for ($old, $new) { + unlink $_ if $_ ne '/dev/null'; +} +} - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] Possible strategy cleanup for git add/remove/diff etc.
> "LT" == Linus Torvalds <[EMAIL PROTECTED]> writes: >> I'll immediately write a tool to diff the current working directory >> against a tree object, and hopefully that will just make pasky happy with >> this model too. The model you have always had is that there are three things the user needs to be aware of: * files in working tree -- this is what you touch with your editor and feed compilers with. * files in dircache -- update-cache copies from working tree to here, checkout-cache copies from here to working tree. * committed tree state -- write-tree + commit-tree copies from dircache to this state, read-tree copies from here to dircache. The original message I started this thread with suggested that I wish if Cogito sugarcoating layer treated the dircache invisible to the user by keeping it virtually and lazily in sync with the working tree, as opposed to the way the current git-pasky does, which is to keep it in sync with the committed state. But after thinking about it more, I changed my mind. With something like diff-cache available to the user, making aware of the three hierarchy to the user might be cleaner. The workflow becomes: * Initial read-tree + checkout-cache -f -a; makes the three in sync. * Hack away. Makes the working tree drift from dircache. * show-diff to see what's changed since your last "checkpoint". update-cache when happy. Working tree is in sync with dircache which is the "staging area" for my half-baked but still good stuff. Makes the dircache different from the committed. * Hack away more. show-diff does not show your earlier changes anymore. This is sometimes inconvenient when you want to see what you earlier changed but not committed. Here comes the new shiny diff-cache to rescue. * When satisfied with all the changes diff-cache --cached shows, finally, say write-tree + commit-tree. This makes all three in sync again. I vaguely recall having heard about some SCM that distinguishes check-in and commit. Maybe this two-staged update-cache and write-tree + commit-tree workflow is similar to it? - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] Possible strategy cleanup for git add/remove/diff etc.
On Tue, 19 Apr 2005, Linus Torvalds wrote: > > That is indeed the whole point of the index file. In my world-view, the > index file does _everything_. It's the staging area ("work file"), it's > the merging area ("merge directory") and it's the cache file ("stat > cache"). > > I'll immediately write a tool to diff the current working directory > against a tree object, and hopefully that will just make pasky happy with > this model too. Ok, "immediately" took a bit longer than I wanted to, and quite frankly, the end result is not very well tested. It was a bit more complex than I was hoping for to match up the index file against a tree object, since unlike the tree<->tree comparison in diff-tree, you have to compare two cases where the layout isn't the same. No matter. It seems to work to a first approximation, and the result is such a cool tool that it's worth committing and pushing out immediately. The code ain't exactly pretty, but hey, maybe that's just me having higher standards of beauty than most. Or maybe you just shudder at what I consider pretty in the first place, in which case you probably shouldn't look too closely at this one. What the new "diff-cache" does is basically emulate "diff-tree", except one of the trees is always the index file. You can also choose whether you want to trust the index file entirely (using the "--cached" flag) or ask the diff logic to show any files that don't match the stat state as being "tentatively changed". Both of these operations are very useful indeed. For example, let's say that you have worked on your index file, and are ready to commit. You want to see eactly _what_ you are going to commit is without having to write a new tree object and compare it that way, and to do that, you just do diff-cache --cached $(cat .git/HEAD) (another difference between diff-tree and diff-cache is that the new diff-cache can take a "commit" object, and it automatically just extracts the tree information from there). Example: let's say I had renamed "commit.c" to "git-commit.c", and I had done an "upate-cache" to make that effective in the index file. "show-diff" wouldn't show anything at all, since the index file matches my working directory. But doing a diff-cache does: [EMAIL PROTECTED]:~/git> diff-cache --cached $(cat .git/HEAD) -100644 blob4161aecc6700a2eb579e842af0b7f22b98443f74commit.c +100644 blob4161aecc6700a2eb579e842af0b7f22b98443f74 git-commit.c So what the above "diff-cache" command line does is to say "show me the differences between HEAD and the current index contents (the ones I'd write with a "write-tree")" And as you can see, the output matches "diff-tree -r" output (we always do "-r", since the index is always fully populated). All the same rules: "+" means added file, "-" means removed file, and "*" means changed file. You can trivially see that the above is a rename. In fact, "diff-tree --cached" _should_ always be entirely equivalent to actually doing a "write-tree" and comparing that. Except this one is much nicer for the case where you just want to check. Maybe you don't want to do the tree. So doing a "diff-cache --cached" is basically very useful when you are asking yourself "what have I already marked for being committed, and what's the difference to a previous tree". However, the "non-cached" version takes a different approach, and is potentially the even more useful of the two in that what it does can't be emulated with a "write-tree + diff-tree". Thus that's the default mode. The non-cached version asks the question "show me the differences between HEAD and the currently checked out tree - index contents _and_ files that aren't up-to-date" which is obviously a very useful question too, since that tells you what you _could_ commit. Again, the output matches the "diff-tree -r" output to a tee, but with a twist. The twist is that if some file doesn't match the cache, we don't have a backing store thing for it, and we use the magic "all-zero" sha1 to show that. So let's say that you have edited "kernel/sched.c", but have not actually done an update-cache on it yet - there is no "object" associated with the new state, and you get: [EMAIL PROTECTED]:~/v2.6/linux> diff-cache $(cat .git/HEAD ) *100644->100664 blob 7476bbcfe5ef5a1dd87d745f298b831143e4d77e-> kernel/sched.c ie it shows that the tree has changed, and that "kernel/sched.c" has is not up-to-date and may contain new stuff. The all-zero sha1 means that to get the real diff, you need to look at the object in the working directory directly rather than do an object-to-object diff. NOTE! As with other commands of this type, "diff-cache" does not actually look at the contents of the file at all. So maybe "kernel/sched.c" hasn't actually changed, and it's just that you touched it. In either case, it's a note that you ne
Re: [RFC] Possible strategy cleanup for git add/remove/diff etc.
> "LT" == Linus Torvalds <[EMAIL PROTECTED]> writes: LT> Is there any other reason why git-pasky wants to have a work file? Do you mean "why does a user wants to check things out in the working directory and make changes, possibly run compile tests before pushing the result to Linus?" ;-) I'm confused what you mean by "a work file", I guess... - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] Possible strategy cleanup for git add/remove/diff etc.
On Tue, 19 Apr 2005, Junio C Hamano wrote: > > Let's for a moment forget what git-pasky currently does, which > is not to touch .git/index until the user says "Ok, let's > commit". I think git-pasky is wrong. It's true that we want to often (almost always) diff against the last "released" thing, and I actually think git-pasky does what it does because I never wrote a tool to diff the current working directory against a "tree". At the same time, I very much worked with a model where you do _not_ have a traditional "work file", but the index really _is_ the "work file". > I'd like to start from a different premise and see what happens: > > - What .git/index records is *not* the state as the last >commit. It is just an cache Cogito uses to speed up access >to the user's working tree. From the user's point of view, >it does not even exist. Yes. Yes. YES. That is indeed the whole point of the index file. In my world-view, the index file does _everything_. It's the staging area ("work file"), it's the merging area ("merge directory") and it's the cache file ("stat cache"). I'll immediately write a tool to diff the current working directory against a tree object, and hopefully that will just make pasky happy with this model too. Is there any other reason why git-pasky wants to have a work file? Linus - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC] Possible strategy cleanup for git add/remove/diff etc.
I was reading this comment in gitcommit.sh and started thinking... # We bother with added/removed files here instead of updating # the cache at the time of git(add|rm).sh, since we want to # have the cache in a consistent state representing the tree # as it was the last time we committed. Otherwise, e.g. partial # conflicts would be a PITA since added/removed files would # be committed along automagically as well. Let's for a moment forget what git-pasky currently does, which is not to touch .git/index until the user says "Ok, let's commit". I am wondering if that is the root cause of all the trouble git-pasky needs to go through. Specifically I think having to deal with add/remove queue seems to affect not just commit you have that comment above but also with diffs. I'd like to start from a different premise and see what happens: - What .git/index records is *not* the state as the last commit. It is just an cache Cogito uses to speed up access to the user's working tree. From the user's point of view, it does not even exist. - The way this hypothetical Cogito uses .git/index is to always reflect add and remove but modification may be out of sync. It is updated lazily when .git/index must match the working tree. Again, this is invisible to the user. From the user's point of view, there are only two things: the last commit represented as .git/HEAD and his own working tree. I call this hypothetical implementation of Cogito "jit-*" in the following description. Also this is just to convey the idea, so all the error checking (e.g. "what the user gave jit-merge is not a valid commit id") and sugarcoating (e.g. tags, symbolic foreign repository names instead of rsync URL etc) are omitted. * jit-checkout $commit_id This is like "cvs co". Same as what you are doing I suppose. committed_tree=$(cat-file commit $commit_id | sed -e 's/^tree //;q') read-tree $committed_tree checkout-cache -f -a echo $commit_id >.git/HEAD * jit-add files... | jit-remove files... Like "cvs add". Here, .git/index is treated as just a cache of the working tree, not the mirror of previous commit. So unlike git-pasky, jit-* touches .git/index here. update-cache --add "$@" --- rm -f "$@" ;# this is debatable... update-cache --remove "$@" * jit-diff [files...] Like "cvs diff". The user wants to see what's different between his working tree and the last commit. case "$#" in 0) set x $(show-files --cached); shift ;; esac update-cache --add --remove "$@" --refresh current_tree=$(write-tree) committed_tree=$(cat-file commit $commit_id | sed -e 's/^tree //;q') diff-tree -r -z $committed_tree $current_tree | filter-output-to-limit-to-given-filelist "$@" | parse-diff-tree-output-and-show-real-file-diffs Unlike git-pasky, jit-* does not keep the state from the last commit in .git/index. Instead, .git/index is meant to cache the state of the working tree. So the first three lines in the above updates .git/index lazily from what is in the working tree for the part that needs to be diffed. Then it uses helper scripts to filter and parse diff-tree output and generates per-file diffs. Since add and remove are already recorded in .git/index, it does not have to special case "uncommitted add" and such. * jit-commit Like "cvs commit". set x $(show-files --cached); shift update-cache --add --remove "$@" current_tree=$(write-tree) next_commit=$(commmit-tree $current_tree -p $(cat .git/HEAD)) echo $next_commit >.git/HEAD Unlike git-pasky, .git/index already has adds and removes but it does not know about local modifications. So it runs update-cache to make it match the working tree first, and then does the usual commit thing. The above only allows the whole tree commit. But allowing single file commit is not that hard: ( set x $(show-files --cached); shift update-cache --add --remove "$@" ) ;# we use subshell to preserve "$@" here... current_tree=$(write-tree) committed_tree=$(cat-file commit $commit_id | sed -e 's/^tree //;q') read-tree $(committed_tree) update-cache --add --remove "$@" next_commit=$(commmit-tree $current_tree -p $(cat .git/HEAD)) echo $next_commit >.git/HEAD read-tree $current_tree The first four lines are to preserve the current tree state. Then we rewind the dircache to the last committed state, update only the named files to bring it to the state the user wanted to commit, and commit. Once done, we re-read the state to match the user's original intention (e.g. adds recorded in .git/index previously but not committed in this run is preserved). * jit-merge $commit_id LIke "cvs up -j". I have working tree which is based on some commit, and I want to merge somebody else's head $commit_id. Stated more exactly: I want to have the result of my chang