Re: Bring together merge and rebase

2017-12-26 Thread Carl Baldwin
On Sun, Dec 24, 2017 at 10:52:15PM -0500, Theodore Ts'o wrote:
> Here's another potential use case.  The stable kernels (e.g., 3.18.y,
> 4.4.y, 4.9.y, etc.) have cherry picks from the the upstream kernel,
> and this is handled by putting in the commit body something like this:
> 
> [ Upstream commit 3a4b77cd47bb837b8557595ec7425f281f2ca1fe ]

I think replaces could apply to cherry picks like this too. The more I
think about it, I actually think that replaces isn't a bad name for it
in the cherry pick context. When you cherry pick a commit, you create a
new commit that is derived from it and stands in for or replaces it in
the new context. It is a stretch but I don't think it is that bad.

You can tell that it is a cherry pick because the referenced commit's
history is not reachable in the current context.

Though we could consider some different names like "derivedfrom",
"obsoletes", "succeeds", "supersedes", "supplants"

> 
> 
> And here's yet another use case.  For internal Google kernel
> development, we maintain a kernel that has a large number of patches
> on top of a kernel version.  When we backport an upstream fix (say,
> one that first appeared in the 4.12 version of the upstream kernel),
> we include a line in the commit body that looks like this:
> 
> Upstream-4.12-SHA1: 5649645d725c73df4302428ee4e02c869248b4c5
> 
> This is useful, because when we switch to use a newer upstream kernel,
> we need make sure we can account for all patches that were built on
> top of the 3xx kernel (which might have been using 4.10, for the sake
> of argument), to the 4xx kernel series (which might be using 4.15 ---
> the version numbers have been changed to protect the innocent).  This
> means going through each and every patch that was on top of the 3xx
> kernel, and if it has a line such as "Upstream 4.12-SHA1", we know
> that it will already be included in a 4.15 based kernel, so we don't
> need to worry about carrying that patch forward.

Are 3xx and 4xx internal version numbers? If I understand correctly, in
your example, 3xx is the heavily patched internal kernel based on 4.10
and 4xx is the internal patched version of 4.15. I think I'm following
so far.

Let's say that you used a "replaces" reference instead of your
"Upstream-4.12-SHA1" reference. The only piece of metadata that is
missing is the "4.12" of your string. However, you could replicate this
with some set arithmetic. If the sha1 referred to by "replaces" exists
in the set of commits reachable from 4.15 then you've answered the same
question.

> In other cases, we might decide that the patch is no longer needed.
> It could be because the patch has already be included upstream, in
> which case we might check in a commit with an empty patch body, but
> whose header contains something like this in the 4xx kernel:
> 
> Origin-3xx-SHA1: fe546bdfc46a92255ebbaa908dc3a942bc422faa
> Upstream-Dropped-4.11-SHA1: d90dc0ae7c264735bfc5ac354c44ce2e

So, the first reference is the old commit that patched the 3xx series?
What is the second reference? What is "4.11" indicating? Is that the
patch that was included in the upstream kernel that obsoleted your 3xx
patch?

If I understood that correctly. You could use a "replaces" reference for
the first line and the second line would still have to be included as a
separate header in your commit message? Does this mean "replaces" is not
useful in your case? I don't think so.

> Or we could decide that the commit is no longer no longer needed ---

no longer no longer needed? Is this a double negative indicating that it
is needed again? Or, is it a mistake?

> perhaps because the relevant subsystem was completely rewritten and
> the functionality was added in a different way.  Then we might have
> just have an empty commit with an explanation of why the commit is no
> longer needed and the commit body would have the metadata:
> 
> Origin-Dropped-3xx-SHA1: 26f49fcbb45e4bc18ad5b52dc93c3afe

The metadata in this reference indicates that it was dropped since 3xx.
Doesn't the empty body (and maybe a commit message saying dropping a
patch) indicate this if a "references" pointer were used instead? The
3xx part of the metadata could be derived again by set arithmetic.

> Or perhaps the commit is still needed, and for various reasons the
> commit was never upstreamed; perhaps because it's only useful for
> Google-specific hardware, or the patch was rejected upstream.  The we
> will have a cherry-pick that would include in the body:
> 
> Origin-3xx-SHA1: 8f3b6df74b9b4ec3ab615effb984c1b5

Replaces reference and set arithmetic.

> (Note: all commits that are added in the rebase workflow, even the
> empty commits that just have the Origin-Dropped-3xx-SHA1 or
> Upstream-Droped-4.11-SHA1 headers, are patch reviewed through Gerrit,
> so we have an audited, second-engineer review to make sure each commit
> in the 3xx kernel that Google had been carrying had the correct
> disposition when rebasing to the 4xx kernel.)

This is 

Re: Bring together merge and rebase

2017-12-26 Thread Carl Baldwin
On Tue, Dec 26, 2017 at 01:08:45PM +0900, Mike Hommey wrote:
> FWIW, your proposal has a lot in common (but is not quite equivalent)
> to mercurial's obsolescence markers and changeset evolution features.

I've had experience with mercurial but not since about 2009. After
reading up a little bit on this changeset evolution feature, it looks
very much like what I'm proposing. Obsolescence markers look a lot like
replaces references except, as illustrated by this blog [1], they point
the other way! Hence, the illustrations confused me for a moment. It
seems more natural to embed the reference in the new commit pointing at
the old. That said, the illustrated direction of the arrows doesn't
really affect the usefulness of the idea.

His third example (#3-working-with-other-people), appears to be the kind
of collaboration that I'm trying to describe here. To quote the blog:

  In git or vanilla (no extension) mercurial, you would have to figure
  out that b’ and b” are two new versions of b and merge them. Changeset
  evolution detects that situation, marks b’ and b” as being divergent.
  It then suggests automatic resolution with a merge and preserves
  history.

This is the kind of thing that I had to deal with manually in gerrit. I
hadn't seen this feature in mercurial but I'm glad to know now there is
a precedent for it.

Carl

[1] https://blog.laurentcharignon.com/post/2016-02-02-changeset-evolution/


Re: [PATCH v2 6/7] wt-status.c: handle worktree renames

2017-12-26 Thread Duy Nguyen
On Wed, Dec 27, 2017 at 1:14 AM, Igor Djordjevic
 wrote:
> I`m afraid "--porcelain=v2" test might be incorrect here, as `git
> status --porcelain=v2` output seems to be too, with this v2 series
> applied. Test I sent previously[1] fails, and it looks valid.
>
> This is output I now get, with old/deleted file unstaged and
> new/created file staged with `git add -N`:
>
> $ git status --porcelain=v2
> 2 .R N... 100644 100644 100644 12f00e90b6ef79117ce6e650416b8cf517099b78 
> 12f00e90b6ef79117ce6e650416b8cf517099b78 R100 original-file new-file
>
> Note "original-file" listed first, where "new-file" listed second
> (last). According the "v2" documentation[2] (excerpt):
>
>   ... 
>
>The pathname. In a renamed/copied entry, this
>  is the path in the index and in the working tree.

Gaah.. as you may see in the other mail when I quoted this
(incorrectly). I must have modified this file at some point and
thought it was true (my version did not have "and in the worktree").

The "and" is still problematic if you take this very seriously
(because in this case index name and worktree name are different) but
I think it's ok to ignore that "and" and switch it to "or".

>   ...
>The pathname in the commit at HEAD. This is only
>  present in a renamed/copied entry, and tells
>  where the renamed/copied contents came from.
>
>
> If I`m reading this correctly, it should be vice-versa - value from
> HEAD, being "original-file", should come last, where value from
> working tree ("new-file") should be first.

Yeah I think the "where the renamed/copied contents came from" clears
up my confusion in this format. Back to v1 it is!
-- 
Duy


Re: [PATCH v2 7/7] wt-status.c: avoid double renames in short/porcelain format

2017-12-26 Thread Duy Nguyen
On Wed, Dec 27, 2017 at 5:14 AM, Igor Djordjevic
 wrote:
> Hi Duy,
>
> On 26/12/2017 10:10, Nguyễn Thái Ngọc Duy wrote:
>>
>> The presence of worktree rename leads to an interesting situation,
>> what if the same index entry is renamed twice, compared to HEAD and to
>> worktree? We can have that with this setup
>>
>> echo first > first && git add first && git commit -m first
>> git mv first second  # rename reported in "diff --cached"
>> mv second third  # rename reported in "diff-files"
>>
>> For the long format this is fine because we print two "->" rename
>> lines, one in the "updated" section, one in "changed" one.
>>
>> For other output formats, it gets tricky because they combine both
>> diffs in one line but can only display one rename per line. The result
>> "XY" column of short format, for example, would be "RR" in that case.
>>
>> This case either needs some extension in short/porcelain format
>> to show something crazy like
>>
>> RR first -> second -> third
>>
>> or we could show renames as two lines instead of one, for example
>> something like this for short form:
>>
>> R  first -> second
>>  R second -> third
>>
>> But for now it's safer and simpler to just break the "second -> third"
>> rename pair and show
>>
>> RD first -> second
>>  A third
>>
>> like we have been showing until now.
>>
>
> I lost you a bit here, partially because of what seems to be an
> incomplete setup script, partially because of this last sentence, as
> Git v2.15.1 doesn`t seem to be showing this, so not sure about "like
> we have been showing until now" part...?

Yeah I missed a "git add -N third" in the setup. And "until now" was a
poor choice of words. It should have been "before 425a28e0a4", where
"new files" could not show up, which prevented rename detection in the
"Changed bot not staged for commit" section in the first place.

Though it's not _exactly_ like before. If you replace
"ita_invisible_in_index = 1" with "ita_invisible_in_index = 0" in
wt-status.c, you effectively roll back 425a28e0a4 and "git status
--short" would show

RD first -> second
AM third

The second line is different and is what 425a28e0a4 tries to fix.

> Now, still using v2.15.1, let`s see porcelain statuses:
>
> (2) $ git status --porcelain
> RR first -> second
>
> (3) $ git status --porcelain=v2
> 2 RR N... 100644 100644 00 9c59e24b8393179a5d712de4f990178df5734d99 
> 9c59e24b8393179a5d712de4f990178df5734d99 R100 secondfirst
>
> Here, they both report renames in _both_ index and working tree (RR),
> but they show "index" renamed path only ("second", in comparison to
> original value in HEAD, "first").
>
> I`m inclined to say this doesn`t align with what `git status` shows,
> disrespecting `add -N` (or respecting it only partially, through that
> second R, but not showing the actual working tree rename, "third").
>
> Without influencing porcelain format, and to fully respect `add -N`,
> I believe showing two renames (index and working tree) as two lines
> would be the correct approach - and that`s what default `git status`
> does, too.

I agree. What worries me though, is the path in index seems to be be
the unique "key" of each line. Porcelain v2 shows this clearer when
"second" is always in the same column. By showing two lines with  the
same key (i.e. "second"), I'm not sure if we are breaking the
porcelain format. Perhaps this is undefined area that we can just
tweak?

> Now, let`s examine this patch series v2 outputs:
>
> (1) $ git status
> On branch master
> Changes to be committed:
>   (use "git reset HEAD ..." to unstage)
>
> renamed:first -> second
>
> Changes not staged for commit:
>   (use "git add ..." to update what will be committed)
>   (use "git checkout -- ..." to discard changes in working 
> directory)
>
> renamed:second -> third
>
> (2) $ git status --porcelain
> RD first -> second
>  A third
>
> (3) $ git status --porcelain=v2
> 2 RD N... 100644 100644 00 9c59e24b8393179a5d712de4f990178df5734d99 
> 9c59e24b8393179a5d712de4f990178df5734d99 R100 secondfirst
> 1 .A N... 00 00 100644  
>  third
>
> Here, porcelain statuses make situation a bit better, as now at least
> `add -N` is respected, showing new "tracked" path appearing in the
> working tree.
>
> But, we now lost any idea about the rename that happened there as
> well - which Git v2.15.1 porcelain was partially showing (through
> RR), and which `git status` still reports correctly - and which we
> still differ from.

Sorry again about "now". Before 425a28e0a4 rename detection would not
kick in to find "second -> third" so people wouldn't know about rename
anyway.

>
> I don`t think this looks like what we have been showing until now
> (unless I misunderstood which exact "now" are we talking about), so I
> don`t 

Re: Bring together merge and rebase

2017-12-26 Thread Igor Djordjevic
Very interesting topic, just this one part I wanted to comment on:

On 26/12/2017 02:28, Jacob Keller wrote:
> 
> What about some way to take the reflog and turn it into a commit-based
> linkage and export that? Rather than tying it into the individual
> commit history, keep track of it outside the commit, possibly via
> something like notes, or some other mechanism.

This seems like the most useful approach, might be not touching reflog 
per se, but having some kind of "cherry-picked commits source" log 
(where rebasing is a subset of cherry-picking). What Johannes 
mentioned, a mapping between "old" and "new" commits. Might be notes 
could fit in nicely, but I`m not competent to comment on that at the 
moment.

For me, the most interesting use case is not even tied to code review 
(thus no review comments to think about), but a situation where one 
might be rebasing a set of downstream patches on top of updating 
upstream - it might be possible for a bug to slip through due to some 
upstream changes, even where there are no conflicts and test suite is 
executed regularly (might be test reveling the bug is yet to be added).

In that situation, instead of just going back in "regular" history 
(single dimension) and eventually finding the offending (rebased) 
commit (its N-th rebased version, that is), it might be great to 
actually keep drilling down the "rebase history" now (second 
dimension), finding the exact rebase iteration / rebased commit 
version where the error first appeared.

Carl, you described this well in your document[1], and Johannes 
provided a valuable first-hand experience[2] from working around the 
very same native Git limitation for years, mentioning using (fragile, 
costly and not very automatible) rebased commits message search to 
drill down the second dimension (rebase iterations), which seems to 
be the only possible approach at the moment, with "vanilla" Git, at 
least.

So this might be much more interesting case, if code review one is 
less appropriate because of review comments being also relevant to 
commit rebase iterations (which should be then stored somewhere, too, 
relating to corresponding commits, not to lose context).

Regards, Buga

p.s. "Merging rebase" and "shears.sh" script[3] seem to be orthogonal 
to this - really great on their own in improving rebase itself and 
making it smarter and much more powerful and useful, where I guess 
they would benefit from native Git "cherry-picked (rebased) commits 
iterations tracking" (old/source <> new/destination commit mapping), 
too, as would other Git tools.

[1] http://blog.episodicgenius.com/post/merge-or-rebase--neither/
[2] 
https://public-inbox.org/git/20171226040843.h7o6txkrp6zlv...@glandium.org/T/#m2e5079488bed2968d4ea52a10051a06c06ff61e0
[3] 
https://github.com/git-for-windows/build-extra/blob/af9cff5005/shears.sh#L12-L18


Re: [PATCH v2 7/7] wt-status.c: avoid double renames in short/porcelain format

2017-12-26 Thread Igor Djordjevic
Hi Duy,

On 26/12/2017 10:10, Nguyễn Thái Ngọc Duy wrote:
> 
> The presence of worktree rename leads to an interesting situation,
> what if the same index entry is renamed twice, compared to HEAD and to
> worktree? We can have that with this setup
> 
> echo first > first && git add first && git commit -m first
> git mv first second  # rename reported in "diff --cached"
> mv second third  # rename reported in "diff-files"
> 
> For the long format this is fine because we print two "->" rename
> lines, one in the "updated" section, one in "changed" one.
> 
> For other output formats, it gets tricky because they combine both
> diffs in one line but can only display one rename per line. The result
> "XY" column of short format, for example, would be "RR" in that case.
> 
> This case either needs some extension in short/porcelain format
> to show something crazy like
> 
> RR first -> second -> third
> 
> or we could show renames as two lines instead of one, for example
> something like this for short form:
> 
> R  first -> second
>  R second -> third
> 
> But for now it's safer and simpler to just break the "second -> third"
> rename pair and show
> 
> RD first -> second
>  A third
> 
> like we have been showing until now.
> 

I lost you a bit here, partially because of what seems to be an 
incomplete setup script, partially because of this last sentence, as 
Git v2.15.1 doesn`t seem to be showing this, so not sure about "like 
we have been showing until now" part...?

Here, with your setup script, with plain Git v2.15.1, we have:

$ git status
On branch master
Changes to be committed:
  (use "git reset HEAD ..." to unstage)

renamed:first -> second

Changes not staged for commit:
  (use "git add/rm ..." to update what will be committed)
  (use "git checkout -- ..." to discard changes in working directory)

deleted:second

Untracked files:
  (use "git add ..." to include in what will be committed)

third

Might be an additional `git add -N -- third` is needed here, to show 
what (I assume) you wanted...? If so:

$ git add -N third
(1) $ git status
On branch master
Changes to be committed:
  (use "git reset HEAD ..." to unstage)

renamed:first -> second

Changes not staged for commit:
  (use "git add ..." to update what will be committed)
  (use "git checkout -- ..." to discard changes in working directory)

renamed:second -> second
  ^^
Now we can see two renames I believe you were talking about...? (Note 
original bug showing above, which started this thread.) Now, still 
using v2.15.1, let`s see porcelain statuses:

(2) $ git status --porcelain
RR first -> second

(3) $ git status --porcelain=v2
2 RR N... 100644 100644 00 9c59e24b8393179a5d712de4f990178df5734d99 
9c59e24b8393179a5d712de4f990178df5734d99 R100 secondfirst

Here, they both report renames in _both_ index and working tree (RR), 
but they show "index" renamed path only ("second", in comparison to 
original value in HEAD, "first").

I`m inclined to say this doesn`t align with what `git status` shows, 
disrespecting `add -N` (or respecting it only partially, through that 
second R, but not showing the actual working tree rename, "third").

Without influencing porcelain format, and to fully respect `add -N`, 
I believe showing two renames (index and working tree) as two lines 
would be the correct approach - and that`s what default `git status` 
does, too.

Now, let`s examine this patch series v2 outputs:

(1) $ git status
On branch master
Changes to be committed:
  (use "git reset HEAD ..." to unstage)

renamed:first -> second

Changes not staged for commit:
  (use "git add ..." to update what will be committed)
  (use "git checkout -- ..." to discard changes in working directory)

renamed:second -> third

(2) $ git status --porcelain
RD first -> second
 A third

(3) $ git status --porcelain=v2
2 RD N... 100644 100644 00 9c59e24b8393179a5d712de4f990178df5734d99 
9c59e24b8393179a5d712de4f990178df5734d99 R100 secondfirst
1 .A N... 00 00 100644  
 third

Here, porcelain statuses make situation a bit better, as now at least 
`add -N` is respected, showing new "tracked" path appearing in the 
working tree.

But, we now lost any idea about the rename that happened there as 
well - which Git v2.15.1 porcelain was partially showing (through 
RR), and which `git status` still reports correctly - and which we 
still differ from.
 
I don`t think this looks like what we have been showing until now 
(unless I misunderstood which exact "now" are we talking about), so I 
don`t see that as a valid argument to support this 

Re: [PATCH v2 0/7] Codespeed perf results

2017-12-26 Thread Christian Couder
On Tue, Dec 26, 2017 at 10:59 PM, Christian Couder
 wrote:

> Changes since previous version
> ~~
>
>   - Fixed the way the 'executable' field sent to Codespeed is set in
> `perf/aggregate.perl` in patch 3/7. We now use `uname -s -m` to
> which we concatenate the perf.repoName config value if it is set,
> as suggested by Junio and Eric.
>
>   - Fixed the name of the GIT_PERF_REPO_NAME variable in patches 3/7
> and 7/7. It was GIT_TEST_REPO_NAME in some places.
>
>   - Fixed how the conf_opt argument is added to the
> get_var_from_env_or_config() function in patch 4/7. It was added
> as the last, so fifth, argument, but it is simpler and makes more
> sense to keep the default value argument as the last argument, so
> now the conf_opt argument is added as the fourth argument.
>
>   - What patch 4/7 did was previously done in 2 patches in the
> previous version (patches 4/8 and 5/8), but as we are not doing
> exactly the same thing (see the above item) it is simpler to do it
> in only one patch instead of two.
>
>   - We now use the --int type specifier when getting the
> perf.repeatCount config variable in patch 4/7.

Here is the diff with v1:

diff --git a/t/perf/aggregate.perl b/t/perf/aggregate.perl
index 6e15f62a9e..34d74fc015 100755
--- a/t/perf/aggregate.perl
+++ b/t/perf/aggregate.perl
@@ -186,17 +186,16 @@ sub print_codespeed_results {

 my $project = "Git";

-my $executable;
-if ($results_section eq "") {
-$executable = `uname -o -p`;
-} else {
-$executable = $results_section;
-chomp $executable;
+my $executable = `uname -s -m`;
+chomp $executable;
+
+if ($results_section ne "") {
+$executable .= ", " . $results_section;
 }

 my $environment;
-if (exists $ENV{GIT_TEST_REPO_NAME} and $ENV{GIT_TEST_REPO_NAME} ne "") {
-$environment = $ENV{GIT_TEST_REPO_NAME};
+if (exists $ENV{GIT_PERF_REPO_NAME} and $ENV{GIT_PERF_REPO_NAME} ne "") {
+$environment = $ENV{GIT_PERF_REPO_NAME};
 } elsif (exists $ENV{GIT_TEST_INSTALLED} and
$ENV{GIT_TEST_INSTALLED} ne "") {
 $environment = $ENV{GIT_TEST_INSTALLED};
 $environment =~ s|/bin-wrappers$||;
diff --git a/t/perf/run b/t/perf/run
index 279c2d41f6..1a100d6134 100755
--- a/t/perf/run
+++ b/t/perf/run
@@ -105,8 +105,8 @@ get_var_from_env_or_config () {
 env_var="$1"
 conf_sec="$2"
 conf_var="$3"
-default_value="$4" # optional
-conf_opts="$5" # optional
+conf_opts="$4" # optional
+# $5 can be set to a default value

 # Do nothing if the env variable is already set
 eval "test -z \"\${$env_var+x}\"" || return
@@ -124,11 +124,11 @@ get_var_from_env_or_config () {
 conf_value=$(git config $conf_opts -f "$GIT_PERF_CONFIG_FILE" "$var") &&
 eval "$env_var=\"$conf_value\"" && return

-test -n "${default_value+x}" && eval "$env_var=\"$default_value\""
+test -n "${5+x}" && eval "$env_var=\"$5\""
 }

 run_subsection () {
-get_var_from_env_or_config "GIT_PERF_REPEAT_COUNT" "perf" "repeatCount" 3
+get_var_from_env_or_config "GIT_PERF_REPEAT_COUNT" "perf"
"repeatCount" "--int" 3
 export GIT_PERF_REPEAT_COUNT

 get_var_from_env_or_config "GIT_PERF_DIRS_OR_REVS" "perf" "dirsOrRevs"
@@ -137,7 +137,7 @@ run_subsection () {
 get_var_from_env_or_config "GIT_PERF_MAKE_COMMAND" "perf" "makeCommand"
 get_var_from_env_or_config "GIT_PERF_MAKE_OPTS" "perf" "makeOpts"

-get_var_from_env_or_config "GIT_TEST_REPO_NAME" "perf" "repoName"
+get_var_from_env_or_config "GIT_PERF_REPO_NAME" "perf" "repoName"
 export GIT_PERF_REPO_NAME

 GIT_PERF_AGGREGATING_LATER=t
@@ -163,7 +163,7 @@ run_subsection () {
 fi
 }

-get_var_from_env_or_config "GIT_PERF_CODESPEED_OUTPUT" "perf"
"codespeedOutput" "" --bool
+get_var_from_env_or_config "GIT_PERF_CODESPEED_OUTPUT" "perf"
"codespeedOutput" "--bool"
 get_var_from_env_or_config "GIT_PERF_SEND_TO_CODESPEED" "perf"
"sendToCodespeed"

 cd "$(dirname $0)"


[PATCH v2 0/7] Codespeed perf results

2017-12-26 Thread Christian Couder
This patch series is built on top of cc/perf-run-config which recently
graduated to master.

It makes it possible to send perf results to a Codespeed server. See
https://github.com/tobami/codespeed/ and web sites like
http://speed.pypy.org/ which are using Codespeed.

The end goal would be to have such a server always available to track
how the different git commands perform over time on different kind of
repos (small, medium, large, ...) with different optimizations on and
off (split-index, libpcre2, BLK_SHA1, ...)

With this series and a config file like:

$ cat perf.conf
[perf]
dirsOrRevs = v2.12.0 v2.13.0
repeatCount = 10
sendToCodespeed = http://localhost:8000
repoName = Git repo
[perf "with libpcre"]
makeOpts = "DEVELOPER=1 USE_LIBPCRE=YesPlease"
[perf "without libpcre"]
makeOpts = "DEVELOPER=1"

One should be able to just launch:

$ ./run --config perf.conf p7810-grep.sh

and then get nice graphs in a Codespeed instance running on
http://localhost:8000.

Caveat
~~

For now one has to create the "Git repo" environment in the Codespeed
admin interface. (We send the perf.repoName config variable in the
"environment" Codespeed field.) This is because Codespeed requires the
environment fields to be created and does not provide a simple way to
create these fields programmatically.

There are discussions on a Codespeed issue
(https://github.com/tobami/codespeed/issues/232) about creating a
proper API for Codespeed that could address this problem in the
future.

Changes since previous version
~~

  - Fixed the way the 'executable' field sent to Codespeed is set in
`perf/aggregate.perl` in patch 3/7. We now use `uname -s -m` to
which we concatenate the perf.repoName config value if it is set,
as suggested by Junio and Eric.

  - Fixed the name of the GIT_PERF_REPO_NAME variable in patches 3/7
and 7/7. It was GIT_TEST_REPO_NAME in some places.

  - Fixed how the conf_opt argument is added to the
get_var_from_env_or_config() function in patch 4/7. It was added
as the last, so fifth, argument, but it is simpler and makes more
sense to keep the default value argument as the last argument, so
now the conf_opt argument is added as the fourth argument.

  - What patch 4/7 did was previously done in 2 patches in the
previous version (patches 4/8 and 5/8), but as we are not doing
exactly the same thing (see the above item) it is simpler to do it
in only one patch instead of two.

  - We now use the --int type specifier when getting the
perf.repeatCount config variable in patch 4/7.

Links
~

This patch series:

https://github.com/chriscool/git/commits/codespeed

Previous version (v1):

https://github.com/chriscool/git/commits/codespeed1

Discussions about v1:

https://public-inbox.org/git/cap8ufd3q4h-aybdabspow948lqyvydwz1hlpad+kr9zpxvz...@mail.gmail.com/

Discussions about the cc/perf-run-config patch series:

v1: https://public-inbox.org/git/20170713065050.19215-1-chrisc...@tuxfamily.org/
v2: 
https://public-inbox.org/git/cap8ufd2j-ufh+9awz91gtz-jusq7euoexmguro59vpf29jx...@mail.gmail.com/

Christian Couder (7):
  perf/aggregate: fix checking ENV{GIT_PERF_SUBSECTION}
  perf/aggregate: refactor printing results
  perf/aggregate: implement codespeed JSON output
  perf/run: add conf_opts argument to get_var_from_env_or_config()
  perf/run: learn about perf.codespeedOutput
  perf/run: learn to send output to codespeed server
  perf/run: read GIT_TEST_REPO_NAME from perf.repoName

 t/perf/aggregate.perl | 163 +++---
 t/perf/run|  31 --
 2 files changed, 140 insertions(+), 54 deletions(-)

-- 
2.15.1.361.g8b07d831d0



[PATCH v2 6/7] perf/run: learn to send output to codespeed server

2017-12-26 Thread Christian Couder
Let's make it possible to set in a config file the URL of
a codespeed server. And then let's make the `run` script
send the perf test results to this URL at the end of the
tests.

This should make is possible to easily automate the process
of running perf tests and having their results available in
Codespeed.

Signed-off-by: Christian Couder 
---
 t/perf/run | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/t/perf/run b/t/perf/run
index 4e62d6bb3f..ef56396546 100755
--- a/t/perf/run
+++ b/t/perf/run
@@ -148,10 +148,20 @@ run_subsection () {
test "$GIT_PERF_CODESPEED_OUTPUT" = "true" && 
codespeed_opt="--codespeed"
 
run_dirs "$@"
-   ./aggregate.perl $codespeed_opt "$@"
+
+   if test -z "$GIT_PERF_SEND_TO_CODESPEED"
+   then
+   ./aggregate.perl $codespeed_opt "$@"
+   else
+   json_res_file="test-results/$GIT_PERF_SUBSECTION/aggregate.json"
+   ./aggregate.perl --codespeed "$@" | tee "$json_res_file"
+   send_data_url="$GIT_PERF_SEND_TO_CODESPEED/result/add/json/"
+   curl -v --request POST --data-urlencode "json=$(cat 
"$json_res_file")" "$send_data_url"
+   fi
 }
 
 get_var_from_env_or_config "GIT_PERF_CODESPEED_OUTPUT" "perf" 
"codespeedOutput" "--bool"
+get_var_from_env_or_config "GIT_PERF_SEND_TO_CODESPEED" "perf" 
"sendToCodespeed"
 
 cd "$(dirname $0)"
 . ../../GIT-BUILD-OPTIONS
-- 
2.15.1.361.g8b07d831d0



[PATCH v2 7/7] perf/run: read GIT_TEST_REPO_NAME from perf.repoName

2017-12-26 Thread Christian Couder
The GIT_PERF_REPO_NAME env variable is used in
the `aggregate.perl` script to set the 'environment'
field in the JSON Codespeed output.

Let's make it easy to set this variable by setting it
in a config file.

Signed-off-by: Christian Couder 
---
 t/perf/run | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/t/perf/run b/t/perf/run
index ef56396546..1a100d6134 100755
--- a/t/perf/run
+++ b/t/perf/run
@@ -137,6 +137,9 @@ run_subsection () {
get_var_from_env_or_config "GIT_PERF_MAKE_COMMAND" "perf" "makeCommand"
get_var_from_env_or_config "GIT_PERF_MAKE_OPTS" "perf" "makeOpts"
 
+   get_var_from_env_or_config "GIT_PERF_REPO_NAME" "perf" "repoName"
+   export GIT_PERF_REPO_NAME
+
GIT_PERF_AGGREGATING_LATER=t
export GIT_PERF_AGGREGATING_LATER
 
-- 
2.15.1.361.g8b07d831d0



[PATCH v2 1/7] perf/aggregate: fix checking ENV{GIT_PERF_SUBSECTION}

2017-12-26 Thread Christian Couder
Signed-off-by: Christian Couder 
---
 t/perf/aggregate.perl | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/t/perf/aggregate.perl b/t/perf/aggregate.perl
index e401208488..769d418708 100755
--- a/t/perf/aggregate.perl
+++ b/t/perf/aggregate.perl
@@ -70,7 +70,7 @@ if (not @tests) {
 }
 
 my $resultsdir = "test-results";
-if ($ENV{GIT_PERF_SUBSECTION} ne "") {
+if (exists $ENV{GIT_PERF_SUBSECTION} and $ENV{GIT_PERF_SUBSECTION} ne "") {
$resultsdir .= "/" . $ENV{GIT_PERF_SUBSECTION};
 }
 
-- 
2.15.1.361.g8b07d831d0



[PATCH v2 3/7] perf/aggregate: implement codespeed JSON output

2017-12-26 Thread Christian Couder
Codespeed (https://github.com/tobami/codespeed/) is an open source
project that can be used to track how some software performs over
time. It stores performance test results in a database and can show
nice graphs and charts on a web interface.

As it can be interesting to Codespeed to see how Git performance
evolves over time and releases, let's implement a Codespeed output
in "perf/aggregate.perl".

Helped-by: Eric Sunshine 
Signed-off-by: Christian Couder 
---
 t/perf/aggregate.perl | 67 +--
 1 file changed, 65 insertions(+), 2 deletions(-)

diff --git a/t/perf/aggregate.perl b/t/perf/aggregate.perl
index 3609cb5dc3..34d74fc015 100755
--- a/t/perf/aggregate.perl
+++ b/t/perf/aggregate.perl
@@ -35,10 +35,15 @@ sub format_times {
return $out;
 }
 
-my (@dirs, %dirnames, %dirabbrevs, %prefixes, @tests);
+my (@dirs, %dirnames, %dirabbrevs, %prefixes, @tests, $codespeed);
 while (scalar @ARGV) {
my $arg = $ARGV[0];
my $dir;
+   if ($arg eq "--codespeed") {
+   $codespeed = 1;
+   shift @ARGV;
+   next;
+   }
last if -f $arg or $arg eq "--";
if (! -d $arg) {
my $rev = Git::command_oneline(qw(rev-parse --verify), $arg);
@@ -70,8 +75,10 @@ if (not @tests) {
 }
 
 my $resultsdir = "test-results";
+my $results_section = "";
 if (exists $ENV{GIT_PERF_SUBSECTION} and $ENV{GIT_PERF_SUBSECTION} ne "") {
$resultsdir .= "/" . $ENV{GIT_PERF_SUBSECTION};
+   $results_section = $ENV{GIT_PERF_SUBSECTION};
 }
 
 my @subtests;
@@ -174,6 +181,62 @@ sub print_default_results {
}
 }
 
+sub print_codespeed_results {
+   my ($results_section) = @_;
+
+   my $project = "Git";
+
+   my $executable = `uname -s -m`;
+   chomp $executable;
+
+   if ($results_section ne "") {
+   $executable .= ", " . $results_section;
+   }
+
+   my $environment;
+   if (exists $ENV{GIT_PERF_REPO_NAME} and $ENV{GIT_PERF_REPO_NAME} ne "") 
{
+   $environment = $ENV{GIT_PERF_REPO_NAME};
+   } elsif (exists $ENV{GIT_TEST_INSTALLED} and $ENV{GIT_TEST_INSTALLED} 
ne "") {
+   $environment = $ENV{GIT_TEST_INSTALLED};
+   $environment =~ s|/bin-wrappers$||;
+   } else {
+   $environment = `uname -r`;
+   chomp $environment;
+   }
+
+   my @data;
+
+   for my $t (@subtests) {
+   for my $d (@dirs) {
+   my $commitid = $prefixes{$d};
+   $commitid =~ s/^build_//;
+   $commitid =~ s/\.$//;
+   my ($result_value, $u, $s) = 
get_times("$resultsdir/$prefixes{$d}$t.times");
+
+   my %vals = (
+   "commitid" => $commitid,
+   "project" => $project,
+   "branch" => $dirnames{$d},
+   "executable" => $executable,
+   "benchmark" => $shorttests{$t} . " " . 
read_descr("$resultsdir/$t.descr"),
+   "environment" => $environment,
+   "result_value" => $result_value,
+   );
+   push @data, \%vals;
+   }
+   }
+
+   #use Data::Dumper qw/ Dumper /;
+   #print Dumper(\@data);
+
+   use JSON;
+   print to_json(\@data, {utf8 => 1, pretty => 1}), "\n";
+}
+
 binmode STDOUT, ":utf8" or die "PANIC on binmode: $!";
 
-print_default_results();
+if ($codespeed) {
+   print_codespeed_results($results_section);
+} else {
+   print_default_results();
+}
-- 
2.15.1.361.g8b07d831d0



[PATCH v2 4/7] perf/run: add conf_opts argument to get_var_from_env_or_config()

2017-12-26 Thread Christian Couder
Let's make it possible to use `git config` type specifiers like
`--int` or `--bool`, so that config values are converted to the
canonical form and easier to use.

This additional argument is now the fourth argument of
get_var_from_env_or_config() instead of the fifth because we
want the default value argument to be unset if it is not
passed, and this is simpler if it is the last argument.

Signed-off-by: Christian Couder 
---
 t/perf/run | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/t/perf/run b/t/perf/run
index 43e4de49ef..214d658172 100755
--- a/t/perf/run
+++ b/t/perf/run
@@ -105,7 +105,8 @@ get_var_from_env_or_config () {
env_var="$1"
conf_sec="$2"
conf_var="$3"
-   # $4 can be set to a default value
+   conf_opts="$4" # optional
+   # $5 can be set to a default value
 
# Do nothing if the env variable is already set
eval "test -z \"\${$env_var+x}\"" || return
@@ -116,18 +117,18 @@ get_var_from_env_or_config () {
if test -n "$GIT_PERF_SUBSECTION"
then
var="$conf_sec.$GIT_PERF_SUBSECTION.$conf_var"
-   conf_value=$(git config -f "$GIT_PERF_CONFIG_FILE" "$var") &&
+   conf_value=$(git config $conf_opts -f "$GIT_PERF_CONFIG_FILE" 
"$var") &&
eval "$env_var=\"$conf_value\"" && return
fi
var="$conf_sec.$conf_var"
-   conf_value=$(git config -f "$GIT_PERF_CONFIG_FILE" "$var") &&
+   conf_value=$(git config $conf_opts -f "$GIT_PERF_CONFIG_FILE" "$var") &&
eval "$env_var=\"$conf_value\"" && return
 
-   test -n "${4+x}" && eval "$env_var=\"$4\""
+   test -n "${5+x}" && eval "$env_var=\"$5\""
 }
 
 run_subsection () {
-   get_var_from_env_or_config "GIT_PERF_REPEAT_COUNT" "perf" "repeatCount" 
3
+   get_var_from_env_or_config "GIT_PERF_REPEAT_COUNT" "perf" "repeatCount" 
"--int" 3
export GIT_PERF_REPEAT_COUNT
 
get_var_from_env_or_config "GIT_PERF_DIRS_OR_REVS" "perf" "dirsOrRevs"
-- 
2.15.1.361.g8b07d831d0



[PATCH v2 2/7] perf/aggregate: refactor printing results

2017-12-26 Thread Christian Couder
As we want to implement another kind of output than
the current output for the perf test results, let's
refactor the existing code that outputs the results
in its own print_default_results() function.

Signed-off-by: Christian Couder 
---
 t/perf/aggregate.perl | 96 +++
 1 file changed, 50 insertions(+), 46 deletions(-)

diff --git a/t/perf/aggregate.perl b/t/perf/aggregate.perl
index 769d418708..3609cb5dc3 100755
--- a/t/perf/aggregate.perl
+++ b/t/perf/aggregate.perl
@@ -100,13 +100,6 @@ sub read_descr {
return $line;
 }
 
-my %descrs;
-my $descrlen = 4; # "Test"
-for my $t (@subtests) {
-   $descrs{$t} = $shorttests{$t}.": ".read_descr("$resultsdir/$t.descr");
-   $descrlen = length $descrs{$t} if length $descrs{$t}>$descrlen;
-}
-
 sub have_duplicate {
my %seen;
for (@_) {
@@ -122,54 +115,65 @@ sub have_slash {
return 0;
 }
 
-my %newdirabbrevs = %dirabbrevs;
-while (!have_duplicate(values %newdirabbrevs)) {
-   %dirabbrevs = %newdirabbrevs;
-   last if !have_slash(values %dirabbrevs);
-   %newdirabbrevs = %dirabbrevs;
-   for (values %newdirabbrevs) {
-   s{^[^/]*/}{};
+sub print_default_results {
+   my %descrs;
+   my $descrlen = 4; # "Test"
+   for my $t (@subtests) {
+   $descrs{$t} = $shorttests{$t}.": 
".read_descr("$resultsdir/$t.descr");
+   $descrlen = length $descrs{$t} if length $descrs{$t}>$descrlen;
}
-}
 
-my %times;
-my @colwidth = ((0)x@dirs);
-for my $i (0..$#dirs) {
-   my $d = $dirs[$i];
-   my $w = length (exists $dirabbrevs{$d} ? $dirabbrevs{$d} : 
$dirnames{$d});
-   $colwidth[$i] = $w if $w > $colwidth[$i];
-}
-for my $t (@subtests) {
-   my $firstr;
+   my %newdirabbrevs = %dirabbrevs;
+   while (!have_duplicate(values %newdirabbrevs)) {
+   %dirabbrevs = %newdirabbrevs;
+   last if !have_slash(values %dirabbrevs);
+   %newdirabbrevs = %dirabbrevs;
+   for (values %newdirabbrevs) {
+   s{^[^/]*/}{};
+   }
+   }
+
+   my %times;
+   my @colwidth = ((0)x@dirs);
for my $i (0..$#dirs) {
my $d = $dirs[$i];
-   $times{$prefixes{$d}.$t} = 
[get_times("$resultsdir/$prefixes{$d}$t.times")];
-   my ($r,$u,$s) = @{$times{$prefixes{$d}.$t}};
-   my $w = length format_times($r,$u,$s,$firstr);
+   my $w = length (exists $dirabbrevs{$d} ? $dirabbrevs{$d} : 
$dirnames{$d});
$colwidth[$i] = $w if $w > $colwidth[$i];
-   $firstr = $r unless defined $firstr;
}
-}
-my $totalwidth = 3*@dirs+$descrlen;
-$totalwidth += $_ for (@colwidth);
-
-binmode STDOUT, ":utf8" or die "PANIC on binmode: $!";
+   for my $t (@subtests) {
+   my $firstr;
+   for my $i (0..$#dirs) {
+   my $d = $dirs[$i];
+   $times{$prefixes{$d}.$t} = 
[get_times("$resultsdir/$prefixes{$d}$t.times")];
+   my ($r,$u,$s) = @{$times{$prefixes{$d}.$t}};
+   my $w = length format_times($r,$u,$s,$firstr);
+   $colwidth[$i] = $w if $w > $colwidth[$i];
+   $firstr = $r unless defined $firstr;
+   }
+   }
+   my $totalwidth = 3*@dirs+$descrlen;
+   $totalwidth += $_ for (@colwidth);
 
-printf "%-${descrlen}s", "Test";
-for my $i (0..$#dirs) {
-   my $d = $dirs[$i];
-   printf "   %-$colwidth[$i]s", (exists $dirabbrevs{$d} ? $dirabbrevs{$d} 
: $dirnames{$d});
-}
-print "\n";
-print "-"x$totalwidth, "\n";
-for my $t (@subtests) {
-   printf "%-${descrlen}s", $descrs{$t};
-   my $firstr;
+   printf "%-${descrlen}s", "Test";
for my $i (0..$#dirs) {
my $d = $dirs[$i];
-   my ($r,$u,$s) = @{$times{$prefixes{$d}.$t}};
-   printf "   %-$colwidth[$i]s", format_times($r,$u,$s,$firstr);
-   $firstr = $r unless defined $firstr;
+   printf "   %-$colwidth[$i]s", (exists $dirabbrevs{$d} ? 
$dirabbrevs{$d} : $dirnames{$d});
}
print "\n";
+   print "-"x$totalwidth, "\n";
+   for my $t (@subtests) {
+   printf "%-${descrlen}s", $descrs{$t};
+   my $firstr;
+   for my $i (0..$#dirs) {
+   my $d = $dirs[$i];
+   my ($r,$u,$s) = @{$times{$prefixes{$d}.$t}};
+   printf "   %-$colwidth[$i]s", 
format_times($r,$u,$s,$firstr);
+   $firstr = $r unless defined $firstr;
+   }
+   print "\n";
+   }
 }
+
+binmode STDOUT, ":utf8" or die "PANIC on binmode: $!";
+
+print_default_results();
-- 
2.15.1.361.g8b07d831d0



[PATCH v2 5/7] perf/run: learn about perf.codespeedOutput

2017-12-26 Thread Christian Couder
Let's make it possible to set in a config file the output
format (regular or codespeed) of the perf tests.

Signed-off-by: Christian Couder 
---
 t/perf/run | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/t/perf/run b/t/perf/run
index 214d658172..4e62d6bb3f 100755
--- a/t/perf/run
+++ b/t/perf/run
@@ -144,10 +144,15 @@ run_subsection () {
set -- . "$@"
fi
 
+   codespeed_opt=
+   test "$GIT_PERF_CODESPEED_OUTPUT" = "true" && 
codespeed_opt="--codespeed"
+
run_dirs "$@"
-   ./aggregate.perl "$@"
+   ./aggregate.perl $codespeed_opt "$@"
 }
 
+get_var_from_env_or_config "GIT_PERF_CODESPEED_OUTPUT" "perf" 
"codespeedOutput" "--bool"
+
 cd "$(dirname $0)"
 . ../../GIT-BUILD-OPTIONS
 
-- 
2.15.1.361.g8b07d831d0



Re: Bring together merge and rebase

2017-12-26 Thread Mike Hommey
On Fri, Dec 22, 2017 at 11:10:19PM -0700, Carl Baldwin wrote:
> The big contention among git users is whether to rebase or to merge
> changes [2][3] while iterating. I used to firmly believe that merging
> was the way to go and rebase was harmful. More recently, I have worked
> in some environments where I saw rebase used very effectively while
> iterating on changes and I relaxed my stance a lot. Now, I'm on the
> fence. I appreciate the strengths and weaknesses of both approaches. I
> waffle between the two depending on the situation, the tools being
> used, and I guess, to some extent, my mood.
> 
> I think what git needs is something brand new that brings the two
> together and has all of the advantages of both approaches. Let me
> explain what I've got in mind...
> 
> I've been calling this proposal `git replay` or `git replace` but I'd
> like to hear other suggestions for what to name it. It works like
> rebase except with one very important difference. Instead of orphaning
> the original commit, it keeps a pointer to it in the commit just like
> a `parent` entry but calls it `replaces` instead to distinguish it
> from regular history. In the resulting commit history, following
> `parent` pointers shows exactly the same history as if the commit had
> been rebased. Meanwhile, the history of iterating on the change itself
> is available by following `replaces` pointers. The new commit replaces
> the old one but keeps it around to record how the change evolved.
> 
> The git history now has two dimensions. The first shows a cleaned up
> history where fix ups and code review feedback have been rolled into
> the original changes and changes can possibly be ordered in a nice
> linear progression that is much easier to understand. The second
> drills into the history of a change. There is no loss and you don't
> change history in a way that will cause problems for others who have
> the older commits.
> 
> Replay handles collaboration between multiple authors on a single
> change. This is difficult and prone to accidental loss when using
> rebase and it results in a complex history when done with merge. With
> replay, collaborators could merge while collaborating on a single
> change and a record of each one's contributions can be preserved.
> Attempting this level of collaboration caused me many headaches when I
> worked with the gerrit workflow (which in many ways, I like a lot).
> 
> I blogged about this proposal earlier this year when I first thought
> of it [1]. I got busy and didn't think about it for a while. Now with
> a little time off of work, I've come back to revisit it. The blog
> entry has a few examples showing how it works and how the history will
> look in a few examples. Take a look.
> 
> Various git commands will have to learn how to handle this kind of
> history. For example, things like fetch, push, gc, and others that
> move history around and clean out orphaned history should treat
> anything reachable through `replaces` pointers as precious. Log and
> related history commands may need new switches to traverse the history
> differently in different situations. Bisect is a interesting one. I
> tend to think that bisect should prefer the regular commit history but
> have the ability to drill into the change history if necessary.
> 
> In my opinion, this proposal would bring together rebase and merge in
> a powerful way and could end the contention. Thanks for your
> consideration.

FWIW, your proposal has a lot in common (but is not quite equivalent) to
mercurial's obsolescence markers and changeset evolution features.

Mike


Re: Bring together merge and rebase

2017-12-26 Thread Carl Baldwin
On Tue, Dec 26, 2017 at 03:19:02PM -0500, Paul Smith wrote:
> As someone working in an environment where we do a lot of rebasing and
> very little merging, I read these proposals with interest.  I'm not
> convinced that we would switch to using a "replaces"-type feature, but
> I'm pretty sure that the "null-merge and rebase" trick described
> previously would not be something we're interested in using.

In the near term, maybe. I'm still working with it to be sure I
understand it right.

> Although "git log" doesn't follow these merges (unless requested), all
> the graphical tools that are used to display history WOULD show all
> those branches.  In a "replaces"-type environment I think the point is
> that we would not want to see them (certainly not by default) as they
> would be used mainly for deeper spelunking, but since they just seem
> like normal merges I don't see any way to turn them off.

You've touched on some of my concerns with the null-merge approach. I
want the end result to be as clean as possible which I think is what
lures many to the rebase methodology in the first place.

> If "replaces" was a separate capability then it could be treated
> differently by history browsing tools, and shown or not shown as
> desired.  For example, a commit that had a "replaces" element could be
> selected somehow and you could expand that set of commits that were
> replaced, or something like that.

Exactly!

Carl


Re: Bring together merge and rebase

2017-12-26 Thread Paul Smith
On Tue, 2017-12-26 at 12:44 -0700, Carl Baldwin wrote:
> > Sure, it could be opt in, be a new format etc. But you haven't
> > explained why you think a feature like this would need to rely on
> > an entirely new parent structure and side-DAG, as opposed to just
> > the more minor changes I'm pointing out above, and which I think
> > will give you what you need from a UX level.
> 
> I have not wrapped my head around it enough to convince myself that
> it gives what I'm after. Let me spend a little more time with it to
> get a feel for it.

As someone working in an environment where we do a lot of rebasing and
very little merging, I read these proposals with interest.  I'm not
convinced that we would switch to using a "replaces"-type feature, but
I'm pretty sure that the "null-merge and rebase" trick described
previously would not be something we're interested in using.

Although "git log" doesn't follow these merges (unless requested), all
the graphical tools that are used to display history WOULD show all
those branches.  In a "replaces"-type environment I think the point is
that we would not want to see them (certainly not by default) as they
would be used mainly for deeper spelunking, but since they just seem
like normal merges I don't see any way to turn them off.

If "replaces" was a separate capability then it could be treated
differently by history browsing tools, and shown or not shown as
desired.  For example, a commit that had a "replaces" element could be
selected somehow and you could expand that set of commits that were
replaced, or something like that.


Re: Bring together merge and rebase

2017-12-26 Thread Carl Baldwin
On Tue, Dec 26, 2017 at 01:04:36PM -0500, Theodore Ts'o wrote:
> On Mon, Dec 25, 2017 at 06:16:40PM -0700, Carl Baldwin wrote:
> > At this point, you might wonder why I'm not proposing to simply add a
> > "change-id" to the commit object. The short answer is that the
> > "change-id" Gerrit uses in the commit messages cannot stand on its own.
> > It depends on data stored on the server which maintains a relationship
> > of commits to a review number and a linear ordering of commits within
> > the review (hopefully I'm not over simplifying this). The "replaces"
> > reference is an attempt to make something which can stand on its own. I
> > don't think we need to solve the problem of where to keep comments at
> > this point.
> 
> I strongly disagree, and one way to see that is by doing a real-life
> experiment.  If you take a look at a gerrit change that, which in my
> experience can have up to ten or twelve revisions, and strip out the
> comments, so all you get to look at it is half-dozen or more
> revisions.  How useful is it *really*?  How does it get used in
> practice?  What development problem does it help to solve?

I didn't mean to imply that we need to get along without the comments. I
was only pointing out that gerrit, github, other code review UIs have
already figured out how to store comments archored to specific revisions
of files in the repository. I'm suggesting that we let them continue to
do that part while we take the first step of specifying how the
intermediate revisions are kept.

If the various code review servers adopted this then we'd have a client
side which could push up revisions for review to any of them. In
addition, they'd all get the collaborative functionality that I
described in my reply to your previous message.

What we get with this proposal is if I push up a review and that review
is changed by someone (maybe even me) outside of my original workspace,
my client gives me the tools to detect it and merge with it. If I try to
push over (clobber) that work then I get an error that the remote cannot
be fast-forwarded and I'm forced to fetch it and merge it.

I get this while using the rebase methodology I've grown to enjoy having
since using gerrit and I end up with a mainline history that looks
exactly the way I want it to.

> And when you say that it is a bug that the Gerrit Change-Id does not
> stand alone, consider that it can also be a *feature*.  If you keep
> all of this in the main repo, the number of commits can easily grow by
> an order of magnitude.  And these are commits that you have to keep
> forever, which means it slows down every subsequent git clone, git gc
> operation, git tag --contains search, etc.

I didn't say it was a bug; just that it is at odds with what I'm hoping
to do.

I agree that the number of commits in the repository will go up.
However, I think there will be ways to mitigate the costs.

The commits are not in the mainline history. So, I wouldn't expect a git
tag --contains or most other commands that traverse history to consider
them at all.

It could be possible to make the default git clone skip them all and
only fetch them on demand for specific changes.

> So what are the benefits, and what are the costs?  If the benefits
> were huge, then perhaps it would be worthwhile.  But if you lose a
> huge amount of the value because you are missing the *why* between the
> half-dozen to dozen past revisions of the commit, then is it really
> worth it to adopt that particular workflow?
> 
> It seems to me your argument is contrasting a "replaces" pointer
> versus the github PR.  But compared to the Gerrit solution, I don't
> think the "replaces" pointer proposal is as robust or as featureful.
> Also, please keep in mind that just because it's in core git doesn't
> guarantee that Github will support it.  As far as I know github has
> zero support notes, for example.

What I propose is that gerrit and github could end up more robust,
featureful, and interoperable if they had this feature to build from.

With gerrit specifically, adopting this feature would make the "change"
concept richer than it is now because it could supersede the change-id
in the commit message and allow a change to evolve in a distributed
non-linear way with protection against clobbering work.

I have no intention to disparage either tool. I love them both. They've
both made my career better in different ways. I know there is no
guarantee that github, gerrit, or any other tool will do anything to
adopt this. But, I'm hoping they are reading this thread and that they
recognize how this feature can make them a little bit better and jump in
and help. I know it is a lot to hope for but I think it could be great
if it happened.

Carl


Re: Bring together merge and rebase

2017-12-26 Thread Carl Baldwin
On Tue, Dec 26, 2017 at 06:49:56PM +0100, Ævar Arnfjörð Bjarmason wrote:
> New headers should be added after existing headers, but other than
> that it won't choke on it. See 4b2bced559 when the encoding header was
> added, this also passes most tests:
> 
> diff --git a/commit.c b/commit.c
> index cab8d4455b..cd2bafbaa0 100644
> --- a/commit.c
> +++ b/commit.c
> @@ -1565,6 +1565,8 @@ int commit_tree_extended(const char *msg, size_t 
> msg_len,
> if (!encoding_is_utf8)
> strbuf_addf(, "encoding %s\n", 
> git_commit_encoding);
> 
> +   strbuf_addf(, "replaces 
> \n");
> +
> while (extra) {
> add_extra_header(, extra);
> extra = extra->next;
> 
> Only "most" since of course this changes the sha1 of every commit git
> creates from what you get now.
> 
> > Even if core git code does not simply choke on it, I would like push and
> > pull to follow these pointers and transfer the history behind them. I
> > assumed that git would not do this today. I would also like gc to
> > preserve e8aa79baf6 as if it were referenced by a parent pointer so that
> > it doesn't purge it from the history.
> 
> It won't pay any attention to them if "replaces" is something entirely
> new, what I was pointing out in my earlier reply is that you can simply
> *also* create the parent pointers to these no-op merge commits that hide
> away the previous history the "replaces" headers will be referencing.
> 
> The reason to do that is 100% backwards compatibility, and and only
> needing to make minor UI changes to have this feature (to e.g. history
> walking), as opposed to needing to hack everything that now follows
> "parent" or constructs a commit graph.

Thank you for clarifying this. I have learned something.

> Sure, it could be opt in, be a new format etc. But you haven't
> explained why you think a feature like this would need to rely on an
> entirely new parent structure and side-DAG, as opposed to just the
> more minor changes I'm pointing out above, and which I think will give
> you what you need from a UX level.

I have not wrapped my head around it enough to convince myself that it
gives what I'm after. Let me spend a little more time with it to get a
feel for it.

Carl


Re: [PATCH v2 6/7] wt-status.c: handle worktree renames

2017-12-26 Thread Igor Djordjevic
Hi Duy,

On 26/12/2017 10:10, Nguyễn Thái Ngọc Duy wrote:
> Before 425a28e0a4 (diff-lib: allow ita entries treated as "not yet exist
> in index" - 2016-10-24) there are never "new files" in the index, which
> essentially disables rename detection because we only detect renames
> when a new file appears in a diff pair.
> 
> After that commit, an i-t-a entry can appear as a new file in "git
> diff-files". But the diff callback function in wt-status.c does not
> handle this case and produces incorrect status output.
> 
> PS. The reader may notice that this patch adds a new xstrdup() but not
> a free(). Yes we leak memory (the same for head_path). But wt_status
> so far has been short lived, this leak should not matter in
> practice.
> 
> Noticed-by: Alex Vandiver 
> Helped-by: Igor Djordjevic 
> Signed-off-by: Nguyễn Thái Ngọc Duy 
> ---
>  t/t2203-add-intent.sh | 28 
>  wt-status.c   | 72 
> +++
>  wt-status.h   |  4 +--
>  3 files changed, 85 insertions(+), 19 deletions(-)
> 
> diff --git a/t/t2203-add-intent.sh b/t/t2203-add-intent.sh
> index 878e73fe98..e5bfda1853 100755
> --- a/t/t2203-add-intent.sh
> +++ b/t/t2203-add-intent.sh
> @@ -162,5 +162,33 @@ test_expect_success 'commit: ita entries ignored in 
> empty commit check' '
>   )
>  '
>  
> +test_expect_success 'rename detection finds the right names' '
> + git init rename-detection &&
> + (
> + cd rename-detection &&
> + echo contents > first &&
> + git add first &&
> + git commit -m first &&
> + mv first third &&
> + git add -N third &&
> +
> + git status | grep -v "^?" >actual.1 &&
> + test_i18ngrep "renamed: *first -> third" actual.1 &&
> +
> + git status --porcelain | grep -v "^?" >actual.2 &&
> + cat >expected.2 <<-\EOF &&
> +  R first -> third
> + EOF
> + test_cmp expected.2 actual.2 &&
> +
> + oid=12f00e90b6ef79117ce6e650416b8cf517099b78 &&
> + git status --porcelain=v2 | grep -v "^?" >actual.3 &&
> + cat >expected.3 <<-EOF &&
> + 2 .R N... 100644 100644 100644 $oid $oid R100 first third
> + EOF
> + test_cmp expected.3 actual.3
> + )
> +'
> +
>  test_done

I`m afraid "--porcelain=v2" test might be incorrect here, as `git 
status --porcelain=v2` output seems to be too, with this v2 series 
applied. Test I sent previously[1] fails, and it looks valid.

This is output I now get, with old/deleted file unstaged and 
new/created file staged with `git add -N`:

$ git status --porcelain=v2
2 .R N... 100644 100644 100644 12f00e90b6ef79117ce6e650416b8cf517099b78 
12f00e90b6ef79117ce6e650416b8cf517099b78 R100 original-file new-file

Note "original-file" listed first, where "new-file" listed second 
(last). According the "v2" documentation[2] (excerpt):

  ... 

   The pathname. In a renamed/copied entry, this
 is the path in the index and in the working tree.
  ...
   The pathname in the commit at HEAD. This is only
 present in a renamed/copied entry, and tells
 where the renamed/copied contents came from.


If I`m reading this correctly, it should be vice-versa - value from 
HEAD, being "original-file", should come last, where value from 
working tree ("new-file") should be first.

If I stage both files and try again, output is as expected 
("new-file" comes first):

$ git status --porcelain=v2
2 R. N... 100644 100644 100644 12f00e90b6ef79117ce6e650416b8cf517099b78 
12f00e90b6ef79117ce6e650416b8cf517099b78 R100 new-file  original-file

> diff --git a/wt-status.c b/wt-status.c
> index c124d7589c..d5bdf4c2e9 100644
> --- a/wt-status.c
> +++ b/wt-status.c
> @@ -376,6 +376,8 @@ static void wt_longstatus_print_change_data(struct 
> wt_status *s,
>   strbuf_addch(, ')');
>   }
>   status = d->worktree_status;
> + if (d->worktree_path)
> + two_name = d->worktree_path;
>   break;
>   default:
>   die("BUG: unhandled change_type %d in 
> wt_longstatus_print_change_data",
> @@ -460,6 +462,12 @@ static void wt_status_collect_changed_cb(struct 
> diff_queue_struct *q,
>   /* mode_worktree is zero for a delete. */
>   break;
>  
> + case DIFF_STATUS_COPIED:
> + case DIFF_STATUS_RENAMED:
> + d->worktree_path = xstrdup(p->two->path);
  ^^^
This is changed from v1 of this patch, where it was:

+   d->worktree_path = xstrdup(p->one->path);
  ^^^
..., and might be it introduced the issue here...? Or, if this is 

Re: Bring together merge and rebase

2017-12-26 Thread Theodore Ts'o
On Mon, Dec 25, 2017 at 06:16:40PM -0700, Carl Baldwin wrote:
> At this point, you might wonder why I'm not proposing to simply add a
> "change-id" to the commit object. The short answer is that the
> "change-id" Gerrit uses in the commit messages cannot stand on its own.
> It depends on data stored on the server which maintains a relationship
> of commits to a review number and a linear ordering of commits within
> the review (hopefully I'm not over simplifying this). The "replaces"
> reference is an attempt to make something which can stand on its own. I
> don't think we need to solve the problem of where to keep comments at
> this point.

I strongly disagree, and one way to see that is by doing a real-life
experiment.  If you take a look at a gerrit change that, which in my
experience can have up to ten or twelve revisions, and strip out the
comments, so all you get to look at it is half-dozen or more
revisions.  How useful is it *really*?  How does it get used in
practice?  What development problem does it help to solve?

And when you say that it is a bug that the Gerrit Change-Id does not
stand alone, consider that it can also be a *feature*.  If you keep
all of this in the main repo, the number of commits can easily grow by
an order of magnitude.  And these are commits that you have to keep
forever, which means it slows down every subsequent git clone, git gc
operation, git tag --contains search, etc.

So what are the benefits, and what are the costs?  If the benefits
were huge, then perhaps it would be worthwhile.  But if you lose a
huge amount of the value because you are missing the *why* between the
half-dozen to dozen past revisions of the commit, then is it really
worth it to adopt that particular workflow?

It seems to me your argument is contrasting a "replaces" pointer
versus the github PR.  But compared to the Gerrit solution, I don't
think the "replaces" pointer proposal is as robust or as featureful.
Also, please keep in mind that just because it's in core git doesn't
guarantee that Github will support it.  As far as I know github has
zero support notes, for example.

- Ted


Re: [PATCH] status: handle worktree renames

2017-12-26 Thread Torsten Bögershausen
On 2017-12-25 11:37, Nguyễn Thái Ngọc Duy wrote:
[]
>  wt-status.c   | 24 +++-
>  wt-status.h   |  1 +
>  3 files changed, 35 insertions(+), 5 deletions(-)
> 
> diff --git a/t/t2203-add-intent.sh b/t/t2203-add-intent.sh
> index 1bdf38e80d..41a8874e60 100755
> --- a/t/t2203-add-intent.sh
> +++ b/t/t2203-add-intent.sh
> @@ -150,5 +150,20 @@ test_expect_success 'commit: ita entries ignored in 
> empty commit check' '
>   )
>  '
>  
> +test_expect_success 'rename detection finds the right names' '
> + git init rename-detection &&
> + (
> + cd rename-detection &&
> + echo contents > original-file &&
Micro-nit, please no " " after ">":
echo contents >original-file



Re: Bring together merge and rebase

2017-12-26 Thread Ævar Arnfjörð Bjarmason

On Tue, Dec 26 2017, Carl Baldwin jotted:

> On Sat, Dec 23, 2017 at 11:09:59PM +0100, Ævar Arnfjörð Bjarmason wrote:
>> >> But I don't see why you think this needs a new "replaces" parent
>> >> pointer orthagonal to parent pointers, i.e. something that would
>> >> need to be a new field in the commit object (I may have misread the
>> >> proposal, it's not heavy on technical details).
>> >
>> > Just to clarify, I am proposing a new "replaces" pointer in the commit
>> > object. Imagine starting with rebase exactly as it works today. This new
>> > field would be inserted into any new commit created by a rebase command
>> > to reference the original commit on which it was based. Though, I'm not
>> > sure if it would be better to change the behavior of the existing rebase
>> > command, provide a switch or config option to turn it on, or provide a
>> > new command entirely (e.g. git replay or git replace) to avoid
>> > compatibility issues with the existing rebase.
>>
>> Yeah that sounds fine, I thought you meant that this "replaces" field
>> would replace the "parent" field, which would require some rather deep
>> incompatible changes to all git clients.
>>
>> But then I don't get why you think fetch/pull/gc would need to be
>> altered, if it's because you thought that adding arbitrary *new* fields
>> to the commit object would require changes to those that's not the case.
>
> Thank you again for your reply. Following is the kind of commit that I
> would like to create.
>
> tree fcce2f309177c7da9c795448a3e392a137434cf1
> parent b3758d9223b63ebbfbc16c9b23205e42272cd4b9
> replaces e8aa79baf6aef573da930a385e4db915187d5187
> author Carl Baldwin  1514057225 -0700
> committer Carl Baldwin  1514058444 -0700
>
> What will happen if I create this today? I assumed git would just choke
> on it but I'm not certain. It has been a long time since I attempted to
> get into the internals of git.

New headers should be added after existing headers, but other than that
it won't choke on it. See 4b2bced559 when the encoding header was added,
this also passes most tests:

diff --git a/commit.c b/commit.c
index cab8d4455b..cd2bafbaa0 100644
--- a/commit.c
+++ b/commit.c
@@ -1565,6 +1565,8 @@ int commit_tree_extended(const char *msg, size_t 
msg_len,
if (!encoding_is_utf8)
strbuf_addf(, "encoding %s\n", git_commit_encoding);

+   strbuf_addf(, "replaces 
\n");
+
while (extra) {
add_extra_header(, extra);
extra = extra->next;

Only "most" since of course this changes the sha1 of every commit git
creates from what you get now.

> Even if core git code does not simply choke on it, I would like push and
> pull to follow these pointers and transfer the history behind them. I
> assumed that git would not do this today. I would also like gc to
> preserve e8aa79baf6 as if it were referenced by a parent pointer so that
> it doesn't purge it from the history.

It won't pay any attention to them if "replaces" is something entirely
new, what I was pointing out in my earlier reply is that you can simply
*also* create the parent pointers to these no-op merge commits that hide
away the previous history the "replaces" headers will be referencing.

The reason to do that is 100% backwards compatibility, and and only
needing to make minor UI changes to have this feature (to e.g. history
walking), as opposed to needing to hack everything that now follows
"parent" or constructs a commit graph.

> I'm currently thinking of an example of the workflow that I'm after in
> response to Theodore Ts'o's message from yesterday. Stay tuned, I hope
> it makes it clearer why I want it this way.
>
> [snip]
>
>> Instead, if I understand what you're actually trying to do, it could
>> also be done as:
>>
>>  1) Just add a new replaces  field to new commit objects
>>
>>  2) Make git-rebase know how to write those, e.g. add two of those
>> pointing to A & B when it squashes them into AB.
>>
>>  3) Write a history traversal mechanism similar to --full-history
>> that'll ignore any commits on branches that yield no changes, or
>> only those whose commits are referenced by this "replaces" field.
>>
>> You'd then end up with:
>>
>>  A) A way to "stash" these commits in the permanent history
>>
>>  B) ... that wouldn't be visble in "git log" by default
>>
>>  C) Would require no underlying changes to the commit model, i.e. it
>> would work with all past & future git clients, if they didn't know
>> about the "replaces" field they'd just show more verbose history.
>
> I get this point. I don't underestimate how difficult making such a
> change to the core model is. I know there are older clients which cannot
> simply be updated. There are also alternate implementations (e.g. jgit)
> that also need to be considered. This is the thing I worry 

Re: [PATCH] status: add a failing test showing a core.untrackedCache bug

2017-12-26 Thread Duy Nguyen
On Tue, Dec 26, 2017 at 1:45 AM, Ævar Arnfjörð Bjarmason
 wrote:
>
> On Mon, Dec 25 2017, Duy Nguyen jotted:
>
>> On Fri, Dec 22, 2017 at 9:00 PM, Ævar Arnfjörð Bjarmason
>>  wrote:
>>> The untracked cache gets confused when a directory is swapped out for
>>> a symlink to another directory. Whatever files are inside the target
>>> of the symlink will be incorrectly shown as untracked. This issue does
>>> not happen if the symlink links to another file, only if it links to
>>> another directory.
>>
>> Sounds about right (I completely forgot about dir symlinks). Since
>> I've been away for some time and have not caught up (probably cannot)
>> with the mailing list yet, is anyone working on this? It may be
>> easiest to just detect symlinksand disable  the cache for now.
>
> I haven't yet, I wanted to see what you had to say about it,
> i.e. whether it was a "do'h here's a fix" or if it was more involved
> than that.

OK I have more time to actually read your test and play with it (last
time I stopped at the word "symlink").

First thing out of the way first, I think the stat() call in
valid_cached_dir() is wrong. We don't want to automatically follow a
symlink, we want stat of the symlink itself.

OK back to the test. If you insert test-dump-untracked-cached around
the "git checkout master" line, then we get the data that the
next/faulty git status uses. For me it shows this


...
/  recurse
/one/  recurse valid
/two/  recurse valid

OK so we have two uptodate cached dirs for "one" and "two". Strangely,
root dir is not cached (no valid flag). I don't know why but that's ok
we'll roll with it. It means we will ignore "/" content and do the
opendir() and readdir() dance instead.

This is where it gets fun, when readdir() returns "one", we hit this
code in treat_one_path() and correctly ignores it

/* Always exclude indexed files */
if (dtype != DT_DIR && has_path_in_index)
return path_none;

and we do _nothing_ towards the cached version of "one". In constrast,
when readdir() returns "two" we are able to get further and invalidate
it, deleting the cached data.

After the readdir() dance is complete, dir.c is confident that it has
all the good data in the world in its cache and notes down "from now
on uses the cached data for /". Another test-dump-... after the
second-to-last git-status can show this "valid" flag. Unfortunately
the "one" dir is NOT invalidated.

So the last git-status sees that cached "/" is good, it skips
opendir() and goes with the cached directories which are "two" and the
bad "one". In short, we fail to invalidate bad data in some case.

An invalidate_directory() call before the "return path_none" above
might be the solution...
-- 
Duy


[PATCH v2 7/7] wt-status.c: avoid double renames in short/porcelain format

2017-12-26 Thread Nguyễn Thái Ngọc Duy
The presence of worktree rename leads to an interesting situation,
what if the same index entry is renamed twice, compared to HEAD and to
worktree? We can have that with this setup

echo first > first && git add first && git commit -m first
git mv first second  # rename reported in "diff --cached"
mv second third  # rename reported in "diff-files"

For the long format this is fine because we print two "->" rename
lines, one in the "updated" section, one in "changed" one.

For other output formats, it gets tricky because they combine both
diffs in one line but can only display one rename per line. The result
"XY" column of short format, for example, would be "RR" in that case.

This case either needs some extension in short/porcelain format
to show something crazy like

RR first -> second -> third

or we could show renames as two lines instead of one, for example
something like this for short form:

R  first -> second
 R second -> third

But for now it's safer and simpler to just break the "second -> third"
rename pair and show

RD first -> second
 A third

like we have been showing until now.

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 t/t2203-add-intent.sh | 33 +
 wt-status.c   | 58 ---
 2 files changed, 84 insertions(+), 7 deletions(-)

diff --git a/t/t2203-add-intent.sh b/t/t2203-add-intent.sh
index e5bfda1853..79aca93810 100755
--- a/t/t2203-add-intent.sh
+++ b/t/t2203-add-intent.sh
@@ -190,5 +190,38 @@ test_expect_success 'rename detection finds the right 
names' '
)
 '
 
+test_expect_success 'double rename detection in status' '
+   git init rename-detection-2 &&
+   (
+   cd rename-detection-2 &&
+   echo contents > first &&
+   git add first &&
+   git commit -m first &&
+   git mv first second &&
+   mv second third &&
+   git add -N third &&
+
+   git status | grep -v "^?" >actual.1 &&
+   test_i18ngrep "renamed: *first -> second" actual.1 &&
+   test_i18ngrep "renamed: *second -> third" actual.1 &&
+
+
+   git status --porcelain | grep -v "^?" >actual.2 &&
+   cat >expected.2 <<-\EOF &&
+   RD first -> second
+A third
+   EOF
+   test_cmp expected.2 actual.2 &&
+
+   oid=12f00e90b6ef79117ce6e650416b8cf517099b78 &&
+   git status --porcelain=v2 | grep -v "^?" >actual.3 &&
+   cat >expected.3 <<-EOF &&
+   2 RD N... 100644 100644 00 $oid $oid R100 secondfirst
+   1 .A N... 00 00 100644 $_z40 $_z40 third
+   EOF
+   test_cmp expected.3 actual.3
+   )
+'
+
 test_done
 
diff --git a/wt-status.c b/wt-status.c
index d5bdf4c2e9..e62853f748 100644
--- a/wt-status.c
+++ b/wt-status.c
@@ -419,6 +419,47 @@ static char short_submodule_status(struct 
wt_status_change_data *d)
return d->worktree_status;
 }
 
+static struct string_list_item * break_double_rename(
+   struct wt_status *s, struct string_list_item *it,
+   int *status, struct diff_filepair *p)
+{
+   struct wt_status_change_data *d;
+   struct string_list_item *new_it;
+
+   d = it->util;
+   /*
+* _collect_index_changes() must have been called or
+* d->head_path does not contain a real value.
+*/
+   if (!d || !d->head_path)
+   return it;
+
+   switch (s->status_format) {
+   case STATUS_FORMAT_SHORT:
+   case STATUS_FORMAT_PORCELAIN:
+   case STATUS_FORMAT_PORCELAIN_V2:
+   break;
+   case STATUS_FORMAT_LONG:
+   case STATUS_FORMAT_NONE:
+   /* this output can handle double renames ok */
+   return it;
+   default:
+   die("BUG: finalize_deferred_config() should have been called");
+   }
+
+   switch (*status) {
+   case DIFF_STATUS_RENAMED:
+   d->worktree_status = DIFF_STATUS_DELETED;
+   /* fallthru */
+   case DIFF_STATUS_COPIED:
+   *status = DIFF_STATUS_ADDED;
+   new_it = string_list_insert(>change, p->two->path);
+   return new_it;
+   }
+
+   return it;
+}
+
 static void wt_status_collect_changed_cb(struct diff_queue_struct *q,
 struct diff_options *options,
 void *data)
@@ -433,16 +474,19 @@ static void wt_status_collect_changed_cb(struct 
diff_queue_struct *q,
struct diff_filepair *p;
struct string_list_item *it;
struct wt_status_change_data *d;
+   int status;
 
p = q->queue[i];
+   status = p->status;
it = string_list_insert(>change, p->one->path);
+ 

[PATCH v2 6/7] wt-status.c: handle worktree renames

2017-12-26 Thread Nguyễn Thái Ngọc Duy
Before 425a28e0a4 (diff-lib: allow ita entries treated as "not yet exist
in index" - 2016-10-24) there are never "new files" in the index, which
essentially disables rename detection because we only detect renames
when a new file appears in a diff pair.

After that commit, an i-t-a entry can appear as a new file in "git
diff-files". But the diff callback function in wt-status.c does not
handle this case and produces incorrect status output.

PS. The reader may notice that this patch adds a new xstrdup() but not
a free(). Yes we leak memory (the same for head_path). But wt_status
so far has been short lived, this leak should not matter in
practice.

Noticed-by: Alex Vandiver 
Helped-by: Igor Djordjevic 
Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 t/t2203-add-intent.sh | 28 
 wt-status.c   | 72 +++
 wt-status.h   |  4 +--
 3 files changed, 85 insertions(+), 19 deletions(-)

diff --git a/t/t2203-add-intent.sh b/t/t2203-add-intent.sh
index 878e73fe98..e5bfda1853 100755
--- a/t/t2203-add-intent.sh
+++ b/t/t2203-add-intent.sh
@@ -162,5 +162,33 @@ test_expect_success 'commit: ita entries ignored in empty 
commit check' '
)
 '
 
+test_expect_success 'rename detection finds the right names' '
+   git init rename-detection &&
+   (
+   cd rename-detection &&
+   echo contents > first &&
+   git add first &&
+   git commit -m first &&
+   mv first third &&
+   git add -N third &&
+
+   git status | grep -v "^?" >actual.1 &&
+   test_i18ngrep "renamed: *first -> third" actual.1 &&
+
+   git status --porcelain | grep -v "^?" >actual.2 &&
+   cat >expected.2 <<-\EOF &&
+R first -> third
+   EOF
+   test_cmp expected.2 actual.2 &&
+
+   oid=12f00e90b6ef79117ce6e650416b8cf517099b78 &&
+   git status --porcelain=v2 | grep -v "^?" >actual.3 &&
+   cat >expected.3 <<-EOF &&
+   2 .R N... 100644 100644 100644 $oid $oid R100 first third
+   EOF
+   test_cmp expected.3 actual.3
+   )
+'
+
 test_done
 
diff --git a/wt-status.c b/wt-status.c
index c124d7589c..d5bdf4c2e9 100644
--- a/wt-status.c
+++ b/wt-status.c
@@ -376,6 +376,8 @@ static void wt_longstatus_print_change_data(struct 
wt_status *s,
strbuf_addch(, ')');
}
status = d->worktree_status;
+   if (d->worktree_path)
+   two_name = d->worktree_path;
break;
default:
die("BUG: unhandled change_type %d in 
wt_longstatus_print_change_data",
@@ -460,6 +462,12 @@ static void wt_status_collect_changed_cb(struct 
diff_queue_struct *q,
/* mode_worktree is zero for a delete. */
break;
 
+   case DIFF_STATUS_COPIED:
+   case DIFF_STATUS_RENAMED:
+   d->worktree_path = xstrdup(p->two->path);
+   d->worktree_score = p->score * 100 / MAX_SCORE;
+   /* fallthru */
+
case DIFF_STATUS_MODIFIED:
case DIFF_STATUS_TYPE_CHANGED:
case DIFF_STATUS_UNMERGED:
@@ -1712,6 +1720,7 @@ static void wt_shortstatus_status(struct string_list_item 
*it,
 struct wt_status *s)
 {
struct wt_status_change_data *d = it->util;
+   const char *from, *to;
 
if (d->index_status)
color_fprintf(s->fp, color(WT_STATUS_UPDATED, s), "%c", 
d->index_status);
@@ -1722,15 +1731,30 @@ static void wt_shortstatus_status(struct 
string_list_item *it,
else
putchar(' ');
putchar(' ');
+
+   if (d->head_path && d->worktree_path)
+   die("BUG: to be addressed in the next patch");
+
+   if (d->head_path) {
+   from = d->head_path;
+   to = it->string;
+   } else if (d->worktree_path) {
+   from = it->string;
+   to = d->worktree_path;
+   } else {
+   from = it->string;
+   to = NULL;
+   }
if (s->null_termination) {
-   fprintf(stdout, "%s%c", it->string, 0);
-   if (d->head_path)
-   fprintf(stdout, "%s%c", d->head_path, 0);
+   fprintf(stdout, "%s%c", from, 0);
+   if (to)
+   fprintf(stdout, "%s%c", to, 0);
} else {
struct strbuf onebuf = STRBUF_INIT;
const char *one;
-   if (d->head_path) {
-   one = quote_path(d->head_path, s->prefix, );
+
+   if (to) {
+   one = quote_path(from, s->prefix, );
if (*one != '"' && 

[PATCH v2 0/7] Renames in git-status "changed not staged" section

2017-12-26 Thread Nguyễn Thái Ngọc Duy
The changes in 425a28e0a4 allow new files to show up in
"git diff-files" (aka "changed but not staged") which is a problem
because status code does not handle renaming in this case.

The main change to fix this is 6/7. The interesting corner case is in
7/7 where I decided to go with a middle ground, disabling only double
renames. We have two other options to go:

 - unconditionally disable rename in wt_status_collect_changes_worktree
   which turns this 7-patch series into one patch
 - support double renames too but this may change porcelain output
   (or we have to add new output format). This could be done in a
   follow series if we want to.

Nguyễn Thái Ngọc Duy (7):
  t2203: test status output with porcelain v2 format
  Use DIFF_DETECT_RENAME for detect_rename assignments
  wt-status.c: coding style fix
  wt-status.c: rename wt_status_change_data::score
  wt-status.c: catch unhandled diff status codes
  wt-status.c: handle worktree renames
  wt-status.c: avoid double renames in short/porcelain format

 builtin/commit.c  |   2 +-
 diff.c|   2 +-
 t/t2203-add-intent.sh |  73 ++
 wt-status.c   | 141 --
 wt-status.h   |   4 +-
 5 files changed, 191 insertions(+), 31 deletions(-)

-- 
2.15.0.320.g0453912d77



[PATCH v2 5/7] wt-status.c: catch unhandled diff status codes

2017-12-26 Thread Nguyễn Thái Ngọc Duy
Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 wt-status.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/wt-status.c b/wt-status.c
index 0f089c5789..c124d7589c 100644
--- a/wt-status.c
+++ b/wt-status.c
@@ -468,8 +468,8 @@ static void wt_status_collect_changed_cb(struct 
diff_queue_struct *q,
oidcpy(>oid_index, >one->oid);
break;
 
-   case DIFF_STATUS_UNKNOWN:
-   die("BUG: worktree status unknown???");
+   default:
+   die("BUG: unhandled diff-files status '%c'", p->status);
break;
}
 
@@ -549,6 +549,10 @@ static void wt_status_collect_updated_cb(struct 
diff_queue_struct *q,
 * values in these fields.
 */
break;
+
+   default:
+   die("BUG: unhandled diff-index status '%c'", p->status);
+   break;
}
}
 }
-- 
2.15.0.320.g0453912d77



[PATCH v2 3/7] wt-status.c: coding style fix

2017-12-26 Thread Nguyễn Thái Ngọc Duy
Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 wt-status.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/wt-status.c b/wt-status.c
index 59338adb8b..db06fc7c85 100644
--- a/wt-status.c
+++ b/wt-status.c
@@ -406,7 +406,8 @@ static void wt_longstatus_print_change_data(struct 
wt_status *s,
strbuf_release();
 }
 
-static char short_submodule_status(struct wt_status_change_data *d) {
+static char short_submodule_status(struct wt_status_change_data *d)
+{
if (d->new_submodule_commits)
return 'M';
if (d->dirty_submodule & DIRTY_SUBMODULE_MODIFIED)
-- 
2.15.0.320.g0453912d77



[PATCH v2 1/7] t2203: test status output with porcelain v2 format

2017-12-26 Thread Nguyễn Thái Ngọc Duy
Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 t/t2203-add-intent.sh | 12 
 1 file changed, 12 insertions(+)

diff --git a/t/t2203-add-intent.sh b/t/t2203-add-intent.sh
index 1bdf38e80d..878e73fe98 100755
--- a/t/t2203-add-intent.sh
+++ b/t/t2203-add-intent.sh
@@ -25,6 +25,18 @@ test_expect_success 'git status' '
test_cmp expect actual
 '
 
+test_expect_success 'git status with porcelain v2' '
+   git status --porcelain=v2 | grep -v "^?" >actual &&
+   nam1=d00491fd7e5bb6fa28c517a0bb32b8b506539d4d &&
+   nam2=ce013625030ba8dba906f756967f9e9ca394464a &&
+   cat >expect <<-EOF &&
+   1 DA N... 100644 00 100644 $nam1 $_z40 1.t
+   1 A. N... 00 100644 100644 $_z40 $nam2 elif
+   1 .A N... 00 00 100644 $_z40 $_z40 file
+   EOF
+   test_cmp expect actual
+'
+
 test_expect_success 'check result of "add -N"' '
git ls-files -s file >actual &&
empty=$(git hash-object --stdin 

[PATCH v2 4/7] wt-status.c: rename wt_status_change_data::score

2017-12-26 Thread Nguyễn Thái Ngọc Duy
We are about to adding support for "diff-files" rename, which has its
own rename score in addition to the "diff-index" one. Rename score to
head_score to indicate this score comes from diff-index. The new score
will be named worktree_score.

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 wt-status.c | 4 ++--
 wt-status.h | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/wt-status.c b/wt-status.c
index db06fc7c85..0f089c5789 100644
--- a/wt-status.c
+++ b/wt-status.c
@@ -532,7 +532,7 @@ static void wt_status_collect_updated_cb(struct 
diff_queue_struct *q,
case DIFF_STATUS_COPIED:
case DIFF_STATUS_RENAMED:
d->head_path = xstrdup(p->one->path);
-   d->score = p->score * 100 / MAX_SCORE;
+   d->head_score = p->score * 100 / MAX_SCORE;
/* fallthru */
case DIFF_STATUS_MODIFIED:
case DIFF_STATUS_TYPE_CHANGED:
@@ -2074,7 +2074,7 @@ static void wt_porcelain_v2_print_changed_entry(
key, submodule_token,
d->mode_head, d->mode_index, d->mode_worktree,
oid_to_hex(>oid_head), 
oid_to_hex(>oid_index),
-   key[0], d->score,
+   key[0], d->head_score,
path_index, sep_char, path_head, eol_char);
else
fprintf(s->fp, "1 %s %s %06o %06o %06o %s %s %s%c",
diff --git a/wt-status.h b/wt-status.h
index fe27b465e2..f9330982ac 100644
--- a/wt-status.h
+++ b/wt-status.h
@@ -44,7 +44,7 @@ struct wt_status_change_data {
int worktree_status;
int index_status;
int stagemask;
-   int score;
+   int head_score;
int mode_head, mode_index, mode_worktree;
struct object_id oid_head, oid_index;
char *head_path;
-- 
2.15.0.320.g0453912d77



[PATCH v2 2/7] Use DIFF_DETECT_RENAME for detect_rename assignments

2017-12-26 Thread Nguyễn Thái Ngọc Duy
This field can have two values (2 for copy). Use this name instead for
clarity. Many places have already used this constant.

Note, the detect_rename assignments in merge-recursive.c remain
unchanged because it's actually a boolean there.

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 builtin/commit.c | 2 +-
 diff.c   | 2 +-
 wt-status.c  | 4 ++--
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/builtin/commit.c b/builtin/commit.c
index 8a87701414..1f11e3992d 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -1507,7 +1507,7 @@ static void print_summary(const char *prefix, const 
struct object_id *oid,
rev.show_root_diff = 1;
get_commit_format(format.buf, );
rev.always_show_header = 0;
-   rev.diffopt.detect_rename = 1;
+   rev.diffopt.detect_rename = DIFF_DETECT_RENAME;
rev.diffopt.break_opt = 0;
diff_setup_done();
 
diff --git a/diff.c b/diff.c
index 3fb445a54d..51fe31c7aa 100644
--- a/diff.c
+++ b/diff.c
@@ -246,7 +246,7 @@ static int parse_ws_error_highlight(const char *arg)
  */
 void init_diff_ui_defaults(void)
 {
-   diff_detect_rename_default = 1;
+   diff_detect_rename_default = DIFF_DETECT_RENAME;
 }
 
 int git_diff_heuristic_config(const char *var, const char *value, void *cb)
diff --git a/wt-status.c b/wt-status.c
index ef26f07446..59338adb8b 100644
--- a/wt-status.c
+++ b/wt-status.c
@@ -602,7 +602,7 @@ static void wt_status_collect_changes_index(struct 
wt_status *s)
rev.diffopt.output_format |= DIFF_FORMAT_CALLBACK;
rev.diffopt.format_callback = wt_status_collect_updated_cb;
rev.diffopt.format_callback_data = s;
-   rev.diffopt.detect_rename = 1;
+   rev.diffopt.detect_rename = DIFF_DETECT_RENAME;
rev.diffopt.rename_limit = 200;
rev.diffopt.break_opt = 0;
copy_pathspec(_data, >pathspec);
@@ -962,7 +962,7 @@ static void wt_longstatus_print_verbose(struct wt_status *s)
setup_revisions(0, NULL, , );
 
rev.diffopt.output_format |= DIFF_FORMAT_PATCH;
-   rev.diffopt.detect_rename = 1;
+   rev.diffopt.detect_rename = DIFF_DETECT_RENAME;
rev.diffopt.file = s->fp;
rev.diffopt.close_file = 0;
/*
-- 
2.15.0.320.g0453912d77



Re: Bring together merge and rebase

2017-12-26 Thread Jacob Keller
On Mon, Dec 25, 2017 at 10:02 PM, Carl Baldwin  wrote:
> On Mon, Dec 25, 2017 at 05:47:55PM -0800, Jacob Keller wrote:
>> On Mon, Dec 25, 2017 at 5:16 PM, Carl Baldwin  wrote:
>> > Anyway, now I am compelled to use github which is also a fine tool and I
>> > appreciate all of the work that has gone into it. About 80% of the time,
>> > I rebase and force push to my branch to update a pull request. I've come
>> > to like the end product of the rebase workflow. However, github doesn't
>> > excel at this approach. For one, it doesn't preserve older revisions
>> > which were already reviewed which makes it is difficult for reviewers to
>> > pick up where they left off the last time. If it preserved them, as
>> > gerrit does, the reviewer can compare a new revision with the most
>> > recent older revision they reviewed to see just what has been addressed
>> > since then.
>>
>> A bit of a tangent here, but a thought I didn't wanna lose: In the
>> general case where a patch was rebased and the original parent pointer
>> was changed, it is actually quite hard to show a diff of what changed
>> between versions.
>>
>> The best I've found is to do something like a 4-way --cc merge diff,
>> which mostly works, but has a few awkward cases, and ends up usually
>> showing double ++ and -- notation.
>>
>> Just something I've thought about a fair bit, trying to come up with
>> some good way to show "what changed between A1 and A2, but ignore all
>> changes between parent P1 and P2 which you don't care that much about
>> in this context.
>
> I ran into this all the time with gerrit. I wrote a script that you'd
> run on a working copy (with no local changes). I'd fetch and checkout
> the latest patchset that I want to review(say, for example, its patchset
> 5) from gerrit. Then, say I wanted to compare it with patch set 3 which
> has a different parent. I'd run this from the top level of my working
> copy.
>
> compare-to-previous-patchset 3
>
> It would fetch patch set 3 from gerrit, rebase it to the same parent as
> the current patch set on a detached HEAD and then git diff it with the
> current patch set. If there were conflicts, it would just commit the
> conflict markers to the commit. There is no attempt to resolve the
> conflicts. The script was crude but it helped me out many times and it
> was nice to be able to review how conflicts were resolved when those
> came up.
>
> Carl
>

Interesting. That could work fairly well. I usually do something along
the lines of:

git diff patch-new patch-old patch-base-new patch-base-old --cc, which
produces a combined diff format patch which usually works ok.

My biggest gripes are that the gerrit web interface doesn't itself do
something like this (and jgit does not appear to be able to generate
combined diffs at all!)

> PS In case you're curious, here's my script...
>
> #!/bin/bash
>
> remote=gerrit
> previous_patchset=$1; shift
>
> # Assumes we're sitting on the latest patch set.
> new_patch_set_id=$(git rev-parse HEAD)
>
> branch=$(git branch | awk '/^\*/ {print$2}')
> [ "$branch" = "(no" ] && branch=
>
> # set user, host, port, and project from git config
> eval $(echo "$(git config remote.$remote.url)" |
>sed 's,ssh://\(.*\)@\(.*\):\([[:digit:]]*\)/\(.*\).git,user=\1 host=\2 
> p<
>
> gerrit() {
> ssh $user@$host -p $port gerrit ${1+"$@"}
> }
>
> # Grabs a bunch of information from gerrit about the current patch
> eval $(gerrit query --current-patch-set $new_patch_set_id |
> awk '
> BEGIN {mode="main"}
> / currentPatchSet:/ { mode="currentPatchSet" }
> / ref:/ { printf "new_patch_ref=%s\n", $2 }
> / number:/ {
> if (mode=="main") {
> printf "review_num=%s\n", $2
> }
> if (mode=="currentPatchSet") {
> printf "new_patchset=%s\n", $2
> }
> }
> ')
>
> # Fetch the old patch set
> old_patch_ref=${new_patch_ref%$new_patchset}$previous_patchset
> git fetch $remote $old_patch_ref && git checkout FETCH_HEAD
>
> # Rebase the old patch set to the parent of the new patch set.
> if ! git rebase HEAD^ --onto ${new_patch_set_id}^
> then
> git diff --name-only --diff-filter=U -z | xargs -0 git add
> git rebase --continue
> fi
>
> previous_patchset_rebased=$(git rev-parse HEAD)
>
> # Go back to the new patch set and diff it against the rebased old one.
> if [ "$branch" ]
> then
> git checkout $branch
> else
> git checkout $new_patch_set_id
> fi
> git diff $previous_patchset_rebased

One thing you might do is have it create a temporary worktree in order
to avoid problems with being in the local checkout.

Thanks,
Jake