date:20170619

Re: in case you want a use-case with lots of submodules

2017-06-19 Thread Stefan Beller

On Mon, Jun 19, 2017 at 1:20 PM, Yaroslav Halchenko  wrote:
>
> On Mon, 19 Jun 2017, Stefan Beller wrote:
>
>> On Mon, Jun 19, 2017 at 8:59 AM, Yaroslav Halchenko  
>> wrote:
>> > Hi All,
>
>> > On a recent trip I've listened to the git minutes podcast episode and
>> > got excited to hear  Stefan Beller (CCed just in case) describing
>> > ongoing work on submodules mechanism.  I got excited, since e.g.
>> > performance improvements would be of great benefit to us too.
>
>> If you're mostly interested in performance improvements of the status
>> quo (i.e. "make git-submodule fast"), then the work of Prathamesh
>> Chavan (cc'd) might be more interesting to you than what I do.
>> He is porting git-submodule (which is mostly a shell script nowadays)
>> to C, such that we can save a lot of process invocations and can do
>> processing within one process.
>
> ah -- cool.  I would be eager to test it out, thanks!  would be
> interesting to see if it positively affects our overall performance.
> Pointers to that development would be welcome!

The latest from today:
https://public-inbox.org/git/CAME+mvUQJFneV7b1G7zmAidP-5L=nimvy43v0ug-gtesr83...@mail.gmail.com/


>
>> > http://datasets.datalad.org ATM provides quite a sizeable (ATM 370
>> > repositories, up to 4 levels deep) hierarchy of git/git-annex
>> > repositories all tied together via git submodules mechanism.  And as the
>> > collection grows, interactions with it become slower, so additional
>> > options (such as --ignore-submodules=dirty  to status) become our
>> > friends.
>
>> I am not as much concerned about the 370 number than about the
>> 4 layers of nesting. In my experience the nested submodule case
>> is a little bit error prone and the bug reports are not as frequent as
>> there are not as many users of nesting, yet(?)
>
> well -- part of the story here is that we are forced to use/have full
> blown .git/ directories (for git-annex symlinks to content files to
> work) within submodules instead of .git file with a reference under
> parent's .git/modules.   So we can 'slice' at any level and I
> guess that is why may be avoiding some possibly issues due to nesting
> and the "parent has all .git/modules" approach.

That sounds like you either want to configure to have the submodules
git dirs in-place or you want to convince git-annex to learn about the
gitdir pointer files.

>
>> In a neighboring thread on the mailing list we have a discussion
>> on the usefulness of being on branches than in detached HEAD
>> in the submodules.
>> https://public-inbox.org/git/0092cdd27c5f9d418b0f3e9b5d05be0801028...@sbs2011.opfingen.plc2.de/
>
>> This would not break non-ambiguously, rather it would add
>> ease of use.
>
> that is indeed a common caveat... I am not sure if any heuristic
> approach would provide a 'bullet proof' solution.  I might even prefer a
> hardcoded 'branch-name' to be listed/associated with each submodule
> within .gitmodules.

hardcoded as submodule.NAME.branch, maybe?
https://git-scm.com/docs/gitmodules

>  In the datalad case, detached HEAD is common

So you are accustomed to detached HEADs and would not
gain much from being back on a branch?  That's cool, too.


> whenever someone installs "outdated" (branch of which progressed
> forward) submodule.  In this case we just check if the branch after "git
> clone"  (but before git submodule update) includes the pointed by
> Subproject commit, and if so -- we announce that it must be the branch
> (so far it is always "master" branch anyways ;) )

heh, having just one branch. That is retro-style. :)

[PATCH/RFC] Cleanup Documentation

2017-06-19 Thread Kaartic Sivaraam

Make following changes to the git-submodule
documentation:

* Remove redundancy
* Remove unclear back reference
* Use more appropriate word
* Quote important word

Suggestions-by: Stefan Beller 
Signed-off-by: Kaartic Sivaraam 
---
 Currently used the word "canonical" instead of "humanish". If that word
 sounds more suitable then this is a [PATCH] and not a [PATCH/RFC].


 Documentation/git-submodule.txt | 37 +++--
 1 file changed, 15 insertions(+), 22 deletions(-)

diff --git a/Documentation/git-submodule.txt b/Documentation/git-submodule.txt
index 74bc6200d..045fef417 100644
--- a/Documentation/git-submodule.txt
+++ b/Documentation/git-submodule.txt
@@ -63,14 +63,6 @@ add [-b ] [-f|--force] [--name ] [--reference 
] [--dep
to the changeset to be committed next to the current
project: the current project is termed the "superproject".
 +
-This requires at least one argument: . The optional
-argument  is the relative location for the cloned submodule
-to exist in the superproject. If  is not given, the
-"humanish" part of the source repository is used ("repo" for
-"/path/to/repo.git" and "foo" for "host.xz:foo/.git").
-The  is also used as the submodule's logical name in its
-configuration entries unless `--name` is used to specify a logical name.
-+
  is the URL of the new submodule's origin repository.
 This may be either an absolute URL, or (if it begins with ./
 or ../), the location relative to the superproject's default remote
@@ -87,21 +79,22 @@ If the superproject doesn't have a default remote configured
 the superproject is its own authoritative upstream and the current
 working directory is used instead.
 +
- is the relative location for the cloned submodule to
-exist in the superproject. If  does not exist, then the
-submodule is created by cloning from the named URL. If  does
-exist and is already a valid Git repository, then this is added
-to the changeset without cloning. This second form is provided
-to ease creating a new submodule from scratch, and presumes
-the user will later push the submodule to the given URL.
+The optional argument  is the relative location for the cloned
+submodule to exist in the superproject. If  is not given, the
+canonical part of the source repository is used ("repo" for
+"/path/to/repo.git" and "foo" for "host.xz:foo/.git"). If 
+exists and is already a valid Git repository, then this is added
+to the changeset without cloning. The  is also used as the
+submodule's logical name in its configuration entries unless `--name`
+is used to specify a logical name.
 +
-In either case, the given URL is recorded into .gitmodules for
-use by subsequent users cloning the superproject. If the URL is
-given relative to the superproject's repository, the presumption
-is the superproject and submodule repositories will be kept
-together in the same relative location, and only the
-superproject's URL needs to be provided: git-submodule will correctly
-locate the submodule using the relative URL in .gitmodules.
+The given URL is recorded into `.gitmodules` for use by subsequent users
+cloning the superproject. If the URL is given relative to the
+superproject's repository, the presumption is the superproject and
+submodule repositories will be kept together in the same relative
+location, and only the superproject's URL needs to be provided.
+git-submodule will correctly locate the submodule using the relative
+URL in .gitmodules.
 
 status [--cached] [--recursive] [--] [...]::
Show the status of the submodules. This will print the SHA-1 of the
-- 
2.11.0

[PATCH 3/3] Add tests for the contextual initial status message

2017-06-19 Thread Kaartic Sivaraam

Also, fixed minor spacing issue

Helped-by: Ævar Arnfjörð Bjarmason 
Helped-by: Junio C Hamano 
Signed-off-by: Kaartic Sivaraam 
---
 t/t7508-status.sh | 32 +++-
 1 file changed, 31 insertions(+), 1 deletion(-)

diff --git a/t/t7508-status.sh b/t/t7508-status.sh
index fb00e6d9b..6cf3af9d9 100755
--- a/t/t7508-status.sh
+++ b/t/t7508-status.sh
@@ -1494,9 +1494,39 @@ EOF
 test_expect_success 'git commit -m will commit a staged but ignored submodule' 
'
git commit -uno -m message &&
git status -s --ignore-submodules=dirty >output &&
-test_i18ngrep ! "^M. sm" output &&
+   test_i18ngrep ! "^M. sm" output &&
git config --remove-section submodule.subname &&
git config -f .gitmodules  --remove-section submodule.subname
 '
 
+test_expect_success '"No commits yet" should be noted in status output' '
+   git checkout --orphan empty-branch-1 &&
+   git status >output &&
+   test_i18ngrep "No commits yet" output
+'
+
+test_expect_success '"No commits yet" should not be noted in status output' '
+   git checkout --orphan empty-branch-2 &&
+   test_commit test-commit-1 &&
+   git status >output &&
+   test_i18ngrep ! "No commits yet" output
+'
+
+test_expect_success '"Initial commit" should be noted in commit template' '
+   git checkout --orphan empty-branch-3 &&
+   touch to_be_committed_1 &&
+   git add to_be_committed_1 &&
+   git commit --dry-run >output &&
+   test_i18ngrep "Initial commit" output
+'
+
+test_expect_success '"Initial commit" should not be noted in commit template' '
+   git checkout --orphan empty-branch-4 &&
+   test_commit test-commit-2 &&
+   touch to_be_committed_2 &&
+   git add to_be_committed_2 &&
+   git commit --dry-run >output &&
+   test_i18ngrep ! "Initial commit" output
+'
+
 test_done
-- 
2.11.0

[PATCH 2/3] Update test(s) that used old status message

2017-06-19 Thread Kaartic Sivaraam

The tests that checked for old status message have been
updated to check for the new status message

Signed-off-by: Kaartic Sivaraam 
---
 t/t7501-commit.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/t/t7501-commit.sh b/t/t7501-commit.sh
index 0b6da7ae1..fa61b1a4e 100755
--- a/t/t7501-commit.sh
+++ b/t/t7501-commit.sh
@@ -18,7 +18,7 @@ test_expect_success 'initial status' '
echo bongo bongo >file &&
git add file &&
git status >actual &&
-   test_i18ngrep "Initial commit" actual
+   test_i18ngrep "No commits yet" actual
 '
 
 test_expect_success 'fail initial amend' '
-- 
2.11.0

[PATCH 1/3] Contextually notify user about an initial commit

2017-06-19 Thread Kaartic Sivaraam

"git status" indicated "Initial commit" when HEAD points at
an unborn branch.  This message is shared with the commit
log template "git commit" prepares for the user when
creating a commit (i.e. "You are about to create the initial
commit"), and is OK as long as the reader is aware of the
nature of the message (i.e. it guides the user working
toward the next commit), but was confusing to new users,
especially the ones who do "git commit -m message" without
having a chance to pay attention to the commit log template.

The "Initial commit" indication wasn't an issue in the commit
template. Taking that into consideration, a good solution would
be to contextually use different messages to indicate the user
that there were no commits in this branch.

A few alternatives considered were,

* Waiting for initial commit
* Your current branch does not have any commits
* Current branch waiting for initial commit

The most succint one, "No commits yet", among the alternatives
was chosen.

Helped-by: Junio C Hamano 
Signed-off-by: Kaartic Sivaraam 
---
 builtin/commit.c | 1 +
 wt-status.c  | 5 -
 wt-status.h  | 1 +
 3 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/builtin/commit.c b/builtin/commit.c
index 1d805f5da..0f36d2ac3 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -1648,6 +1648,7 @@ int cmd_commit(int argc, const char **argv, const char 
*prefix)
usage_with_options(builtin_commit_usage, 
builtin_commit_options);
 
status_init_config(&s, git_commit_config);
+   s.commit_template = 1;
status_format = STATUS_FORMAT_NONE; /* Ignore status.short */
s.colopts = 0;
 
diff --git a/wt-status.c b/wt-status.c
index 037548496..e6a9ee34f 100644
--- a/wt-status.c
+++ b/wt-status.c
@@ -1576,7 +1576,10 @@ static void wt_longstatus_print(struct wt_status *s)
 
if (s->is_initial) {
status_printf_ln(s, color(WT_STATUS_HEADER, s), "%s", "");
-   status_printf_ln(s, color(WT_STATUS_HEADER, s), _("Initial 
commit"));
+   status_printf_ln(s, color(WT_STATUS_HEADER, s),
+s->commit_template
+? _("Initial commit")
+: _("No commits yet"));
status_printf_ln(s, color(WT_STATUS_HEADER, s), "%s", "");
}
 
diff --git a/wt-status.h b/wt-status.h
index 6018c627b..782b2997f 100644
--- a/wt-status.h
+++ b/wt-status.h
@@ -76,6 +76,7 @@ struct wt_status {
char color_palette[WT_STATUS_MAXSLOT][COLOR_MAXLEN];
unsigned colopts;
int null_termination;
+   int commit_template;
int show_branch;
int hints;
 
-- 
2.11.0

Re: [showing-off RFC/PATCH 26/26] diff.c: have a "machine parseable" move coloring

2017-06-19 Thread Stefan Beller

+ Ævar, who was not part of the email where I copied all recipients
from for this series.

On Mon, Jun 19, 2017 at 7:48 PM, Stefan Beller  wrote:
> Ævar asked for it, this is how you would do it.
> (plus documentation, tests, CLI knobs, options)
>
> Signed-off-by: Stefan Beller 
> ---
>  diff.c | 15 +++
>  diff.h |  2 ++
>  2 files changed, 13 insertions(+), 4 deletions(-)
>
> diff --git a/diff.c b/diff.c
> index 7756f7610c..61caa057ff 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -997,6 +997,7 @@ static void emit_diff_symbol_from_struct(struct 
> diff_options *o,
> static const char *nneof = " No newline at end of file\n";
> const char *context, *reset, *set, *meta, *fraginfo;
> struct strbuf sb = STRBUF_INIT;
> +   int sign;
>
> enum diff_symbol s = eds->s;
> const char *line = eds->line;
> @@ -1058,8 +1059,11 @@ static void emit_diff_symbol_from_struct(struct 
> diff_options *o,
> default:
> set = diff_get_color_opt(o, DIFF_FILE_NEW);
> }
> +   sign = '+';
> +   if (flags & DIFF_SYMBOL_MOVED_LINE && 
> o->machine_readable_moves)
> +   sign = '*';
> reset = diff_get_color_opt(o, DIFF_RESET);
> -   emit_line_ws_markup(o, set, reset, line, len, '+',
> +   emit_line_ws_markup(o, set, reset, line, len, sign,
> flags & DIFF_SYMBOL_CONTENT_WS_MASK,
> flags & 
> DIFF_SYMBOL_CONTENT_BLANK_LINE_EOF);
> break;
> @@ -1086,8 +1090,11 @@ static void emit_diff_symbol_from_struct(struct 
> diff_options *o,
> default:
> set = diff_get_color_opt(o, DIFF_FILE_OLD);
> }
> +   sign = '-';
> +   if (flags & DIFF_SYMBOL_MOVED_LINE && 
> o->machine_readable_moves)
> +   sign = '_';
> reset = diff_get_color_opt(o, DIFF_RESET);
> -   emit_line_ws_markup(o, set, reset, line, len, '-',
> +   emit_line_ws_markup(o, set, reset, line, len, sign,
> flags & DIFF_SYMBOL_CONTENT_WS_MASK, 0);
> break;
> case DIFF_SYMBOL_WORDS_PORCELAIN:
> @@ -5475,7 +5482,7 @@ static void diff_flush_patch_all_file_pairs(struct 
> diff_options *o)
> static struct emitted_diff_symbols esm = EMITTED_DIFF_SYMBOLS_INIT;
> struct diff_queue_struct *q = &diff_queued_diff;
>
> -   if (o->color_moved)
> +   if (o->color_moved || o->machine_readable_moves)
> o->emitted_symbols = &esm;
>
> for (i = 0; i < q->nr; i++) {
> @@ -5485,7 +5492,7 @@ static void diff_flush_patch_all_file_pairs(struct 
> diff_options *o)
> }
>
> if (o->emitted_symbols) {
> -   if (o->color_moved) {
> +   if (o->color_moved || o->machine_readable_moves) {
> struct hashmap add_lines, del_lines;
> unsigned ignore_ws = DIFF_XDL_TST(o, 
> IGNORE_WHITESPACE);
>
> diff --git a/diff.h b/diff.h
> index 98abd75521..b6c4d7ab0f 100644
> --- a/diff.h
> +++ b/diff.h
> @@ -194,6 +194,8 @@ struct diff_options {
> COLOR_MOVED_ZEBRA = 2,
> COLOR_MOVED_ZEBRA_DIM = 3,
> } color_moved;
> +
> +   int machine_readable_moves;
>  };
>
>  void diff_emit_submodule_del(struct diff_options *o, const char *line);
> --
> 2.12.2.575.gb14f27f917
>

[PATCH 23/26] diff.c: color moved lines differently, plain mode

2017-06-19 Thread Stefan Beller

Add the 'plain' mode for move detection of code.
This omits the checking for adjacent blocks, so it is not as useful.
If you have a lot of the same blocks moved in the same patch, the 'Zebra'
would end up slow as it is O(n^2) (n is number of same blocks).
So this may be useful there and is generally easy to add

Signed-off-by: Stefan Beller 
---
 diff.c |  5 +
 diff.h |  1 +
 t/t4015-diff-whitespace.sh | 51 +-
 3 files changed, 56 insertions(+), 1 deletion(-)

diff --git a/diff.c b/diff.c
index 20c1f9b99f..0d41a53b76 100644
--- a/diff.c
+++ b/diff.c
@@ -246,6 +246,8 @@ static int parse_color_moved(const char *arg)
 {
if (!strcmp(arg, "no"))
return COLOR_MOVED_NO;
+   else if (!strcmp(arg, "plain"))
+   return COLOR_MOVED_PLAIN;
else if (!strcmp(arg, "zebra"))
return COLOR_MOVED_ZEBRA;
else
@@ -830,6 +832,9 @@ static void mark_color_as_moved(struct diff_options *o,
 
l->flags |= DIFF_SYMBOL_MOVED_LINE;
 
+   if (o->color_moved == COLOR_MOVED_PLAIN)
+   continue;
+
/* Check any potential block runs, advance each or nullify */
for (i = 0; i < pmb_nr; i++) {
struct moved_entry *p = pmb[i];
diff --git a/diff.h b/diff.h
index 7726ad255c..1aae8738ca 100644
--- a/diff.h
+++ b/diff.h
@@ -190,6 +190,7 @@ struct diff_options {
struct emitted_diff_symbols *emitted_symbols;
enum {
COLOR_MOVED_NO = 0,
+   COLOR_MOVED_PLAIN = 1,
COLOR_MOVED_ZEBRA = 2,
} color_moved;
 };
diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh
index 4a03766f1f..1ca16435d6 100755
--- a/t/t4015-diff-whitespace.sh
+++ b/t/t4015-diff-whitespace.sh
@@ -986,7 +986,7 @@ test_expect_success 'detect moved code, complete file' '
git mv test.c main.c &&
test_config color.diff.oldMoved "normal red" &&
test_config color.diff.newMoved "normal green" &&
-   git diff HEAD --color-moved --no-renames | test_decode_color >actual &&
+   git diff HEAD --color-moved=zebra --no-renames | test_decode_color 
>actual &&
cat >expected <<-\EOF &&
diff --git a/main.c b/main.c
new file mode 100644
@@ -1130,6 +1130,55 @@ test_expect_success 'detect malicious moved code, inside 
file' '
test_cmp expected actual
 '
 
+test_expect_success 'plain moved code, inside file' '
+   test_config color.diff.oldMoved "normal red" &&
+   test_config color.diff.newMoved "normal green" &&
+   test_config color.diff.oldMovedAlternative "blue" &&
+   test_config color.diff.newMovedAlternative "yellow" &&
+   # needs previous test as setup
+   git diff HEAD --no-renames --color-moved=plain| test_decode_color 
>actual &&
+   cat <<-\EOF >expected &&
+   diff --git a/main.c b/main.c
+   index 27a619c..7cf9336 100644
+   --- a/main.c
+   +++ b/main.c
+   @@ -5,13 +5,6 @@ printf("Hello ");
+printf("World\n");
+}
+
+   -int secure_foo(struct user *u)
+   -{
+   -if (!u->is_allowed_foo)
+   -return;
+   -foo(u);
+   -}
+   -
+int main()
+{
+foo();
+   diff --git a/test.c b/test.c
+   index 1dc1d85..2bedec9 100644
+   --- a/test.c
+   +++ b/test.c
+   @@ -4,6 +4,13 @@ int bar()
+printf("Hello World, but different\n");
+}
+
+   +int secure_foo(struct user *u)
+   +{
+   +foo(u);
+   +if (!u->is_allowed_foo)
+   +return;
+   +}
+   +
+int another_function()
+{
+bar();
+   EOF
+
+   test_cmp expected actual
+'
+
 test_expect_success 'no effect from --color-moved with --word-diff' '
cat <<-\EOF >text.txt &&
Lorem Ipsum is simply dummy text of the printing and typesetting 
industry.
-- 
2.12.2.575.gb14f27f917

[PATCH 22/26] diff.c: color moved lines differently

2017-06-19 Thread Stefan Beller

When a patch consists mostly of moving blocks of code around, it can
be quite tedious to ensure that the blocks are moved verbatim, and not
undesirably modified in the move. To that end, color blocks that are
moved within the same patch differently. For example (OM, del, add,
and NM are different colors):

[OM]  -void sensitive_stuff(void)
[OM]  -{
[OM]  -if (!is_authorized_user())
[OM]  -die("unauthorized");
[OM]  -sensitive_stuff(spanning,
[OM]  -multiple,
[OM]  -lines);
[OM]  -}

   void another_function()
   {
[del] -printf("foo");
[add] +printf("bar");
   }

[NM]  +void sensitive_stuff(void)
[NM]  +{
[NM]  +if (!is_authorized_user())
[NM]  +die("unauthorized");
[NM]  +sensitive_stuff(spanning,
[NM]  +multiple,
[NM]  +lines);
[NM]  +}

However adjacent blocks may be problematic. For example, in this
potentially malicious patch, the swapping of blocks can be spotted:

[OM]  -void sensitive_stuff(void)
[OM]  -{
[OMA] -if (!is_authorized_user())
[OMA] -die("unauthorized");
[OM]  -sensitive_stuff(spanning,
[OM]  -multiple,
[OM]  -lines);
[OMA] -}

   void another_function()
   {
[del] -printf("foo");
[add] +printf("bar");
   }

[NM]  +void sensitive_stuff(void)
[NM]  +{
[NMA] +sensitive_stuff(spanning,
[NMA] +multiple,
[NMA] +lines);
[NM]  +if (!is_authorized_user())
[NM]  +die("unauthorized");
[NMA] +}

If the moved code is larger, it is easier to hide some permutation in the
code, which is why some alternative coloring is needed.

This patch implements the first mode:
* basic alternating 'Zebra' mode
  This conveys all information needed to the user.  Defer customization to
  later patches.

First I implemented an alternative design, which would try to fingerprint
a line by its neighbors to detect if we are in a block or at the boundary.
This idea iss error prone as it inspected each line and its neighboring
lines to determine if the line was (a) moved and (b) if was deep inside
a hunk by having matching neighboring lines. This is unreliable as the
we can construct hunks which have equal neighbors that just exceed the
number of lines inspected. (Think of 'AXYZBXYZCXYZD..' with each letter
as a line, that is permutated to AXYZCXYZBXYZD..').

Instead this provides a dynamic programming greedy algorithm that finds
the largest moved hunk and then has several modes on highlighting bounds.

A note on the options '--submodule=diff' and '--color-words/--word-diff':
In the conversion to use emit_line in the prior patches both submodules
as well as word diff output carefully chose to call emit_line with sign=0.
All output with sign=0 is ignored for move detection purposes in this
patch, such that no weird looking output will be generated for these
cases. This leads to another thought: We could pass on '--color-moved' to
submodules such that they color up moved lines for themselves. If we'd do
so only line moves within a repository boundary are marked up.

Helped-by: Jonathan Tan 
Signed-off-by: Stefan Beller 
---
 diff.c | 304 ++---
 diff.h |  10 +-
 t/t4015-diff-whitespace.sh | 196 +
 3 files changed, 495 insertions(+), 15 deletions(-)

diff --git a/diff.c b/diff.c
index 35b5924ff2..20c1f9b99f 100644
--- a/diff.c
+++ b/diff.c
@@ -15,6 +15,7 @@
 #include "userdiff.h"
 #include "submodule-config.h"
 #include "submodule.h"
+#include "hashmap.h"
 #include "ll-merge.h"
 #include "string-list.h"
 #include "argv-array.h"
@@ -31,6 +32,7 @@ static int diff_indent_heuristic; /* experimental */
 static int diff_rename_limit_default = 400;
 static int diff_suppress_blank_empty;
 static int diff_use_color_default = -1;
+static int diff_color_moved_default;
 static int diff_context_default = 3;
 static int diff_interhunk_context_default;
 static const char *diff_word_regex_cfg;
@@ -55,6 +57,10 @@ static char diff_colors[][COLOR_MAXLEN] = {
GIT_COLOR_YELLOW,   /* COMMIT */
GIT_COLOR_BG_RED,   /* WHITESPACE */
GIT_COLOR_NORMAL,   /* FUNCINFO */
+   GIT_COLOR_MAGENTA,  /* OLD_MOVED */
+   GIT_COLOR_BLUE, /* OLD_MOVED ALTERNATIVE */
+   GIT_COLOR_CYAN, /* NEW_MOVED */
+   GIT_COLOR_YELLOW,   /* NEW_MOVED ALTERNATIVE */
 };
 
 static NORETURN void die_want_option(const char *option_name)
@@ -80,6 +86,14 @@ static int parse_diff_color_slot(const char *var)
return DIFF_WHITESPACE;
if (!strcasecmp

[PATCH 13/26] diff.c: emit_diff_symbol learns about DIFF_SYMBOL_BINARY_FILES

2017-06-19 Thread Stefan Beller

Signed-off-by: Stefan Beller 
---
 diff.c | 21 -
 1 file changed, 16 insertions(+), 5 deletions(-)

diff --git a/diff.c b/diff.c
index 95f11ea86f..e56962b844 100644
--- a/diff.c
+++ b/diff.c
@@ -572,6 +572,7 @@ enum diff_symbol {
DIFF_SYMBOL_WORDS,
DIFF_SYMBOL_FILEPAIR,
DIFF_SYMBOL_HEADER,
+   DIFF_SYMBOL_BINARY_FILES,
 };
 /*
  * Flags for content lines:
@@ -689,6 +690,10 @@ static void emit_diff_symbol(struct diff_options *o, enum 
diff_symbol s,
case DIFF_SYMBOL_HEADER:
fprintf(o->file, "%s", line);
break;
+   case DIFF_SYMBOL_BINARY_FILES:
+   fprintf(o->file, "%sBinary files %s differ\n",
+   diff_line_prefix(o), line);
+   break;
default:
die("BUG: unknown diff symbol");
}
@@ -2537,6 +2542,7 @@ static void builtin_diff(const char *name_a,
} else if (!DIFF_OPT_TST(o, TEXT) &&
( (!textconv_one && diff_filespec_is_binary(one)) ||
  (!textconv_two && diff_filespec_is_binary(two)) )) {
+   struct strbuf sb = STRBUF_INIT;
if (!one->data && !two->data &&
S_ISREG(one->mode) && S_ISREG(two->mode) &&
!DIFF_OPT_TST(o, BINARY)) {
@@ -2549,8 +2555,10 @@ static void builtin_diff(const char *name_a,
}
emit_diff_symbol(o, DIFF_SYMBOL_HEADER,
 header.buf, header.len, 0);
-   fprintf(o->file, "%sBinary files %s and %s differ\n",
-   line_prefix, lbl[0], lbl[1]);
+   strbuf_addf(&sb, "%s and %s", lbl[0], lbl[1]);
+   emit_diff_symbol(o, DIFF_SYMBOL_BINARY_FILES,
+sb.buf, sb.len, 0);
+   strbuf_release(&sb);
goto free_ab_and_return;
}
if (fill_mmfile(&mf1, one) < 0 || fill_mmfile(&mf2, two) < 0)
@@ -2567,9 +2575,12 @@ static void builtin_diff(const char *name_a,
strbuf_reset(&header);
if (DIFF_OPT_TST(o, BINARY))
emit_binary_diff(o->file, &mf1, &mf2, line_prefix);
-   else
-   fprintf(o->file, "%sBinary files %s and %s differ\n",
-   line_prefix, lbl[0], lbl[1]);
+   else {
+   strbuf_addf(&sb, "%s and %s", lbl[0], lbl[1]);
+   emit_diff_symbol(o, DIFF_SYMBOL_BINARY_FILES,
+sb.buf, sb.len, 0);
+   strbuf_release(&sb);
+   }
o->found_changes = 1;
} else {
/* Crazy xdl interfaces.. */
-- 
2.12.2.575.gb14f27f917

[PATCH 01/26] diff.c: readability fix

2017-06-19 Thread Stefan Beller

We already have dereferenced 'p->two' into a local variable 'two'.
Use that.

Signed-off-by: Stefan Beller 
Signed-off-by: Junio C Hamano 
---
 diff.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/diff.c b/diff.c
index 74283d9001..3f5bf8b5a4 100644
--- a/diff.c
+++ b/diff.c
@@ -3283,8 +3283,8 @@ static void run_diff(struct diff_filepair *p, struct 
diff_options *o)
const char *other;
const char *attr_path;
 
-   name  = p->one->path;
-   other = (strcmp(name, p->two->path) ? p->two->path : NULL);
+   name  = one->path;
+   other = (strcmp(name, two->path) ? two->path : NULL);
attr_path = name;
if (o->prefix_length)
strip_prefix(o->prefix_length, &name, &other);
-- 
2.12.2.575.gb14f27f917

[PATCH 16/26] diff.c: convert emit_binary_diff_body to use emit_diff_symbol

2017-06-19 Thread Stefan Beller

Signed-off-by: Stefan Beller 
---
 diff.c | 66 +-
 1 file changed, 49 insertions(+), 17 deletions(-)

diff --git a/diff.c b/diff.c
index 42a9020d95..e6ade5fde0 100644
--- a/diff.c
+++ b/diff.c
@@ -573,6 +573,11 @@ enum diff_symbol {
DIFF_SYMBOL_FILEPAIR,
DIFF_SYMBOL_HEADER,
DIFF_SYMBOL_BINARY_FILES,
+   DIFF_SYMBOL_BINARY_DIFF_HEADER,
+   DIFF_SYMBOL_BINARY_DIFF_HEADER_DELTA,
+   DIFF_SYMBOL_BINARY_DIFF_HEADER_LITERAL,
+   DIFF_SYMBOL_BINARY_DIFF_BODY,
+   DIFF_SYMBOL_BINARY_DIFF_FOOTER,
DIFF_SYMBOL_REWRITE_DIFF,
DIFF_SYMBOL_SUBMODULE_ADD,
DIFF_SYMBOL_SUBMODULE_DEL,
@@ -581,6 +586,7 @@ enum diff_symbol {
DIFF_SYMBOL_SUBMODULE_HEADER,
DIFF_SYMBOL_SUBMODULE_ERROR,
DIFF_SYMBOL_SUBMODULE_PIPETHROUGH,
+
 };
 /*
  * Flags for content lines:
@@ -702,6 +708,22 @@ static void emit_diff_symbol(struct diff_options *o, enum 
diff_symbol s,
fprintf(o->file, "%sBinary files %s differ\n",
diff_line_prefix(o), line);
break;
+   case DIFF_SYMBOL_BINARY_DIFF_HEADER:
+   fprintf(o->file, "%sGIT binary patch\n", diff_line_prefix(o));
+   break;
+   case DIFF_SYMBOL_BINARY_DIFF_HEADER_DELTA:
+   fprintf(o->file, "%sdelta %s\n", diff_line_prefix(o), line);
+   break;
+   case DIFF_SYMBOL_BINARY_DIFF_HEADER_LITERAL:
+   fprintf(o->file, "%sliteral %s\n", diff_line_prefix(o), line);
+   break;
+   case DIFF_SYMBOL_BINARY_DIFF_BODY:
+   emit_line(o, "", "", line, len);
+   break;
+   case DIFF_SYMBOL_BINARY_DIFF_FOOTER:
+   fputs(diff_line_prefix(o), o->file);
+   fputc('\n', o->file);
+   break;
case DIFF_SYMBOL_REWRITE_DIFF:
fraginfo = diff_get_color(o->use_color, DIFF_FRAGINFO);
reset = diff_get_color_opt(o, DIFF_RESET);
@@ -2394,8 +2416,8 @@ static unsigned char *deflate_it(char *data,
return deflated;
 }
 
-static void emit_binary_diff_body(FILE *file, mmfile_t *one, mmfile_t *two,
- const char *prefix)
+static void emit_binary_diff_body(struct diff_options *o,
+ mmfile_t *one, mmfile_t *two)
 {
void *cp;
void *delta;
@@ -2424,13 +2446,18 @@ static void emit_binary_diff_body(FILE *file, mmfile_t 
*one, mmfile_t *two,
}
 
if (delta && delta_size < deflate_size) {
-   fprintf(file, "%sdelta %lu\n", prefix, orig_size);
+   char *s = xstrfmt("%lu", orig_size);
+   emit_diff_symbol(o, DIFF_SYMBOL_BINARY_DIFF_HEADER_DELTA,
+s, strlen(s), 0);
+   free(s);
free(deflated);
data = delta;
data_size = delta_size;
-   }
-   else {
-   fprintf(file, "%sliteral %lu\n", prefix, two->size);
+   } else {
+   char *s = xstrfmt("%lu", two->size);
+   emit_diff_symbol(o, DIFF_SYMBOL_BINARY_DIFF_HEADER_LITERAL,
+s, strlen(s), 0);
+   free(s);
free(delta);
data = deflated;
data_size = deflate_size;
@@ -2439,8 +2466,9 @@ static void emit_binary_diff_body(FILE *file, mmfile_t 
*one, mmfile_t *two,
/* emit data encoded in base85 */
cp = data;
while (data_size) {
+   int len;
int bytes = (52 < data_size) ? 52 : data_size;
-   char line[70];
+   char line[71];
data_size -= bytes;
if (bytes <= 26)
line[0] = bytes + 'A' - 1;
@@ -2448,20 +2476,24 @@ static void emit_binary_diff_body(FILE *file, mmfile_t 
*one, mmfile_t *two,
line[0] = bytes - 26 + 'a' - 1;
encode_85(line + 1, cp, bytes);
cp = (char *) cp + bytes;
-   fprintf(file, "%s", prefix);
-   fputs(line, file);
-   fputc('\n', file);
+
+   len = strlen(line);
+   line[len++] = '\n';
+   line[len] = '\0';
+
+   emit_diff_symbol(o, DIFF_SYMBOL_BINARY_DIFF_BODY,
+line, len, 0);
}
-   fprintf(file, "%s\n", prefix);
+   emit_diff_symbol(o, DIFF_SYMBOL_BINARY_DIFF_FOOTER, NULL, 0, 0);
free(data);
 }
 
-static void emit_binary_diff(FILE *file, mmfile_t *one, mmfile_t *two,
-const char *prefix)
+static void emit_binary_diff(struct diff_options *o,
+mmfile_t *one, mmfile_t *two)
 {
-   fprintf(file, "%sGIT binary patch\n", prefix);
-   emit_binary_diff_body(file, one, two, prefix);
-   emit_binary_diff_body(file, two, one, prefix);
+   emit_diff_symbol(o,

[PATCH 10/26] diff.c: emit_diff_symbol learns DIFF_SYMBOL_CONTEXT_INCOMPLETE

2017-06-19 Thread Stefan Beller

Signed-off-by: Stefan Beller 
---
 diff.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/diff.c b/diff.c
index 7096457c8a..e6029e8213 100644
--- a/diff.c
+++ b/diff.c
@@ -565,6 +565,7 @@ enum diff_symbol {
DIFF_SYMBOL_CONTEXT_FRAGINFO,
DIFF_SYMBOL_NO_LF_EOF,
DIFF_SYMBOL_CONTEXT,
+   DIFF_SYMBOL_CONTEXT_INCOMPLETE,
DIFF_SYMBOL_PLUS,
DIFF_SYMBOL_MINUS,
DIFF_SYMBOL_WORDS_PORCELAIN,
@@ -637,6 +638,11 @@ static void emit_diff_symbol(struct diff_options *o, enum 
diff_symbol s,
emit_line_ws_markup(o, set, reset, line, len, ' ',
flags & (DIFF_SYMBOL_CONTENT_WS_MASK), 0);
break;
+   case DIFF_SYMBOL_CONTEXT_INCOMPLETE:
+   context = diff_get_color_opt(o, DIFF_CONTEXT);
+   reset = diff_get_color_opt(o, DIFF_RESET);
+   emit_line(o, context, reset, line, len);
+   break;
case DIFF_SYMBOL_PLUS:
set = diff_get_color_opt(o, DIFF_FILE_NEW);
reset = diff_get_color_opt(o, DIFF_RESET);
@@ -1448,8 +1454,8 @@ static void fn_out_consume(void *priv, char *line, 
unsigned long len)
default:
/* incomplete line at the end */
ecbdata->lno_in_preimage++;
-   emit_line(o, diff_get_color(ecbdata->color_diff, DIFF_CONTEXT),
- reset, line, len);
+   emit_diff_symbol(o, DIFF_SYMBOL_CONTEXT_INCOMPLETE,
+line, len, 0);
break;
}
 }
-- 
2.12.2.575.gb14f27f917

[PATCH 09/26] diff.c: emit_diff_symbol learns DIFF_SYMBOL_WORDS{_PORCELAIN}

2017-06-19 Thread Stefan Beller

Signed-off-by: Stefan Beller 
---
 diff.c | 42 ++
 1 file changed, 26 insertions(+), 16 deletions(-)

diff --git a/diff.c b/diff.c
index 42c9f48dc2..7096457c8a 100644
--- a/diff.c
+++ b/diff.c
@@ -567,6 +567,8 @@ enum diff_symbol {
DIFF_SYMBOL_CONTEXT,
DIFF_SYMBOL_PLUS,
DIFF_SYMBOL_MINUS,
+   DIFF_SYMBOL_WORDS_PORCELAIN,
+   DIFF_SYMBOL_WORDS,
 };
 /*
  * Flags for content lines:
@@ -648,6 +650,26 @@ static void emit_diff_symbol(struct diff_options *o, enum 
diff_symbol s,
emit_line_ws_markup(o, set, reset, line, len, '-',
flags & DIFF_SYMBOL_CONTENT_WS_MASK, 0);
break;
+   case DIFF_SYMBOL_WORDS_PORCELAIN:
+   context = diff_get_color_opt(o, DIFF_CONTEXT);
+   reset = diff_get_color_opt(o, DIFF_RESET);
+   emit_line(o, context, reset, line, len);
+   fputs("~\n", o->file);
+   break;
+   case DIFF_SYMBOL_WORDS:
+   context = diff_get_color_opt(o, DIFF_CONTEXT);
+   reset = diff_get_color_opt(o, DIFF_RESET);
+   /*
+* Skip the prefix character, if any.  With
+* diff_suppress_blank_empty, there may be
+* none.
+*/
+   if (line[0] != '\n') {
+   line++;
+   len--;
+   }
+   emit_line(o, context, reset, line, len);
+   break;
default:
die("BUG: unknown diff symbol");
}
@@ -1342,7 +1364,6 @@ static void fn_out_consume(void *priv, char *line, 
unsigned long len)
 {
struct emit_callback *ecbdata = priv;
const char *meta = diff_get_color(ecbdata->color_diff, DIFF_METAINFO);
-   const char *context = diff_get_color(ecbdata->color_diff, DIFF_CONTEXT);
const char *reset = diff_get_color(ecbdata->color_diff, DIFF_RESET);
struct diff_options *o = ecbdata->opt;
const char *line_prefix = diff_line_prefix(o);
@@ -1384,6 +1405,9 @@ static void fn_out_consume(void *priv, char *line, 
unsigned long len)
}
 
if (ecbdata->diff_words) {
+   enum diff_symbol s =
+   ecbdata->diff_words->type == DIFF_WORDS_PORCELAIN ?
+   DIFF_SYMBOL_WORDS_PORCELAIN : DIFF_SYMBOL_WORDS;
if (line[0] == '-') {
diff_words_append(line, len,
  &ecbdata->diff_words->minus);
@@ -1403,21 +1427,7 @@ static void fn_out_consume(void *priv, char *line, 
unsigned long len)
return;
}
diff_words_flush(ecbdata);
-   if (ecbdata->diff_words->type == DIFF_WORDS_PORCELAIN) {
-   emit_line(o, context, reset, line, len);
-   fputs("~\n", o->file);
-   } else {
-   /*
-* Skip the prefix character, if any.  With
-* diff_suppress_blank_empty, there may be
-* none.
-*/
-   if (line[0] != '\n') {
- line++;
- len--;
-   }
-   emit_line(o, context, reset, line, len);
-   }
+   emit_diff_symbol(o, s, line, len, 0);
return;
}
 
-- 
2.12.2.575.gb14f27f917

[showing-off RFC/PATCH 26/26] diff.c: have a "machine parseable" move coloring

2017-06-19 Thread Stefan Beller

Ævar asked for it, this is how you would do it.
(plus documentation, tests, CLI knobs, options)

Signed-off-by: Stefan Beller 
---
 diff.c | 15 +++
 diff.h |  2 ++
 2 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/diff.c b/diff.c
index 7756f7610c..61caa057ff 100644
--- a/diff.c
+++ b/diff.c
@@ -997,6 +997,7 @@ static void emit_diff_symbol_from_struct(struct 
diff_options *o,
static const char *nneof = " No newline at end of file\n";
const char *context, *reset, *set, *meta, *fraginfo;
struct strbuf sb = STRBUF_INIT;
+   int sign;
 
enum diff_symbol s = eds->s;
const char *line = eds->line;
@@ -1058,8 +1059,11 @@ static void emit_diff_symbol_from_struct(struct 
diff_options *o,
default:
set = diff_get_color_opt(o, DIFF_FILE_NEW);
}
+   sign = '+';
+   if (flags & DIFF_SYMBOL_MOVED_LINE && o->machine_readable_moves)
+   sign = '*';
reset = diff_get_color_opt(o, DIFF_RESET);
-   emit_line_ws_markup(o, set, reset, line, len, '+',
+   emit_line_ws_markup(o, set, reset, line, len, sign,
flags & DIFF_SYMBOL_CONTENT_WS_MASK,
flags & DIFF_SYMBOL_CONTENT_BLANK_LINE_EOF);
break;
@@ -1086,8 +1090,11 @@ static void emit_diff_symbol_from_struct(struct 
diff_options *o,
default:
set = diff_get_color_opt(o, DIFF_FILE_OLD);
}
+   sign = '-';
+   if (flags & DIFF_SYMBOL_MOVED_LINE && o->machine_readable_moves)
+   sign = '_';
reset = diff_get_color_opt(o, DIFF_RESET);
-   emit_line_ws_markup(o, set, reset, line, len, '-',
+   emit_line_ws_markup(o, set, reset, line, len, sign,
flags & DIFF_SYMBOL_CONTENT_WS_MASK, 0);
break;
case DIFF_SYMBOL_WORDS_PORCELAIN:
@@ -5475,7 +5482,7 @@ static void diff_flush_patch_all_file_pairs(struct 
diff_options *o)
static struct emitted_diff_symbols esm = EMITTED_DIFF_SYMBOLS_INIT;
struct diff_queue_struct *q = &diff_queued_diff;
 
-   if (o->color_moved)
+   if (o->color_moved || o->machine_readable_moves)
o->emitted_symbols = &esm;
 
for (i = 0; i < q->nr; i++) {
@@ -5485,7 +5492,7 @@ static void diff_flush_patch_all_file_pairs(struct 
diff_options *o)
}
 
if (o->emitted_symbols) {
-   if (o->color_moved) {
+   if (o->color_moved || o->machine_readable_moves) {
struct hashmap add_lines, del_lines;
unsigned ignore_ws = DIFF_XDL_TST(o, IGNORE_WHITESPACE);
 
diff --git a/diff.h b/diff.h
index 98abd75521..b6c4d7ab0f 100644
--- a/diff.h
+++ b/diff.h
@@ -194,6 +194,8 @@ struct diff_options {
COLOR_MOVED_ZEBRA = 2,
COLOR_MOVED_ZEBRA_DIM = 3,
} color_moved;
+
+   int machine_readable_moves;
 };
 
 void diff_emit_submodule_del(struct diff_options *o, const char *line);
-- 
2.12.2.575.gb14f27f917

[PATCH 21/26] diff.c: buffer all output if asked to

2017-06-19 Thread Stefan Beller

Introduce a new option 'emitted_symbols' in the struct diff_options which
controls whether all output is buffered up until all output is available.
It is set internally in diff.c when necessary.

We'll have a new struct 'emitted_string' in diff.c which will be used to
buffer each line.  The emitted_string will duplicate the memory of the
line to buffer as that is easiest to reason about for now. In a future
patch we may want to decrease the memory usage by not duplicating all
output for buffering but rather we may want to store offsets into the
file or in case of hunk descriptions such as the similarity score, we
could just store the relevant number and reproduce the text later on.

This approach was chosen as a first step because it is quite simple
compared to the alternative with less memory footprint.

emit_diff_symbol factors out the emission part and depending on the
diff_options->emitted_symbols the emission will be performed directly
when calling emit_diff_symbol or after the whole process is done, i.e.
by buffering we have add the possibility for a second pass over the
whole output before doing the actual output.

In 6440d34 (2012-03-14, diff: tweak a _copy_ of diff_options with
word-diff) we introduced a duplicate diff options struct for word
emissions as we may have different regex settings in there.
When buffering the output, we need to operate on just one buffer,
so we have to copy back the emissions of the word buffer into the
main buffer.

Unconditionally enable output via buffer in this patch as it yields
a great opportunity for testing, i.e. all the diff tests from the
test suite pass without having reordering issues (i.e. only parts
of the output got buffered, and we forgot to buffer other parts).
The test suite passes, which gives confidence that we converted all
functions to use emit_string for output.

Signed-off-by: Stefan Beller 
---
 diff.c | 109 +++--
 diff.h |   2 ++
 2 files changed, 109 insertions(+), 2 deletions(-)

diff --git a/diff.c b/diff.c
index f2f7a4da79..35b5924ff2 100644
--- a/diff.c
+++ b/diff.c
@@ -603,6 +603,47 @@ enum diff_symbol {
 #define DIFF_SYMBOL_CONTENT_BLANK_LINE_EOF (1<<16)
 #define DIFF_SYMBOL_CONTENT_WS_MASK (WSEH_NEW | WSEH_OLD | WSEH_CONTEXT | 
WS_RULE_MASK)
 
+/*
+ * This struct is used when we need to buffer the output of the diff output.
+ *
+ * NEEDSWORK: Instead of storing a copy of the line, add an offset pointer
+ * into the pre/post image file. This pointer could be a union with the
+ * line pointer. By storing an offset into the file instead of the literal 
line,
+ * we can decrease the memory footprint for the buffered output. At first we
+ * may want to only have indirection for the content lines, but we could also
+ * enhance the state for emitting prefabricated lines, e.g. the similarity
+ * score line or hunk/file headers would only need to store a number or path
+ * and then the output can be constructed later on depending on state.
+ */
+struct emitted_diff_symbol {
+   const char *line;
+   int len;
+   int flags;
+   enum diff_symbol s;
+};
+#define EMITTED_DIFF_SYMBOL_INIT {NULL}
+
+struct emitted_diff_symbols {
+   struct emitted_diff_symbol *buf;
+   int nr, alloc;
+};
+#define EMITTED_DIFF_SYMBOLS_INIT {NULL, 0, 0}
+
+static void append_emitted_diff_symbol(struct diff_options *o,
+  struct emitted_diff_symbol *e)
+{
+   struct emitted_diff_symbol *f;
+
+   ALLOC_GROW(o->emitted_symbols->buf,
+  o->emitted_symbols->nr + 1,
+  o->emitted_symbols->alloc);
+   f = &o->emitted_symbols->buf[o->emitted_symbols->nr++];
+
+   memcpy(f, e, sizeof(struct emitted_diff_symbol));
+   f->line = e->line ? xmemdupz(e->line, e->len) : NULL;
+}
+
+
 static void emit_line_ws_markup(struct diff_options *o,
const char *set, const char *reset,
const char *line, int len, char sign,
@@ -629,12 +670,18 @@ static void emit_line_ws_markup(struct diff_options *o,
}
 }
 
-static void emit_diff_symbol(struct diff_options *o, enum diff_symbol s,
-const char *line, int len, unsigned flags)
+static void emit_diff_symbol_from_struct(struct diff_options *o,
+struct emitted_diff_symbol *eds)
 {
static const char *nneof = " No newline at end of file\n";
const char *context, *reset, *set, *meta, *fraginfo;
struct strbuf sb = STRBUF_INIT;
+
+   enum diff_symbol s = eds->s;
+   const char *line = eds->line;
+   int len = eds->len;
+   unsigned flags = eds->flags;
+
switch (s) {
case DIFF_SYMBOL_SEPARATOR:
fprintf(o->file, "%s%c",
@@ -790,6 +837,17 @@ static void emit_diff_symbol(struct diff_options *o, enum 
diff_symbol s,
strbuf_release(&sb);
 }
 
+static void emit_diff_symbol(struct di

[PATCH 05/26] diff.c: emit_diff_symbol learns DIFF_SYMBOL_CONTEXT_MARKER

2017-06-19 Thread Stefan Beller

Signed-off-by: Stefan Beller 
---
 diff.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/diff.c b/diff.c
index 89466018e5..3af07fa659 100644
--- a/diff.c
+++ b/diff.c
@@ -561,17 +561,24 @@ static void emit_line(struct diff_options *o, const char 
*set, const char *reset
 
 enum diff_symbol {
DIFF_SYMBOL_SEPARATOR,
+   DIFF_SYMBOL_CONTEXT_MARKER,
 };
 
 static void emit_diff_symbol(struct diff_options *o, enum diff_symbol s,
 const char *line, int len)
 {
+   const char *context, *reset;
switch (s) {
case DIFF_SYMBOL_SEPARATOR:
fprintf(o->file, "%s%c",
diff_line_prefix(o),
o->line_termination);
break;
+   case DIFF_SYMBOL_CONTEXT_MARKER:
+   context = diff_get_color_opt(o, DIFF_CONTEXT);
+   reset = diff_get_color_opt(o, DIFF_RESET);
+   emit_line(o, context, reset, line, len);
+   break;
default:
die("BUG: unknown diff symbol");
}
@@ -661,7 +668,8 @@ static void emit_hunk_header(struct emit_callback *ecbdata,
if (len < 10 ||
memcmp(line, atat, 2) ||
!(ep = memmem(line + 2, len - 2, atat, 2))) {
-   emit_line(ecbdata->opt, context, reset, line, len);
+   emit_diff_symbol(ecbdata->opt,
+DIFF_SYMBOL_CONTEXT_MARKER, line, len);
return;
}
ep += 2; /* skip over @@ */
-- 
2.12.2.575.gb14f27f917

[PATCH 20/26] diff.c: emit_diff_symbol learns about DIFF_SYMBOL_SUMMARY

2017-06-19 Thread Stefan Beller

Signed-off-by: Stefan Beller 
---
 diff.c | 73 +++---
 1 file changed, 43 insertions(+), 30 deletions(-)

diff --git a/diff.c b/diff.c
index 18bfc4720b..f2f7a4da79 100644
--- a/diff.c
+++ b/diff.c
@@ -592,6 +592,7 @@ enum diff_symbol {
DIFF_SYMBOL_STATS_LINE,
DIFF_SYMBOL_WORD_DIFF,
DIFF_SYMBOL_STAT_SEP,
+   DIFF_SYMBOL_SUMMARY,
 };
 /*
  * Flags for content lines:
@@ -780,6 +781,9 @@ static void emit_diff_symbol(struct diff_options *o, enum 
diff_symbol s,
case DIFF_SYMBOL_STAT_SEP:
fputs(o->stat_sep, o->file);
break;
+   case DIFF_SYMBOL_SUMMARY:
+   emit_line(o, "", "", line, len);
+   break;
default:
die("BUG: unknown diff symbol");
}
@@ -4727,67 +4731,76 @@ static void flush_one_pair(struct diff_filepair *p, 
struct diff_options *opt)
}
 }
 
-static void show_file_mode_name(FILE *file, const char *newdelete, struct 
diff_filespec *fs)
+static void show_file_mode_name(struct diff_options *opt, const char 
*newdelete, struct diff_filespec *fs)
 {
+   struct strbuf sb = STRBUF_INIT;
if (fs->mode)
-   fprintf(file, " %s mode %06o ", newdelete, fs->mode);
+   strbuf_addf(&sb, " %s mode %06o ", newdelete, fs->mode);
else
-   fprintf(file, " %s ", newdelete);
-   write_name_quoted(fs->path, file, '\n');
-}
+   strbuf_addf(&sb, " %s ", newdelete);
 
+   quote_c_style(fs->path, &sb, NULL, 0);
+   strbuf_addch(&sb, '\n');
+   emit_diff_symbol(opt, DIFF_SYMBOL_SUMMARY,
+sb.buf, sb.len, 0);
+   strbuf_release(&sb);
+}
 
-static void show_mode_change(FILE *file, struct diff_filepair *p, int 
show_name,
-   const char *line_prefix)
+static void show_mode_change(struct diff_options *opt, struct diff_filepair *p,
+   int show_name)
 {
if (p->one->mode && p->two->mode && p->one->mode != p->two->mode) {
-   fprintf(file, "%s mode change %06o => %06o%c", line_prefix, 
p->one->mode,
-   p->two->mode, show_name ? ' ' : '\n');
+   struct strbuf sb = STRBUF_INIT;
+   strbuf_addf(&sb, " mode change %06o => %06o",
+   p->one->mode, p->two->mode);
if (show_name) {
-   write_name_quoted(p->two->path, file, '\n');
+   strbuf_addch(&sb, ' ');
+   quote_c_style(p->two->path, &sb, NULL, 0);
}
+   emit_diff_symbol(opt, DIFF_SYMBOL_SUMMARY,
+sb.buf, sb.len, 0);
+   strbuf_release(&sb);
}
 }
 
-static void show_rename_copy(FILE *file, const char *renamecopy, struct 
diff_filepair *p,
-   const char *line_prefix)
+static void show_rename_copy(struct diff_options *opt, const char *renamecopy,
+   struct diff_filepair *p)
 {
+   struct strbuf sb = STRBUF_INIT;
char *names = pprint_rename(p->one->path, p->two->path);
-
-   fprintf(file, " %s %s (%d%%)\n", renamecopy, names, 
similarity_index(p));
+   strbuf_addf(&sb, " %s %s (%d%%)\n",
+   renamecopy, names, similarity_index(p));
free(names);
-   show_mode_change(file, p, 0, line_prefix);
+   emit_diff_symbol(opt, DIFF_SYMBOL_SUMMARY,
+sb.buf, sb.len, 0);
+   show_mode_change(opt, p, 0);
 }
 
 static void diff_summary(struct diff_options *opt, struct diff_filepair *p)
 {
-   FILE *file = opt->file;
-   const char *line_prefix = diff_line_prefix(opt);
-
switch(p->status) {
case DIFF_STATUS_DELETED:
-   fputs(line_prefix, file);
-   show_file_mode_name(file, "delete", p->one);
+   show_file_mode_name(opt, "delete", p->one);
break;
case DIFF_STATUS_ADDED:
-   fputs(line_prefix, file);
-   show_file_mode_name(file, "create", p->two);
+   show_file_mode_name(opt, "create", p->two);
break;
case DIFF_STATUS_COPIED:
-   fputs(line_prefix, file);
-   show_rename_copy(file, "copy", p, line_prefix);
+   show_rename_copy(opt, "copy", p);
break;
case DIFF_STATUS_RENAMED:
-   fputs(line_prefix, file);
-   show_rename_copy(file, "rename", p, line_prefix);
+   show_rename_copy(opt, "rename", p);
break;
default:
if (p->score) {
-   fprintf(file, "%s rewrite ", line_prefix);
-   write_name_quoted(p->two->path, file, ' ');
-   fprintf(file, "(%d%%)\n", similarity_index(p));
+   struct strbuf sb = STRBUF_INIT;
+   strbuf_addstr(&sb, " rewrite ");
+

[PATCH 24/26] diff.c: add dimming to moved line detection

2017-06-19 Thread Stefan Beller

Any lines inside a moved block of code are not interesting. Boundaries
of blocks are only interesting if they are next to another block of moved
code.

Signed-off-by: Stefan Beller 
---
 color.h|   2 +
 diff.c | 139 -
 diff.h |   9 ++-
 t/t4015-diff-whitespace.sh | 124 
 4 files changed, 259 insertions(+), 15 deletions(-)

diff --git a/color.h b/color.h
index 90627650fc..0e091b0cf5 100644
--- a/color.h
+++ b/color.h
@@ -42,6 +42,8 @@ struct strbuf;
 #define GIT_COLOR_BG_BLUE  "\033[44m"
 #define GIT_COLOR_BG_MAGENTA   "\033[45m"
 #define GIT_COLOR_BG_CYAN  "\033[46m"
+#define GIT_COLOR_DI   "\033[2m"
+#define GIT_COLOR_DI_IT"\033[2;3m"
 
 /* A special value meaning "no color selected" */
 #define GIT_COLOR_NIL "NIL"
diff --git a/diff.c b/diff.c
index 0d41a53b76..7756f7610c 100644
--- a/diff.c
+++ b/diff.c
@@ -57,10 +57,14 @@ static char diff_colors[][COLOR_MAXLEN] = {
GIT_COLOR_YELLOW,   /* COMMIT */
GIT_COLOR_BG_RED,   /* WHITESPACE */
GIT_COLOR_NORMAL,   /* FUNCINFO */
-   GIT_COLOR_MAGENTA,  /* OLD_MOVED */
-   GIT_COLOR_BLUE, /* OLD_MOVED ALTERNATIVE */
-   GIT_COLOR_CYAN, /* NEW_MOVED */
-   GIT_COLOR_YELLOW,   /* NEW_MOVED ALTERNATIVE */
+   GIT_COLOR_BOLD_MAGENTA, /* OLD_MOVED */
+   GIT_COLOR_BOLD_BLUE,/* OLD_MOVED ALTERNATIVE */
+   GIT_COLOR_DI,   /* OLD_MOVED_DIM */
+   GIT_COLOR_DI_IT,/* OLD_MOVED_ALTERNATIVE_DIM */
+   GIT_COLOR_BOLD_CYAN,/* NEW_MOVED */
+   GIT_COLOR_BOLD_YELLOW,  /* NEW_MOVED ALTERNATIVE */
+   GIT_COLOR_DI,   /* NEW_MOVED_DIM */
+   GIT_COLOR_DI_IT,/* NEW_MOVED_ALTERNATIVE_DIM */
 };
 
 static NORETURN void die_want_option(const char *option_name)
@@ -90,10 +94,18 @@ static int parse_diff_color_slot(const char *var)
return DIFF_FILE_OLD_MOVED;
if (!strcasecmp(var, "oldmovedalternative"))
return DIFF_FILE_OLD_MOVED_ALT;
+   if (!strcasecmp(var, "oldmoveddimmed"))
+   return DIFF_FILE_OLD_MOVED_DIM;
+   if (!strcasecmp(var, "oldmovedalternativedimmed"))
+   return DIFF_FILE_OLD_MOVED_ALT_DIM;
if (!strcasecmp(var, "newmoved"))
return DIFF_FILE_NEW_MOVED;
if (!strcasecmp(var, "newmovedalternative"))
return DIFF_FILE_NEW_MOVED_ALT;
+   if (!strcasecmp(var, "newmoveddimmed"))
+   return DIFF_FILE_NEW_MOVED_DIM;
+   if (!strcasecmp(var, "newmovedalternativedimmed"))
+   return DIFF_FILE_NEW_MOVED_ALT_DIM;
return -1;
 }
 
@@ -250,6 +262,8 @@ static int parse_color_moved(const char *arg)
return COLOR_MOVED_PLAIN;
else if (!strcmp(arg, "zebra"))
return COLOR_MOVED_ZEBRA;
+   else if (!strcmp(arg, "dimmed_zebra"))
+   return COLOR_MOVED_ZEBRA_DIM;
else
return -1;
 }
@@ -636,6 +650,7 @@ enum diff_symbol {
 #define DIFF_SYMBOL_CONTENT_BLANK_LINE_EOF (1<<16)
 #define DIFF_SYMBOL_MOVED_LINE (1<<17)
 #define DIFF_SYMBOL_MOVED_LINE_ZEBRA   (1<<18)
+#define DIFF_SYMBOL_MOVED_LINE_UNINTERESTING   (1<<19)
 #define DIFF_SYMBOL_CONTENT_WS_MASK (WSEH_NEW | WSEH_OLD | WSEH_CONTEXT | 
WS_RULE_MASK)
 
 /*
@@ -889,6 +904,67 @@ static void mark_color_as_moved(struct diff_options *o,
free(pmb);
 }
 
+#define DIFF_SYMBOL_MOVED_LINE_ZEBRA_MASK \
+  (DIFF_SYMBOL_MOVED_LINE | DIFF_SYMBOL_MOVED_LINE_ZEBRA)
+static void dim_moved_lines(struct diff_options *o)
+{
+   int n;
+   for (n = 0; n < o->emitted_symbols->nr; n++) {
+   struct emitted_diff_symbol *prev = (n != 0) ?
+   &o->emitted_symbols->buf[n - 1] : NULL;
+   struct emitted_diff_symbol *l = &o->emitted_symbols->buf[n];
+   struct emitted_diff_symbol *next =
+   (n < o->emitted_symbols->nr - 1) ?
+   &o->emitted_symbols->buf[n + 1] : NULL;
+
+   /* Not a plus or minus line? */
+   if (l->s != DIFF_SYMBOL_PLUS && l->s != DIFF_SYMBOL_MINUS)
+   continue;
+
+   /* Not a moved line? */
+   if (!(l->flags & DIFF_SYMBOL_MOVED_LINE))
+   continue;
+
+   /*
+* If prev or next are not a plus or minus line,
+* pretend they don't exist
+*/
+   if (prev && prev->s != DIFF_SYMBOL_PLUS &&
+   prev->s != DIFF_SYMBOL_MINUS)
+   prev = NULL;
+   if (next && next->s != DIFF_SYMBOL_PLUS &&
+   next->s != DIFF_SYMBOL_MINUS)
+   next = NULL;
+
+   /* Inside a block? */
+   if (

[PATCH 03/26] diff.c: factor out diff_flush_patch_all_file_pairs

2017-06-19 Thread Stefan Beller

In a later patch we want to do more things before and after all filepairs
are flushed. So factor flushing out all file pairs into its own function
that the new code can be plugged in easily.

Signed-off-by: Stefan Beller 
Signed-off-by: Junio C Hamano 
---
 diff.c | 17 -
 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/diff.c b/diff.c
index c2ed605cd0..2f9722b382 100644
--- a/diff.c
+++ b/diff.c
@@ -4737,6 +4737,17 @@ void diff_warn_rename_limit(const char *varname, int 
needed, int degraded_cc)
warning(_(rename_limit_advice), varname, needed);
 }
 
+static void diff_flush_patch_all_file_pairs(struct diff_options *o)
+{
+   int i;
+   struct diff_queue_struct *q = &diff_queued_diff;
+   for (i = 0; i < q->nr; i++) {
+   struct diff_filepair *p = q->queue[i];
+   if (check_pair_status(p))
+   diff_flush_patch(p, o);
+   }
+}
+
 void diff_flush(struct diff_options *options)
 {
struct diff_queue_struct *q = &diff_queued_diff;
@@ -4831,11 +4842,7 @@ void diff_flush(struct diff_options *options)
}
}
 
-   for (i = 0; i < q->nr; i++) {
-   struct diff_filepair *p = q->queue[i];
-   if (check_pair_status(p))
-   diff_flush_patch(p, options);
-   }
+   diff_flush_patch_all_file_pairs(options);
}
 
if (output_format & DIFF_FORMAT_CALLBACK)
-- 
2.12.2.575.gb14f27f917

[PATCH 11/26] diff.c: emit_diff_symbol learns DIFF_SYMBOL_FILEPAIR

2017-06-19 Thread Stefan Beller

Signed-off-by: Stefan Beller 
---
 diff.c | 42 +-
 1 file changed, 21 insertions(+), 21 deletions(-)

diff --git a/diff.c b/diff.c
index e6029e8213..7c56150b56 100644
--- a/diff.c
+++ b/diff.c
@@ -570,6 +570,7 @@ enum diff_symbol {
DIFF_SYMBOL_MINUS,
DIFF_SYMBOL_WORDS_PORCELAIN,
DIFF_SYMBOL_WORDS,
+   DIFF_SYMBOL_FILEPAIR,
 };
 /*
  * Flags for content lines:
@@ -610,7 +611,7 @@ static void emit_diff_symbol(struct diff_options *o, enum 
diff_symbol s,
 const char *line, int len, unsigned flags)
 {
static const char *nneof = " No newline at end of file\n";
-   const char *context, *reset, *set;
+   const char *context, *reset, *set, *meta;
switch (s) {
case DIFF_SYMBOL_SEPARATOR:
fprintf(o->file, "%s%c",
@@ -676,6 +677,14 @@ static void emit_diff_symbol(struct diff_options *o, enum 
diff_symbol s,
}
emit_line(o, context, reset, line, len);
break;
+   case DIFF_SYMBOL_FILEPAIR:
+   meta = diff_get_color_opt(o, DIFF_METAINFO);
+   reset = diff_get_color_opt(o, DIFF_RESET);
+   fprintf(o->file, "%s%s%s%s%s%s\n", diff_line_prefix(o), meta,
+   flags ? "+++ " : "--- ",
+   line, reset,
+   strchr(line, ' ') ? "\t" : "");
+   break;
default:
die("BUG: unknown diff symbol");
}
@@ -847,8 +856,6 @@ static void emit_rewrite_diff(const char *name_a,
  struct diff_options *o)
 {
int lc_a, lc_b;
-   const char *name_a_tab, *name_b_tab;
-   const char *metainfo = diff_get_color(o->use_color, DIFF_METAINFO);
const char *fraginfo = diff_get_color(o->use_color, DIFF_FRAGINFO);
const char *reset = diff_get_color(o->use_color, DIFF_RESET);
static struct strbuf a_name = STRBUF_INIT, b_name = STRBUF_INIT;
@@ -868,8 +875,6 @@ static void emit_rewrite_diff(const char *name_a,
 
name_a += (*name_a == '/');
name_b += (*name_b == '/');
-   name_a_tab = strchr(name_a, ' ') ? "\t" : "";
-   name_b_tab = strchr(name_b, ' ') ? "\t" : "";
 
strbuf_reset(&a_name);
strbuf_reset(&b_name);
@@ -896,11 +901,11 @@ static void emit_rewrite_diff(const char *name_a,
 
lc_a = count_lines(data_one, size_one);
lc_b = count_lines(data_two, size_two);
-   fprintf(o->file,
-   "%s%s--- %s%s%s\n%s%s+++ %s%s%s\n%s%s@@ -",
-   line_prefix, metainfo, a_name.buf, name_a_tab, reset,
-   line_prefix, metainfo, b_name.buf, name_b_tab, reset,
-   line_prefix, fraginfo);
+
+   emit_diff_symbol(o, DIFF_SYMBOL_FILEPAIR, a_name.buf, a_name.len, 0);
+   emit_diff_symbol(o, DIFF_SYMBOL_FILEPAIR, b_name.buf, b_name.len, 1);
+
+   fprintf(o->file, "%s%s@@ -", line_prefix, fraginfo);
if (!o->irreversible_delete)
print_line_count(o->file, lc_a);
else
@@ -1369,10 +1374,8 @@ static void find_lno(const char *line, struct 
emit_callback *ecbdata)
 static void fn_out_consume(void *priv, char *line, unsigned long len)
 {
struct emit_callback *ecbdata = priv;
-   const char *meta = diff_get_color(ecbdata->color_diff, DIFF_METAINFO);
const char *reset = diff_get_color(ecbdata->color_diff, DIFF_RESET);
struct diff_options *o = ecbdata->opt;
-   const char *line_prefix = diff_line_prefix(o);
 
o->found_changes = 1;
 
@@ -1383,15 +1386,12 @@ static void fn_out_consume(void *priv, char *line, 
unsigned long len)
}
 
if (ecbdata->label_path[0]) {
-   const char *name_a_tab, *name_b_tab;
-
-   name_a_tab = strchr(ecbdata->label_path[0], ' ') ? "\t" : "";
-   name_b_tab = strchr(ecbdata->label_path[1], ' ') ? "\t" : "";
-
-   fprintf(o->file, "%s%s--- %s%s%s\n",
-   line_prefix, meta, ecbdata->label_path[0], reset, 
name_a_tab);
-   fprintf(o->file, "%s%s+++ %s%s%s\n",
-   line_prefix, meta, ecbdata->label_path[1], reset, 
name_b_tab);
+   emit_diff_symbol(o, DIFF_SYMBOL_FILEPAIR,
+ecbdata->label_path[0],
+strlen(ecbdata->label_path[0]), 0);
+   emit_diff_symbol(o, DIFF_SYMBOL_FILEPAIR,
+ecbdata->label_path[1],
+strlen(ecbdata->label_path[1]), 1);
ecbdata->label_path[0] = ecbdata->label_path[1] = NULL;
}
 
-- 
2.12.2.575.gb14f27f917

[PATCH 00/26] reroll of sb/diff-color-moved

2017-06-19 Thread Stefan Beller

This is a complete rewrite of the series. Highlights:
* instead of buffering partial lines, we'll pretend all diff output
  follows a well defined grammar, and we emit symbols thereof.
  (The difference is mostly mental, though by this trick we reduce
  the memory footprint for storing one of these symbols from 7 variables
  (3 pointers, 3 ints, one state (also int) down to 4 variables
  (one pointer, 2 ints, one state).
* The algorithm for color painting was detangled:
  -> different functions for block detection and dimming
  -> The last patch (not to be applied) is an RFC that shows
 how we would approach non-colored, but machine parseable highlighting
 of moved lines.

Thanks,
Stefan

Stefan Beller (26):
  diff.c: readability fix
  diff.c: move line ending check into emit_hunk_header
  diff.c: factor out diff_flush_patch_all_file_pairs
  diff.c: introduce emit_diff_symbol
  diff.c: emit_diff_symbol learns DIFF_SYMBOL_CONTEXT_MARKER
  diff.c: emit_diff_symbol learns DIFF_SYMBOL_CONTEXT_FRAGINFO
  diff.c: emit_diff_symbol learns DIFF_SYMBOL_NO_LF_EOF
  diff.c: migrate emit_line_checked to use emit_diff_symbol
  diff.c: emit_diff_symbol learns DIFF_SYMBOL_WORDS{_PORCELAIN}
  diff.c: emit_diff_symbol learns DIFF_SYMBOL_CONTEXT_INCOMPLETE
  diff.c: emit_diff_symbol learns DIFF_SYMBOL_FILEPAIR
  diff.c: emit_diff_symbol learns DIFF_SYMBOL_HEADER
  diff.c: emit_diff_symbol learns about DIFF_SYMBOL_BINARY_FILES
  diff.c: emit_diff_symbol learns DIFF_SYMBOL_REWRITE_DIFF
  submodule.c: migrate diff output to use emit_diff_symbol
  diff.c: convert emit_binary_diff_body to use emit_diff_symbol
  diff.c: convert show_stats to use emit_diff_symbol
  diff.c: convert word diffing to use emit_diff_symbol
  diff.c: emit_diff_symbol learns about DIFF_SYMBOL_STAT_SEP
  diff.c: emit_diff_symbol learns about DIFF_SYMBOL_SUMMARY
  diff.c: buffer all output if asked to
  diff.c: color moved lines differently
  diff.c: color moved lines differently, plain mode
  diff.c: add dimming to moved line detection
  diff: document the new --color-moved setting
  WIP/RFC: diff.c: have a "machine parseable" move coloring

 Documentation/config.txt   |   12 +-
 Documentation/diff-options.txt |   27 +
 cache.h|1 +
 color.h|2 +
 diff.c | 1283 
 diff.h |   39 +-
 submodule.c|   85 ++-
 submodule.h|   13 +-
 t/t4015-diff-whitespace.sh |  369 
 9 files changed, 1515 insertions(+), 316 deletions(-)

-- 
2.12.2.575.gb14f27f917

[PATCH 15/26] submodule.c: migrate diff output to use emit_diff_symbol

2017-06-19 Thread Stefan Beller

As the submodule process is no longer attached to the same stdout as
the superprojects process, we need to pass coloring explicitly.

Remove the colors from the function signatures, as all the coloring
decisions will be made either inside the child process or the final
emit_diff_symbol.

Signed-off-by: Stefan Beller 
---
 diff.c  | 89 -
 diff.h  |  9 +++
 submodule.c | 85 ++
 submodule.h | 13 +++--
 4 files changed, 128 insertions(+), 68 deletions(-)

diff --git a/diff.c b/diff.c
index 96ce53c5cf..42a9020d95 100644
--- a/diff.c
+++ b/diff.c
@@ -574,6 +574,13 @@ enum diff_symbol {
DIFF_SYMBOL_HEADER,
DIFF_SYMBOL_BINARY_FILES,
DIFF_SYMBOL_REWRITE_DIFF,
+   DIFF_SYMBOL_SUBMODULE_ADD,
+   DIFF_SYMBOL_SUBMODULE_DEL,
+   DIFF_SYMBOL_SUBMODULE_UNTRACKED,
+   DIFF_SYMBOL_SUBMODULE_MODIFIED,
+   DIFF_SYMBOL_SUBMODULE_HEADER,
+   DIFF_SYMBOL_SUBMODULE_ERROR,
+   DIFF_SYMBOL_SUBMODULE_PIPETHROUGH,
 };
 /*
  * Flags for content lines:
@@ -700,11 +707,77 @@ static void emit_diff_symbol(struct diff_options *o, enum 
diff_symbol s,
reset = diff_get_color_opt(o, DIFF_RESET);
emit_line(o, fraginfo, reset, line, len);
break;
+   case DIFF_SYMBOL_SUBMODULE_ADD:
+   set = diff_get_color_opt(o, DIFF_FILE_NEW);
+   reset = diff_get_color_opt(o, DIFF_RESET);
+   emit_line(o, set, reset, line, len);
+   break;
+   case DIFF_SYMBOL_SUBMODULE_DEL:
+   set = diff_get_color_opt(o, DIFF_FILE_OLD);
+   reset = diff_get_color_opt(o, DIFF_RESET);
+   emit_line(o, set, reset, line, len);
+   break;
+   case DIFF_SYMBOL_SUBMODULE_UNTRACKED:
+   fprintf(o->file, "%sSubmodule %s contains untracked content\n",
+   diff_line_prefix(o), line);
+   break;
+   case DIFF_SYMBOL_SUBMODULE_MODIFIED:
+   fprintf(o->file, "%sSubmodule %s contains modified content\n",
+   diff_line_prefix(o), line);
+   break;
+   case DIFF_SYMBOL_SUBMODULE_HEADER:
+   fprintf(o->file, "%s%s", diff_line_prefix(o), line);
+   break;
+   case DIFF_SYMBOL_SUBMODULE_ERROR:
+   emit_line(o, "", "", line, len);
+   break;
+   case DIFF_SYMBOL_SUBMODULE_PIPETHROUGH:
+   emit_line(o, "", "", line, len);
+   break;
default:
die("BUG: unknown diff symbol");
}
 }
 
+void diff_emit_submodule_del(struct diff_options *o, const char *line)
+{
+   emit_diff_symbol(o, DIFF_SYMBOL_SUBMODULE_DEL, line, strlen(line), 0);
+}
+
+void diff_emit_submodule_add(struct diff_options *o, const char *line)
+{
+   emit_diff_symbol(o, DIFF_SYMBOL_SUBMODULE_ADD, line, strlen(line), 0);
+}
+
+void diff_emit_submodule_untracked(struct diff_options *o, const char *path)
+{
+   emit_diff_symbol(o, DIFF_SYMBOL_SUBMODULE_UNTRACKED,
+path, strlen(path), 0);
+}
+
+void diff_emit_submodule_modified(struct diff_options *o, const char *path)
+{
+   emit_diff_symbol(o, DIFF_SYMBOL_SUBMODULE_MODIFIED,
+path, strlen(path), 0);
+}
+
+void diff_emit_submodule_header(struct diff_options *o, const char *header)
+{
+   emit_diff_symbol(o, DIFF_SYMBOL_SUBMODULE_HEADER,
+header, strlen(header), 0);
+}
+
+void diff_emit_submodule_error(struct diff_options *o, const char *err)
+{
+   emit_diff_symbol(o, DIFF_SYMBOL_SUBMODULE_ERROR, err, strlen(err), 0);
+}
+
+void diff_emit_submodule_pipethrough(struct diff_options *o,
+const char *line, int len)
+{
+   emit_diff_symbol(o, DIFF_SYMBOL_SUBMODULE_PIPETHROUGH, line, len, 0);
+}
+
 static int new_blank_line_at_eof(struct emit_callback *ecbdata, const char 
*line, int len)
 {
if (!((ecbdata->ws_rule & WS_BLANK_AT_EOF) &&
@@ -2465,24 +2538,16 @@ static void builtin_diff(const char *name_a,
if (o->submodule_format == DIFF_SUBMODULE_LOG &&
(!one->mode || S_ISGITLINK(one->mode)) &&
(!two->mode || S_ISGITLINK(two->mode))) {
-   const char *del = diff_get_color_opt(o, DIFF_FILE_OLD);
-   const char *add = diff_get_color_opt(o, DIFF_FILE_NEW);
-   show_submodule_summary(o->file, one->path ? one->path : 
two->path,
-   line_prefix,
+   show_submodule_summary(o, one->path ? one->path : two->path,
&one->oid, &two->oid,
-   two->dirty_submodule,
-   meta, del, add, reset);
+   two->dirty_submodule);
return;
} else if (o->submodule_format == DIFF_SUBMODULE_INLINE_DIFF

[PATCH 18/26] diff.c: convert word diffing to use emit_diff_symbol

2017-06-19 Thread Stefan Beller

The word diffing is not line oriented and would need some serious
effort to be transformed into a line oriented approach, so
just go with a symbol DIFF_SYMBOL_WORD_DIFF that is a partial line.

Signed-off-by: Stefan Beller 
---
 diff.c | 79 ++
 1 file changed, 46 insertions(+), 33 deletions(-)

diff --git a/diff.c b/diff.c
index 0a5ff2500f..5b46baa12a 100644
--- a/diff.c
+++ b/diff.c
@@ -590,6 +590,7 @@ enum diff_symbol {
DIFF_SYMBOL_STATS_SUMMARY_ABBREV,
DIFF_SYMBOL_STATS_SUMMARY_INSERTS_DELETES,
DIFF_SYMBOL_STATS_LINE,
+   DIFF_SYMBOL_WORD_DIFF,
 };
 /*
  * Flags for content lines:
@@ -772,6 +773,9 @@ static void emit_diff_symbol(struct diff_options *o, enum 
diff_symbol s,
case DIFF_SYMBOL_STATS_SUMMARY_INSERTS_DELETES:
emit_line(o, "", "", line, len);
break;
+   case DIFF_SYMBOL_WORD_DIFF:
+   fprintf(o->file, "%.*s", len, line);
+   break;
default:
die("BUG: unknown diff symbol");
}
@@ -1100,37 +1104,49 @@ struct diff_words_data {
struct diff_words_style *style;
 };
 
-static int fn_out_diff_words_write_helper(FILE *fp,
+static int fn_out_diff_words_write_helper(struct diff_options *o,
  struct diff_words_style_elem *st_el,
  const char *newline,
- size_t count, const char *buf,
- const char *line_prefix)
+ size_t count, const char *buf)
 {
int print = 0;
+   struct strbuf sb = STRBUF_INIT;
 
while (count) {
char *p = memchr(buf, '\n', count);
if (print)
-   fputs(line_prefix, fp);
+   strbuf_addstr(&sb, diff_line_prefix(o));
+
if (p != buf) {
-   if (st_el->color && fputs(st_el->color, fp) < 0)
-   return -1;
-   if (fputs(st_el->prefix, fp) < 0 ||
-   fwrite(buf, p ? p - buf : count, 1, fp) != 1 ||
-   fputs(st_el->suffix, fp) < 0)
-   return -1;
-   if (st_el->color && *st_el->color
-   && fputs(GIT_COLOR_RESET, fp) < 0)
-   return -1;
+   const char *reset = st_el->color && *st_el->color ?
+   GIT_COLOR_RESET : NULL;
+   if (st_el->color && *st_el->color)
+   strbuf_addstr(&sb, st_el->color);
+   strbuf_addstr(&sb, st_el->prefix);
+   strbuf_add(&sb, buf, p ? p - buf : count);
+   strbuf_addstr(&sb, st_el->suffix);
+   if (reset)
+   strbuf_addstr(&sb, reset);
}
if (!p)
-   return 0;
-   if (fputs(newline, fp) < 0)
-   return -1;
+   goto out;
+
+   strbuf_addstr(&sb, newline);
count -= p + 1 - buf;
buf = p + 1;
print = 1;
+   if (count) {
+   emit_diff_symbol(o, DIFF_SYMBOL_WORD_DIFF,
+sb.buf, sb.len, 0);
+   strbuf_reset(&sb);
+   }
}
+
+out:
+   if (sb.len)
+   emit_diff_symbol(o, DIFF_SYMBOL_WORD_DIFF,
+sb.buf, sb.len, 0);
+   strbuf_release(&sb);
return 0;
 }
 
@@ -1212,24 +1228,20 @@ static void fn_out_diff_words_aux(void *priv, char 
*line, unsigned long len)
fputs(line_prefix, diff_words->opt->file);
}
if (diff_words->current_plus != plus_begin) {
-   fn_out_diff_words_write_helper(diff_words->opt->file,
+   fn_out_diff_words_write_helper(diff_words->opt,
&style->ctx, style->newline,
plus_begin - diff_words->current_plus,
-   diff_words->current_plus, line_prefix);
-   if (*(plus_begin - 1) == '\n')
-   fputs(line_prefix, diff_words->opt->file);
+   diff_words->current_plus);
}
if (minus_begin != minus_end) {
-   fn_out_diff_words_write_helper(diff_words->opt->file,
+   fn_out_diff_words_write_helper(diff_words->opt,
&style->old, style->newline,
-   minus_end - minus_begin, minus_begin,
-   line_prefix);
+   minus_end - minus_begin, minus_begin);
}
if (plus_begin !=

[PATCH 02/26] diff.c: move line ending check into emit_hunk_header

2017-06-19 Thread Stefan Beller

The emit_hunk_header() function is responsible for assembling a
hunk header and calling emit_line() to send the hunk header
to the output file.  Its only caller fn_out_consume() needs
to prepare for a case where the function emits an incomplete
line and add the terminating LF.

Instead make sure emit_hunk_header() to always send a
completed line to emit_line().

Signed-off-by: Stefan Beller 
Signed-off-by: Junio C Hamano 
---
 diff.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/diff.c b/diff.c
index 3f5bf8b5a4..c2ed605cd0 100644
--- a/diff.c
+++ b/diff.c
@@ -677,6 +677,8 @@ static void emit_hunk_header(struct emit_callback *ecbdata,
}
 
strbuf_add(&msgbuf, line + len, org_len - len);
+   strbuf_complete_line(&msgbuf);
+
emit_line(ecbdata->opt, "", "", msgbuf.buf, msgbuf.len);
strbuf_release(&msgbuf);
 }
@@ -1315,8 +1317,6 @@ static void fn_out_consume(void *priv, char *line, 
unsigned long len)
len = sane_truncate_line(ecbdata, line, len);
find_lno(line, ecbdata);
emit_hunk_header(ecbdata, line, len);
-   if (line[len-1] != '\n')
-   putc('\n', o->file);
return;
}
 
-- 
2.12.2.575.gb14f27f917

[PATCH 17/26] diff.c: convert show_stats to use emit_diff_symbol

2017-06-19 Thread Stefan Beller

We call print_stat_summary from builtin/apply, so we still
need the version with a file pointer, so introduce
print_stat_summary_0 that uses emit_string machinery and
keep print_stat_summary with the same arguments around.

Signed-off-by: Stefan Beller 
---
 diff.c | 118 +
 diff.h |   4 +--
 2 files changed, 77 insertions(+), 45 deletions(-)

diff --git a/diff.c b/diff.c
index e6ade5fde0..0a5ff2500f 100644
--- a/diff.c
+++ b/diff.c
@@ -586,7 +586,10 @@ enum diff_symbol {
DIFF_SYMBOL_SUBMODULE_HEADER,
DIFF_SYMBOL_SUBMODULE_ERROR,
DIFF_SYMBOL_SUBMODULE_PIPETHROUGH,
-
+   DIFF_SYMBOL_STATS_SUMMARY_NO_FILES,
+   DIFF_SYMBOL_STATS_SUMMARY_ABBREV,
+   DIFF_SYMBOL_STATS_SUMMARY_INSERTS_DELETES,
+   DIFF_SYMBOL_STATS_LINE,
 };
 /*
  * Flags for content lines:
@@ -628,6 +631,7 @@ static void emit_diff_symbol(struct diff_options *o, enum 
diff_symbol s,
 {
static const char *nneof = " No newline at end of file\n";
const char *context, *reset, *set, *meta, *fraginfo;
+   struct strbuf sb = STRBUF_INIT;
switch (s) {
case DIFF_SYMBOL_SEPARATOR:
fprintf(o->file, "%s%c",
@@ -756,9 +760,22 @@ static void emit_diff_symbol(struct diff_options *o, enum 
diff_symbol s,
case DIFF_SYMBOL_SUBMODULE_PIPETHROUGH:
emit_line(o, "", "", line, len);
break;
+   case DIFF_SYMBOL_STATS_SUMMARY_NO_FILES:
+   fprintf(o->file, " 0 files changed\n");
+   break;
+   case DIFF_SYMBOL_STATS_SUMMARY_ABBREV:
+   emit_line(o, "", "", " ...\n", strlen(" ...\n"));
+   break;
+   case DIFF_SYMBOL_STATS_LINE:
+   emit_line(o, "", "", line, len);
+   break;
+   case DIFF_SYMBOL_STATS_SUMMARY_INSERTS_DELETES:
+   emit_line(o, "", "", line, len);
+   break;
default:
die("BUG: unknown diff symbol");
}
+   strbuf_release(&sb);
 }
 
 void diff_emit_submodule_del(struct diff_options *o, const char *line)
@@ -1712,20 +1729,14 @@ static int scale_linear(int it, int width, int 
max_change)
return 1 + (it * (width - 1) / max_change);
 }
 
-static void show_name(FILE *file,
- const char *prefix, const char *name, int len)
-{
-   fprintf(file, " %s%-*s |", prefix, len, name);
-}
-
-static void show_graph(FILE *file, char ch, int cnt, const char *set, const 
char *reset)
+static void show_graph(struct strbuf *out, char ch, int cnt,
+  const char *set, const char *reset)
 {
if (cnt <= 0)
return;
-   fprintf(file, "%s", set);
-   while (cnt--)
-   putc(ch, file);
-   fprintf(file, "%s", reset);
+   strbuf_addstr(out, set);
+   strbuf_addchars(out, ch, cnt);
+   strbuf_addstr(out, reset);
 }
 
 static void fill_print_name(struct diffstat_file *file)
@@ -1749,14 +1760,16 @@ static void fill_print_name(struct diffstat_file *file)
file->print_name = pname;
 }
 
-int print_stat_summary(FILE *fp, int files, int insertions, int deletions)
+static void print_stat_summary_inserts_deletes(struct diff_options *options,
+   int files, int insertions, int deletions)
 {
struct strbuf sb = STRBUF_INIT;
-   int ret;
 
if (!files) {
assert(insertions == 0 && deletions == 0);
-   return fprintf(fp, "%s\n", " 0 files changed");
+   emit_diff_symbol(options, DIFF_SYMBOL_STATS_SUMMARY_NO_FILES,
+NULL, 0, 0);
+   return;
}
 
strbuf_addf(&sb,
@@ -1783,9 +1796,19 @@ int print_stat_summary(FILE *fp, int files, int 
insertions, int deletions)
deletions);
}
strbuf_addch(&sb, '\n');
-   ret = fputs(sb.buf, fp);
+   emit_diff_symbol(options, DIFF_SYMBOL_STATS_SUMMARY_INSERTS_DELETES,
+sb.buf, sb.len, 0);
strbuf_release(&sb);
-   return ret;
+}
+
+void print_stat_summary(FILE *fp, int files,
+   int insertions, int deletions)
+{
+   struct diff_options o;
+   memset(&o, 0, sizeof(o));
+   o.file = fp;
+
+   print_stat_summary_inserts_deletes(&o, files, insertions, deletions);
 }
 
 static void show_stats(struct diffstat_t *data, struct diff_options *options)
@@ -1795,13 +1818,13 @@ static void show_stats(struct diffstat_t *data, struct 
diff_options *options)
int total_files = data->nr, count;
int width, name_width, graph_width, number_width = 0, bin_width = 0;
const char *reset, *add_c, *del_c;
-   const char *line_prefix = "";
int extra_shown = 0;
+   const char *line_prefix = diff_line_prefix(options);
+   struct strbuf out = STRBUF_INIT;
 
if (data->nr == 0)
return;
 
-   line_prefix = diff_line_prefix(options);

[PATCH 25/26] diff: document the new --color-moved setting

2017-06-19 Thread Stefan Beller

Signed-off-by: Stefan Beller 
---
 Documentation/config.txt   | 12 ++--
 Documentation/diff-options.txt | 27 +++
 2 files changed, 37 insertions(+), 2 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 475e874d51..29e0b9fa69 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1051,14 +1051,22 @@ This does not affect linkgit:git-format-patch[1] or the
 'git-diff-{asterisk}' plumbing commands.  Can be overridden on the
 command line with the `--color[=]` option.
 
+diff.colorMoved::
+   If set moved lines in a diff are colored differently,
+   for details see '--color-moved' in linkgit:git-diff[1].
+
 color.diff.::
Use customized color for diff colorization.  `` specifies
which part of the patch to use the specified color, and is one
of `context` (context text - `plain` is a historical synonym),
`meta` (metainformation), `frag`
(hunk header), 'func' (function in hunk header), `old` (removed lines),
-   `new` (added lines), `commit` (commit headers), or `whitespace`
-   (highlighting whitespace errors).
+   `new` (added lines), `commit` (commit headers), `whitespace`
+   (highlighting whitespace errors), `oldMoved` (deleted lines),
+   `newMoved` (added lines), `oldMovedDimmed`, `oldMovedAlternative`,
+   `oldMovedAlternativeDimmed`, `newMovedDimmed`, `newMovedAlternative`
+   and `newMovedAlternativeDimmed` (See the ''
+   setting of '--color-moved' in linkgit:git-diff[1] for details).
 
 color.decorate.::
Use customized color for 'git log --decorate' output.  `` is one
diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index 89cc0f48de..058c8014ed 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -231,6 +231,33 @@ ifdef::git-diff[]
 endif::git-diff[]
It is the same as `--color=never`.
 
+--color-moved[=]::
+   Moved lines of code are colored differently.
+ifdef::git-diff[]
+   It can be changed by the `diff.colorMoved` configuration setting.
+endif::git-diff[]
+   The  defaults to 'no' if the option is not given
+   and to 'dimmed_zebra' if the option with no mode is given.
+   The mode must be one of:
++
+--
+no::
+   Moved lines are not highlighted.
+plain::
+   Any line that is added in one location and was removed
+   in another location will be colored with 'color.diff.newMoved'.
+   Similarly 'color.diff.oldMoved' will be used for removed lines
+   that are added somewhere else in the diff.
+zebra::
+   Blocks of moved code are detected. The detected blocks are
+   painted using the 'color.diff.{old,new}Moved' alternating with
+   'color.diff.{old,new}MovedAlternative'.
+dimmed_zebra::
+   Similar to 'zebra', but additional dimming of uninteresting parts
+   of moved code is performed. The bordering lines of two adjacent
+   blocks are considered interesting, the rest is uninteresting.
+--
+
 --word-diff[=]::
Show a word diff, using the  to delimit changed words.
By default, words are delimited by whitespace; see
-- 
2.12.2.575.gb14f27f917

[PATCH 04/26] diff.c: introduce emit_diff_symbol

2017-06-19 Thread Stefan Beller

Signed-off-by: Stefan Beller 
---
 diff.c | 22 +++---
 1 file changed, 19 insertions(+), 3 deletions(-)

diff --git a/diff.c b/diff.c
index 2f9722b382..89466018e5 100644
--- a/diff.c
+++ b/diff.c
@@ -559,6 +559,24 @@ static void emit_line(struct diff_options *o, const char 
*set, const char *reset
emit_line_0(o, set, reset, line[0], line+1, len-1);
 }
 
+enum diff_symbol {
+   DIFF_SYMBOL_SEPARATOR,
+};
+
+static void emit_diff_symbol(struct diff_options *o, enum diff_symbol s,
+const char *line, int len)
+{
+   switch (s) {
+   case DIFF_SYMBOL_SEPARATOR:
+   fprintf(o->file, "%s%c",
+   diff_line_prefix(o),
+   o->line_termination);
+   break;
+   default:
+   die("BUG: unknown diff symbol");
+   }
+}
+
 static int new_blank_line_at_eof(struct emit_callback *ecbdata, const char 
*line, int len)
 {
if (!((ecbdata->ws_rule & WS_BLANK_AT_EOF) &&
@@ -4833,9 +4851,7 @@ void diff_flush(struct diff_options *options)
 
if (output_format & DIFF_FORMAT_PATCH) {
if (separator) {
-   fprintf(options->file, "%s%c",
-   diff_line_prefix(options),
-   options->line_termination);
+   emit_diff_symbol(options, DIFF_SYMBOL_SEPARATOR, NULL, 
0);
if (options->stat_sep) {
/* attach patch instead of inline */
fputs(options->stat_sep, options->file);
-- 
2.12.2.575.gb14f27f917

[PATCH 14/26] diff.c: emit_diff_symbol learns DIFF_SYMBOL_REWRITE_DIFF

2017-06-19 Thread Stefan Beller

Signed-off-by: Stefan Beller 
---
 diff.c | 35 +--
 1 file changed, 21 insertions(+), 14 deletions(-)

diff --git a/diff.c b/diff.c
index e56962b844..96ce53c5cf 100644
--- a/diff.c
+++ b/diff.c
@@ -573,6 +573,7 @@ enum diff_symbol {
DIFF_SYMBOL_FILEPAIR,
DIFF_SYMBOL_HEADER,
DIFF_SYMBOL_BINARY_FILES,
+   DIFF_SYMBOL_REWRITE_DIFF,
 };
 /*
  * Flags for content lines:
@@ -613,7 +614,7 @@ static void emit_diff_symbol(struct diff_options *o, enum 
diff_symbol s,
 const char *line, int len, unsigned flags)
 {
static const char *nneof = " No newline at end of file\n";
-   const char *context, *reset, *set, *meta;
+   const char *context, *reset, *set, *meta, *fraginfo;
switch (s) {
case DIFF_SYMBOL_SEPARATOR:
fprintf(o->file, "%s%c",
@@ -694,6 +695,11 @@ static void emit_diff_symbol(struct diff_options *o, enum 
diff_symbol s,
fprintf(o->file, "%sBinary files %s differ\n",
diff_line_prefix(o), line);
break;
+   case DIFF_SYMBOL_REWRITE_DIFF:
+   fraginfo = diff_get_color(o->use_color, DIFF_FRAGINFO);
+   reset = diff_get_color_opt(o, DIFF_RESET);
+   emit_line(o, fraginfo, reset, line, len);
+   break;
default:
die("BUG: unknown diff symbol");
}
@@ -816,17 +822,17 @@ static void remove_tempfile(void)
}
 }
 
-static void print_line_count(FILE *file, int count)
+static void add_line_count(struct strbuf *out, int count)
 {
switch (count) {
case 0:
-   fprintf(file, "0,0");
+   strbuf_addstr(out, "0,0");
break;
case 1:
-   fprintf(file, "1");
+   strbuf_addstr(out, "1");
break;
default:
-   fprintf(file, "1,%d", count);
+   strbuf_addf(out, "1,%d", count);
break;
}
 }
@@ -865,14 +871,12 @@ static void emit_rewrite_diff(const char *name_a,
  struct diff_options *o)
 {
int lc_a, lc_b;
-   const char *fraginfo = diff_get_color(o->use_color, DIFF_FRAGINFO);
-   const char *reset = diff_get_color(o->use_color, DIFF_RESET);
static struct strbuf a_name = STRBUF_INIT, b_name = STRBUF_INIT;
const char *a_prefix, *b_prefix;
char *data_one, *data_two;
size_t size_one, size_two;
struct emit_callback ecbdata;
-   const char *line_prefix = diff_line_prefix(o);
+   struct strbuf out = STRBUF_INIT;
 
if (diff_mnemonic_prefix && DIFF_OPT_TST(o, REVERSE_DIFF)) {
a_prefix = o->b_prefix;
@@ -914,14 +918,17 @@ static void emit_rewrite_diff(const char *name_a,
emit_diff_symbol(o, DIFF_SYMBOL_FILEPAIR, a_name.buf, a_name.len, 0);
emit_diff_symbol(o, DIFF_SYMBOL_FILEPAIR, b_name.buf, b_name.len, 1);
 
-   fprintf(o->file, "%s%s@@ -", line_prefix, fraginfo);
+   strbuf_addstr(&out, "@@ -");
if (!o->irreversible_delete)
-   print_line_count(o->file, lc_a);
+   add_line_count(&out, lc_a);
else
-   fprintf(o->file, "?,?");
-   fprintf(o->file, " +");
-   print_line_count(o->file, lc_b);
-   fprintf(o->file, " @@%s\n", reset);
+   strbuf_addstr(&out, "?,?");
+   strbuf_addstr(&out, " +");
+   add_line_count(&out, lc_b);
+   strbuf_addstr(&out, " @@\n");
+   emit_diff_symbol(o, DIFF_SYMBOL_REWRITE_DIFF, out.buf, out.len, 0);
+   strbuf_release(&out);
+
if (lc_a && !o->irreversible_delete)
emit_rewrite_lines(&ecbdata, '-', data_one, size_one);
if (lc_b)
-- 
2.12.2.575.gb14f27f917

[PATCH 12/26] diff.c: emit_diff_symbol learns DIFF_SYMBOL_HEADER

2017-06-19 Thread Stefan Beller

Signed-off-by: Stefan Beller 
---
 diff.c | 28 
 1 file changed, 20 insertions(+), 8 deletions(-)

diff --git a/diff.c b/diff.c
index 7c56150b56..95f11ea86f 100644
--- a/diff.c
+++ b/diff.c
@@ -571,6 +571,7 @@ enum diff_symbol {
DIFF_SYMBOL_WORDS_PORCELAIN,
DIFF_SYMBOL_WORDS,
DIFF_SYMBOL_FILEPAIR,
+   DIFF_SYMBOL_HEADER,
 };
 /*
  * Flags for content lines:
@@ -685,6 +686,9 @@ static void emit_diff_symbol(struct diff_options *o, enum 
diff_symbol s,
line, reset,
strchr(line, ' ') ? "\t" : "");
break;
+   case DIFF_SYMBOL_HEADER:
+   fprintf(o->file, "%s", line);
+   break;
default:
die("BUG: unknown diff symbol");
}
@@ -1380,7 +1384,8 @@ static void fn_out_consume(void *priv, char *line, 
unsigned long len)
o->found_changes = 1;
 
if (ecbdata->header) {
-   fprintf(o->file, "%s", ecbdata->header->buf);
+   emit_diff_symbol(o, DIFF_SYMBOL_HEADER,
+ecbdata->header->buf, ecbdata->header->len, 0);
strbuf_reset(ecbdata->header);
ecbdata->header = NULL;
}
@@ -2514,7 +2519,8 @@ static void builtin_diff(const char *name_a,
if (complete_rewrite &&
(textconv_one || !diff_filespec_is_binary(one)) &&
(textconv_two || !diff_filespec_is_binary(two))) {
-   fprintf(o->file, "%s", header.buf);
+   emit_diff_symbol(o, DIFF_SYMBOL_HEADER,
+header.buf, header.len, 0);
strbuf_reset(&header);
emit_rewrite_diff(name_a, name_b, one, two,
textconv_one, textconv_two, o);
@@ -2524,7 +2530,8 @@ static void builtin_diff(const char *name_a,
}
 
if (o->irreversible_delete && lbl[1][0] == '/') {
-   fprintf(o->file, "%s", header.buf);
+   emit_diff_symbol(o, DIFF_SYMBOL_HEADER, header.buf,
+header.len, 0);
strbuf_reset(&header);
goto free_ab_and_return;
} else if (!DIFF_OPT_TST(o, TEXT) &&
@@ -2535,10 +2542,13 @@ static void builtin_diff(const char *name_a,
!DIFF_OPT_TST(o, BINARY)) {
if (!oidcmp(&one->oid, &two->oid)) {
if (must_show_header)
-   fprintf(o->file, "%s", header.buf);
+   emit_diff_symbol(o, DIFF_SYMBOL_HEADER,
+header.buf, header.len,
+0);
goto free_ab_and_return;
}
-   fprintf(o->file, "%s", header.buf);
+   emit_diff_symbol(o, DIFF_SYMBOL_HEADER,
+header.buf, header.len, 0);
fprintf(o->file, "%sBinary files %s and %s differ\n",
line_prefix, lbl[0], lbl[1]);
goto free_ab_and_return;
@@ -2549,10 +2559,11 @@ static void builtin_diff(const char *name_a,
if (mf1.size == mf2.size &&
!memcmp(mf1.ptr, mf2.ptr, mf1.size)) {
if (must_show_header)
-   fprintf(o->file, "%s", header.buf);
+   emit_diff_symbol(o, DIFF_SYMBOL_HEADER,
+header.buf, header.len, 0);
goto free_ab_and_return;
}
-   fprintf(o->file, "%s", header.buf);
+   emit_diff_symbol(o, DIFF_SYMBOL_HEADER, header.buf, header.len, 
0);
strbuf_reset(&header);
if (DIFF_OPT_TST(o, BINARY))
emit_binary_diff(o->file, &mf1, &mf2, line_prefix);
@@ -2570,7 +2581,8 @@ static void builtin_diff(const char *name_a,
const struct userdiff_funcname *pe;
 
if (must_show_header) {
-   fprintf(o->file, "%s", header.buf);
+   emit_diff_symbol(o, DIFF_SYMBOL_HEADER,
+header.buf, header.len, 0);
strbuf_reset(&header);
}
 
-- 
2.12.2.575.gb14f27f917

[PATCH 07/26] diff.c: emit_diff_symbol learns DIFF_SYMBOL_NO_LF_EOF

2017-06-19 Thread Stefan Beller

Signed-off-by: Stefan Beller 
---
 diff.c | 19 +++
 1 file changed, 11 insertions(+), 8 deletions(-)

diff --git a/diff.c b/diff.c
index a1804a7785..28be3226c2 100644
--- a/diff.c
+++ b/diff.c
@@ -563,11 +563,13 @@ enum diff_symbol {
DIFF_SYMBOL_SEPARATOR,
DIFF_SYMBOL_CONTEXT_MARKER,
DIFF_SYMBOL_CONTEXT_FRAGINFO,
+   DIFF_SYMBOL_NO_LF_EOF,
 };
 
 static void emit_diff_symbol(struct diff_options *o, enum diff_symbol s,
 const char *line, int len)
 {
+   static const char *nneof = " No newline at end of file\n";
const char *context, *reset;
switch (s) {
case DIFF_SYMBOL_SEPARATOR:
@@ -583,6 +585,13 @@ static void emit_diff_symbol(struct diff_options *o, enum 
diff_symbol s,
case DIFF_SYMBOL_CONTEXT_FRAGINFO:
emit_line(o, "", "", line, len);
break;
+   case DIFF_SYMBOL_NO_LF_EOF:
+   context = diff_get_color_opt(o, DIFF_CONTEXT);
+   reset = diff_get_color_opt(o, DIFF_RESET);
+   putc('\n', o->file);
+   emit_line_0(o, context, reset, '\\',
+   nneof, strlen(nneof));
+   break;
default:
die("BUG: unknown diff symbol");
}
@@ -750,7 +759,6 @@ static void emit_rewrite_lines(struct emit_callback *ecb,
   int prefix, const char *data, int size)
 {
const char *endp = NULL;
-   static const char *nneof = " No newline at end of file\n";
const char *reset = diff_get_color(ecb->color_diff, DIFF_RESET);
 
while (0 < size) {
@@ -768,13 +776,8 @@ static void emit_rewrite_lines(struct emit_callback *ecb,
size -= len;
data += len;
}
-   if (!endp) {
-   const char *context = diff_get_color(ecb->color_diff,
-DIFF_CONTEXT);
-   putc('\n', ecb->opt->file);
-   emit_line_0(ecb->opt, context, reset, '\\',
-   nneof, strlen(nneof));
-   }
+   if (!endp)
+   emit_diff_symbol(ecb->opt, DIFF_SYMBOL_NO_LF_EOF, NULL, 0);
 }
 
 static void emit_rewrite_diff(const char *name_a,
-- 
2.12.2.575.gb14f27f917

[PATCH 06/26] diff.c: emit_diff_symbol learns DIFF_SYMBOL_CONTEXT_FRAGINFO

2017-06-19 Thread Stefan Beller

Signed-off-by: Stefan Beller 
---
 diff.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/diff.c b/diff.c
index 3af07fa659..a1804a7785 100644
--- a/diff.c
+++ b/diff.c
@@ -562,6 +562,7 @@ static void emit_line(struct diff_options *o, const char 
*set, const char *reset
 enum diff_symbol {
DIFF_SYMBOL_SEPARATOR,
DIFF_SYMBOL_CONTEXT_MARKER,
+   DIFF_SYMBOL_CONTEXT_FRAGINFO,
 };
 
 static void emit_diff_symbol(struct diff_options *o, enum diff_symbol s,
@@ -579,6 +580,9 @@ static void emit_diff_symbol(struct diff_options *o, enum 
diff_symbol s,
reset = diff_get_color_opt(o, DIFF_RESET);
emit_line(o, context, reset, line, len);
break;
+   case DIFF_SYMBOL_CONTEXT_FRAGINFO:
+   emit_line(o, "", "", line, len);
+   break;
default:
die("BUG: unknown diff symbol");
}
@@ -704,8 +708,8 @@ static void emit_hunk_header(struct emit_callback *ecbdata,
 
strbuf_add(&msgbuf, line + len, org_len - len);
strbuf_complete_line(&msgbuf);
-
-   emit_line(ecbdata->opt, "", "", msgbuf.buf, msgbuf.len);
+   emit_diff_symbol(ecbdata->opt,
+DIFF_SYMBOL_CONTEXT_FRAGINFO, msgbuf.buf, msgbuf.len);
strbuf_release(&msgbuf);
 }
 
-- 
2.12.2.575.gb14f27f917

[PATCH 08/26] diff.c: migrate emit_line_checked to use emit_diff_symbol

2017-06-19 Thread Stefan Beller

Add a new flags field to emit_diff_symbol, that will be used by
context lines for:
* white space rules that are applicable (The first 12 bits)
* how the rules are evaluated (actually this double encodes the sign
  of the line, but the code is easier to keep this way, bits 13,14,15)
* if the line a blank line at EOF (bit 16)


Signed-off-by: Stefan Beller 
---
 cache.h |   1 +
 diff.c  | 112 
 diff.h  |   6 ++--
 3 files changed, 75 insertions(+), 44 deletions(-)

diff --git a/cache.h b/cache.h
index e1f0e182ad..4d63c44f07 100644
--- a/cache.h
+++ b/cache.h
@@ -2168,6 +2168,7 @@ void shift_tree_by(const struct object_id *, const struct 
object_id *, struct ob
 #define WS_TRAILING_SPACE  (WS_BLANK_AT_EOL|WS_BLANK_AT_EOF)
 #define WS_DEFAULT_RULE (WS_TRAILING_SPACE|WS_SPACE_BEFORE_TAB|8)
 #define WS_TAB_WIDTH_MASK077
+#define WS_RULE_MASK   0
 extern unsigned whitespace_rule_cfg;
 extern unsigned whitespace_rule(const char *);
 extern unsigned parse_whitespace_rule(const char *);
diff --git a/diff.c b/diff.c
index 28be3226c2..42c9f48dc2 100644
--- a/diff.c
+++ b/diff.c
@@ -564,13 +564,50 @@ enum diff_symbol {
DIFF_SYMBOL_CONTEXT_MARKER,
DIFF_SYMBOL_CONTEXT_FRAGINFO,
DIFF_SYMBOL_NO_LF_EOF,
+   DIFF_SYMBOL_CONTEXT,
+   DIFF_SYMBOL_PLUS,
+   DIFF_SYMBOL_MINUS,
 };
+/*
+ * Flags for content lines:
+ * 0..12 are whitespace rules
+ * 13-15 are WSEH_NEW | WSEH_OLD | WSEH_CONTEXT
+ * 16 is marking if the line is blank at EOF
+ */
+#define DIFF_SYMBOL_CONTENT_BLANK_LINE_EOF (1<<16)
+#define DIFF_SYMBOL_CONTENT_WS_MASK (WSEH_NEW | WSEH_OLD | WSEH_CONTEXT | 
WS_RULE_MASK)
+
+static void emit_line_ws_markup(struct diff_options *o,
+   const char *set, const char *reset,
+   const char *line, int len, char sign,
+   unsigned ws_rule, int blank_at_eof)
+{
+   const char *ws = NULL;
+
+   if (o->ws_error_highlight & ws_rule) {
+   ws = diff_get_color_opt(o, DIFF_WHITESPACE);
+   if (!*ws)
+   ws = NULL;
+   }
+
+   if (!ws)
+   emit_line_0(o, set, reset, sign, line, len);
+   else if (blank_at_eof)
+   /* Blank line at EOF - paint '+' as well */
+   emit_line_0(o, ws, reset, sign, line, len);
+   else {
+   /* Emit just the prefix, then the rest. */
+   emit_line_0(o, set, reset, sign, "", 0);
+   ws_check_emit(line, len, ws_rule,
+ o->file, set, reset, ws);
+   }
+}
 
 static void emit_diff_symbol(struct diff_options *o, enum diff_symbol s,
-const char *line, int len)
+const char *line, int len, unsigned flags)
 {
static const char *nneof = " No newline at end of file\n";
-   const char *context, *reset;
+   const char *context, *reset, *set;
switch (s) {
case DIFF_SYMBOL_SEPARATOR:
fprintf(o->file, "%s%c",
@@ -592,6 +629,25 @@ static void emit_diff_symbol(struct diff_options *o, enum 
diff_symbol s,
emit_line_0(o, context, reset, '\\',
nneof, strlen(nneof));
break;
+   case DIFF_SYMBOL_CONTEXT:
+   set = diff_get_color_opt(o, DIFF_CONTEXT);
+   reset = diff_get_color_opt(o, DIFF_RESET);
+   emit_line_ws_markup(o, set, reset, line, len, ' ',
+   flags & (DIFF_SYMBOL_CONTENT_WS_MASK), 0);
+   break;
+   case DIFF_SYMBOL_PLUS:
+   set = diff_get_color_opt(o, DIFF_FILE_NEW);
+   reset = diff_get_color_opt(o, DIFF_RESET);
+   emit_line_ws_markup(o, set, reset, line, len, '+',
+   flags & DIFF_SYMBOL_CONTENT_WS_MASK,
+   flags & DIFF_SYMBOL_CONTENT_BLANK_LINE_EOF);
+   break;
+   case DIFF_SYMBOL_MINUS:
+   set = diff_get_color_opt(o, DIFF_FILE_OLD);
+   reset = diff_get_color_opt(o, DIFF_RESET);
+   emit_line_ws_markup(o, set, reset, line, len, '-',
+   flags & DIFF_SYMBOL_CONTENT_WS_MASK, 0);
+   break;
default:
die("BUG: unknown diff symbol");
}
@@ -608,57 +664,31 @@ static int new_blank_line_at_eof(struct emit_callback 
*ecbdata, const char *line
return ws_blank_line(line, len, ecbdata->ws_rule);
 }
 
-static void emit_line_checked(const char *reset,
- struct emit_callback *ecbdata,
- const char *line, int len,
- enum color_diff color,
- unsigned ws_error_highlight,
- char sign)
-{
-   const char *set = diff_get_color(ecbdata->colo

[PATCH 19/26] diff.c: emit_diff_symbol learns about DIFF_SYMBOL_STAT_SEP

2017-06-19 Thread Stefan Beller

Signed-off-by: Stefan Beller 
---
 diff.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/diff.c b/diff.c
index 5b46baa12a..18bfc4720b 100644
--- a/diff.c
+++ b/diff.c
@@ -591,6 +591,7 @@ enum diff_symbol {
DIFF_SYMBOL_STATS_SUMMARY_INSERTS_DELETES,
DIFF_SYMBOL_STATS_LINE,
DIFF_SYMBOL_WORD_DIFF,
+   DIFF_SYMBOL_STAT_SEP,
 };
 /*
  * Flags for content lines:
@@ -776,6 +777,9 @@ static void emit_diff_symbol(struct diff_options *o, enum 
diff_symbol s,
case DIFF_SYMBOL_WORD_DIFF:
fprintf(o->file, "%.*s", len, line);
break;
+   case DIFF_SYMBOL_STAT_SEP:
+   fputs(o->stat_sep, o->file);
+   break;
default:
die("BUG: unknown diff symbol");
}
@@ -5085,10 +5089,10 @@ void diff_flush(struct diff_options *options)
if (output_format & DIFF_FORMAT_PATCH) {
if (separator) {
emit_diff_symbol(options, DIFF_SYMBOL_SEPARATOR, NULL, 
0, 0);
-   if (options->stat_sep) {
+   if (options->stat_sep)
/* attach patch instead of inline */
-   fputs(options->stat_sep, options->file);
-   }
+   emit_diff_symbol(options, DIFF_SYMBOL_STAT_SEP,
+NULL, 0, 0);
}
 
diff_flush_patch_all_file_pairs(options);
-- 
2.12.2.575.gb14f27f917

Re: [PATCH v4 5/5] stash: implement builtin stash

2017-06-19 Thread Joel Teichroeb

On Sun, Jun 11, 2017 at 2:27 PM, Thomas Gummerer  wrote:
>> +
>> +int cmd_stash(int argc, const char **argv, const char *prefix)
>> +{
>> + int result = 0;
>> + pid_t pid = getpid();
>> +
>> + struct option options[] = {
>> + OPT_END()
>> + };
>> +
>> + git_config(git_default_config, NULL);
>> +
>> + xsnprintf(stash_index_path, 64, ".git/index.stash.%d", pid);
>> +
>> + argc = parse_options(argc, argv, prefix, options, git_stash_usage,
>> + PARSE_OPT_KEEP_UNKNOWN|PARSE_OPT_KEEP_DASHDASH);
>> +
>> + if (argc < 1) {
>> + result = do_push_stash(NULL, prefix, 0, 0, 0, 0, NULL);
>> + } else if (!strcmp(argv[0], "list"))
>> + result = list_stash(argc, argv, prefix);
>> + else if (!strcmp(argv[0], "show"))
>> + result = show_stash(argc, argv, prefix);
>> + else if (!strcmp(argv[0], "save"))
>> + result = save_stash(argc, argv, prefix);
>> + else if (!strcmp(argv[0], "push"))
>> + result = push_stash(argc, argv, prefix);
>> + else if (!strcmp(argv[0], "apply"))
>> + result = apply_stash(argc, argv, prefix);
>> + else if (!strcmp(argv[0], "clear"))
>> + result = clear_stash(argc, argv, prefix);
>> + else if (!strcmp(argv[0], "create"))
>> + result = create_stash(argc, argv, prefix);
>> + else if (!strcmp(argv[0], "store"))
>> + result = store_stash(argc, argv, prefix);
>> + else if (!strcmp(argv[0], "drop"))
>> + result = drop_stash(argc, argv, prefix);
>> + else if (!strcmp(argv[0], "pop"))
>> + result = pop_stash(argc, argv, prefix);
>> + else if (!strcmp(argv[0], "branch"))
>> + result = branch_stash(argc, argv, prefix);
>> + else {
>> + if (argv[0][0] == '-') {
>> + struct argv_array args = ARGV_ARRAY_INIT;
>> + argv_array_push(&args, "push");
>> + argv_array_pushv(&args, argv);
>> + result = push_stash(args.argc, args.argv, prefix);
>
> This is a bit of a change in behaviour to what we currently have.
>
> The rules we decided on are as follows:
>
>  - "git stash -p" is an alias for "git stash push -p".
>  - "git stash" with only option arguments is an alias for "git stash
>push" with those same arguments.  non-option arguments can be
>specified after a "--" for disambiguation.
>
> The above makes "git stash -*" a alias for "git stash push -*".  This
> would result in a change of behaviour, for example in the case where
> someone would use "git stash -this is a test-".  In that case the
> current behaviour is to create a stash with the message "-this is a
> test-", while the above would end up making git stash error out.  The
> discussion on how we came up with those rules can be found at
> http://public-inbox.org/git/20170206161432.zvpsqegjspaa2...@sigill.intra.peff.net/.

I don't really like the "argv[0][0] == '-'" logic, but it doesn't seem
to have the flaw you pointed out:
$ ./git stash -this is a test-
error: unknown switch `t'
usage: git stash [push [-p|--patch] [-k|--[no-]keep-index] [-q|--quiet]
[...]

I'm not sure this is the best possible error message, but it's just as
useful as the message from the old version.

>
>> + if (!result)
>> + printf_ln(_("To restore them type \"git stash 
>> apply\""));
>
> In the shell script this is only displayed when the stash_push in the
> case where git stash is invoked with no arguments, not in the push
> case if I read this correctly.  So the two lines above should go in
> the (argc < 1) case I think.

I think it's correct as is. One of the tests checks for this string to
be output, and if I move the line, the test fails.



I agreed with all the other points you raised, and they will be fixed
in my next revision.

Re: [PATCH v5 08/10] rebase -i: skip unnecessary picks using the rebase--helper

2017-06-19 Thread Liam Beguin



On 19/06/17 05:45 AM, Johannes Schindelin wrote:
> Hi Liam,
> 
> On Sat, 17 Jun 2017, Liam Beguin wrote:
> 
>> On 16/06/17 09:56 AM, Johannes Schindelin wrote:
>>
>>> On Thu, 15 Jun 2017, Liam Beguin wrote:
>>>
 On 14/06/17 09:08 AM, Johannes Schindelin wrote:
> diff --git a/sequencer.c b/sequencer.c
> index a697906d463..a0e020dab09 100644
> --- a/sequencer.c
> +++ b/sequencer.c
> @@ -2640,3 +2640,110 @@ int check_todo_list(void)
>  
>   return res;
>  }
> +
> +/* skip picking commits whose parents are unchanged */
> +int skip_unnecessary_picks(void)
> +{
> + const char *todo_file = rebase_path_todo();
> + struct strbuf buf = STRBUF_INIT;
> + struct todo_list todo_list = TODO_LIST_INIT;
> + struct object_id onto_oid, *oid = &onto_oid, *parent_oid;
> + int fd, i;
> +
> + if (!read_oneliner(&buf, rebase_path_onto(), 0))
> + return error(_("could not read 'onto'"));
> + if (get_sha1(buf.buf, onto_oid.hash)) {

 I missed this last time but we could also replace `get_sha1` with
 `get_oid`
>>>
>>> Good point!
>>>
>>> I replaced this locally and force-pushed, but there is actually little
>>> chance of this patch series being integrated in a form with which I
>>> would be comfortable.
>>
>> Is there any chance of this moving forward? I was hoping to resend my
>> path to abbreviate command names on top of this.
> 
> We are at an impasse right now.
> 
> Junio wants me to change this code:
> 
> revs.pretty_given = 1;
> git_config_get_string("rebase.instructionFormat", &format);
> if (!format || !*format) {
> free(format);
> format = xstrdup("%s");
> }
> get_commit_format(format, &revs);
> free(format);
> pp.fmt = revs.commit_format;
> pp.output_encoding = get_log_output_encoding();
> 
> if (setup_revisions(argc, argv, &revs, NULL) > 1)
>   ...
> 
> which is reasonably compile-time safe, to something like this:
> 
>   struct strbuf userformat = STRBUF_INIT;
>   struct argv_array args = ARGV_ARRAY_INIT;
>   ...
>   for (i = 0; i < argc; i++)
>   argv_array_push(&args, argv[i]);
> git_config_get_string("rebase.instructionFormat", &format);
> if (!format || !*format)
>   argv_array_push(&args, "--format=%s");
>   else {
>   strbuf_addf(&userformat, "--format=%s", format);
> argv_array_push(&args, userformat.buf);
> }
> 
> if (setup_revisions(args.argc, args.argv, &revs, NULL) > 1)
>   ...
>   argv_array_clear(&args);
>   strbuf_release(&userformat);
> 
> which is not compile-time safe, harder to read, sloppy coding in my book.
> 
> The reason for this suggestion is that one of the revision machinery's
> implementation details is an ugly little semi-secret: the pretty-printing
> machinery uses a global state, and that is why we need the "pretty_given"
> flag in the first place.
> 
> Junio thinks that it would be better to not use the pretty_given flag in
> this patch series. I disagree: It would be better to use the pretty_given
> flag, also as a good motivator to work on removing the global state in the
> pretty-printing machinery.
> 
> Junio wants to strong-arm me into accepting authorship for this sloppy
> coding, and I simply won't do it.
> 
> That's why we are at an impasse right now.
> 
> I am really, really sorry that this affects your patch series, as I had
> not foreseen Junio's insistence on the sloppy coding when I suggested that
> you rebase your work on top of my patch series. I just was really
> unprepared for this contention, and I am still surprised/shocked that this
> is even an issue up for discussion.
> 
> Be that as it may, I see that this is a very bad experience for a
> motivated contributor such as yourself. I am very sorry about that!

It's not such a bad experience, I've found the people on the list to 
be quite welcoming and supportive.

> 
> So it may actually be better for you to go forward with your original
> patch series targeting git-rebase--interactive.sh.

I'll wait a bit longer to see how this goes and if it doesn't move, I'll
try re-sending an update to that series.

> 
> My apologies,
> Dscho
> 

Thanks,
Liam

Re: [PATCH v4 5/5] stash: implement builtin stash

2017-06-19 Thread Joel Teichroeb

On Fri, Jun 16, 2017 at 3:47 PM, Junio C Hamano  wrote:
> Joel Teichroeb  writes:
>> +/*
>> + * Untracked files are stored by themselves in a parentless commit, for
>> + * ease of unpacking later.
>> + */
>> +static int save_untracked(struct stash_info *info, const char *message,
>> + int include_untracked, int include_ignored, const char **argv)
>> +{
>> + struct child_process cp = CHILD_PROCESS_INIT;
>> + struct strbuf out = STRBUF_INIT;
>> + struct object_id orig_tree;
>> + int ret;
>> + const char *index_file = get_index_file();
>> +
>> + set_alternate_index_output(stash_index_path);
>> + untracked_files(&out, include_untracked, include_ignored, argv);
>> +
>> + cp.git_cmd = 1;
>> + argv_array_pushl(&cp.args, "update-index", "-z", "--add", "--remove",
>> + "--stdin", NULL);
>> + argv_array_pushf(&cp.env_array, "GIT_INDEX_FILE=%s", stash_index_path);
>> +
>> + if (pipe_command(&cp, out.buf, out.len, NULL, 0, NULL, 0)) {
>> + strbuf_release(&out);
>> + return 1;
>> + }
>> +
>
> OK, that's a very straight-forward way of doing this, and as we do
> not care too much about performance in this initial conversion to C,
> it is even sensible.  In a later update after this patch lands, you
> may want to use dir.c's fill_directory() API to find the untracked
> files and add them yourself internally, without running ls-files (in
> untracked_files()) or update-index (here) as subprocesses, but that
> is in the future.  Let's get this round finished.
>
>> + strbuf_reset(&out);
>> +
>> + discard_cache();
>> + read_cache_from(stash_index_path);
>> +
>> + write_index_as_tree(orig_tree.hash, &the_index, stash_index_path, 
>> 0,NULL);
>
> SP before "NULL".
>
>> + discard_cache();
>> +
>> + read_cache_from(stash_index_path);
>
> Hmph, what did anybody change in the on-disk stash_index (or
> contents in the_index) since you read_cache_from()?
>
>> + write_cache_as_tree(info->u_tree.hash, 0, NULL);
>
> Then you write exactly the same index contents again, this time to
> info->u_tree here.  I am not sure why you need to do this twice, and
> I do not see how orig_tree.hash you wrote earlier is used?
>

I'm not sure I understand what's happening here either. When I was
writing this, it was essentially a lot of trial and error in order to
get the index handling correct. Getting rid of any single one of these
lines makes the test fail. At some point I'd like to redo all the
index handling parts here, as I think I can do without an additional
index, but I'd need to make sure the error handling is perfect first.

[PATCH v4 8/8] sha1_file: refactor has_sha1_file_with_flags

2017-06-19 Thread Jonathan Tan

has_sha1_file_with_flags() implements many mechanisms in common with
sha1_object_info_extended(). Make has_sha1_file_with_flags() a
convenience function for sha1_object_info_extended() instead.

Signed-off-by: Jonathan Tan 
---
 builtin/fetch.c  | 10 ++
 builtin/index-pack.c |  3 ++-
 cache.h  | 11 +++
 sha1_file.c  | 13 +++--
 4 files changed, 14 insertions(+), 23 deletions(-)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index 47708451b..96d5146c4 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -242,9 +242,11 @@ static void find_non_local_tags(struct transport 
*transport,
 */
if (ends_with(ref->name, "^{}")) {
if (item &&
-   !has_object_file_with_flags(&ref->old_oid, 
HAS_SHA1_QUICK) &&
+   !has_object_file_with_flags(&ref->old_oid,
+   OBJECT_INFO_QUICK) &&
!will_fetch(head, ref->old_oid.hash) &&
-   !has_sha1_file_with_flags(item->util, 
HAS_SHA1_QUICK) &&
+   !has_sha1_file_with_flags(item->util,
+ OBJECT_INFO_QUICK) &&
!will_fetch(head, item->util))
item->util = NULL;
item = NULL;
@@ -258,7 +260,7 @@ static void find_non_local_tags(struct transport *transport,
 * fetch.
 */
if (item &&
-   !has_sha1_file_with_flags(item->util, HAS_SHA1_QUICK) &&
+   !has_sha1_file_with_flags(item->util, OBJECT_INFO_QUICK) &&
!will_fetch(head, item->util))
item->util = NULL;
 
@@ -279,7 +281,7 @@ static void find_non_local_tags(struct transport *transport,
 * checked to see if it needs fetching.
 */
if (item &&
-   !has_sha1_file_with_flags(item->util, HAS_SHA1_QUICK) &&
+   !has_sha1_file_with_flags(item->util, OBJECT_INFO_QUICK) &&
!will_fetch(head, item->util))
item->util = NULL;
 
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 04b9dcaf0..587bc80c9 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -794,7 +794,8 @@ static void sha1_object(const void *data, struct 
object_entry *obj_entry,
 
if (startup_info->have_repository) {
read_lock();
-   collision_test_needed = has_sha1_file_with_flags(oid->hash, 
HAS_SHA1_QUICK);
+   collision_test_needed =
+   has_sha1_file_with_flags(oid->hash, OBJECT_INFO_QUICK);
read_unlock();
}
 
diff --git a/cache.h b/cache.h
index 2e1cc3fe2..387694b25 100644
--- a/cache.h
+++ b/cache.h
@@ -1268,15 +1268,10 @@ int read_loose_object(const char *path,
  void **contents);
 
 /*
- * Return true iff we have an object named sha1, whether local or in
- * an alternate object database, and whether packed or loose.  This
- * function does not respect replace references.
- *
- * If the QUICK flag is set, do not re-check the pack directory
- * when we cannot find the object (this means we may give a false
- * negative answer if another process is simultaneously repacking).
+ * Convenience for sha1_object_info_extended() with a blank struct
+ * object_info. OBJECT_INFO_SKIP_CACHED is automatically set; pass
+ * nonzero flags to also set other flags.
  */
-#define HAS_SHA1_QUICK 0x1
 extern int has_sha1_file_with_flags(const unsigned char *sha1, int flags);
 static inline int has_sha1_file(const unsigned char *sha1)
 {
diff --git a/sha1_file.c b/sha1_file.c
index 68e3a3400..20db9b510 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -3491,18 +3491,11 @@ int has_sha1_pack(const unsigned char *sha1)
 
 int has_sha1_file_with_flags(const unsigned char *sha1, int flags)
 {
-   struct pack_entry e;
-
+   static struct object_info blank;
if (!startup_info->have_repository)
return 0;
-   if (find_pack_entry(sha1, &e))
-   return 1;
-   if (has_loose_object(sha1))
-   return 1;
-   if (flags & HAS_SHA1_QUICK)
-   return 0;
-   reprepare_packed_git();
-   return find_pack_entry(sha1, &e);
+   return !sha1_object_info_extended(sha1, &blank,
+ flags | OBJECT_INFO_SKIP_CACHED);
 }
 
 int has_object_file(const struct object_id *oid)
-- 
2.13.1.611.g7e3b11ae1-goog

[PATCH v4 6/8] sha1_file: improve sha1_object_info_extended

2017-06-19 Thread Jonathan Tan

Improve sha1_object_info_extended() by supporting additional flags. This
allows has_sha1_file_with_flags() to be modified to use
sha1_object_info_extended() in a subsequent patch.

Signed-off-by: Jonathan Tan 
---
 cache.h |  4 
 sha1_file.c | 43 ---
 2 files changed, 28 insertions(+), 19 deletions(-)

diff --git a/cache.h b/cache.h
index 48aea923b..7cf2ca466 100644
--- a/cache.h
+++ b/cache.h
@@ -1863,6 +1863,10 @@ struct object_info {
 #define OBJECT_INFO_LOOKUP_REPLACE 1
 /* Allow reading from a loose object file of unknown/bogus type */
 #define OBJECT_INFO_ALLOW_UNKNOWN_TYPE 2
+/* Do not check cached storage */
+#define OBJECT_INFO_SKIP_CACHED 4
+/* Do not retry packed storage after checking packed and loose storage */
+#define OBJECT_INFO_QUICK 8
 extern int sha1_object_info_extended(const unsigned char *, struct object_info 
*, unsigned flags);
 extern int packed_object_info(struct packed_git *pack, off_t offset, struct 
object_info *);
 
diff --git a/sha1_file.c b/sha1_file.c
index 4d5033c48..24f7a146e 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -2977,29 +2977,30 @@ static int sha1_loose_object_info(const unsigned char 
*sha1,
 
 int sha1_object_info_extended(const unsigned char *sha1, struct object_info 
*oi, unsigned flags)
 {
-   struct cached_object *co;
struct pack_entry e;
int rtype;
const unsigned char *real = (flags & OBJECT_INFO_LOOKUP_REPLACE) ?
lookup_replace_object(sha1) :
sha1;
 
-   co = find_cached_object(real);
-   if (co) {
-   if (oi->typep)
-   *(oi->typep) = co->type;
-   if (oi->sizep)
-   *(oi->sizep) = co->size;
-   if (oi->disk_sizep)
-   *(oi->disk_sizep) = 0;
-   if (oi->delta_base_sha1)
-   hashclr(oi->delta_base_sha1);
-   if (oi->typename)
-   strbuf_addstr(oi->typename, typename(co->type));
-   if (oi->contentp)
-   *oi->contentp = xmemdupz(co->buf, co->size);
-   oi->whence = OI_CACHED;
-   return 0;
+   if (!(flags & OBJECT_INFO_SKIP_CACHED)) {
+   struct cached_object *co = find_cached_object(real);
+   if (co) {
+   if (oi->typep)
+   *(oi->typep) = co->type;
+   if (oi->sizep)
+   *(oi->sizep) = co->size;
+   if (oi->disk_sizep)
+   *(oi->disk_sizep) = 0;
+   if (oi->delta_base_sha1)
+   hashclr(oi->delta_base_sha1);
+   if (oi->typename)
+   strbuf_addstr(oi->typename, typename(co->type));
+   if (oi->contentp)
+   *oi->contentp = xmemdupz(co->buf, co->size);
+   oi->whence = OI_CACHED;
+   return 0;
+   }
}
 
if (!find_pack_entry(real, &e)) {
@@ -3010,9 +3011,13 @@ int sha1_object_info_extended(const unsigned char *sha1, 
struct object_info *oi,
}
 
/* Not a loose object; someone else may have just packed it. */
-   reprepare_packed_git();
-   if (!find_pack_entry(real, &e))
+   if (flags & OBJECT_INFO_QUICK) {
return -1;
+   } else {
+   reprepare_packed_git();
+   if (!find_pack_entry(real, &e))
+   return -1;
+   }
}
 
rtype = packed_object_info(e.p, e.offset, oi);
-- 
2.13.1.611.g7e3b11ae1-goog

[PATCH v4 7/8] sha1_file: do not access pack if unneeded

2017-06-19 Thread Jonathan Tan

Add an option to struct object_info to suppress population of additional
information about a packed object if unneeded. This allows an
optimization in which sha1_object_info_extended() does not even need to
access the pack if no information besides provenance is requested. A
subsequent patch will make use of this optimization.

Signed-off-by: Jonathan Tan 
---
 cache.h |  1 +
 sha1_file.c | 17 +
 streaming.c |  1 +
 3 files changed, 15 insertions(+), 4 deletions(-)

diff --git a/cache.h b/cache.h
index 7cf2ca466..2e1cc3fe2 100644
--- a/cache.h
+++ b/cache.h
@@ -1828,6 +1828,7 @@ struct object_info {
unsigned char *delta_base_sha1;
struct strbuf *typename;
void **contentp;
+   unsigned populate_u : 1;
 
/* Response */
enum {
diff --git a/sha1_file.c b/sha1_file.c
index 24f7a146e..68e3a3400 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -3020,6 +3020,13 @@ int sha1_object_info_extended(const unsigned char *sha1, 
struct object_info *oi,
}
}
 
+   if (!oi->typep && !oi->sizep && !oi->disk_sizep &&
+   !oi->delta_base_sha1 && !oi->typename && !oi->contentp &&
+   !oi->populate_u) {
+   oi->whence = OI_PACKED;
+   return 0;
+   }
+
rtype = packed_object_info(e.p, e.offset, oi);
if (rtype < 0) {
mark_bad_packed_object(e.p, real);
@@ -3028,10 +3035,12 @@ int sha1_object_info_extended(const unsigned char 
*sha1, struct object_info *oi,
oi->whence = OI_DBCACHED;
} else {
oi->whence = OI_PACKED;
-   oi->u.packed.offset = e.offset;
-   oi->u.packed.pack = e.p;
-   oi->u.packed.is_delta = (rtype == OBJ_REF_DELTA ||
-rtype == OBJ_OFS_DELTA);
+   if (oi->populate_u) {
+   oi->u.packed.offset = e.offset;
+   oi->u.packed.pack = e.p;
+   oi->u.packed.is_delta = (rtype == OBJ_REF_DELTA ||
+rtype == OBJ_OFS_DELTA);
+   }
}
 
return 0;
diff --git a/streaming.c b/streaming.c
index 9afa66b8b..deebc18a8 100644
--- a/streaming.c
+++ b/streaming.c
@@ -113,6 +113,7 @@ static enum input_source istream_source(const unsigned char 
*sha1,
 
oi->typep = type;
oi->sizep = &size;
+   oi->populate_u = 1;
status = sha1_object_info_extended(sha1, oi, 0);
if (status < 0)
return stream_error;
-- 
2.13.1.611.g7e3b11ae1-goog

[PATCH v4 4/8] sha1_file: move delta base cache code up

2017-06-19 Thread Jonathan Tan

In a subsequent patch, packed_object_info() will be modified to use the
delta base cache, so move the relevant code to before
packed_object_info().

Signed-off-by: Jonathan Tan 
---
 sha1_file.c | 220 ++--
 1 file changed, 110 insertions(+), 110 deletions(-)

diff --git a/sha1_file.c b/sha1_file.c
index ae44b32f3..a7be45efe 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -2239,116 +2239,6 @@ static enum object_type packed_to_object_type(struct 
packed_git *p,
goto out;
 }
 
-int packed_object_info(struct packed_git *p, off_t obj_offset,
-  struct object_info *oi)
-{
-   struct pack_window *w_curs = NULL;
-   unsigned long size;
-   off_t curpos = obj_offset;
-   enum object_type type;
-
-   /*
-* We always get the representation type, but only convert it to
-* a "real" type later if the caller is interested.
-*/
-   type = unpack_object_header(p, &w_curs, &curpos, &size);
-
-   if (oi->sizep) {
-   if (type == OBJ_OFS_DELTA || type == OBJ_REF_DELTA) {
-   off_t tmp_pos = curpos;
-   off_t base_offset = get_delta_base(p, &w_curs, &tmp_pos,
-  type, obj_offset);
-   if (!base_offset) {
-   type = OBJ_BAD;
-   goto out;
-   }
-   *oi->sizep = get_size_from_delta(p, &w_curs, tmp_pos);
-   if (*oi->sizep == 0) {
-   type = OBJ_BAD;
-   goto out;
-   }
-   } else {
-   *oi->sizep = size;
-   }
-   }
-
-   if (oi->disk_sizep) {
-   struct revindex_entry *revidx = find_pack_revindex(p, 
obj_offset);
-   *oi->disk_sizep = revidx[1].offset - obj_offset;
-   }
-
-   if (oi->typep || oi->typename) {
-   enum object_type ptot;
-   ptot = packed_to_object_type(p, obj_offset, type, &w_curs,
-curpos);
-   if (oi->typep)
-   *oi->typep = ptot;
-   if (oi->typename) {
-   const char *tn = typename(ptot);
-   if (tn)
-   strbuf_addstr(oi->typename, tn);
-   }
-   if (ptot < 0) {
-   type = OBJ_BAD;
-   goto out;
-   }
-   }
-
-   if (oi->delta_base_sha1) {
-   if (type == OBJ_OFS_DELTA || type == OBJ_REF_DELTA) {
-   const unsigned char *base;
-
-   base = get_delta_base_sha1(p, &w_curs, curpos,
-  type, obj_offset);
-   if (!base) {
-   type = OBJ_BAD;
-   goto out;
-   }
-
-   hashcpy(oi->delta_base_sha1, base);
-   } else
-   hashclr(oi->delta_base_sha1);
-   }
-
-out:
-   unuse_pack(&w_curs);
-   return type;
-}
-
-static void *unpack_compressed_entry(struct packed_git *p,
-   struct pack_window **w_curs,
-   off_t curpos,
-   unsigned long size)
-{
-   int st;
-   git_zstream stream;
-   unsigned char *buffer, *in;
-
-   buffer = xmallocz_gently(size);
-   if (!buffer)
-   return NULL;
-   memset(&stream, 0, sizeof(stream));
-   stream.next_out = buffer;
-   stream.avail_out = size + 1;
-
-   git_inflate_init(&stream);
-   do {
-   in = use_pack(p, w_curs, curpos, &stream.avail_in);
-   stream.next_in = in;
-   st = git_inflate(&stream, Z_FINISH);
-   if (!stream.avail_out)
-   break; /* the payload is larger than it should be */
-   curpos += stream.next_in - in;
-   } while (st == Z_OK || st == Z_BUF_ERROR);
-   git_inflate_end(&stream);
-   if ((st != Z_STREAM_END) || stream.total_out != size) {
-   free(buffer);
-   return NULL;
-   }
-
-   return buffer;
-}
-
 static struct hashmap delta_base_cache;
 static size_t delta_base_cached;
 
@@ -2486,6 +2376,116 @@ static void add_delta_base_cache(struct packed_git *p, 
off_t base_offset,
hashmap_add(&delta_base_cache, ent);
 }
 
+int packed_object_info(struct packed_git *p, off_t obj_offset,
+  struct object_info *oi)
+{
+   struct pack_window *w_curs = NULL;
+   unsigned long size;
+   off_t curpos = obj_offset;
+   enum object_type type;
+
+   /*
+* We always get the representation type, but only convert i

[PATCH v4 5/8] sha1_file: refactor read_object

2017-06-19 Thread Jonathan Tan

read_object() and sha1_object_info_extended() both implement mechanisms
such as object replacement, retrying the packed store after failing to
find the object in the packed store then the loose store, and being able
to mark a packed object as bad and then retrying the whole process.
Consolidating these mechanisms would be a great help to maintainability.

Therefore, consolidate them by extending sha1_object_info_extended() to
support the functionality needed, and then modifying read_object() to
use sha1_object_info_extended().

Signed-off-by: Jonathan Tan 
---
 cache.h |  1 +
 sha1_file.c | 84 ++---
 2 files changed, 43 insertions(+), 42 deletions(-)

diff --git a/cache.h b/cache.h
index a3631b237..48aea923b 100644
--- a/cache.h
+++ b/cache.h
@@ -1827,6 +1827,7 @@ struct object_info {
off_t *disk_sizep;
unsigned char *delta_base_sha1;
struct strbuf *typename;
+   void **contentp;
 
/* Response */
enum {
diff --git a/sha1_file.c b/sha1_file.c
index a7be45efe..4d5033c48 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -2005,19 +2005,6 @@ int parse_sha1_header(const char *hdr, unsigned long 
*sizep)
return parse_sha1_header_extended(hdr, &oi, 0);
 }
 
-static void *unpack_sha1_file(void *map, unsigned long mapsize, enum 
object_type *type, unsigned long *size, const unsigned char *sha1)
-{
-   int ret;
-   git_zstream stream;
-   char hdr[8192];
-
-   ret = unpack_sha1_header(&stream, map, mapsize, hdr, sizeof(hdr));
-   if (ret < Z_OK || (*type = parse_sha1_header(hdr, size)) < 0)
-   return NULL;
-
-   return unpack_sha1_rest(&stream, hdr, *size, sha1);
-}
-
 unsigned long get_size_from_delta(struct packed_git *p,
  struct pack_window **w_curs,
  off_t curpos)
@@ -2326,8 +2313,10 @@ static void *cache_or_unpack_entry(struct packed_git *p, 
off_t base_offset,
if (!ent)
return unpack_entry(p, base_offset, type, base_size);
 
-   *type = ent->type;
-   *base_size = ent->size;
+   if (type)
+   *type = ent->type;
+   if (base_size)
+   *base_size = ent->size;
return xmemdupz(ent->data, ent->size);
 }
 
@@ -2388,9 +2377,16 @@ int packed_object_info(struct packed_git *p, off_t 
obj_offset,
 * We always get the representation type, but only convert it to
 * a "real" type later if the caller is interested.
 */
-   type = unpack_object_header(p, &w_curs, &curpos, &size);
+   if (oi->contentp) {
+   *oi->contentp = cache_or_unpack_entry(p, obj_offset, oi->sizep,
+ &type);
+   if (!*oi->contentp)
+   type = OBJ_BAD;
+   } else {
+   type = unpack_object_header(p, &w_curs, &curpos, &size);
+   }
 
-   if (oi->sizep) {
+   if (!oi->contentp && oi->sizep) {
if (type == OBJ_OFS_DELTA || type == OBJ_REF_DELTA) {
off_t tmp_pos = curpos;
off_t base_offset = get_delta_base(p, &w_curs, &tmp_pos,
@@ -2679,8 +2675,10 @@ void *unpack_entry(struct packed_git *p, off_t 
obj_offset,
free(external_base);
}
 
-   *final_type = type;
-   *final_size = size;
+   if (final_type)
+   *final_type = type;
+   if (final_size)
+   *final_size = size;
 
unuse_pack(&w_curs);
 
@@ -2914,6 +2912,7 @@ static int sha1_loose_object_info(const unsigned char 
*sha1,
git_zstream stream;
char hdr[32];
struct strbuf hdrbuf = STRBUF_INIT;
+   unsigned long size_scratch;
 
if (oi->delta_base_sha1)
hashclr(oi->delta_base_sha1);
@@ -2926,7 +2925,7 @@ static int sha1_loose_object_info(const unsigned char 
*sha1,
 * return value implicitly indicates whether the
 * object even exists.
 */
-   if (!oi->typep && !oi->typename && !oi->sizep) {
+   if (!oi->typep && !oi->typename && !oi->sizep && !oi->contentp) {
const char *path;
struct stat st;
if (stat_sha1_file(sha1, &st, &path) < 0)
@@ -2939,6 +2938,10 @@ static int sha1_loose_object_info(const unsigned char 
*sha1,
map = map_sha1_file(sha1, &mapsize);
if (!map)
return -1;
+
+   if (!oi->sizep)
+   oi->sizep = &size_scratch;
+
if (oi->disk_sizep)
*oi->disk_sizep = mapsize;
if ((flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE)) {
@@ -2956,10 +2959,18 @@ static int sha1_loose_object_info(const unsigned char 
*sha1,
   sha1_to_hex(sha1));
} else if ((status = parse_sha1_header_extended(hdr, oi, flags)) < 0)
status = error("unable to parse %s header", sha1_to_hex(sha1));
-   git_inflate_end

[PATCH v4 0/8] Improvements to sha1_file

2017-06-19 Thread Jonathan Tan

Thanks, Peff and Junio for your comments. Here's an updated version and
some replies to comments.

> I also found this quite subtle. However, I don't think that
> has_sha1_file() actually freshens. It's a bit confusing because
> has_loose_object() reuses the check_and_freshen() function to do the
> lookup, but it actually sets the "freshen" flag to false.
> 
> That's why in 33d4221c7 (write_sha1_file: freshen existing objects,
> 2014-10-15), which introduced the freshening functions and converted
> has_loose_object(), the actual write_sha1_file() function switched to
> using the freshening functions directly (and obviously sets the freshen
> parameter to true).

Good catch.

> I actually think all of that infrastructure could become part of
> Jonathan's consolidated lookup, too. We would just need:
> 
>   1. A QUICK flag to avoid re-reading objects/pack when we don't find
>  anything (which it looks like he already has).
> 
>   2. A FRESHEN flag to update the mtime of any item that we do find.
> 
> I suspect we may also need something like ONLY_LOOSE and ONLY_NONLOCAL
> to meet all the callers (e.g., has_loose_object_nonlocal). Those should
> be easy to implement, I'd think.

For things like FRESHEN, ONLY_LOOSE, and ONLY_NONLOCAL, I was thinking
that I would like to restrict these patches to only handle the cases
that are agnostic to the type of storage (in preparation for missing
blob handling patches).

> I had the same thoughts (both on the name and the "vocabularies"). IMHO
> we should consider allocating the bits from the same set. There's only
> one HAS_SHA1 flag, and it has an exact match in OBJECT_INFO_QUICK.

Agreed - in this patch set, I have also consolidated the relevant flags,
including LOOKUP_REPLACE_OBJECT and LOOKUP_UNKNOWN_OBJECT.

In addition, Junio has mentioned the potential confusion in behavior
between a NULL and an empty struct passed to
sha1_object_info_extended(). In this patch set, I require non-NULL, and
have added an optimization that avoids accessing the pack in certain
situations, but this optimization requires checking a lot of fields. Let
me know what you think.

Jonathan Tan (8):
  sha1_file: teach packed_object_info about typename
  sha1_file: rename LOOKUP_UNKNOWN_OBJECT
  sha1_file: rename LOOKUP_REPLACE_OBJECT
  sha1_file: move delta base cache code up
  sha1_file: refactor read_object
  sha1_file: improve sha1_object_info_extended
  sha1_file: do not access pack if unneeded
  sha1_file: refactor has_sha1_file_with_flags

 builtin/cat-file.c   |   7 +-
 builtin/fetch.c  |  10 +-
 builtin/index-pack.c |   3 +-
 cache.h  |  37 +++--
 sha1_file.c  | 391 ++-
 streaming.c  |   1 +
 6 files changed, 228 insertions(+), 221 deletions(-)

-- 
2.13.1.611.g7e3b11ae1-goog

[PATCH v4 1/8] sha1_file: teach packed_object_info about typename

2017-06-19 Thread Jonathan Tan

In commit 46f0344 ("sha1_file: support reading from a loose object of
unknown type", 2015-05-06), "struct object_info" gained a "typename"
field that could represent a type name from a loose object file, whether
valid or invalid, as opposed to the existing "typep" which could only
represent valid types. Some relatively complex manipulations were added
to avoid breaking packed_object_info() without modifying it, but it is
much easier to just teach packed_object_info() about the new field.
Therefore, teach packed_object_info() as described above.

Signed-off-by: Jonathan Tan 
---
 sha1_file.c | 29 -
 1 file changed, 12 insertions(+), 17 deletions(-)

diff --git a/sha1_file.c b/sha1_file.c
index 59a4ed2ed..a52b27541 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -2277,9 +2277,18 @@ int packed_object_info(struct packed_git *p, off_t 
obj_offset,
*oi->disk_sizep = revidx[1].offset - obj_offset;
}
 
-   if (oi->typep) {
-   *oi->typep = packed_to_object_type(p, obj_offset, type, 
&w_curs, curpos);
-   if (*oi->typep < 0) {
+   if (oi->typep || oi->typename) {
+   enum object_type ptot;
+   ptot = packed_to_object_type(p, obj_offset, type, &w_curs,
+curpos);
+   if (oi->typep)
+   *oi->typep = ptot;
+   if (oi->typename) {
+   const char *tn = typename(ptot);
+   if (tn)
+   strbuf_addstr(oi->typename, tn);
+   }
+   if (ptot < 0) {
type = OBJ_BAD;
goto out;
}
@@ -2960,7 +2969,6 @@ int sha1_object_info_extended(const unsigned char *sha1, 
struct object_info *oi,
struct cached_object *co;
struct pack_entry e;
int rtype;
-   enum object_type real_type;
const unsigned char *real = lookup_replace_object_extended(sha1, flags);
 
co = find_cached_object(real);
@@ -2992,18 +3000,9 @@ int sha1_object_info_extended(const unsigned char *sha1, 
struct object_info *oi,
return -1;
}
 
-   /*
-* packed_object_info() does not follow the delta chain to
-* find out the real type, unless it is given oi->typep.
-*/
-   if (oi->typename && !oi->typep)
-   oi->typep = &real_type;
-
rtype = packed_object_info(e.p, e.offset, oi);
if (rtype < 0) {
mark_bad_packed_object(e.p, real);
-   if (oi->typep == &real_type)
-   oi->typep = NULL;
return sha1_object_info_extended(real, oi, 0);
} else if (in_delta_base_cache(e.p, e.offset)) {
oi->whence = OI_DBCACHED;
@@ -3014,10 +3013,6 @@ int sha1_object_info_extended(const unsigned char *sha1, 
struct object_info *oi,
oi->u.packed.is_delta = (rtype == OBJ_REF_DELTA ||
 rtype == OBJ_OFS_DELTA);
}
-   if (oi->typename)
-   strbuf_addstr(oi->typename, typename(*oi->typep));
-   if (oi->typep == &real_type)
-   oi->typep = NULL;
 
return 0;
 }
-- 
2.13.1.611.g7e3b11ae1-goog

[PATCH v4 2/8] sha1_file: rename LOOKUP_UNKNOWN_OBJECT

2017-06-19 Thread Jonathan Tan

The LOOKUP_UNKNOWN_OBJECT flag was introduced in commit 46f0344
("sha1_file: support reading from a loose object of unknown type",
2015-05-03) in order to support a feature in cat-file subsequently
introduced in commit 39e4ae3 ("cat-file: teach cat-file a
'--allow-unknown-type' option", 2015-05-03). Despite its name and
location in cache.h, this flag is used neither in
read_sha1_file_extended() nor in any of the lookup functions, but used
only in sha1_object_info_extended().

Therefore rename this flag to OBJECT_INFO_ALLOW_UNKNOWN_TYPE, taking the
name of the cat-file flag that invokes this feature, and move it closer
to the declaration of sha1_object_info_extended(). Also add
documentation for this flag.

Signed-off-by: Jonathan Tan 
---
 builtin/cat-file.c | 2 +-
 cache.h| 3 ++-
 sha1_file.c| 4 ++--
 3 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 4bffd7a2d..209374b3c 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -60,7 +60,7 @@ static int cat_one_file(int opt, const char *exp_type, const 
char *obj_name,
const char *path = force_path;
 
if (unknown_type)
-   flags |= LOOKUP_UNKNOWN_OBJECT;
+   flags |= OBJECT_INFO_ALLOW_UNKNOWN_TYPE;
 
if (get_sha1_with_context(obj_name, GET_SHA1_RECORD_PATH,
  oid.hash, &obj_context))
diff --git a/cache.h b/cache.h
index 4d92aae0e..e2ec45dfe 100644
--- a/cache.h
+++ b/cache.h
@@ -1207,7 +1207,6 @@ extern char *xdg_cache_home(const char *filename);
 
 /* object replacement */
 #define LOOKUP_REPLACE_OBJECT 1
-#define LOOKUP_UNKNOWN_OBJECT 2
 extern void *read_sha1_file_extended(const unsigned char *sha1, enum 
object_type *type, unsigned long *size, unsigned flag);
 static inline void *read_sha1_file(const unsigned char *sha1, enum object_type 
*type, unsigned long *size)
 {
@@ -1866,6 +1865,8 @@ struct object_info {
  */
 #define OBJECT_INFO_INIT {NULL}
 
+/* Allow reading from a loose object file of unknown/bogus type */
+#define OBJECT_INFO_ALLOW_UNKNOWN_TYPE 2
 extern int sha1_object_info_extended(const unsigned char *, struct object_info 
*, unsigned flags);
 extern int packed_object_info(struct packed_git *pack, off_t offset, struct 
object_info *);
 
diff --git a/sha1_file.c b/sha1_file.c
index a52b27541..ad04ea8e0 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -1964,7 +1964,7 @@ static int parse_sha1_header_extended(const char *hdr, 
struct object_info *oi,
 * we're obtaining the type using '--allow-unknown-type'
 * option.
 */
-   if ((flags & LOOKUP_UNKNOWN_OBJECT) && (type < 0))
+   if ((flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE) && (type < 0))
type = 0;
else if (type < 0)
die("invalid object type");
@@ -2941,7 +2941,7 @@ static int sha1_loose_object_info(const unsigned char 
*sha1,
return -1;
if (oi->disk_sizep)
*oi->disk_sizep = mapsize;
-   if ((flags & LOOKUP_UNKNOWN_OBJECT)) {
+   if ((flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE)) {
if (unpack_sha1_header_to_strbuf(&stream, map, mapsize, hdr, 
sizeof(hdr), &hdrbuf) < 0)
status = error("unable to unpack %s header with 
--allow-unknown-type",
   sha1_to_hex(sha1));
-- 
2.13.1.611.g7e3b11ae1-goog

[PATCH v4 3/8] sha1_file: rename LOOKUP_REPLACE_OBJECT

2017-06-19 Thread Jonathan Tan

The LOOKUP_REPLACE_OBJECT flag controls whether the
lookup_replace_object() function is invoked by
sha1_object_info_extended(), read_sha1_file_extended(), and
lookup_replace_object_extended(), but it is not immediately clear which
functions accept that flag.

Therefore restrict this flag to only sha1_object_info_extended(),
renaming it appropriately to OBJECT_INFO_LOOKUP_REPLACE and adding some
documentation. Update read_sha1_file_extended() to have a boolean
parameter instead, and delete lookup_replace_object_extended().

parse_sha1_header() also passes this flag to
parse_sha1_header_extended() since commit 46f0344 ("sha1_file: support
reading from a loose object of unknown type", 2015-05-03), but that has
had no effect since that commit. Therefore this patch also removes this
flag from that invocation.

Signed-off-by: Jonathan Tan 
---
 builtin/cat-file.c |  5 +++--
 cache.h| 17 ++---
 sha1_file.c| 13 -
 3 files changed, 17 insertions(+), 18 deletions(-)

diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 209374b3c..923786c00 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -56,7 +56,7 @@ static int cat_one_file(int opt, const char *exp_type, const 
char *obj_name,
struct object_context obj_context;
struct object_info oi = OBJECT_INFO_INIT;
struct strbuf sb = STRBUF_INIT;
-   unsigned flags = LOOKUP_REPLACE_OBJECT;
+   unsigned flags = OBJECT_INFO_LOOKUP_REPLACE;
const char *path = force_path;
 
if (unknown_type)
@@ -337,7 +337,8 @@ static void batch_object_write(const char *obj_name, struct 
batch_options *opt,
struct strbuf buf = STRBUF_INIT;
 
if (!data->skip_object_info &&
-   sha1_object_info_extended(data->oid.hash, &data->info, 
LOOKUP_REPLACE_OBJECT) < 0) {
+   sha1_object_info_extended(data->oid.hash, &data->info,
+ OBJECT_INFO_LOOKUP_REPLACE)) {
printf("%s missing\n",
   obj_name ? obj_name : oid_to_hex(&data->oid));
fflush(stdout);
diff --git a/cache.h b/cache.h
index e2ec45dfe..a3631b237 100644
--- a/cache.h
+++ b/cache.h
@@ -1205,12 +1205,12 @@ extern char *xdg_config_home(const char *filename);
  */
 extern char *xdg_cache_home(const char *filename);
 
-/* object replacement */
-#define LOOKUP_REPLACE_OBJECT 1
-extern void *read_sha1_file_extended(const unsigned char *sha1, enum 
object_type *type, unsigned long *size, unsigned flag);
+extern void *read_sha1_file_extended(const unsigned char *sha1,
+enum object_type *type,
+unsigned long *size, int lookup_replace);
 static inline void *read_sha1_file(const unsigned char *sha1, enum object_type 
*type, unsigned long *size)
 {
-   return read_sha1_file_extended(sha1, type, size, LOOKUP_REPLACE_OBJECT);
+   return read_sha1_file_extended(sha1, type, size, 1);
 }
 
 /*
@@ -1232,13 +1232,6 @@ static inline const unsigned char 
*lookup_replace_object(const unsigned char *sh
return do_lookup_replace_object(sha1);
 }
 
-static inline const unsigned char *lookup_replace_object_extended(const 
unsigned char *sha1, unsigned flag)
-{
-   if (!(flag & LOOKUP_REPLACE_OBJECT))
-   return sha1;
-   return lookup_replace_object(sha1);
-}
-
 /* Read and unpack a sha1 file into memory, write memory to a sha1 file */
 extern int sha1_object_info(const unsigned char *, unsigned long *);
 extern int hash_sha1_file(const void *buf, unsigned long len, const char 
*type, unsigned char *sha1);
@@ -1865,6 +1858,8 @@ struct object_info {
  */
 #define OBJECT_INFO_INIT {NULL}
 
+/* Invoke lookup_replace_object() on the given hash */
+#define OBJECT_INFO_LOOKUP_REPLACE 1
 /* Allow reading from a loose object file of unknown/bogus type */
 #define OBJECT_INFO_ALLOW_UNKNOWN_TYPE 2
 extern int sha1_object_info_extended(const unsigned char *, struct object_info 
*, unsigned flags);
diff --git a/sha1_file.c b/sha1_file.c
index ad04ea8e0..ae44b32f3 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -2002,7 +2002,7 @@ int parse_sha1_header(const char *hdr, unsigned long 
*sizep)
struct object_info oi = OBJECT_INFO_INIT;
 
oi.sizep = sizep;
-   return parse_sha1_header_extended(hdr, &oi, LOOKUP_REPLACE_OBJECT);
+   return parse_sha1_header_extended(hdr, &oi, 0);
 }
 
 static void *unpack_sha1_file(void *map, unsigned long mapsize, enum 
object_type *type, unsigned long *size, const unsigned char *sha1)
@@ -2969,7 +2969,9 @@ int sha1_object_info_extended(const unsigned char *sha1, 
struct object_info *oi,
struct cached_object *co;
struct pack_entry e;
int rtype;
-   const unsigned char *real = lookup_replace_object_extended(sha1, flags);
+   const unsigned char *real = (flags & OBJECT_INFO_LOOKUP_REPLACE) ?
+   lookup_replace_object(sha1) :
+

Re: [GSoC] Update: Week 5

2017-06-19 Thread Brandon Williams

On 06/20, Andrew Ardill wrote:
> On 20 June 2017 at 07:41, Prathamesh Chavan  wrote:
> 
> >But as communicating between child_process is still an issue
> >and so there was no simple was to current carry out the
> >porting. And hence, a hack was used instead. But after
> >discussing it, instead using the repository-object patch
> >series will help to resolve these issues in this situation.
> 
> Just wondering, does that mean that your patch series is dependent on
> the repository-object one? I saw some discussion around it recently
> but couldn't see it in the latest whats cooking so maybe I missed what
> has happened to it.
> 

Due to some of the discussion (and finding a bug with how git_path works
with worktrees) I decided to break the series up into smaller bits since
the original series was 30+ patches.  Once 'bw/config-h' and
'bw/ls-files-sans-the-index' have been merged into next I'll probably
send out v3 of the repository object series.

-- 
Brandon Williams

Re: [GSoC] Update: Week 5

2017-06-19 Thread Andrew Ardill

On 20 June 2017 at 07:41, Prathamesh Chavan  wrote:

>But as communicating between child_process is still an issue
>and so there was no simple was to current carry out the
>porting. And hence, a hack was used instead. But after
>discussing it, instead using the repository-object patch
>series will help to resolve these issues in this situation.

Just wondering, does that mean that your patch series is dependent on
the repository-object one? I saw some discussion around it recently
but couldn't see it in the latest whats cooking so maybe I missed what
has happened to it.

Really enjoying your updates, by the way, they are very clear and show
what looks like great progress!

Regards,

Andrew Ardill

Re: [PATCH] die routine: change recursion limit from 1 to 1024

2017-06-19 Thread Stefan Beller

On Mon, Jun 19, 2017 at 3:32 PM, Ævar Arnfjörð Bjarmason
 wrote:
>
> On Mon, Jun 19 2017, Stefan Beller jotted:
>
>>> Now, git-grep could make use of the pluggable error facility added in
>>> commit c19a490e37 ("usage: allow pluggable die-recursion checks",
>>> 2013-04-16).
>>
>> I think we should do that instead (though I have not looked at the downsides
>> of this), because...
>
> It makes sense to do that in addition to what I'm doing here (or if the
> approach I'm taking doesn't make sense, some other patch to fix the
> general issue in the default handler).
>
> I'm going to try to get around to fixing the grep behavior in a
> follow-up patch, this is a fix for the overzealous recursion detection
> in the default handler needlessly causing other issues.
>
>>>
>>> So let's just set the recursion limit to a number higher than the
>>> number of threads we're ever likely to spawn. Now we won't lose
>>> errors, and if we have a recursing die handler we'll still die within
>>> microseconds.
>>
>> how are we handling access to that global variable?
>> Do we need to hold a mutex to be correct? or rather hope that
>> it works across threads, not counting on it, because each thread
>> individually would count up to 1024?
>
> It's not guarded by a mutex and the ++ here and the reads from it are
> both racy.
>
> However, for its stated purpose that's fine, even if we're racily
> incrementing it and losing some updates some will get through, which is
> good enough for an infinite recursion detection. We don't really care if
> we die at exactly 1 or exactly 1024.
>
>> I would prefer if we kept the number as low as "at most
>> one screen of lines".
>
> In practice this is the case in git, because the programs that would
> encounter this are going to be spawning less than screenfull of threads,
> assuming (as is the case) that each thread might print out one error.
>
> The semantics of that aren't changed with this patch, the difference is
> that you're going to get e.g. N repeats of a meaningful error instead of
> N repeats of either the meaningful error OR "recursion detected in die
> handler", depending on your luck.
>
> I.e. in current git (after a few runs to get an unlucky one):
>
> $ git grep -P --threads=10 
> '(*LIMIT_RECURSION=1)(*LIMIT_MATCH=1)-?-?-?---$'
> fatal: recursion detected in die handler
> fatal: recursion detected in die handler
> fatal: recursion detected in die handler
>
> Or if you're lucky at least one of these will be the actual error:
>
> $ git grep -P --threads=10 
> '(*LIMIT_RECURSION=1)(*LIMIT_MATCH=1)-?-?-?---$'
> fatal: recursion detected in die handler
> fatal: pcre_exec failed with error code -8
> fatal: recursion detected in die handler
> fatal: recursion detected in die handler
> fatal: recursion detected in die handler
>
> But with this change:
>
> $ ~/g/git/git-grep -P --threads=10 
> '(*LIMIT_RECURSION=1)(*LIMIT_MATCH=1)-?-?-?---$'
> fatal: pcre2_jit_match failed with error code -47: match limit exceeded
> fatal: pcre2_jit_match failed with error code -47: match limit exceeded
> fatal: pcre2_jit_match failed with error code -47: match limit exceeded
>
> (The error message is different because I compiled with PCRE v2 locally,
> instead of the system PCRE v1, but that doesn't matter for this example)
>
> Over 1000 runs thish is how that breaks down on my machine, without this
> patch. I've replaced the recursion error with just "R" and the PCRE
> error with "P", and shown them in descending order by occurrence, lines
> without a "P" only printed out the recursion error:
>
> $ (for i in {1..1000}; do git grep -P --threads=10 
> '(*LIMIT_RECURSION=1)(*LIMIT_MATCH=1)-?-?-?---$' 2>&1|perl -pe 's/^fatal: 
> r.*/R/; s/^fatal: p.*/P/' | tr '\n' ' '; echo; done)|sort|uniq -c|sort 
> -nr|head -n 10
> 247 R R
> 136 P R R
> 122 P R
> 112 R R R
> 108 R
>  59 R P R
>  54 R P
>  54 P
>  31 P R R R
>  21 R R P
>
> There's a long tail I've omitted there of alterations to that. As this
> shows in >10% of cases we don't print out any meaningful error at
> all. But with this change:
>
> $ (for i in {1..1000}; do ~/g/git/git-grep -P --threads=10 
> '(*LIMIT_RECURSION=1)(*LIMIT_MATCH=1)-?-?-?---$' 2>&1|perl -pe 's/^fatal: 
> r.*/R/; s/^fatal: p.*/P/' | tr '\n' ' '; echo; done)|sort|uniq -c|sort 
> -nr|head -n 10
> 377 P P
> 358 P P P
> 192 P
>  63 P P P P
>   8 P P P P P
>   2 P P P P P P
>
> We will always show a meaningful error, but may of course do so multiple
> times, which is a subject for a fix in git-grep in particular, but the
> point is again, to fix the general case for the default handler.
>
> Something something sorry about the long mail didn't have time to write
> a shorter one :)
>

Actually this convinced me (and it would be lovely to have such observations
in the commit message).

Thanks,
Stefan

Re: [PATCH] die routine: change recursion limit from 1 to 1024

2017-06-19 Thread Ævar Arnfjörð Bjarmason


On Mon, Jun 19 2017, Stefan Beller jotted:

>> Now, git-grep could make use of the pluggable error facility added in
>> commit c19a490e37 ("usage: allow pluggable die-recursion checks",
>> 2013-04-16).
>
> I think we should do that instead (though I have not looked at the downsides
> of this), because...

It makes sense to do that in addition to what I'm doing here (or if the
approach I'm taking doesn't make sense, some other patch to fix the
general issue in the default handler).

I'm going to try to get around to fixing the grep behavior in a
follow-up patch, this is a fix for the overzealous recursion detection
in the default handler needlessly causing other issues.

>>
>> So let's just set the recursion limit to a number higher than the
>> number of threads we're ever likely to spawn. Now we won't lose
>> errors, and if we have a recursing die handler we'll still die within
>> microseconds.
>
> how are we handling access to that global variable?
> Do we need to hold a mutex to be correct? or rather hope that
> it works across threads, not counting on it, because each thread
> individually would count up to 1024?

It's not guarded by a mutex and the ++ here and the reads from it are
both racy.

However, for its stated purpose that's fine, even if we're racily
incrementing it and losing some updates some will get through, which is
good enough for an infinite recursion detection. We don't really care if
we die at exactly 1 or exactly 1024.

> I would prefer if we kept the number as low as "at most
> one screen of lines".

In practice this is the case in git, because the programs that would
encounter this are going to be spawning less than screenfull of threads,
assuming (as is the case) that each thread might print out one error.

The semantics of that aren't changed with this patch, the difference is
that you're going to get e.g. N repeats of a meaningful error instead of
N repeats of either the meaningful error OR "recursion detected in die
handler", depending on your luck.

I.e. in current git (after a few runs to get an unlucky one):

$ git grep -P --threads=10 '(*LIMIT_RECURSION=1)(*LIMIT_MATCH=1)-?-?-?---$'
fatal: recursion detected in die handler
fatal: recursion detected in die handler
fatal: recursion detected in die handler

Or if you're lucky at least one of these will be the actual error:

$ git grep -P --threads=10 '(*LIMIT_RECURSION=1)(*LIMIT_MATCH=1)-?-?-?---$'
fatal: recursion detected in die handler
fatal: pcre_exec failed with error code -8
fatal: recursion detected in die handler
fatal: recursion detected in die handler
fatal: recursion detected in die handler

But with this change:

$ ~/g/git/git-grep -P --threads=10 
'(*LIMIT_RECURSION=1)(*LIMIT_MATCH=1)-?-?-?---$'
fatal: pcre2_jit_match failed with error code -47: match limit exceeded
fatal: pcre2_jit_match failed with error code -47: match limit exceeded
fatal: pcre2_jit_match failed with error code -47: match limit exceeded

(The error message is different because I compiled with PCRE v2 locally,
instead of the system PCRE v1, but that doesn't matter for this example)

Over 1000 runs thish is how that breaks down on my machine, without this
patch. I've replaced the recursion error with just "R" and the PCRE
error with "P", and shown them in descending order by occurrence, lines
without a "P" only printed out the recursion error:

$ (for i in {1..1000}; do git grep -P --threads=10 
'(*LIMIT_RECURSION=1)(*LIMIT_MATCH=1)-?-?-?---$' 2>&1|perl -pe 's/^fatal: 
r.*/R/; s/^fatal: p.*/P/' | tr '\n' ' '; echo; done)|sort|uniq -c|sort -nr|head 
-n 10
247 R R
136 P R R
122 P R
112 R R R
108 R
 59 R P R
 54 R P
 54 P
 31 P R R R
 21 R R P

There's a long tail I've omitted there of alterations to that. As this
shows in >10% of cases we don't print out any meaningful error at
all. But with this change:

$ (for i in {1..1000}; do ~/g/git/git-grep -P --threads=10 
'(*LIMIT_RECURSION=1)(*LIMIT_MATCH=1)-?-?-?---$' 2>&1|perl -pe 's/^fatal: 
r.*/R/; s/^fatal: p.*/P/' | tr '\n' ' '; echo; done)|sort|uniq -c|sort -nr|head 
-n 10
377 P P
358 P P P
192 P
 63 P P P P
  8 P P P P P
  2 P P P P P P

We will always show a meaningful error, but may of course do so multiple
times, which is a subject for a fix in git-grep in particular, but the
point is again, to fix the general case for the default handler.

Something something sorry about the long mail didn't have time to write
a shorter one :)

>> Signed-off-by: Ævar Arnfjörð Bjarmason 
>> ---
>>  usage.c | 4 +++-
>>  1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/usage.c b/usage.c
>> index 2f87ca69a8..1c198d4882 100644
>> --- a/usage.c
>> +++ b/usage.c
>> @@ -44,7 +44,9 @@ static void warn_builtin(const char *warn, va_list params)
>>  static int die_is_recursing_builtin(void)
>>  {
>> static i

Re: Behavior of 'git fetch' for commit hashes

2017-06-19 Thread Jonathan Tan

On Mon, 19 Jun 2017 10:49:36 -0700
Jonathan Tan  wrote:

> On Mon, 19 Jun 2017 12:09:28 +
>  wrote:
> 
> > For version 2.13.3

Firstly, exactly which version of Git doesn't work? I'm assuming 2.13.1
(as written elsewhere in your e-mail), since 2.13.3 doesn't exist.

> > However, the workaround for descbibed abot for git version 2.7.4 no
> > longer works. The result is always
> > fatal: Couldn't find remote ref
> 
> I'll take a look into this.

I tried to reproduce with this script, but it seems to pass even at
2.13.1:

#!/bin/bash
rm -rf ~/tmp/x &&
make --quiet &&
./git init ~/tmp/x &&
./git -C ~/tmp/x fetch --quiet ~/gitpristine/git master:foo &&
./git -C ~/tmp/x fetch ~/gitpristine/git "$(git -C ~/gitpristine/git 
rev-parse master^)"
exit $?

Commenting out the first fetch line produces, as expected:

error: Server does not allow request for unadvertised object 

And I have not seen the "fatal: Couldn't find remote ref" error you
describe.

As Stefan alluded to in his earlier e-mail [1], I made a change recently
to how "git fetch" handles literal SHA-1s in commit fdb69d3
("fetch-pack: always allow fetching of literal SHA1s", 2017-05-15). In
this case, I don't think that's the cause of this issue (since that
change, if anything, makes things more permissive, not less) but that
might be a point of interest in an investigation.

[1] 
https://public-inbox.org/git/CAGZ79kbFeptKOfpaZ23yK=zw9mj0_evqpsthkkd1hscap_p...@mail.gmail.com/

Re: Git Strange behaviour with remote deleted branch

2017-06-19 Thread Paul Smith

On Mon, 2017-06-19 at 23:33 +0200, Reda Lyazidi wrote:
> with git I noticed when I removed a remote branch with git push origin 
> --delete 
> in my clone when I used git branch -a I don't the deleted branch
> but my colleagues still see it.
> 
> I tried with two clones in my PC, with the first one delete branch and 
> the other still sees it
> despite git pull .

Normally Git will not remove a remote's branch refs on a fetch (or
pull) operation.  This is a safety measure, since you might still have
a use for that branch (for example to run diff against or similar).

In order to remove branches that have been deleted in your remote, you
must use the --prune (or -p) option to git fetch or git pull:

  git pull -p

will get rid of them.

Re: [PATCH] die routine: change recursion limit from 1 to 1024

2017-06-19 Thread Stefan Beller

> Now, git-grep could make use of the pluggable error facility added in
> commit c19a490e37 ("usage: allow pluggable die-recursion checks",
> 2013-04-16).

I think we should do that instead (though I have not looked at the downsides
of this), because...
>
> So let's just set the recursion limit to a number higher than the
> number of threads we're ever likely to spawn. Now we won't lose
> errors, and if we have a recursing die handler we'll still die within
> microseconds.

how are we handling access to that global variable?
Do we need to hold a mutex to be correct? or rather hope that
it works across threads, not counting on it, because each thread
individually would count up to 1024?

I would prefer if we kept the number as low as "at most
one screen of lines".

> Signed-off-by: Ævar Arnfjörð Bjarmason 
> ---
>  usage.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/usage.c b/usage.c
> index 2f87ca69a8..1c198d4882 100644
> --- a/usage.c
> +++ b/usage.c
> @@ -44,7 +44,9 @@ static void warn_builtin(const char *warn, va_list params)
>  static int die_is_recursing_builtin(void)
>  {
> static int dying;
> -   return dying++;
> +   static int recursion_limit = 1024;
> +
> +   return dying++ > recursion_limit;
>  }
>
>  /* If we are in a dlopen()ed .so write to a global variable would segfault
> --
> 2.13.1.518.g3df882009
>

[PATCH] grep: fix erroneously copy/pasted variable in check/assert pattern

2017-06-19 Thread Ævar Arnfjörð Bjarmason

Fix an erroneously copy/pasted check for the pcre2_jit_stack variable
to check pcre2_match_context instead. The former was already checked
in the preceding "if" statement.

This is a trivial and obvious error introduced in my commit
94da9193a6 ("grep: add support for PCRE v2", 2017-06-01).

In practice if pcre2_match_context_create() returned NULL we were
likely in a situation where malloc() was returning NULL, and were thus
screwed anyway, but if only the pcre2_match_context_create() call
returned NULL (through some transitory bug) PCRE v2 would just
allocate and supply its own context object when matching, and we'd run
normally at the trivial expense of not getting a slight speedup by
sharing the context object between successive matches.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 grep.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/grep.c b/grep.c
index d7ef21358e..06c4c24926 100644
--- a/grep.c
+++ b/grep.c
@@ -508,7 +508,7 @@ static void compile_pcre2_pattern(struct grep_pat *p, const 
struct grep_opt *opt
if (!p->pcre2_jit_stack)
die("Couldn't allocate PCRE2 JIT stack");
p->pcre2_match_context = pcre2_match_context_create(NULL);
-   if (!p->pcre2_jit_stack)
+   if (!p->pcre2_match_context)
die("Couldn't allocate PCRE2 match context");
pcre2_jit_stack_assign(p->pcre2_match_context, NULL, 
p->pcre2_jit_stack);
} else if (p->pcre2_jit_on != 0) {
-- 
2.13.1.518.g3df882009

[PATCH] die routine: change recursion limit from 1 to 1024

2017-06-19 Thread Ævar Arnfjörð Bjarmason

Change the recursion limit for the default die routine from a *very*
low 1 to 1024. This ensures that infinite recursions are broken, but
doesn't lose error messages.

The intent of the existing code, as explained in commit
cd163d4b4e ("usage.c: detect recursion in die routines and bail out
immediately", 2012-11-14), is to break infinite recursion in cases
where the die routine itself dies.

However, doing that very aggressively by immediately printing out
"recursion detected in die handler" if we've already called die() once
means that threaded invocations of git can go through the following
flow:

 1. Start a bunch of threads

 2. The threads start invoking die(), pretty much at the same time.

 3. The first die() invocation will be in the middle of trying to
print out its message by the time another thread dies, that other
thread then runs into the recursion limit and dies with "recursion
detected in die handler".

 4. Due to a race condition the initial error may never get printed
before the "recursion detected" thread calls exit() and aborts the
program.

An example of this is running a threaded grep against e.g. linux.git:

git grep -P --threads=4 '(*LIMIT_RECURSION=1)(*LIMIT_MATCH=1)-?-?-?---$'

With the current version of git this will print some combination of
multiple PCRE failures that caused the abort and multiple "recursion
detected", some invocations will print out multiple "recursion
detected" errors with no PCRE error at all!

Now, git-grep could make use of the pluggable error facility added in
commit c19a490e37 ("usage: allow pluggable die-recursion checks",
2013-04-16).

That should be done for git-grep in particular because before this
change (and after) it'll potentially print out the exact same error
from the N threads it starts, that should be de-duplicated.

But let's start by improving the default behavior shared across all of
git. Right now the common case is not an infinite recursion in the
handler, but us losing error messages by default because we're overly
paranoid about our recursion check.

So let's just set the recursion limit to a number higher than the
number of threads we're ever likely to spawn. Now we won't lose
errors, and if we have a recursing die handler we'll still die within
microseconds.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 usage.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/usage.c b/usage.c
index 2f87ca69a8..1c198d4882 100644
--- a/usage.c
+++ b/usage.c
@@ -44,7 +44,9 @@ static void warn_builtin(const char *warn, va_list params)
 static int die_is_recursing_builtin(void)
 {
static int dying;
-   return dying++;
+   static int recursion_limit = 1024;
+
+   return dying++ > recursion_limit;
 }
 
 /* If we are in a dlopen()ed .so write to a global variable would segfault
-- 
2.13.1.518.g3df882009

[GSoC][PATCH 6/6] submodule: port submodule subcommand 'deinit' from shell to C

2017-06-19 Thread Prathamesh Chavan

The same mechanism is used even for porting this submodule
subcommand, as used in the ported subcommands till now.
The function cmd_deinit in split up after porting into three
functions: module_deinit, for_each_submodule_list and
deinit_submodule.

Mentored-by: Christian Couder 
Mentored-by: Stefan Beller 
Signed-off-by: Prathamesh Chavan 
---
 builtin/submodule--helper.c | 140 
 git-submodule.sh|  55 +
 2 files changed, 141 insertions(+), 54 deletions(-)

diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index e10cac462..f029f5fae 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -750,6 +750,145 @@ static int module_status(int argc, const char **argv, 
const char *prefix)
return 0;
 }
 
+struct deinit_cb {
+   const char *prefix;
+   unsigned int quiet: 1;
+   unsigned int force: 1;
+   unsigned int all: 1;
+};
+#define DEINIT_CB_INIT { NULL, 0, 0, 0 }
+
+static void deinit_submodule(const struct cache_entry *list_item,
+void *cb_data)
+{
+   struct deinit_cb *info = cb_data;
+   const struct submodule *sub;
+   char *displaypath = NULL;
+   struct child_process cp_config = CHILD_PROCESS_INIT;
+   struct strbuf sb_config = STRBUF_INIT;
+   char *sm_path = xstrdup(list_item->name);
+   char *sub_git_dir = xstrfmt("%s/.git", sm_path);
+
+   sub = submodule_from_path(null_sha1, sm_path);
+
+   if (!sub->name)
+   goto cleanup;
+
+   displaypath = get_submodule_displaypath(sm_path, info->prefix);
+
+   /* remove the submodule work tree (unless the user already did it) */
+   if (is_directory(sm_path)) {
+   struct child_process cp = CHILD_PROCESS_INIT;
+
+   /* protect submodules containing a .git directory */
+   if (is_git_directory(sub_git_dir))
+   die(_("Submodule work tree '%s' contains a .git "
+ "directory use 'rm -rf' if you really want "
+ "to remove it including all of its history"),
+ displaypath);
+
+   if (!info->force) {
+   struct child_process cp_rm = CHILD_PROCESS_INIT;
+   cp_rm.git_cmd = 1;
+   argv_array_pushl(&cp_rm.args, "rm", "-qn", sm_path,
+NULL);
+
+   /* list_item->name is changed by cmd_rm() below */
+   if (run_command(&cp_rm))
+   die(_("Submodule work tree '%s' contains local "
+ "modifications; use '-f' to discard 
them"),
+ displaypath);
+   }
+
+   cp.use_shell = 1;
+   argv_array_pushl(&cp.args, "rm", "-rf", sm_path, NULL);
+   if (!run_command(&cp)) {
+   if (!info->quiet)
+   printf(_("Cleared directory '%s'\n"),
+displaypath);
+   } else {
+   if (!info->quiet)
+   printf(_("Could not remove submodule work tree 
'%s'\n"),
+displaypath);
+   }
+   }
+
+   if (mkdir(sm_path, 0700))
+   die(_("could not create empty submodule directory %s"),
+ displaypath);
+
+   cp_config.git_cmd = 1;
+   argv_array_pushl(&cp_config.args, "config", "--get-regexp", NULL);
+   argv_array_pushf(&cp_config.args, "submodule.%s\\.", sub->name);
+
+   /* remove the .git/config entries (unless the user already did it) */
+   if (!capture_command(&cp_config, &sb_config, 0) && sb_config.len) {
+   char *sub_key = xstrfmt("submodule.%s", sub->name);
+   /*
+* remove the whole section so we have a clean state when
+* the user later decides to init this submodule again
+*/
+   git_config_rename_section_in_file(NULL, sub_key, NULL);
+   if (!info->quiet)
+   printf(_("Submodule '%s' (%s) unregistered for path 
'%s'\n"),
+sub->name, sub->url, displaypath);
+   free(sub_key);
+   }
+
+cleanup:
+   free(displaypath);
+   free(sub_git_dir);
+   free(sm_path);
+   strbuf_release(&sb_config);
+}
+
+static int module_deinit(int argc, const char **argv, const char *prefix)
+{
+   struct deinit_cb info = DEINIT_CB_INIT;
+   struct pathspec pathspec;
+   struct module_list list = MODULE_LIST_INIT;
+   int quiet = 0;
+   int force = 0;
+   int all = 0;
+
+   struct option module_deinit_options[] = {
+   OPT__QUIET(&quiet, N_("Suppress submodule status output")),
+   OPT__

[GSoC][PATCH 2/6] submodule--helper: introduce get_submodule_displaypath and for_each_submodule_list

2017-06-19 Thread Prathamesh Chavan

Functions get_submodule_displaypath and for_each_submodule_list
for using them in the later patches, related to porting submodule
subcommands from shell to C.
These new functions are also used in ported submodule subcommand
init

Mentored-by: Christian Couder 
Mentored-by: Stefan Beller 
Signed-off-by: Prathamesh Chavan 
---
 builtin/submodule--helper.c | 69 -
 1 file changed, 50 insertions(+), 19 deletions(-)

diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index 8cc648d85..f7adca95b 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -13,6 +13,9 @@
 #include "refs.h"
 #include "connect.h"
 
+typedef void (*submodule_list_func_t)(const struct cache_entry *list_item,
+ void *cb_data);
+
 static char *get_default_remote(void)
 {
char *dest = NULL, *ret;
@@ -219,6 +222,27 @@ static int resolve_relative_url_test(int argc, const char 
**argv, const char *pr
return 0;
 }
 
+static char *get_submodule_displaypath(const char *path, const char *prefix)
+{
+   const char *super_prefix = get_super_prefix();
+
+   if (prefix && super_prefix) {
+   BUG("cannot have prefix '%s' and superprefix '%s'",
+   prefix, super_prefix);
+   } else if (prefix) {
+   struct strbuf sb = STRBUF_INIT;
+   char *displaypath = xstrdup(relative_path(path, prefix, &sb));
+   strbuf_release(&sb);
+   return displaypath;
+   } else if (super_prefix) {
+   int len = strlen(super_prefix);
+   const char *format = is_dir_sep(super_prefix[len-1]) ? "%s%s" : 
"%s/%s";
+   return xstrfmt(format, super_prefix, path);
+   } else {
+   return xstrdup(path);
+   }
+}
+
 struct module_list {
const struct cache_entry **entries;
int alloc, nr;
@@ -330,26 +354,30 @@ static int module_list(int argc, const char **argv, const 
char *prefix)
return 0;
 }
 
-static void init_submodule(const char *path, const char *prefix, int quiet)
+static void for_each_submodule_list(const struct module_list list,
+   submodule_list_func_t fn, void *cb_data)
 {
+   int i;
+   for (i = 0; i < list.nr; i++)
+   fn(list.entries[i], cb_data);
+}
+
+struct init_cb {
+   const char *prefix;
+   unsigned int quiet: 1;
+};
+#define INIT_CB_INIT { NULL, 0 }
+
+static void init_submodule(const struct cache_entry *list_item, void *cb_data)
+{
+   struct init_cb *info = cb_data;
const struct submodule *sub;
struct strbuf sb = STRBUF_INIT;
char *upd = NULL, *url = NULL, *displaypath;
 
-   /* Only loads from .gitmodules, no overlay with .git/config */
-   gitmodules_config();
-
-   if (prefix && get_super_prefix())
-   die("BUG: cannot have prefix and superprefix");
-   else if (prefix)
-   displaypath = xstrdup(relative_path(path, prefix, &sb));
-   else if (get_super_prefix()) {
-   strbuf_addf(&sb, "%s%s", get_super_prefix(), path);
-   displaypath = strbuf_detach(&sb, NULL);
-   } else
-   displaypath = xstrdup(path);
+   displaypath = get_submodule_displaypath(list_item->name, info->prefix);
 
-   sub = submodule_from_path(null_sha1, path);
+   sub = submodule_from_path(null_sha1, list_item->name);
 
if (!sub)
die(_("No url found for submodule path '%s' in .gitmodules"),
@@ -361,7 +389,7 @@ static void init_submodule(const char *path, const char 
*prefix, int quiet)
 *
 * Set active flag for the submodule being initialized
 */
-   if (!is_submodule_initialized(path)) {
+   if (!is_submodule_initialized(list_item->name)) {
strbuf_reset(&sb);
strbuf_addf(&sb, "submodule.%s.active", sub->name);
git_config_set_gently(sb.buf, "true");
@@ -404,7 +432,7 @@ static void init_submodule(const char *path, const char 
*prefix, int quiet)
if (git_config_set_gently(sb.buf, url))
die(_("Failed to register url for submodule path '%s'"),
displaypath);
-   if (!quiet)
+   if (!info->quiet)
fprintf(stderr,
_("Submodule '%s' (%s) registered for path 
'%s'\n"),
sub->name, url, displaypath);
@@ -433,10 +461,10 @@ static void init_submodule(const char *path, const char 
*prefix, int quiet)
 
 static int module_init(int argc, const char **argv, const char *prefix)
 {
+   struct init_cb info = INIT_CB_INIT;
struct pathspec pathspec;
struct module_list list = MODULE_LIST_INIT;
int quiet = 0;
-   int i;
 
struct option module_init_options[] = {
OPT__QUIET(&quiet, N_("Suppress output

[GSoC][PATCH 5/6] submodule: port submodule subcommand sync from shell to C

2017-06-19 Thread Prathamesh Chavan

The mechanism used for porting the submodule subcommand 'sync' is
similar to that of 'foreach', where we split the function cmd_sync
from shell into three functions in C, module_sync,
for_each_submodule_list and sync_submodule.

print_default_remote is introduced as a submodule--helper
subcommand for getting the default remote as stdout.

Mentored-by: Christian Couder 
Mentored-by: Stefan Beller 
Signed-off-by: Prathamesh Chavan 
---
 builtin/submodule--helper.c | 180 
 git-submodule.sh|  56 +-
 2 files changed, 181 insertions(+), 55 deletions(-)

diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index 78b21ab22..e10cac462 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -43,6 +43,20 @@ static char *get_default_remote(void)
return ret;
 }
 
+static int print_default_remote(int argc, const char **argv, const char 
*prefix)
+{
+   const char *remote;
+
+   if (argc != 1)
+   die(_("submodule--helper print-default-remote takes no 
arguments"));
+
+   remote = get_default_remote();
+   if (remote)
+   puts(remote);
+
+   return 0;
+}
+
 static int starts_with_dot_slash(const char *str)
 {
return str[0] == '.' && is_dir_sep(str[1]);
@@ -311,6 +325,25 @@ static int print_name_rev(int argc, const char **argv, 
const char *prefix)
return 0;
 }
 
+static char *get_up_path(const char *path)
+{
+   int i = count_slashes(path);
+   struct strbuf sb = STRBUF_INIT;
+
+   while (i--)
+   strbuf_addstr(&sb, "../");
+
+   /*
+*Check if 'path' ends with slash or not
+*for having the same output for dir/sub_dir
+*and dir/sub_dir/
+*/
+   if (!is_dir_sep(path[i - 1]))
+   strbuf_addstr(&sb, "../");
+
+   return strbuf_detach(&sb, NULL);
+}
+
 struct module_list {
const struct cache_entry **entries;
int alloc, nr;
@@ -736,6 +769,151 @@ static int module_name(int argc, const char **argv, const 
char *prefix)
return 0;
 }
 
+struct sync_cb {
+   const char *prefix;
+   unsigned int quiet: 1;
+   unsigned int recursive: 1;
+};
+#define SYNC_CB_INIT { NULL, 0, 0 }
+
+static void sync_submodule(const struct cache_entry *list_item, void *cb_data)
+{
+   struct sync_cb *info = cb_data;
+   const struct submodule *sub;
+   char *sub_key, *remote_key;
+   char *url, *sub_origin_url, *super_config_url, *displaypath;
+   struct strbuf sb = STRBUF_INIT;
+   struct child_process cp = CHILD_PROCESS_INIT;
+
+   if (!is_submodule_initialized(list_item->name))
+   return;
+
+   sub = submodule_from_path(null_sha1, list_item->name);
+
+   if (!sub->url)
+   die(_("no url found for submodule path '%s' in .gitmodules"),
+ list_item->name);
+
+   url = xstrdup(sub->url);
+
+   if (starts_with_dot_dot_slash(url) || starts_with_dot_slash(url)) {
+   char *remote_url, *up_path;
+   char *remote = get_default_remote();
+   char *remote_key = xstrfmt("remote.%s.url", remote);
+   free(remote);
+
+   if (git_config_get_string(remote_key, &remote_url))
+   remote_url = xgetcwd();
+   up_path = get_up_path(list_item->name);
+   sub_origin_url = relative_url(remote_url, url, up_path);
+   super_config_url = relative_url(remote_url, url, NULL);
+   free(remote_key);
+   free(up_path);
+   free(remote_url);
+   } else {
+   sub_origin_url = xstrdup(url);
+   super_config_url = xstrdup(url);
+   }
+
+   displaypath = get_submodule_displaypath(list_item->name, info->prefix);
+
+   if (!info->quiet)
+   printf(_("Synchronizing submodule url for '%s'\n"),
+displaypath);
+
+   sub_key = xstrfmt("submodule.%s.url", sub->name);
+   if (git_config_set_gently(sub_key, super_config_url))
+   die(_("failed to register url for submodule path '%s'"),
+ displaypath);
+
+   if (!is_submodule_populated_gently(list_item->name, NULL))
+   goto cleanup;
+
+   prepare_submodule_repo_env(&cp.env_array);
+   cp.git_cmd = 1;
+   cp.dir = list_item->name;
+   argv_array_pushl(&cp.args, "submodule--helper",
+"print-default-remote", NULL);
+   if (capture_command(&cp, &sb, 0))
+   die(_("failed to get the default remote for submodule '%s'"),
+ list_item->name);
+
+   strbuf_strip_suffix(&sb, "\n");
+   remote_key = xstrfmt("remote.%s.url", sb.buf);
+   strbuf_release(&sb);
+
+   child_process_init(&cp);
+   prepare_submodule_repo_env(&cp.env_array);
+   cp.git_cmd = 1;
+   cp.dir = list_item->name;
+   argv_array_pus

[GSoC][PATCH 4/6] submodule: port submodule subcommand status

2017-06-19 Thread Prathamesh Chavan

The mechanism used for porting submodule subcommand 'status'
is similar to that used for subcommand 'foreach'.
The function cmd_status from git-submodule is ported to three
functions in the builtin submodule--helper namely: module_status,
for_each_submodule_list and status_submodule.

print_status is also introduced for handling the output of
the subcommand and also to reduce the code size.

Mentored-by: Christian Couder 
Mentored-by: Stefan Beller 
Signed-off-by: Prathamesh Chavan 
---
 builtin/submodule--helper.c | 152 
 git-submodule.sh|  49 +-
 2 files changed, 153 insertions(+), 48 deletions(-)

diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index 6fd861e42..78b21ab22 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -566,6 +566,157 @@ static int module_init(int argc, const char **argv, const 
char *prefix)
return 0;
 }
 
+struct status_cb {
+   const char *prefix;
+   unsigned int quiet: 1;
+   unsigned int recursive: 1;
+   unsigned int cached: 1;
+};
+#define STATUS_CB_INIT { NULL, 0, 0, 0 }
+
+static void print_status(struct status_cb *info, char state, const char *path,
+char *sub_sha1, char *displaypath)
+{
+   if (info->quiet)
+   return;
+
+   printf("%c%s %s", state, sub_sha1, displaypath);
+
+   if (state == ' ' || state == '+') {
+   struct argv_array name_rev_args = ARGV_ARRAY_INIT;
+
+   argv_array_pushl(&name_rev_args, "print-name-rev",
+path, sub_sha1, NULL);
+   print_name_rev(name_rev_args.argc, name_rev_args.argv,
+  info->prefix);
+   } else {
+   printf("\n");
+   }
+}
+
+static void status_submodule(const struct cache_entry *list_item, void 
*cb_data)
+{
+   struct status_cb *info = cb_data;
+   char *sub_sha1 = xstrdup(oid_to_hex(&list_item->oid));
+   char *displaypath;
+   struct argv_array diff_files_args = ARGV_ARRAY_INIT;
+
+   if (!submodule_from_path(null_sha1, list_item->name))
+   die(_("no submodule mapping found in .gitmodules for path 
'%s'"),
+ list_item->name);
+
+   displaypath = get_submodule_displaypath(list_item->name, info->prefix);
+
+   if (list_item->ce_flags) {
+   print_status(info, 'U', list_item->name,
+sha1_to_hex(null_sha1), displaypath);
+   goto cleanup;
+   }
+
+   if (!is_submodule_initialized(list_item->name)) {
+   print_status(info, '-', list_item->name, sub_sha1, displaypath);
+   goto cleanup;
+   }
+
+   argv_array_pushl(&diff_files_args, "diff-files",
+"--ignore-submodules=dirty", "--quiet", "--",
+list_item->name, NULL);
+
+   if (!cmd_diff_files(diff_files_args.argc, diff_files_args.argv,
+   info->prefix)) {
+   print_status(info, ' ', list_item->name, sub_sha1, displaypath);
+   } else {
+   if (!info->cached) {
+   struct child_process cp = CHILD_PROCESS_INIT;
+   struct strbuf sb = STRBUF_INIT;
+
+   prepare_submodule_repo_env(&cp.env_array);
+   cp.git_cmd = 1;
+   cp.dir = list_item->name;
+
+   argv_array_pushl(&cp.args, "rev-parse",
+"--verify", "HEAD", NULL);
+
+   if (capture_command(&cp, &sb, 0))
+   die(_("could not run 'git rev-parse --verify"
+ "HEAD' in submodule %s"),
+ list_item->name);
+
+   strbuf_strip_suffix(&sb, "\n");
+   print_status(info, '+', list_item->name, sb.buf,
+displaypath);
+   strbuf_release(&sb);
+   } else {
+   print_status(info, '+', list_item->name, sub_sha1,
+displaypath);
+   }
+   }
+
+   if (info->recursive) {
+   struct child_process cpr = CHILD_PROCESS_INIT;
+
+   cpr.git_cmd = 1;
+   cpr.dir = list_item->name;
+   prepare_submodule_repo_env(&cpr.env_array);
+
+   argv_array_pushl(&cpr.args, "--super-prefix", displaypath,
+"submodule--helper", "status", "--recursive",
+NULL);
+
+   if (info->cached)
+   argv_array_push(&cpr.args, "--cached");
+
+   if (info->quiet)
+   argv_array_push(&cpr.args, "--quiet");
+
+   if (run_command(&cpr))
+   die(_("failed to recurse into s

[GSoC][PATCH 3/6] submodule: port set_name_rev from shell to C

2017-06-19 Thread Prathamesh Chavan

Since later on we want to port submodule subcommand status, and since
set_name_rev is part of cmd_status, hence this function is ported. It
has been ported to function print_name_rev in C, which calls get_name_rev
to get the revname, and after formatting it, print_name_rev prints it.
And hence in this way, the command `git submodule--helper print-name-rev
"sm_path" "sha1"` sets value of revname in git-submodule.sh

The function get_name_rev returns the stdout of the git describe
commands. Since there are four different git-describe commands used for
generating the name rev, four child_process are introduced, each successive
child process running only when previous has no stdout. The order of these
four git-describe commands is maintained the same as it was in the function
set_name_rev() in shell script.

Mentored-by: Christian Couder 
Mentored-by: Stefan Beller 
Signed-off-by: Prathamesh Chavan 
---
 builtin/submodule--helper.c | 69 +
 git-submodule.sh| 16 ++-
 2 files changed, 71 insertions(+), 14 deletions(-)

diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index f7adca95b..6fd861e42 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -243,6 +243,74 @@ static char *get_submodule_displaypath(const char *path, 
const char *prefix)
}
 }
 
+enum describe_step {
+   step_bare,
+   step_tags,
+   step_contains,
+   step_all_always,
+   step_end
+};
+
+static char *get_name_rev(const char *sub_path, const char* object_id)
+{
+   struct strbuf sb = STRBUF_INIT;
+   enum describe_step cur_step;
+
+   for (cur_step = step_bare; cur_step < step_end; cur_step++) {
+   struct child_process cp = CHILD_PROCESS_INIT;
+   prepare_submodule_repo_env(&cp.env_array);
+   cp.dir = sub_path;
+   cp.git_cmd = 1;
+   cp.no_stderr = 1;
+
+   switch (cur_step) {
+   case step_bare:
+   argv_array_pushl(&cp.args, "describe",
+object_id, NULL);
+   break;
+   case step_tags: 
+   argv_array_pushl(&cp.args, "describe",
+"--tags", object_id, NULL);
+   break;
+   case step_contains:
+   argv_array_pushl(&cp.args, "describe",
+"--contains", object_id,
+NULL);
+   break;
+   case step_all_always:
+   argv_array_pushl(&cp.args, "describe",
+"--all", "--always",
+object_id, NULL);
+   break;
+   default:
+   BUG("unknown describe step '%d'", cur_step);
+   }
+
+   if (!capture_command(&cp, &sb, 0) && sb.len) {
+   strbuf_strip_suffix(&sb, "\n");
+   return strbuf_detach(&sb, NULL);
+   }
+
+   }
+
+   strbuf_release(&sb);
+   return NULL;
+}
+
+static int print_name_rev(int argc, const char **argv, const char *prefix)
+{
+   char *namerev;
+   if (argc != 3)
+   die("print-name-rev only accepts two arguments:  ");
+
+   namerev = get_name_rev(argv[1], argv[2]);
+   if (namerev && namerev[0])
+   printf(" (%s)", namerev);
+   printf("\n");
+
+   return 0;
+}
+
 struct module_list {
const struct cache_entry **entries;
int alloc, nr;
@@ -1242,6 +1310,7 @@ static struct cmd_struct commands[] = {
{"relative-path", resolve_relative_path, 0},
{"resolve-relative-url", resolve_relative_url, 0},
{"resolve-relative-url-test", resolve_relative_url_test, 0},
+   {"print-name-rev", print_name_rev, 0},
{"init", module_init, SUPPORT_SUPER_PREFIX},
{"remote-branch", resolve_remote_submodule_branch, 0},
{"push-check", push_check, 0},
diff --git a/git-submodule.sh b/git-submodule.sh
index c0d0e9a4c..091051891 100755
--- a/git-submodule.sh
+++ b/git-submodule.sh
@@ -758,18 +758,6 @@ cmd_update()
}
 }
 
-set_name_rev () {
-   revname=$( (
-   sanitize_submodule_env
-   cd "$1" && {
-   git describe "$2" 2>/dev/null ||
-   git describe --tags "$2" 2>/dev/null ||
-   git describe --contains "$2" 2>/dev/null ||
-   git describe --all --always "$2"
-   }
-   ) )
-   test -z "$revname" || revname=" ($revname)"
-}
 #
 # Show commit summary for submodules in index or working tree
 #

[GSoC][PATCH 1/6] dir: create function count_slashes

2017-06-19 Thread Prathamesh Chavan

Similar functions exist in apply.c and builtin/show-branch.c for
counting the number of slashes in a string. Also in the later
patches, we introduce a third caller for the same. Hence, we unify
it now by cleaning the existing functions and declaring a common
function count_slashes in dir.h and implementing it in dir.c to
remove this code duplication.

Mentored-by: Christian Couder 
Mentored-by: Stefan Beller 
Signed-off-by: Junio C Hamano 
Signed-off-by: Prathamesh Chavan 
Signed-off-by: Junio C Hamano 
---
The complete build report of this is available at:
https://travis-ci.org/pratham-pc/git/builds/
Branch: All-patch-series
Build #111

 apply.c   | 11 ---
 builtin/show-branch.c | 13 +++--
 dir.c |  9 +
 dir.h |  3 +++
 4 files changed, 15 insertions(+), 21 deletions(-)

diff --git a/apply.c b/apply.c
index c49cef063..121e53406 100644
--- a/apply.c
+++ b/apply.c
@@ -762,17 +762,6 @@ static char *find_name_traditional(struct apply_state 
*state,
return find_name_common(state, line, def, p_value, line + len, 0);
 }
 
-static int count_slashes(const char *cp)
-{
-   int cnt = 0;
-   char ch;
-
-   while ((ch = *cp++))
-   if (ch == '/')
-   cnt++;
-   return cnt;
-}
-
 /*
  * Given the string after "--- " or "+++ ", guess the appropriate
  * p_value for the given patch.
diff --git a/builtin/show-branch.c b/builtin/show-branch.c
index 4a6cc6f49..3636a0559 100644
--- a/builtin/show-branch.c
+++ b/builtin/show-branch.c
@@ -5,6 +5,7 @@
 #include "color.h"
 #include "argv-array.h"
 #include "parse-options.h"
+#include "dir.h"
 
 static const char* show_branch_usage[] = {
 N_("git show-branch [-a | --all] [-r | --remotes] [--topo-order | 
--date-order]\n"
@@ -421,14 +422,6 @@ static int append_tag_ref(const char *refname, const 
struct object_id *oid,
 
 static const char *match_ref_pattern = NULL;
 static int match_ref_slash = 0;
-static int count_slash(const char *s)
-{
-   int cnt = 0;
-   while (*s)
-   if (*s++ == '/')
-   cnt++;
-   return cnt;
-}
 
 static int append_matching_ref(const char *refname, const struct object_id 
*oid,
   int flag, void *cb_data)
@@ -438,7 +431,7 @@ static int append_matching_ref(const char *refname, const 
struct object_id *oid,
 * refs/tags/v0.99.9a and friends.
 */
const char *tail;
-   int slash = count_slash(refname);
+   int slash = count_slashes(refname);
for (tail = refname; *tail && match_ref_slash < slash; )
if (*tail++ == '/')
slash--;
@@ -529,7 +522,7 @@ static void append_one_rev(const char *av)
int saved_matches = ref_name_cnt;
 
match_ref_pattern = av;
-   match_ref_slash = count_slash(av);
+   match_ref_slash = count_slashes(av);
for_each_ref(append_matching_ref, NULL);
if (saved_matches == ref_name_cnt &&
ref_name_cnt < MAX_REVS)
diff --git a/dir.c b/dir.c
index 9efcf1eab..4a953c16a 100644
--- a/dir.c
+++ b/dir.c
@@ -52,6 +52,15 @@ static enum path_treatment read_directory_recursive(struct 
dir_struct *dir,
 static int get_dtype(struct dirent *de, struct index_state *istate,
 const char *path, int len);
 
+int count_slashes(const char *s)
+{
+   int cnt = 0;
+   while (*s)
+   if (*s++ == '/')
+   cnt++;
+   return cnt;
+}
+
 int fspathcmp(const char *a, const char *b)
 {
return ignore_case ? strcasecmp(a, b) : strcmp(a, b);
diff --git a/dir.h b/dir.h
index a89c13e27..e3717055d 100644
--- a/dir.h
+++ b/dir.h
@@ -197,6 +197,9 @@ struct dir_struct {
unsigned unmanaged_exclude_files;
 };
 
+/*Count the number of slashes for string s*/
+extern int count_slashes(const char *s);
+
 /*
  * The ordering of these constants is significant, with
  * higher-numbered match types signifying "closer" (i.e. more
-- 
2.13.0

[GSoC] Update: Week 5

2017-06-19 Thread Prathamesh Chavan

SUMMARY OF MY PROJECT:

Git submodule subcommands are currently implemented by using shell script
'git-submodule.sh'. There are several reasons why we'll prefer not to
use the shell script. My project intends to convert the subcommands into
C code, thus making them builtins. This will increase Git's portability
and hence the efficiency of working with the git-submodule commands.
Link to the complete proposal: [1]

Mentors:
Stefan Beller 
Christian Couder 

UPDATES:

Following are the updates about my ongoing project:

1. sync and status: The patches were discussed with the mentors
   and after that, are being posted with this patch.

2. deinit: The patch is finally debugged, and is ready to be
   discussed. It is also attached with this update.

3. summary: While porting the subcommand, I underwent certain
   issues. After getting them clarified from my mentors, I
   have resumed working on it. I'm aware of the time I have
   taken for porting this subcommand is more than the previous
   ones. Hence will try my best to finish this in this week.

4. foreach: As stated in the previous update, the subcommand was
   ported without resolving the bug, and simply translating the
   present code, and adding a NEEDSWORK tag to the comment for
   mentioning the reported bug as well.
   But as communicating between child_process is still an issue
   and so there was no simple was to current carry out the
   porting. And hence, a hack was used instead. But after
   discussing it, instead using the repository-object patch
   series will help to resolve these issues in this situation.

PLAN FOR WEEK-6 (20 June 2017 to 26 June 2017):

1. summary: Mostly I'll be working on this and post the patch
   for discussion as soon as possible.

2. foreach: As it was decided that unblock the conversion of
   this submodule subcommand, the original cmd_foreach was
   ported without including the BUG-FIX patch here.
   Hence, for this week I will try to utilize the
   'repository-object' series by Brandon Williams.

3. deinit: I will be working on improvising this patch as it was
   recently debugged and posted for discussion.

[1]: 
https://docs.google.com/document/d/1krxVLooWl--75Pot3dazhfygR3wCUUWZWzTXtK1L-xU/

Thanks,
Prathamesh Chavan

Git Strange behaviour with remote deleted branch

2017-06-19 Thread Reda Lyazidi


Hello,

with git I noticed when I removed a remote branch with git push origin 
--delete 

in my clone when I used git branch -a I don't the deleted branch
but my colleagues still see it.

I tried with two clones in my PC, with the first one delete branch and 
the other still sees it

despite git pull .

I use git version 2.9.4

Kind regards.

Re: What's cooking in git.git (Jun 2017, #05; Mon, 19)

2017-06-19 Thread Stefan Beller

>
> * sb/submodule-doc (2017-06-13) 1 commit
>  - submodules: overhaul documentation
>
>  Doc update.
>
>  Waiting for discussion to settle.

Please hold back, this definitely needs
another version.

> * sb/diff-color-move (2017-06-01) 17 commits
>  - diff.c: color moved lines differently
> ...
>  "git diff" has been taught to optionally paint new lines that are
>  the same as deleted lines elsewhere differently from genuinely new
>  lines.
>
>  Is any more update coming?

Yes.

I do have a series locally that replaces "diff_line" with
"emitted_string" (but the same structure) and also
changed the algorithm to have 8 colors configurable.

But then I got dragged away in "doing it right", which will be
presented shortly with a different approach for the
first patches and how they are refactored.

So the refactoring of that is done and I need to apply the
patches that bring in the new functionality on top.

Thanks,
Stefan

Re: Better usability of stash refs

2017-06-19 Thread Jeff King

On Mon, Jun 19, 2017 at 03:32:54PM -0500, Robert Dailey wrote:

> To drop a stash, I have to do this (example):
> 
> $ git stash drop stash@{3}
> 
> Using the full "stash@{N}" seems superfluous since the documentation
> states it must be a stash in the first place. It would make more sense
> (and be quicker to type) to do:
> 
> $ git stash drop 3
> 
> Is there a trick I can use to make this shorthand possible? I thought
> about creating a "s" script for "stash" that intercepted the
> parameters for only a couple of stash sub-commands and created the
> ref, but that seems a lot of work.
> 
> Any productivity tips here? Thanks in advance.

Junio mentioned that this is already possible. I suspect the problem may
be that your Git is not recent enough. It was added in a56c8f5aa (stash:
allow stashes to be referenced by index only, 2016-10-24), which is in
v2.11.0.

-Peff

Re: Better usability of stash refs

2017-06-19 Thread Junio C Hamano

Robert Dailey  writes:

> To drop a stash, I have to do this (example):
>
> $ git stash drop stash@{3}
>
> Using the full "stash@{N}" seems superfluous since the documentation
> states it must be a stash in the first place. It would make more sense
> (and be quicker to type) to do:
>
> $ git stash drop 3
>
> Is there a trick I can use to make this shorthand possible? I thought
> about creating a "s" script for "stash" that intercepted the
> parameters for only a couple of stash sub-commands and created the
> ref, but that seems a lot of work.
>
> Any productivity tips here? Thanks in advance.

Isn't that already the case?

git-stash.sh::drop_stash gives "$@" to assert_stash_ref, which in
turn calls is_stash_ref, which in turn calls is_stash_like, and this
callchain eventually cals into parse_flags_and_rev.  Which has this
bit (may be hard to read because it is incorrectly indented):

case "$1" in
*[!0-9]*)
:
;;
*)
set -- "${ref_stash}@{$1}"
;;
esac

REV=$(git rev-parse --symbolic --verify --quiet "$1") || {
reference="$1"
die "$(eval_gettext "\$reference is not a valid reference")"
}

Better usability of stash refs

2017-06-19 Thread Robert Dailey

To drop a stash, I have to do this (example):

$ git stash drop stash@{3}

Using the full "stash@{N}" seems superfluous since the documentation
states it must be a stash in the first place. It would make more sense
(and be quicker to type) to do:

$ git stash drop 3

Is there a trick I can use to make this shorthand possible? I thought
about creating a "s" script for "stash" that intercepted the
parameters for only a couple of stash sub-commands and created the
ref, but that seems a lot of work.

Any productivity tips here? Thanks in advance.

Re: in case you want a use-case with lots of submodules

2017-06-19 Thread Yaroslav Halchenko

On Mon, 19 Jun 2017, Stefan Beller wrote:

> On Mon, Jun 19, 2017 at 8:59 AM, Yaroslav Halchenko  
> wrote:
> > Hi All,

> > On a recent trip I've listened to the git minutes podcast episode and
> > got excited to hear  Stefan Beller (CCed just in case) describing
> > ongoing work on submodules mechanism.  I got excited, since e.g.
> > performance improvements would be of great benefit to us too.

> If you're mostly interested in performance improvements of the status
> quo (i.e. "make git-submodule fast"), then the work of Prathamesh
> Chavan (cc'd) might be more interesting to you than what I do.
> He is porting git-submodule (which is mostly a shell script nowadays)
> to C, such that we can save a lot of process invocations and can do
> processing within one process.

ah -- cool.  I would be eager to test it out, thanks!  would be
interesting to see if it positively affects our overall performance.
Pointers to that development would be welcome!

> > http://datasets.datalad.org ATM provides quite a sizeable (ATM 370
> > repositories, up to 4 levels deep) hierarchy of git/git-annex
> > repositories all tied together via git submodules mechanism.  And as the
> > collection grows, interactions with it become slower, so additional
> > options (such as --ignore-submodules=dirty  to status) become our
> > friends.

> I am not as much concerned about the 370 number than about the
> 4 layers of nesting. In my experience the nested submodule case
> is a little bit error prone and the bug reports are not as frequent as
> there are not as many users of nesting, yet(?)

well -- part of the story here is that we are forced to use/have full
blown .git/ directories (for git-annex symlinks to content files to
work) within submodules instead of .git file with a reference under
parent's .git/modules.   So we can 'slice' at any level and I
guess that is why may be avoiding some possibly issues due to nesting
and the "parent has all .git/modules" approach.

> In a neighboring thread on the mailing list we have a discussion
> on the usefulness of being on branches than in detached HEAD
> in the submodules.
> https://public-inbox.org/git/0092cdd27c5f9d418b0f3e9b5d05be0801028...@sbs2011.opfingen.plc2.de/

> This would not break non-ambiguously, rather it would add
> ease of use.

that is indeed a common caveat... I am not sure if any heuristic
approach would provide a 'bullet proof' solution.  I might even prefer a
hardcoded 'branch-name' to be listed/associated with each submodule
within .gitmodules.  In the datalad case, detached HEAD is common
whenever someone installs "outdated" (branch of which progressed
forward) submodule.  In this case we just check if the branch after "git
clone"  (but before git submodule update) includes the pointed by
Subproject commit, and if so -- we announce that it must be the branch
(so far it is always "master" branch anyways ;) )

> > So I thought to share this as a use-case happen you need more
> > motivation or just a real-case test-bed for your work.  And thank
> > you again for making Git even Greater.

> Thanks for the motivation. :)

the least I could do ;)

-- 
Yaroslav O. Halchenko
Center for Open Neuroscience http://centerforopenneuroscience.org
Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755
Phone: +1 (603) 646-9834   Fax: +1 (603) 646-1419
WWW:   http://www.linkedin.com/in/yarik

AW: Restoring detached HEADs after Git operations

2017-06-19 Thread Patrick Lehmann

Hello Stefan,

I never have tapped into the DLL Hell trap. That's maybe I never did C++ 
development or I started with VB .NET / C# as .NET solved major parts of the 
DLL Hell :). That doesn't mean my new beloved language Python doesn't have a 
similar problem ...


Thinking about DLL Hell is a thinking in big version numbers like 1.0, 2.0 oder 
even 2.1, 2.2, ...
We are here talking about revisions in the build numbers which need to be 
synchronized between the parent repository and the sub modules (IP cores). Both 
sides are under heavy development and interfaces evolving from day to day 
because hardware design can't be planned as easy as software design.

So by using Git submodules a developer - responsible for a submodule / IP core 
- can after he finished interface level 1 now go on and implement interface 
level 2. The parent project can finish it's integration and testing of the 
level 1 interface before proceeding with level 2. More over if the same IP core 
is used multiple time in different sub IP cores, it's possible to update one 
usage place to interface level 2 by a second developer so he can finish his IP 
core at level 2, which other usage places can still use the level 1 interface.

Start situation:
--
TOPLEVEL (developer A)
  o- IP_1 @level1 (developer B)
   o- IP_2 @level1 (developer C)
  o- IP_3 @level1 (developer D)
   o- IP_2 @level1


Developer C creates interface level 2, but all instances use level1 of IP_2:
--
TOPLEVEL (developer A)
  o- IP_1 @level1 (developer B)
   o- IP_2 @level1 (developer C)
  o- IP_3 @level1 (developer D)
   o- IP_2 @level1


Developer D updates instance of IP_2 to level 2 and completes level 2 of IP_3:
--
TOPLEVEL (developer A)
  o- IP_1 @level1 (developer B)
   o- IP_2 @level1 (developer C)
  o- IP_3 @level1 (developer D)
   o- IP_2 @level2

Developer A updates instance of IP_3 to level 2:
--
TOPLEVEL (developer A)
  o- IP_1 @level1 (developer B)
   o- IP_2 @level1 (developer C)
  o- IP_3 @level2 (developer D)
   o- IP_2 @level2

Developer B has finished his testing for IP_1 and can now update the instance 
if IP_2:
--
TOPLEVEL (developer A)
  o- IP_1 @level1 (developer B)
   o- IP_2 @level2 (developer C)
  o- IP_3 @level2 (developer D)
   o- IP_2 @level2


So now imaging 8 developers, whereof 6 are working remote on the project. There 
is one responsible developer per IP core (maintainer) and an overall maintainer 
overseeing all integration merges and test results (CI).


Kind regards
Patrick


Von: Stefan Beller [sbel...@google.com]
Gesendet: Montag, 19. Juni 2017 21:21
Bis: Patrick Lehmann
Cc: Lars Schneider; Git Mailinglist
Betreff: Re: Restoring detached HEADs after Git operations

On Mon, Jun 19, 2017 at 11:09 AM, Patrick Lehmann
 wrote:
> Hello Stefan,
>
> the use case is as follows:
>
> The projects consists of circa 18 IP cores. Each IP core is represented by a 
> Git repository. Think of an IP core as of a lonestanding DLL or SO file 
> project. Each IP core references 2 submodules, which bring the verification 
> environments for testing the IP core standalone.

So phrased differently: You are using submodules to avoid "DLL hell"
(sharing a lib, with ease of versioning as the submodules in the different IP
cores may be pointing at different versions).

>
> These 18 IP cores are grouped to bigger IP cores, referencing the low-level 
> IP cores and each again the 2 verification submodules. Finally, the main 
> project references the bigger IP cores and again the 2 verification cores.
>
> TOPLEVEL
>   o- IP1
>o- UVVM
>o- VUnit
>   o- IP2
>o- UVVM
>o- VUnit
>   o- IP3
>o- UVVM
>o- VUnit
>   o- IP4
>o- UVVM
>o- VUnit
>o- IP5
>o- UVVM
>o- VUnit
>o- IP6
>o- UVVM
>o- VUnit
>o- IP7
>o- UVVM
>o- VUnit
>   o- IP8
>o- UVVM
>o- VUnit
>o- IP9
>o- UVVM
>o- VUnit
>o- IP10
>o- UVVM
>o- VUnit
>   o- IP11
>o- UVVM
>o- VUnit
>o- IP9
>o- UVVM
>o- VUnit
>o- IP12
>o- UVVM
>o- VUnit
>o- UVVM
>o- VUnit
>
> That's the simplified structure. I can't write more, because it's a closed 
> source project. You can find other usecases e.g. in my other open source 
> projects. E.g. The PoC-Library or The PicoBlaze-Library and the corresponding 
> PoC-Examples repository.
>
> Example: PoC
> Pile of Cores includes 4 Git submodules and is itself an IP core library.
> So PoC-Examples again references PoC. This looks like this tree:
>
> PoC-Examples
>   |- lib/
>o- PoC
> |- lib
>

What's cooking in git.git (Jun 2017, #05; Mon, 19)

2017-06-19 Thread Junio C Hamano

Here are the topics that have been cooking.  Commits prefixed with
'-' are only in 'pu' (proposed updates) while commits prefixed with
'+' are in 'next'.  The ones marked with '.' do not appear in any of
the integration branches, but I am still holding onto them.

You can find the changes described here in the integration branches
of the repositories listed at

http://git-blame.blogspot.com/p/git-public-repositories.html

--
[Graduated to "master"]

* ab/pcre-v2 (2017-06-02) 8 commits
  (merged to 'next' on 2017-06-13 at 34bf49ee44)
 + grep: add support for PCRE v2
 + grep: un-break building with PCRE >= 8.32 without --enable-jit
 + grep: un-break building with PCRE < 8.20
 + grep: un-break building with PCRE < 8.32
 + grep: add support for the PCRE v1 JIT API
 + log: add -P as a synonym for --perl-regexp
 + grep: skip pthreads overhead when using one thread
 + grep: don't redundantly compile throwaway patterns under threading

 Update "perl-compatible regular expression" support to enable JIT
 and also allow linking with the newer PCRE v2 library.


* ab/perf-remove-index-lock (2017-06-05) 1 commit
  (merged to 'next' on 2017-06-13 at c532e59233)
 + perf: work around the tested repo having an index.lock

 When an existing repository is used for t/perf testing, we first
 create bit-for-bit copy of it, which may grab a transient state of
 the repository and freeze it into the repository used for testing,
 which then may cause Git operations to fail.  Single out "the index
 being locked" case and forcibly drop the lock from the copy.


* bw/object-id (2017-06-05) 33 commits
  (merged to 'next' on 2017-06-13 at 0582278759)
 + diff: rename diff_fill_sha1_info to diff_fill_oid_info
 + diffcore-rename: use is_empty_blob_oid
 + tree-diff: convert path_appendnew to object_id
 + tree-diff: convert diff_tree_paths to struct object_id
 + tree-diff: convert try_to_follow_renames to struct object_id
 + builtin/diff-tree: cleanup references to sha1
 + diff-tree: convert diff_tree_sha1 to struct object_id
 + notes-merge: convert write_note_to_worktree to struct object_id
 + notes-merge: convert verify_notes_filepair to struct object_id
 + notes-merge: convert find_notes_merge_pair_ps to struct object_id
 + notes-merge: convert merge_from_diffs to struct object_id
 + notes-merge: convert notes_merge* to struct object_id
 + tree-diff: convert diff_root_tree_sha1 to struct object_id
 + combine-diff: convert find_paths_* to struct object_id
 + combine-diff: convert diff_tree_combined to struct object_id
 + diff: convert diff_flush_patch_id to struct object_id
 + patch-ids: convert to struct object_id
 + diff: finish conversion for prepare_temp_file to struct object_id
 + diff: convert reuse_worktree_file to struct object_id
 + diff: convert fill_filespec to struct object_id
 + diff: convert diff_change to struct object_id
 + diff: convert run_diff_files to struct object_id
 + diff: convert diff_addremove to struct object_id
 + diff: convert diff_index_show_file to struct object_id
 + diff: convert get_stat_data to struct object_id
 + grep: convert to struct object_id
 + notes: convert some accessor functions to struct object_id
 + builtin/notes: convert to struct object_id
 + notes: convert format_display_notes to struct object_id
 + notes: make get_note return pointer to struct object_id
 + notes: convert for_each_note to struct object_id
 + notes: convert internal parts to struct object_id
 + notes: convert internal structures to struct object_id

 Conversion from uchar[20] to struct object_id continues.


* jk/consistent-h (2017-06-05) 8 commits
  (merged to 'next' on 2017-06-13 at e09c1fe968)
 + t0012: test "-h" with builtins
 + git: add hidden --list-builtins option
 + version: convert to parse-options
 + diff- and log- family: handle "git cmd -h" early
 + submodule--helper: show usage for "-h"
 + remote-{ext,fd}: print usage message on invalid arguments
 + upload-archive: handle "-h" option early
 + credential: handle invalid arguments earlier

 "git $cmd -h" for builtin commands calls the implementation of the
 command (i.e. cmd_$cmd() function) without doing any repository
 set-up, and the commands that expect RUN_SETUP is done by the Git
 potty needs to be prepared to show the help text without barfing.


* jk/pathspec-magic-disambiguation (2017-05-29) 6 commits
  (merged to 'next' on 2017-06-13 at 088987f033)
 + verify_filename(): flip order of checks
 + verify_filename(): treat ":(magic)" as a pathspec
 + check_filename(): handle ":^" path magic
 + check_filename(): use skip_prefix
 + check_filename(): refactor ":/" handling
 + t4208: add check for ":/" without matching file

 The convention for a command line is to follow "git cmdname
 --options" with revisions followed by an optional "--"
 disambiguator and then finally pathspecs.  When "--" is not there,
 we make sure early ones are all interpretable as revs (and do not
 look like paths) and later ones are the

Re: [RFC/PATCH 0/5] remote: eliminate remote->{fetch,push}_refspec and lazy parsing of refspecs

2017-06-19 Thread Junio C Hamano

Jeff King  writes:

> If we forget the "storing it twice" argument, would it make sense to
> convert the parallel arrays of items into a single array-of-struct?
> I.e.:
>
>   struct configured_refspec {
>   const char *string;
>   struct refspec refspec;
>   unsigned parsed:1;
>   }
>
> I guess that may run into problems where we really need an
> array-of-refspec to pass into sub-functions. So going further, could we
> just have "struct refspec" store the text form it was parsed from?

I find this a sensible suggestion.  

I think the original "parallel" structure was anticipating that a
textual input we get from the user (either from the command line or
from the configuration) could expand to multiple parsed refspecs for
easier use by the code, but it appears that we hadn't seen a need
for that, so I think it is safe to unify them into a single struct.

Of course, that still anticipates that a parsed refspec may not be
unambiguously turned back to textual original input form; if that
will never be the case, then the approach taken by the posted
patches should also be OK.

Re: [PATCH v2 0/3] Add regression tests for rectent rebase -i fixes

2017-06-19 Thread Junio C Hamano

Johannes Sixt  writes:

> Am 16.06.2017 um 20:43 schrieb Johannes Sixt:
>> Am 16.06.2017 um 15:49 schrieb Johannes Schindelin:
>>> On Thu, 15 Jun 2017, Junio C Hamano wrote:
 diff --git a/t/t3420-rebase-autostash.sh b/t/t3420-rebase-autostash.sh
 index 325ec75353..801bce25da 100755
 --- a/t/t3420-rebase-autostash.sh
 +++ b/t/t3420-rebase-autostash.sh
 @@ -45,7 +45,7 @@ create_expected_success_am() {
   }
   create_expected_success_interactive() {
 -cr=$'\r' &&
 +cr=$(echo . | tr '.' '\015') &&
   cat >expected <<-EOF
   $(grep "^Created autostash: [0-9a-f][0-9a-f]*\$" actual)
   HEAD is now at $(git rev-parse --short feature-branch) third
 commit
>>>
>>> This is still incorrect, as the \r\n (which $(echo . | tr \.\ '\015')
>>> would emit) is interpreted correctly as a line break on Windows, meaning
>>> that cr is now *empty*. Not what we want.
>>>
>>> What I did is to replace the `cat` by `q_to_cr` (we have that lovely
>>> function, might just as well use it), replace `${cr}` by `Q` and skip the
>>> cr variable altogether.
>>
>> You beat me to it. I came up with the identical q_to_cr changes, but
>> haven't dug the remaining failure regarding the swapped output
>> lines. You seem to have nailed it. Will test your proposed changes
>> tomorrow.
>
> As expected, the patches fix the observed test failures for me, too,
> if that's still relevant.

Thanks for double-checking.

Re: [PATCH 00/28] Create a reference backend for packed refs

2017-06-19 Thread Jeff King

On Mon, Jun 19, 2017 at 03:43:15PM -0400, Jeff King wrote:

> > > Is the iterator over packed-refs correctly skipping over what are
> > > covered by loose refs?  The entries in the packed-refs file that are
> > > superseded by loose refs should be allowed to point at an already
> > > expired object.
> > 
> > Here it is in a test form for easier diagnosis.
> 
> Thanks, I was just starting to do that myself. The problem is in
> ca6b06eb7 (packed_ref_store: support iteration, 2017-05-15) and is
> pretty obvious: the packed_ref iterator checks whether the entry
> resolves.
> 
> I think that _neither_ of the loose and packed iterators should be
> checking this. It's only the merged result (where loose trumps packed)
> that should bother checking.

It looks like this is mostly already the case. files_ref_iterator
combines the two and does check. The loose iteration is done by
cache_ref_iterator[1], and does not. So it's just this new packed_refs
iterator that is wrong, and we just need:

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index 311fd014c..ad1143e64 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -416,12 +416,6 @@ static int packed_ref_iterator_advance(struct ref_iterator 
*ref_iterator)
ref_type(iter->iter0->refname) != REF_TYPE_PER_WORKTREE)
continue;
 
-   if (!(iter->flags & DO_FOR_EACH_INCLUDE_BROKEN) &&
-   !ref_resolves_to_object(iter->iter0->refname,
-   iter->iter0->oid,
-   iter->iter0->flags))
-   continue;
-
iter->base.refname = iter->iter0->refname;
iter->base.oid = iter->iter0->oid;
iter->base.flags = iter->iter0->flags;
@@ -473,8 +467,6 @@ static struct ref_iterator *packed_ref_iterator_begin(
struct ref_iterator *ref_iterator;
unsigned int required_flags = REF_STORE_READ;
 
-   if (!(flags & DO_FOR_EACH_INCLUDE_BROKEN))
-   required_flags |= REF_STORE_ODB;
refs = packed_downcast(ref_store, required_flags, "ref_iterator_begin");
 
iter = xcalloc(1, sizeof(*iter));

At least that makes sense to me and passes the tests (including the new
one). I haven't actually reviewed the patches yet.

-Peff

Re: [PATCH v2 0/3] Add regression tests for rectent rebase -i fixes

2017-06-19 Thread Johannes Sixt


Am 16.06.2017 um 20:43 schrieb Johannes Sixt:

Am 16.06.2017 um 15:49 schrieb Johannes Schindelin:

On Thu, 15 Jun 2017, Junio C Hamano wrote:

diff --git a/t/t3420-rebase-autostash.sh b/t/t3420-rebase-autostash.sh
index 325ec75353..801bce25da 100755
--- a/t/t3420-rebase-autostash.sh
+++ b/t/t3420-rebase-autostash.sh
@@ -45,7 +45,7 @@ create_expected_success_am() {
  }
  create_expected_success_interactive() {
-cr=$'\r' &&
+cr=$(echo . | tr '.' '\015') &&
  cat >expected <<-EOF
  $(grep "^Created autostash: [0-9a-f][0-9a-f]*\$" actual)
  HEAD is now at $(git rev-parse --short feature-branch) third 
commit


This is still incorrect, as the \r\n (which $(echo . | tr \.\ '\015')
would emit) is interpreted correctly as a line break on Windows, meaning
that cr is now *empty*. Not what we want.

What I did is to replace the `cat` by `q_to_cr` (we have that lovely
function, might just as well use it), replace `${cr}` by `Q` and skip the
cr variable altogether.


You beat me to it. I came up with the identical q_to_cr changes, but 
haven't dug the remaining failure regarding the swapped output lines. 
You seem to have nailed it. Will test your proposed changes tomorrow.


As expected, the patches fix the observed test failures for me, too, if 
that's still relevant.


-- Hannes

Re: [PATCH 2/2] Add test for the new status message

2017-06-19 Thread Kaartic Sivaraam


On Mon, 2017-06-19 at 14:04 -0400, Jeff King wrote:
> On Mon, Jun 19, 2017 at 11:29:49PM +0530, Kaartic Sivaraam wrote:
> 
> > Is there a way to test for the "Initial commit" message in the
> > commit
> > template?
> 
> You can do "git commit --dry-run", which produces the template on
> stdout.
Thanks for the help :)

> 
> That should be good enough for our purposes here, as it's the same
> code
> that produces the text that goes into the editor template. If you
> really
> wanted to test what the editor sees, you could do something like:
> 
>   GIT_EDITOR='cat >editor-input' git commit
> 
> but I don't think it's worth the trouble.
> 
That's right. I thinks it's not worth the trouble too.

Re: [PATCH 00/28] Create a reference backend for packed refs

2017-06-19 Thread Jeff King

On Mon, Jun 19, 2017 at 12:25:07PM -0700, Junio C Hamano wrote:

> Junio C Hamano  writes:
> 
> > Is the iterator over packed-refs correctly skipping over what are
> > covered by loose refs?  The entries in the packed-refs file that are
> > superseded by loose refs should be allowed to point at an already
> > expired object.
> 
> Here it is in a test form for easier diagnosis.

Thanks, I was just starting to do that myself. The problem is in
ca6b06eb7 (packed_ref_store: support iteration, 2017-05-15) and is
pretty obvious: the packed_ref iterator checks whether the entry
resolves.

I think that _neither_ of the loose and packed iterators should be
checking this. It's only the merged result (where loose trumps packed)
that should bother checking.

-Peff

Re: in case you want a use-case with lots of submodules

2017-06-19 Thread Stefan Beller

On Mon, Jun 19, 2017 at 8:59 AM, Yaroslav Halchenko  wrote:
> Hi All,
>
> On a recent trip I've listened to the git minutes podcast episode and
> got excited to hear  Stefan Beller (CCed just in case) describing
> ongoing work on submodules mechanism.  I got excited, since e.g.
> performance improvements would be of great benefit to us too.

If you're mostly interested in performance improvements of the status
quo (i.e. "make git-submodule fast"), then the work of Prathamesh
Chavan (cc'd) might be more interesting to you than what I do.
He is porting git-submodule (which is mostly a shell script nowadays)
to C, such that we can save a lot of process invocations and can do
processing within one process.

> In our project, http://datalad.org, git submodules is the basic
> mechanism to bring multiple "datasets" (mix of git and git-annex'ed
> repositories)  under the same roof so we could non-ambiguously
> version them all at any level.

Cool, glad to here submodules being useful. :)

> http://datasets.datalad.org ATM provides quite a sizeable (ATM 370
> repositories, up to 4 levels deep) hierarchy of git/git-annex
> repositories all tied together via git submodules mechanism.  And as the
> collection grows, interactions with it become slower, so additional
> options (such as --ignore-submodules=dirty  to status) become our
> friends.

I am not as much concerned about the 370 number than about the
4 layers of nesting. In my experience the nested submodule case
is a little bit error prone and the bug reports are not as frequent as
there are not as many users of nesting, yet(?)

In a neighboring thread on the mailing list we have a discussion
on the usefulness of being on branches than in detached HEAD
in the submodules.
https://public-inbox.org/git/0092cdd27c5f9d418b0f3e9b5d05be0801028...@sbs2011.opfingen.plc2.de/

This would not break non-ambiguously, rather it would add
ease of use.

> So I thought to share this as a use-case happen you need more
> motivation or just a real-case test-bed for your work.  And thank
> you again for making Git even Greater.

Thanks for the motivation. :)

> P.S. Please CCme in your replies (if any), I am not on the list
>
> With best regards,

Cheers,
Stefan

Re: [PATCH 00/28] Create a reference backend for packed refs

2017-06-19 Thread Junio C Hamano

Junio C Hamano  writes:

> Is the iterator over packed-refs correctly skipping over what are
> covered by loose refs?  The entries in the packed-refs file that are
> superseded by loose refs should be allowed to point at an already
> expired object.

Here it is in a test form for easier diagnosis.

 t/t1408-packed-refs.sh | 46 ++
 1 file changed, 46 insertions(+)

diff --git a/t/t1408-packed-refs.sh b/t/t1408-packed-refs.sh
new file mode 100755
index 00..35533c8593
--- /dev/null
+++ b/t/t1408-packed-refs.sh
@@ -0,0 +1,46 @@
+#!/bin/sh
+
+test_description='packed-refs entries are covered by loose refs'
+
+. ./test-lib.sh
+
+test_expect_success setup '
+   test_tick &&
+   git commit --allow-empty -m one &&
+   one=$(git rev-parse HEAD) && 
+   git for-each-ref >actual &&
+   echo "$one commit   refs/heads/master" >expect &&
+   test_cmp expect actual &&
+
+   git pack-refs --all &&
+   git for-each-ref >actual &&
+   echo "$one commit   refs/heads/master" >expect &&
+   test_cmp expect actual &&
+
+   cat .git/packed-refs &&
+
+   git checkout --orphan another &&
+   test_tick &&
+   git commit --allow-empty -m two &&
+   two=$(git rev-parse HEAD) && 
+   git checkout -B master &&
+   git branch -D another &&
+
+   cat .git/packed-refs &&
+
+   git for-each-ref >actual &&
+   echo "$two commit   refs/heads/master" >expect &&
+   test_cmp expect actual &&
+
+   git reflog expire --expire=now --all &&
+   git prune &&
+   git tag -m v1.0 v1.0 master
+'
+
+test_expect_success 'no error from stale entry in packed-refs' '
+   git describe master >actual 2>&1 &&
+   echo "v1.0" >expect &&
+   test_cmp expect actual
+'
+
+test_done

Re: Restoring detached HEADs after Git operations

2017-06-19 Thread Stefan Beller

On Mon, Jun 19, 2017 at 11:09 AM, Patrick Lehmann
 wrote:
> Hello Stefan,
>
> the use case is as follows:
>
> The projects consists of circa 18 IP cores. Each IP core is represented by a 
> Git repository. Think of an IP core as of a lonestanding DLL or SO file 
> project. Each IP core references 2 submodules, which bring the verification 
> environments for testing the IP core standalone.

So phrased differently: You are using submodules to avoid "DLL hell"
(sharing a lib, with ease of versioning as the submodules in the different IP
cores may be pointing at different versions).

>
> These 18 IP cores are grouped to bigger IP cores, referencing the low-level 
> IP cores and each again the 2 verification submodules. Finally, the main 
> project references the bigger IP cores and again the 2 verification cores.
>
> TOPLEVEL
>   o- IP1
>o- UVVM
>o- VUnit
>   o- IP2
>o- UVVM
>o- VUnit
>   o- IP3
>o- UVVM
>o- VUnit
>   o- IP4
>o- UVVM
>o- VUnit
>o- IP5
>o- UVVM
>o- VUnit
>o- IP6
>o- UVVM
>o- VUnit
>o- IP7
>o- UVVM
>o- VUnit
>   o- IP8
>o- UVVM
>o- VUnit
>o- IP9
>o- UVVM
>o- VUnit
>o- IP10
>o- UVVM
>o- VUnit
>   o- IP11
>o- UVVM
>o- VUnit
>o- IP9
>o- UVVM
>o- VUnit
>o- IP12
>o- UVVM
>o- VUnit
>o- UVVM
>o- VUnit
>
> That's the simplified structure. I can't write more, because it's a closed 
> source project. You can find other usecases e.g. in my other open source 
> projects. E.g. The PoC-Library or The PicoBlaze-Library and the corresponding 
> PoC-Examples repository.
>
> Example: PoC
> Pile of Cores includes 4 Git submodules and is itself an IP core library.
> So PoC-Examples again references PoC. This looks like this tree:
>
> PoC-Examples
>   |- lib/
>o- PoC
> |- lib
> o- Cocotb
> o- OSVVM
> o- VUnit
>  o-  OSVVM
> o- UVVM
>
> The library VUnit itself already includes OSVVM as a library.
>
> --
> Forcast:
> I'll write a new question / idea about multiple equal submodules and the 
> memory footprint soon...
> Here is my original question posted on StackOverflow: 
> https://stackoverflow.com/questions/44585425/how-to-reduce-the-memory-footprint-for-multiple-submodules-of-the-same-source
> --
>
> Do you need more use cases?
>

Well this use case points out a different issue than I hoped for. ;)
>From the stackoverflow post and from looking at the layout here,
one of the major questions is how to deduplicate the submodule
object store for example.

By use case I rather meant a sales pitch for your initial email:

I use this bash script because it fits in my workflow because
I need branches instead of detached HEADS, because $REASONS

and I'd be interested in these $REASONS, which I assumed to be
* easier to work with branches than detached HEADS (it aids the workflow)
* we're not challenging the underlying mental model of tracking sha1s in
  the superproject rather than branches.

At least I gave these reasons in the "reattach HEAD" stuff that I wrote,
but maybe there are others? (I know the code base of submodules very
well, but I do not work with submodules on a day-to-day basis myself...)

Re: Restoring detached HEADs after Git operations

2017-06-19 Thread Stefan Beller

On Mon, Jun 19, 2017 at 10:55 AM, Junio C Hamano  wrote:
> Stefan Beller  writes:
>
>> On Mon, Jun 19, 2017 at 2:52 AM, Patrick Lehmann
>>  wrote:
>>> Hello Lars,
>>>
>>> for your questions:
 If there are multiple branches with the same hash then your script would 
 pick the first one. Can you imagine a situation where this would be a 
 problem?
>>>
>>> I can't think of a good solution to resolve it automatically. Maybe a 
>>> script could print that there are multiple possibilities and it choose the 
>>> first branch in the list.
>>>
>>>
 Plus, you are looking only at local branches. Wouldn't it make sense to 
 look at remote branches, too?
>>>
>>> This is also related to restoring tags. If we go this way, we should have 
>>> this priority list:
>>> - local branches
>>> - remote branches
>>
>> For remote branches you would create a local branch of the same name
>> (if such a branch would not exist, possibly setting it up to track that 
>> remote
>> branch)?
>>
>>> - tags
>>
>> as said in the other email and similar to remote branches, we'd not want to 
>> have
>> HEAD pointing to them directly but somehow have a local branch.
>
> Let's step back a bit.  We detach the HEAD for a good reason, no?

And the 'good reason' being that at the time git-submodule was written
we did not know what would be best, and having a detached HEAD
would be (a) easy to implement, and (b) removing one moving thing
from the whole construction, hence making it a bit safer,
(c) it sort of follows the mental model:

the superproject said it had the submodule at X
(and not at branch Y!)
the submodule itself is a whole repo on its own
(it doesn't need to be aware of the superproject)

so in this world detaching at X is the best we can do.

> Why is it a good idea to move them back on to a branch picked among
> multiple ones that all happen to be pointing at the same commit?

This (rhetorical?) question reads like 2 questions actually:
(a) "Why is it a good idea to move them back on to a branch?"
It makes working easier as the submodule is not detached,
but on a proper branch
(b) "picked among multiple ones that all ..."
I think this is a bad idea and we'd rather want to follow
some configuration instead of wild-guessing by Git.

> The user may build on a history of a submodule, and then may push
> the result out to a particular branch at the other side; that is
> when being on a named branch in the submodule becomes useful, but
> even then I do not think randomly picking one branch and be on it
> is a good thing to do.

so you provide one reason why it is useful, but then claiming it is
'not a good thing' (yet). Can you give a reason why this is a 'bad thing'?

> I would understand the workflow would go more like so:
>
>  - You do something at the superproject (e.g. create a new branch X
>from an existing commit and check it out), which results in
>submodules' HEADs getting detached at the commits bound to the
>superproject's tree.

And here we'd want to discuss if we *really* want to detach the HEADs
or rather have a symbolic ref "following the superproject".

>  - Because you want to make changes to both submodules and the
>superproject in a consistent way, you'd want to commit changes to
>all of these repositories and the push the result out in an
>atomic way.

Committing and pushing are different things. You should not care if
I commit atomically as you (in the general "upstream" sense)
cannot observe my local commits.

For pushing we would want to have an atomic push, but that is
not the scope of this discussion. (As a Gerrit user, we implemented
the submodule atomicity serverside, but in plain Git server you'd
not need the atomicity either:

git commit -a -m "update submodule pointers"
git submodule foreach git push
git push

should be fine w.r.t. any non-atomic race condition.)

>  - Hence you tell "Hey, Git, I want all the submodules that I
>modified to be on branch X" from the superproject.
>
>- This may succeed in a submodule where X is a new name, or the
>  current tip of branch X is an ancestor of the detached HEAD.

so we'd allow fast forward for X. This seems arbitrary to me. I could
also say "If X exists I allow a merge to be made between old X and
the object name given by the superproject". (maybe as a config option)

>- This may fail in a submodule where there is branch X that does
>  not want to move to the detached HEAD's state.  In this latter
>  case, the user needs to deal with the situation (perhaps the
>  old X is expendable; perhaps the HEAD's commit may need to be
>  merged to old X; perhaps there are other cases).

makes sense.

>
> though.

Re: [PATCH 00/28] Create a reference backend for packed refs

2017-06-19 Thread Junio C Hamano

Michael Haggerty  writes:

> I've developed these patches on top of master plus the following
> patches, which are followups to mh/packed-refs-store-prep:
>
> * lock_packed_refs(): fix cache validity check
> * for_each_bisect_ref(): don't trim refnames
>
> The patches can also be obtained from my GitHub fork [2] as branch
> "packed-ref-store".
>
> Michael
>
> [1] http://public-inbox.org/git/cover.1490026594.git.mhag...@alum.mit.edu/
> [2] https://github.com/mhagger/git

Thanks.

Both in the version I queued and a fresh fetching of c13b2fad
("read_packed_refs(): die if `packed-refs` contains bogus data",
2017-06-12) from your https://github.com/mhagger/git repository
seems to exhibit an annoying error message in my local repository I
use for the primary work:

$ git fetch https://github.com/mhagger/git packed-ref-store
$ git checkout FETCH_HEAD
$ make
$ ./git describe next
error: refs/notes/amlog does not point to a valid object!
v2.13.1-611-g7e3b11ae1b
$ grep refs/notes/amlog .git/packed-refs
ed07e83cff8e407464fb2f5e84bd311da9c87565 refs/notes/amlog
$ git rev-parse refs/notes/amlog
b3079212325398e406078585c785c892d6e572f0
$ git cat-file -t ed07e83cff8e407464fb2f5e84bd311da9c87565
fatal: git cat-file: could not get object info
$ git cat-file -t b3079212325398e406078585c785c892d6e572f0
commit

Is the iterator over packed-refs correctly skipping over what are
covered by loose refs?  The entries in the packed-refs file that are
superseded by loose refs should be allowed to point at an already
expired object.

Re: [RFC/PATCH] submodules: overhaul documentation

2017-06-19 Thread Brandon Williams

On 06/13, Stefan Beller wrote:
> Adding two native speakers as we start word smithing.
> 
> On Tue, Jun 13, 2017 at 12:29 PM, Junio C Hamano  wrote:
> 
> >> +
> >> +A submodule is another Git repository tracked in a subdirectory of your
> >> +repository. The tracked repository has its own history, which does not
> >> +interfere with the history of the current repository.
> >
> > "tracked in a subdirectory" sounds as if your top-level superproject
> > has a dedicated submodules/ directory and in it there live a bunch
> > of submodules.  Which obviously is not what you meant.  If phrased
> > "tracked as a subdirectory", I think the sentence makes sense.
> 
> Given this explanation "as a" also sounds wrong[1], maybe we need to
> separate (1) where it is put/mounted and (2) the fact that is tracked,
> i.e. the superproject has an idea of what should be there at a given
> revision. (I shortly thought about /s/as a/using/ in the above, but):
> 
>   A submodule is another Git repository at an arbitrary place inside
>   the working tree, and also tracked. The tracked repository has its
>   own history, which does not interfere with the history of the current
>   repository.

I would probably change the first sentence to:

  A submodule is another Git repository tracked at an arbitrary place
  inside the working tree.

> 
> [1] http://www.thesaurus.com/browse/as
> 
> >
> > While "which does not interfere" may be technically correct, I am
> > not sure what the value of saying that is.
> 
> I think we can drop it here. When writing I wanted to separate it from
> subtrees, but this is the wrong place for that.
> 
> >
> >> +Submodules are composed from a so-called `gitlink` tree entry
> >> +in the main repository that refers to a particular commit object
> >> +within the inner repository.
> >
> > Correct, but it may be unclear to the readers why we do so.  Perhaps
> >
> > ... and this way, the tree of each commit in the main repository
> > "knows" which commit from the submodule's history is "tied" to it.
> >
> > or something like that?
> 
> sounds good to me.
> 
> >
> >> +Additionally to the gitlink entry the `.gitmodules` file (see
> >> +linkgit:gitmodules[5]) at the root of the source tree contains
> >> +information needed for submodules.
> >
> > Is that really true?  Each submodule do not *need* what is in
> > .gitmodules; the top-level superproject needs to learn about
> > its submodules from the contents of that file, though.
> 
> Ha! The ediled words in my mind were:
> 
>  ... information needed for submodules [to work in the superproject].
> 
> But maybe we need to reword that as
> 
>   Additionally to the gitlink entry the `.gitmodules` file (see
>   linkgit:gitmodules[5]) at the root of the source tree contains
>   information on how to handle submodules.

This sounds slightly awkward.  Maybe:

In addition to the gitlink entry, the `.gitmodules` file (see
linkgit:gitmodules[5]) at the root of the source tree contains
information on how to handle submodules.


-- 
Brandon Williams

AW: Restoring detached HEADs after Git operations

2017-06-19 Thread Patrick Lehmann

Hello Stefan,

the use case is as follows:

The projects consists of circa 18 IP cores. Each IP core is represented by a 
Git repository. Think of an IP core as of a lonestanding DLL or SO file 
project. Each IP core references 2 submodules, which bring the verification 
environments for testing the IP core standalone.

These 18 IP cores are grouped to bigger IP cores, referencing the low-level IP 
cores and each again the 2 verification submodules. Finally, the main project 
references the bigger IP cores and again the 2 verification cores.

TOPLEVEL
  o- IP1
   o- UVVM
   o- VUnit
  o- IP2
   o- UVVM
   o- VUnit
  o- IP3
   o- UVVM
   o- VUnit
  o- IP4
   o- UVVM
   o- VUnit
   o- IP5
   o- UVVM
   o- VUnit
   o- IP6
   o- UVVM
   o- VUnit
   o- IP7
   o- UVVM
   o- VUnit
  o- IP8
   o- UVVM
   o- VUnit
   o- IP9
   o- UVVM
   o- VUnit
   o- IP10
   o- UVVM
   o- VUnit
  o- IP11
   o- UVVM
   o- VUnit
   o- IP9
   o- UVVM
   o- VUnit
   o- IP12
   o- UVVM
   o- VUnit
   o- UVVM
   o- VUnit

That's the simplified structure. I can't write more, because it's a closed 
source project. You can find other usecases e.g. in my other open source 
projects. E.g. The PoC-Library or The PicoBlaze-Library and the corresponding 
PoC-Examples repository.

Example: PoC
Pile of Cores includes 4 Git submodules and is itself an IP core library.
So PoC-Examples again references PoC. This looks like this tree:

PoC-Examples
  |- lib/
   o- PoC
|- lib
o- Cocotb
o- OSVVM
o- VUnit
 o-  OSVVM
o- UVVM

The library VUnit itself already includes OSVVM as a library.

--
Forcast:
I'll write a new question / idea about multiple equal submodules and the memory 
footprint soon...
Here is my original question posted on StackOverflow: 
https://stackoverflow.com/questions/44585425/how-to-reduce-the-memory-footprint-for-multiple-submodules-of-the-same-source
--

Do you need more use cases?

Kind regards
Patrick

Von: git-ow...@vger.kernel.org [git-ow...@vger.kernel.org]" im Auftrag von 
"Stefan Beller [sbel...@google.com]
Gesendet: Montag, 19. Juni 2017 19:47
Bis: Patrick Lehmann
Cc: Lars Schneider; Git Mailinglist
Betreff: Re: Restoring detached HEADs after Git operations

On Mon, Jun 19, 2017 at 10:34 AM, Patrick Lehmann
 wrote:
> Hello,
>
> I'm just an advanced Git user, not a Git developer. So I might find some time 
> to improve the suggested script, which I provided with the hints given on the 
> mailing list, but I have no time to do a complete feature release in your 
> patch based Git flow.

ok, thanks for letting us know. I may re-prioritize the "reattach
HEAD" patches that I referenced earlier.
I would have hoped that additionally to the shell lines you'd have
given a good use case/summary.

> I have no experience with other shells then Bash. So if you rely on a Bash 
> with less features, please port the syntax to such a shell system. (I 
> personally do not support legacy programs or out-date programs).
>
> --
> We are talking about circa 50 submodules in total with a maximum depth of 4. 
> The platforms are:
> - Mint OS with Git in Bash
> - Windows 7 with Git-Bash
> - Windows 10 with Git-Bash
> - Windows 10 with Posh-Git

Thanks,
Stefan

Re: [PATCH 2/2] Add test for the new status message

2017-06-19 Thread Jeff King

On Mon, Jun 19, 2017 at 11:29:49PM +0530, Kaartic Sivaraam wrote:

> Is there a way to test for the "Initial commit" message in the commit
> template?

You can do "git commit --dry-run", which produces the template on
stdout.

That should be good enough for our purposes here, as it's the same code
that produces the text that goes into the editor template. If you really
wanted to test what the editor sees, you could do something like:

  GIT_EDITOR='cat >editor-input' git commit

but I don't think it's worth the trouble.

-Peff

Re: [PATCH 2/2] Add test for the new status message

2017-06-19 Thread Kaartic Sivaraam

On Sun, 2017-06-18 at 21:32 -0700, Junio C Hamano wrote:
> Kaartic Sivaraam  writes:
> 
> > +test_expect_success 'No commits yet should be noted in status
> > output' '
> > +   git init initial &&
> > +   cd initial &&
> > +   git status >output &&
> > +   test_i18ngrep "No commits yet" output &&
> > +   test_commit initial &&
> > +   git status >output &&
> > +   test_i18ngrep ! "No commits yet" output &&
> > +   test_i18ngrep "nothing.*to commit" output
> > +'
> > +
> 
> Do not "cd" in a test, without being in a subshell.  When other
> people in the future want to add new tests to the end of this
> script, the new test will end up running in the new subdirectory,
> which is not something they should have to worry about.
> 
>   git checkout --orphan empty-branch &&
>   git status >output &&
>   test_i18ngrep "No commits yet" output &&
>   ...
> 
> perhaps?
> 
> 
>  test_done
Fixed it. I wasn't aware of the guide lines for writing tests when I
used the patch from the thread blindly. I'll be careful to avoid that
in future.

Is there a way to test for the "Initial commit" message in the commit
template?

-- 
Regards,
Kaartic Sivaraam

Re: How to git push mirror local refs only?

2017-06-19 Thread Junio C Hamano

Robert Dailey  writes:

> On Fri, Jun 9, 2017 at 8:53 PM, Junio C Hamano  wrote:
>> Robert Dailey  writes:
>>
>>> So I want to update my remote fork with all my local branches.
>>> Normally I'd do this:
>>>
>>> $ git push --mirror fork
>>> ...
>> Something along this line in your .git/config:
>>
>> [remote "fork"]
>> url = ...
>> push = refs/heads/*:refs/heads/*
>>
>> and then
>>
>> $ git push --prune --follow-tags fork
>
> Sorry for the late reply, I was out of town all last week. Thanks for
> your help. Does this serve as a good general default? I can't imagine
> a case where I'd ever want to push something inside refs/remotes.

It may, as a good general default to something other than --mirror,
perhaps.  When I ask for mirroring the state of my repository, I'd
expect it to preserve as much state as possible, including the tips
of histories I received from elsewhere.

[PATCH v3 4/4] rebase: Add more regression tests for console output

2017-06-19 Thread Phillip Wood

From: Phillip Wood 

Check the console output when using --autostash and the stash does not
apply is what we expect. The test is quite strict but should catch any
changes to the console output from the various rebase flavors.

Thanks-to: Johannes Schindelin 
Signed-off-by: Phillip Wood 
---
 t/t3420-rebase-autostash.sh | 71 +++--
 1 file changed, 69 insertions(+), 2 deletions(-)

diff --git a/t/t3420-rebase-autostash.sh b/t/t3420-rebase-autostash.sh
index 
cd1012798cb300f4f1ddeba6fdcad544ca9ea1d9..2c01ae6ad2a104940e388013984e7fa2a0d5aed5
 100755
--- a/t/t3420-rebase-autostash.sh
+++ b/t/t3420-rebase-autostash.sh
@@ -88,6 +88,67 @@ create_expected_success_merge() {
EOF
 }
 
+create_expected_failure_am() {
+   cat >expected <<-EOF
+   $(grep "^Created autostash: [0-9a-f][0-9a-f]*\$" actual)
+   HEAD is now at $(git rev-parse --short feature-branch) third commit
+   First, rewinding head to replay your work on top of it...
+   Applying: second commit
+   Applying: third commit
+   Applying autostash resulted in conflicts.
+   Your changes are safe in the stash.
+   You can run "git stash pop" or "git stash drop" at any time.
+   EOF
+}
+
+create_expected_failure_interactive() {
+   q_to_cr >expected <<-EOF
+   $(grep "^Created autostash: [0-9a-f][0-9a-f]*\$" actual)
+   HEAD is now at $(git rev-parse --short feature-branch) third commit
+   Rebasing (1/2)QRebasing (2/2)QApplying autostash resulted in conflicts.
+   Your changes are safe in the stash.
+   You can run "git stash pop" or "git stash drop" at any time.
+   Successfully rebased and updated refs/heads/rebased-feature-branch.
+   EOF
+}
+
+create_expected_failure_merge() {
+   cat >expected <<-EOF
+   $(grep "^Created autostash: [0-9a-f][0-9a-f]*\$" actual)
+   HEAD is now at $(git rev-parse --short feature-branch) third commit
+   First, rewinding head to replay your work on top of it...
+   Merging unrelated-onto-branch with HEAD~1
+   Merging:
+   $(git rev-parse --short unrelated-onto-branch) unrelated commit
+   $(git rev-parse --short feature-branch^) second commit
+   found 1 common ancestor:
+   $(git rev-parse --short feature-branch~2) initial commit
+   [detached HEAD $(git rev-parse --short rebased-feature-branch~1)] 
second commit
+Author: A U Thor 
+Date: Thu Apr 7 15:14:13 2005 -0700
+2 files changed, 2 insertions(+)
+create mode 100644 file1
+create mode 100644 file2
+   Committed: 0001 second commit
+   Merging unrelated-onto-branch with HEAD~0
+   Merging:
+   $(git rev-parse --short rebased-feature-branch~1) second commit
+   $(git rev-parse --short feature-branch) third commit
+   found 1 common ancestor:
+   $(git rev-parse --short feature-branch~1) second commit
+   [detached HEAD $(git rev-parse --short rebased-feature-branch)] third 
commit
+Author: A U Thor 
+Date: Thu Apr 7 15:15:13 2005 -0700
+1 file changed, 1 insertion(+)
+create mode 100644 file3
+   Committed: 0002 third commit
+   All done.
+   Applying autostash resulted in conflicts.
+   Your changes are safe in the stash.
+   You can run "git stash pop" or "git stash drop" at any time.
+   EOF
+}
+
 testrebase() {
type=$1
dotest=$2
@@ -198,10 +259,9 @@ testrebase() {
test_config rebase.autostash true &&
git reset --hard &&
git checkout -b rebased-feature-branch feature-branch &&
-   test_when_finished git branch -D rebased-feature-branch &&
echo dirty >file4 &&
git add file4 &&
-   git rebase$type unrelated-onto-branch &&
+   git rebase$type unrelated-onto-branch >actual 2>&1 &&
test_path_is_missing $dotest &&
git reset --hard &&
grep unrelated file4 &&
@@ -210,6 +270,13 @@ testrebase() {
git stash pop &&
grep dirty file4
'
+
+   test_expect_success "rebase$type: check output with conflicting stash" '
+   test_when_finished git branch -D rebased-feature-branch &&
+   suffix=${type#\ --} && suffix=${suffix:-am} &&
+   create_expected_failure_$suffix &&
+   test_cmp expected actual
+   '
 }
 
 test_expect_success "rebase: fast-forward rebase" '
-- 
2.13.0

[PATCH v3 2/4] rebase -i: Add test for reflog message

2017-06-19 Thread Phillip Wood

From: Phillip Wood 

Check that the reflog message written to the branch reflog when the
rebase is completed is correct

Signed-off-by: Phillip Wood 
---
 t/t3404-rebase-interactive.sh | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/t/t3404-rebase-interactive.sh b/t/t3404-rebase-interactive.sh
index 
5bd0275930b715c25507fc9744c8946e7f73900b..37821d245433f757fa13f0a3e27da0312bebb7db
 100755
--- a/t/t3404-rebase-interactive.sh
+++ b/t/t3404-rebase-interactive.sh
@@ -169,6 +169,13 @@ test_expect_success 'reflog for the branch shows state 
before rebase' '
test $(git rev-parse branch1@{1}) = $(git rev-parse original-branch1)
 '
 
+test_expect_success 'reflog for the branch shows correct finish message' '
+   printf "rebase -i (finish): refs/heads/branch1 onto %s\n" \
+   "$(git rev-parse branch2)" >expected &&
+   git log -g --pretty=%gs -1 refs/heads/branch1 >actual &&
+   test_cmp expected actual
+'
+
 test_expect_success 'exchange two commits' '
set_fake_editor &&
FAKE_LINES="2 1" git rebase -i HEAD~2 &&
-- 
2.13.0

[PATCH v3 3/4] rebase: Add regression tests for console output

2017-06-19 Thread Phillip Wood

From: Phillip Wood 

Check the console output when using --autostash and the stash applies
cleanly is what we expect. The test is quite strict but should catch
any changes to the console output from the various rebase flavors.

Thanks-to: Johannes Schindelin 
Signed-off-by: Phillip Wood 
---
 t/t3420-rebase-autostash.sh | 65 +++--
 1 file changed, 63 insertions(+), 2 deletions(-)

diff --git a/t/t3420-rebase-autostash.sh b/t/t3420-rebase-autostash.sh
index 
ab8a63e8d6dc643b28eb0c74ba3f032b7532226f..cd1012798cb300f4f1ddeba6fdcad544ca9ea1d9
 100755
--- a/t/t3420-rebase-autostash.sh
+++ b/t/t3420-rebase-autostash.sh
@@ -33,6 +33,61 @@ test_expect_success setup '
git commit -m "related commit"
 '
 
+create_expected_success_am() {
+   cat >expected <<-EOF
+   $(grep "^Created autostash: [0-9a-f][0-9a-f]*\$" actual)
+   HEAD is now at $(git rev-parse --short feature-branch) third commit
+   First, rewinding head to replay your work on top of it...
+   Applying: second commit
+   Applying: third commit
+   Applied autostash.
+   EOF
+}
+
+create_expected_success_interactive() {
+   q_to_cr >expected <<-EOF
+   $(grep "^Created autostash: [0-9a-f][0-9a-f]*\$" actual)
+   HEAD is now at $(git rev-parse --short feature-branch) third commit
+   Rebasing (1/2)QRebasing (2/2)QApplied autostash.
+   Successfully rebased and updated refs/heads/rebased-feature-branch.
+   EOF
+}
+
+create_expected_success_merge() {
+   cat >expected <<-EOF
+   $(grep "^Created autostash: [0-9a-f][0-9a-f]*\$" actual)
+   HEAD is now at $(git rev-parse --short feature-branch) third commit
+   First, rewinding head to replay your work on top of it...
+   Merging unrelated-onto-branch with HEAD~1
+   Merging:
+   $(git rev-parse --short unrelated-onto-branch) unrelated commit
+   $(git rev-parse --short feature-branch^) second commit
+   found 1 common ancestor:
+   $(git rev-parse --short feature-branch~2) initial commit
+   [detached HEAD $(git rev-parse --short rebased-feature-branch~1)] 
second commit
+Author: A U Thor 
+Date: Thu Apr 7 15:14:13 2005 -0700
+2 files changed, 2 insertions(+)
+create mode 100644 file1
+create mode 100644 file2
+   Committed: 0001 second commit
+   Merging unrelated-onto-branch with HEAD~0
+   Merging:
+   $(git rev-parse --short rebased-feature-branch~1) second commit
+   $(git rev-parse --short feature-branch) third commit
+   found 1 common ancestor:
+   $(git rev-parse --short feature-branch~1) second commit
+   [detached HEAD $(git rev-parse --short rebased-feature-branch)] third 
commit
+Author: A U Thor 
+Date: Thu Apr 7 15:15:13 2005 -0700
+1 file changed, 1 insertion(+)
+create mode 100644 file3
+   Committed: 0002 third commit
+   All done.
+   Applied autostash.
+   EOF
+}
+
 testrebase() {
type=$1
dotest=$2
@@ -51,14 +106,20 @@ testrebase() {
test_config rebase.autostash true &&
git reset --hard &&
git checkout -b rebased-feature-branch feature-branch &&
-   test_when_finished git branch -D rebased-feature-branch &&
echo dirty >>file3 &&
-   git rebase$type unrelated-onto-branch &&
+   git rebase$type unrelated-onto-branch >actual 2>&1 &&
grep unrelated file4 &&
grep dirty file3 &&
git checkout feature-branch
'
 
+   test_expect_success "rebase$type --autostash: check output" '
+   test_when_finished git branch -D rebased-feature-branch &&
+   suffix=${type#\ --} && suffix=${suffix:-am} &&
+   create_expected_success_$suffix &&
+   test_cmp expected actual
+   '
+
test_expect_success "rebase$type: dirty index, non-conflicting rebase" '
test_config rebase.autostash true &&
git reset --hard &&
-- 
2.13.0

[PATCH v3 1/4] sequencer: print autostash messages to stderr

2017-06-19 Thread Phillip Wood

From: Johannes Schindelin 

The rebase messages are printed to stderr traditionally. However due
to a bug introduced in 587947750bd (rebase: implement --[no-]autostash
and rebase.autostash, 2013-05-12) which was faithfully copied when
reimplementing parts of the interactive rebase in the sequencer the
autostash messages are printed to stdout instead.

It is time to fix that: let's print the autostash messages to stderr
instead of stdout.

Signed-off-by: Johannes Schindelin 
Signed-off-by: Phillip Wood 
---
 git-rebase.sh |  4 ++--
 sequencer.c   | 11 ++-
 2 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/git-rebase.sh b/git-rebase.sh
index 
db1deed8464f0643763ed6e3c5e54221cad8c985..2cf73b88e8e83ca34b9eb319dbc2b0a220139b0f
 100755
--- a/git-rebase.sh
+++ b/git-rebase.sh
@@ -166,14 +166,14 @@ apply_autostash () {
stash_sha1=$(cat "$state_dir/autostash")
if git stash apply $stash_sha1 2>&1 >/dev/null
then
-   echo "$(gettext 'Applied autostash.')"
+   echo "$(gettext 'Applied autostash.')" >&2
else
git stash store -m "autostash" -q $stash_sha1 ||
die "$(eval_gettext "Cannot store \$stash_sha1")"
gettext 'Applying autostash resulted in conflicts.
 Your changes are safe in the stash.
 You can run "git stash pop" or "git stash drop" at any time.
-'
+' >&2
fi
fi
 }
diff --git a/sequencer.c b/sequencer.c
index 
a23b948ac148304dbebfe38955ec8b40cab3e1e5..606750b1e0c900a9fb43cea224d27ab36ca29ab9
 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -1921,7 +1921,7 @@ static int apply_autostash(struct replay_opts *opts)
argv_array_push(&child.args, "apply");
argv_array_push(&child.args, stash_sha1.buf);
if (!run_command(&child))
-   printf(_("Applied autostash.\n"));
+   fprintf(stderr, _("Applied autostash.\n"));
else {
struct child_process store = CHILD_PROCESS_INIT;
 
@@ -1935,10 +1935,11 @@ static int apply_autostash(struct replay_opts *opts)
if (run_command(&store))
ret = error(_("cannot store %s"), stash_sha1.buf);
else
-   printf(_("Applying autostash resulted in conflicts.\n"
-   "Your changes are safe in the stash.\n"
-   "You can run \"git stash pop\" or"
-   " \"git stash drop\" at any time.\n"));
+   fprintf(stderr,
+   _("Applying autostash resulted in conflicts.\n"
+ "Your changes are safe in the stash.\n"
+ "You can run \"git stash pop\" or"
+ " \"git stash drop\" at any time.\n"));
}
 
strbuf_release(&stash_sha1);
-- 
2.13.0

Re: Restoring detached HEADs after Git operations

2017-06-19 Thread Ævar Arnfjörð Bjarmason


On Mon, Jun 19 2017, Patrick Lehmann jotted:

> Hello,
>
> I wrote a Bash script to recover branch names after Git operations have 
> create detached HEADs in a Git repository containing lots of Git submodules. 
> The script works recursively.
>
> I would like to see:
> a) that script or algorithm being integrated into Git by default
> b) that as a default behavior for all Git operations creating detached HEADs
>
> That's the command:
> 
> git submodule foreach --recursive  'HEAD=$(git branch --list | head -n 1); if 
> [[ "$HEAD" == *HEAD* ]]; then REF=$(git rev-parse HEAD); FOUND=0; for Branch 
> in $(git branch --list | grep "^  " | sed -e "s/  //" ); do if [[ "$(git 
> rev-parse "$Branch")" == $REF ]]; then echo -e "  \e[36mCheckout 
> $Branch...\e[0m"; git checkout $Branch; FOUND=1; break; fi done; if [[ $FOUND 
> -eq 0 ]]; then echo -e "  \e[31mNo matching branch found.\e[0m"; fi else echo 
> -e "  \e[36mNothing to do.\e[0m"; fi'
> 
>
> How does it work:
> 1. It uses git submodule foreach to dive into each Git submodule and execute 
> a series of Bash commands.
> 2. It's reading the list of branches and checks if the submodule is in 
> detached mode. The first line contains the string HEAD.
> 3. Retrieve the hash of the detached HEAD
> 4. Iterate all local branches and get their hashes
> 5. Compare the branch hashes with the detached HEAD's hash. If it matches do 
> a checkout.
> 6. Report if no branch name was found or if a HEAD was not in detached mode.
>
> The Bash code with line breaks and indentation:
> 
> HEAD=$(git branch --list | head -n 1)
> if [[ "$HEAD" == *HEAD* ]]; then
>   REF=$(git rev-parse HEAD)
>   FOUND=0
>   for Branch in $(git branch --list | grep "^  " | sed -e "s/  //" ); do
> if [[ "$(git rev-parse "$Branch")" == $REF ]]; then
>   echo -e "  \e[36mCheckout $Branch...\e[0m"
>   git checkout $Branch
>   FOUND=1
>   break
> fi
>   done
>   if [[ $FOUND -eq 0 ]]; then
> echo -e "  \e[31mNo matching branch found.\e[0m"
>   fi
> else
>   echo -e "  \e[36mNothing to do.\e[0m"
> fi
> 
>
> Are their any chances to get it integrated into Git?
>
> I tried to register that code as a Git alias, but git config complains about 
> quote problem not showing where. It neither specifies if it's a single or 
> double quote problem. Any advice on how to register that piece of code as an 
> alias?
>
> If wished, I think I could expand the script to also recover hash values to 
> Git tags if no branch was found.

I have something similar to this, this written before git-submodule
learned --branch (or at least before I knew about it):

$ git config alias.sm-pull-all
!git submodule foreach 'git checkout $(NAME=$name git sm-mainbranch) && git 
pull'
$ git config alias.sm-mainbranch
!git config --file ../.gitmodules submodule.$NAME.branch || git describe 
--all --always | sed 's!^heads/!!'

So with this I can run `git sm-pull-all` to update all the submodules in
a superproject to update them all. This relies on the branch name being
in the .gitmodules config.a

Now if you add a module with e.g. 'git submodule add -b master ...'
you'll have it checked out at the master branch, but I'm not familiar
enough with all the workflows around them to say if there are any holes
in that implementation.

But that seems like a fundimentally better approach to me than what
you're suggesting. Why would we try to work our way back from the SHA-1
and guess what branch it lives on when we can just encode the branch
name we'd like to be on in the superproject?

[PATCH v3 0/4] Add regression tests for recent rebase -i fixes

2017-06-19 Thread Phillip Wood

From: Phillip Wood 

I've updated the second two tests to be portable using q_to_cr() as
Johannes suggested and added his patch to fix the autostash messages
going to stdout rather than stderr. The reflog message test is
unchanged. Thanks to Johannes for his help and to Junio for picking up
the bashism in the last iteration.

Johannes Schindelin (1):
  sequencer: print autostash messages to stderr

Phillip Wood (3):
  rebase -i: Add test for reflog message
  rebase: Add regression tests for console output
  rebase: Add more regression tests for console output

 git-rebase.sh |   4 +-
 sequencer.c   |  11 ++--
 t/t3404-rebase-interactive.sh |   7 +++
 t/t3420-rebase-autostash.sh   | 136 --
 4 files changed, 147 insertions(+), 11 deletions(-)

-- 
2.13.0

Re: [PATCH/RFC] Cleanup Documentation

2017-06-19 Thread Stefan Beller

On Mon, Jun 19, 2017 at 10:33 AM, Kaartic Sivaraam
 wrote:

>
>> Please markup the '.gitmodules' either via single quotes or `.
>> (or even link to 'gitmodules(5)' )
>>
> Marked it up using `. Help needed to link to 'gitmodules(5)', as I'm
> not sure how to provide alternative text to 'linkgit:'.

I did not mean  'gitmodules(5)' literally, I rather meant using linkgit:
as you know it. :)

>
>> I am undecided if this is really removing (2) unclearness, but the
>> (1) redundancy seems fine to me.
>>
> Sorry about that. The commit message should have been,
>
> ...
> 2. Removed unclear back reference

Oh that clears some confusion here. :)

> Note: Will follow up with a patch, soon.

Thanks.
Stefan

Re: Restoring detached HEADs after Git operations

2017-06-19 Thread Junio C Hamano

Stefan Beller  writes:

> On Mon, Jun 19, 2017 at 2:52 AM, Patrick Lehmann
>  wrote:
>> Hello Lars,
>>
>> for your questions:
>>> If there are multiple branches with the same hash then your script would 
>>> pick the first one. Can you imagine a situation where this would be a 
>>> problem?
>>
>> I can't think of a good solution to resolve it automatically. Maybe a script 
>> could print that there are multiple possibilities and it choose the first 
>> branch in the list.
>>
>>
>>> Plus, you are looking only at local branches. Wouldn't it make sense to 
>>> look at remote branches, too?
>>
>> This is also related to restoring tags. If we go this way, we should have 
>> this priority list:
>> - local branches
>> - remote branches
>
> For remote branches you would create a local branch of the same name
> (if such a branch would not exist, possibly setting it up to track that remote
> branch)?
>
>> - tags
>
> as said in the other email and similar to remote branches, we'd not want to 
> have
> HEAD pointing to them directly but somehow have a local branch.

Let's step back a bit.  We detach the HEAD for a good reason, no?
Why is it a good idea to move them back on to a branch picked among
multiple ones that all happen to be pointing at the same commit?

The user may build on a history of a submodule, and then may push
the result out to a particular branch at the other side; that is
when being on a named branch in the submodule becomes useful, but
even then I do not think randomly picking one branch and be on it
is a good thing to do.

I would understand the workflow would go more like so:

 - You do something at the superproject (e.g. create a new branch X
   from an existing commit and check it out), which results in
   submodules' HEADs getting detached at the commits bound to the
   superproject's tree.

 - Because you want to make changes to both submodules and the
   superproject in a consistent way, you'd want to commit changes to
   all of these repositories and the push the result out in an
   atomic way.

 - Hence you tell "Hey, Git, I want all the submodules that I
   modified to be on branch X" from the superproject.

   - This may succeed in a submodule where X is a new name, or the
 current tip of branch X is an ancestor of the detached HEAD.

   - This may fail in a submodule where there is branch X that does
 not want to move to the detached HEAD's state.  In this latter
 case, the user needs to deal with the situation (perhaps the
 old X is expendable; perhaps the HEAD's commit may need to be
 merged to old X; perhaps there are other cases).

though.

Re: Behavior of 'git fetch' for commit hashes

2017-06-19 Thread Jonathan Tan

On Mon, 19 Jun 2017 12:09:28 +
 wrote:

> For version 2.7.4
> =
> Git exits with exit code 1.
> 
> However, if I first do 'git fetch ', then 'git fetch  will
> also work
> 
>  * branch-> FETCH_HEAD

I suspect that what is happening is that 'git fetch ' also
downloads the commit referenced by , so the subsequent 'git fetch
' is a no-op (except setting FETCH_HEAD).

> For version 2.13.3
> ==
> Git exits with exit code 128 and message
> fatal: Couldn't find remote ref
> 
> However, the workaround for descbibed abot for git version 2.7.4 no
> longer works. The result is always
> fatal: Couldn't find remote ref

I'll take a look into this.

> Desired result
> ==
> Commit is in .git/FETCH_HEAD and can be checked out.
> 
> 
> I want to checkout a specific commit without creating any extra named
> remotes in the local git clone.
> 
> Finally,
> What is the expected behavior for 'git fetch' in this case?
> Is there some other way I can achieve my goals?

I'll take a look into the expected behavior, but if my assumptions are
correct, you should be able to just checkout the commit you want after
fetching the branch:

  git fetch  
  git checkout

Re: Restoring detached HEADs after Git operations

2017-06-19 Thread Stefan Beller

On Mon, Jun 19, 2017 at 10:34 AM, Patrick Lehmann
 wrote:
> Hello,
>
> I'm just an advanced Git user, not a Git developer. So I might find some time 
> to improve the suggested script, which I provided with the hints given on the 
> mailing list, but I have no time to do a complete feature release in your 
> patch based Git flow.

ok, thanks for letting us know. I may re-prioritize the "reattach
HEAD" patches that I referenced earlier.
I would have hoped that additionally to the shell lines you'd have
given a good use case/summary.

> I have no experience with other shells then Bash. So if you rely on a Bash 
> with less features, please port the syntax to such a shell system. (I 
> personally do not support legacy programs or out-date programs).
>
> --
> We are talking about circa 50 submodules in total with a maximum depth of 4. 
> The platforms are:
> - Mint OS with Git in Bash
> - Windows 7 with Git-Bash
> - Windows 10 with Git-Bash
> - Windows 10 with Posh-Git

Thanks,
Stefan

Re: [PATCH v5 4/5] convert: move multiple file filter error handling to separate function

2017-06-19 Thread Lars Schneider


> On 19 Jun 2017, at 19:18, Torsten Bögershausen  wrote:
> 
> On 2017-06-18 13:47, Lars Schneider wrote:
>> 
>>> On 18 Jun 2017, at 09:20, Torsten Bögershausen  wrote:
>>> 
>>> 
>>> On 2017-06-01 10:22, Lars Schneider wrote:
 This is useful for the subsequent patch 'convert: add "status=delayed" to
 filter process protocol'.
>>> 
>>> May be
>>> Collecting all filter error handling is useful for the subsequent patch
>>> 'convert: add "status=delayed" to filter process protocol'.
>> 
>> I think I get your point. However, I feel "Collecting" is not the right word.
>> 
>> How about:
>> "Refactoring filter error handling is useful..."?
> 
> OK

OK, I'll change it in the next round.

 
 Signed-off-by: Lars Schneider 
 ---
 convert.c | 47 ++-
 1 file changed, 26 insertions(+), 21 deletions(-)
 
 diff --git a/convert.c b/convert.c
 index f1e168bc30..a5e09bb0e8 100644
 --- a/convert.c
 +++ b/convert.c
 @@ -565,6 +565,29 @@ static int start_multi_file_filter_fn(struct 
 subprocess_entry *subprocess)
return err;
 }
 
 +static void handle_filter_error(const struct strbuf *filter_status,
 +  struct cmd2process *entry,
 +  const unsigned int wanted_capability) {
 +  if (!strcmp(filter_status->buf, "error")) {
 +  /* The filter signaled a problem with the file. */
 +  } else if (!strcmp(filter_status->buf, "abort") && wanted_capability) {
 +  /*
 +   * The filter signaled a permanent problem. Don't try to filter
 +   * files with the same command for the lifetime of the current
 +   * Git process.
 +   */
 +   entry->supported_capabilities &= ~wanted_capability;
 +  } else {
 +  /*
 +   * Something went wrong with the protocol filter.
 +   * Force shutdown and restart if another blob requires 
 filtering.
 +   */
 +  error("external filter '%s' failed", entry->subprocess.cmd);
 +  subprocess_stop(&subprocess_map, &entry->subprocess);
 +  free(entry);
 +  }
 +}
 +
 static int apply_multi_file_filter(const char *path, const char *src, 
 size_t len,
   int fd, struct strbuf *dst, const char *cmd,
   const unsigned int wanted_capability)
 @@ -656,28 +679,10 @@ static int apply_multi_file_filter(const char *path, 
 const char *src, size_t len
 done:
sigchain_pop(SIGPIPE);
 
 -  if (err) {
 -  if (!strcmp(filter_status.buf, "error")) {
 -  /* The filter signaled a problem with the file. */
>>>   Note1: Do we need a line with a single ";" here ?
>> 
>> The single ";" wouldn't hurt but I don't think it is helpful either.
>> I can't find anything in the coding guidelines about this...
> 
> OK about that. I was thinking to remove the {}, and the we -need- the ";"

True. However, I prefer the {} style here. Would that be OK with you?
Plus, this commit is not supposed to change code. I just want to move the
code to a different function.


>>>   Question: What should/will happen to the file ?
>>>   Will Git complain later ? Retry later ?
>> 
>> The file is not filtered and Git does not try, again. 
>> That might be OK for certain filters:
>> If "filter.foo.required = false" then this would be OK. 
>> If "filter.foo.required = true" then this would cause an error.
> 
> Does it make sense to add it as a comment ?
> I know, everything is documented otherwise, but when looking at the source
> code only, the context may be useful, it may be not.

I agree. I'll add a comment!

>> 
 -  } else if (!strcmp(filter_status.buf, "abort")) {
 -  /*
 -   * The filter signaled a permanent problem. Don't try 
 to filter
 -   * files with the same command for the lifetime of the 
 current
 -   * Git process.
 -   */
 -   entry->supported_capabilities &= ~wanted_capability;
>>>Hm, could this be clarified somewhat ?
>>>Mapping "abort" to "permanent problem" makes sense as
>>>such, but the only action that is done is to reset
>>>a capablity.
>> 
>> I am not sure what you mean with "reset capability". We disable the 
>> capability here.
>> That means Git will not use the capability for the active session.
> 
> Sorry, my wrong - reset is a bad word here.
> "Git will not use the capability for the active session" is perfect!

OK :)


>>> /*
>>>  * The filter signaled a missing capabilty. Don't try to filter
>>>  * files with the same command for the lifetime o

Re: [PATCH v5 08/10] rebase -i: skip unnecessary picks using the rebase--helper

2017-06-19 Thread Junio C Hamano

Jeff King  writes:

> On Mon, Jun 19, 2017 at 11:45:50AM +0200, Johannes Schindelin wrote:
>
>> The reason for this suggestion is that one of the revision machinery's
>> implementation details is an ugly little semi-secret: the pretty-printing
>> machinery uses a global state, and that is why we need the "pretty_given"
>> flag in the first place.
>
> I think that's mis-stating Junio's complaint. The point is not the
> pretty_given flag itself, which we know about and can work around. The
> point is that we don't know what other similar problems we have or will
> have due to future changes in the revision code.
>
> In other words, there are two APIs: the one where C code manipulates
> rev_info directly, and the one where revision.c responds to string
> arguments. From a maintenance perspective, it is easy for somebody make
> a change that works for the latter but not the former.

There are (1) the API for users of revision traversal machinery and
(2) the implementation detail of the API.  Of course, the code that
actually parses the plumbing and end-user supplied set of options
needs to manipulate rev_info directly, but we should avoid direct
manipulation when we can to future-proof the codebase.

My main complaint is that Johannes's "compiler can catch mistakes"
is true only for the most trivial kind of mistakes.

The current implementation detail for reacting to --format=""
given by the revision traversal machinery happens to be to set three
things in rev_info structure: set verbose_header and pretty_given to
1 and then call get_commit_format() on the format string.  Dscho is
correct to say that the compiler can catch a mistake that misspells,
say verbose_header as verbos_header.  The compiler would not catch
an equivalent mistake to misspell the option as --formta="",
so from that point of view, his argument may seem to make sense.

But the compiler would not help catching a copy-and-paste mistake to
do only two out of the three things needed (e.g. forgetting to set
pretty_given).  If the code relies on setup_revisions() to react to
options just the way "rev-list" plumbing does, such a mistake cannot
happen in the first place---there is no need to worry about "catching".

As you clarified correctly, the writer of the code that _uses_ the
machinery, like the one in sequencer_make_script(), cannot
anticipate how the internal implementation of reacting to the
features will change, and more importantly, it should not have to
know how it will change.  By using the setup_revisions() API that
uses the exactly the same parser as plumbing command "git rev-list"
does, the forward compatibility is guaranteed.  Copying and pasting
the current internal implementation detail will break that.

> I do agree that the lack of compile-time safety for obvious mistakes
> like "--pertty" is a downside, though. On the other hand, there are
> strong run-time checks there, so the tests would catch it.

True.

After setup_revisions() returns, if there are unrecognized options
(e.g. misspelt "--pertty"), that can be used as the indication of a
programming error, as the new code is not even parsing arbitrary set
of options given by the end-user.

Re: [PATCH/RFC] Cleanup Documentation

2017-06-19 Thread Kaartic Sivaraam

On Sun, 2017-06-18 at 22:50 -0700, Stefan Beller wrote:
> > > diff --git a/Documentation/git-submodule.txt b/Documentation/git-
> > > submodule.txt
> > > index 74bc6200d..9812b0655 100644
> > > --- a/Documentation/git-submodule.txt
> > > +++ b/Documentation/git-submodule.txt
> > > @@ -63,13 +63,7 @@ add [-b ] [-f|--force] [--name ]
> > > [--reference ] [--dep
> > >   to the changeset to be committed next to the current
> > >   project: the current project is termed the "superproject".
> > >  +
> > > -This requires at least one argument: . The optional
> > > -argument  is the relative location for the cloned
> > > submodule
> > > -to exist in the superproject. If  is not given, the
> > > -"humanish" part of the source repository is used ("repo" for
> > > -"/path/to/repo.git" and "foo" for "host.xz:foo/.git").
> > > -The  is also used as the submodule's logical name in its
> > > -configuration entries unless `--name` is used to specify a
> > > logical name.
> > > +This requires at least one argument: .
> > >  +
> 
> So we're losing the information how the submodule name is chosen.
I just moved it. I don't think we're losing anything related to how the
name is chosen. Please let me know if I misinterpreted your statement.

> This may be fine as I plan (long term) to make the name an arbitrary
> random
> string (IMHO that reduces confusion as there will be less 'nearly the
> same'
> things)
> 
> On the other hand the newly added line
>   'This requires at least one argument:  (actually moved, but) is sort of redundant. The notation in the
> argument line
> should make that clear, already?
> 
Makes clear sense. Removed it.

> This sounds good, it consolidates all information about []
> in one paragraph. While at it, maybe let's find another (better)
> substitute for "humanish" as that can be anything(?).
> 
> Maybe "the last part of the URL" (without any .git)
> 
How about "meaningful"? Put in place it reads like,

If  is not given, the meaningful part of the source repository
...

> Please markup the '.gitmodules' either via single quotes or `.
> (or even link to 'gitmodules(5)' )
> 
Marked it up using `. Help needed to link to 'gitmodules(5)', as I'm
not sure how to provide alternative text to 'linkgit:'.

> I am undecided if this is really removing (2) unclearness, but the
> (1) redundancy seems fine to me.
> 
Sorry about that. The commit message should have been,

...
2. Removed unclear back reference
...

by which I intend to denote the following removal,
> -In either case, the given URL is recorded into .gitmodules for
> -use by subsequent users cloning the superproject.

Note: Will follow up with a patch, soon.
-- 
Regards,
Kaartic Sivaraam

AW: Restoring detached HEADs after Git operations

2017-06-19 Thread Patrick Lehmann

Hello,

I'm just an advanced Git user, not a Git developer. So I might find some time 
to improve the suggested script, which I provided with the hints given on the 
mailing list, but I have no time to do a complete feature release in your patch 
based Git flow.

I'm currently involved in 8 other open source projects. One can't improve the 
world alone by supplying patches to any open source project one is using...

I have no experience with other shells then Bash. So if you rely on a Bash with 
less features, please port the syntax to such a shell system. (I personally do 
not support legacy programs or out-date programs).

--
We are talking about circa 50 submodules in total with a maximum depth of 4. 
The platforms are:
- Mint OS with Git in Bash
- Windows 7 with Git-Bash
- Windows 10 with Git-Bash
- Windows 10 with Posh-Git

Kind regards
Patrick

Von: Stefan Beller [sbel...@google.com]
Gesendet: Montag, 19. Juni 2017 18:37
Bis: Patrick Lehmann
Cc: Lars Schneider; Git Mailinglist
Betreff: Re: Restoring detached HEADs after Git operations

On Mon, Jun 19, 2017 at 2:52 AM, Patrick Lehmann
 wrote:
> Hello Lars,
>
> for your questions:
>> If there are multiple branches with the same hash then your script would 
>> pick the first one. Can you imagine a situation where this would be a 
>> problem?
>
> I can't think of a good solution to resolve it automatically. Maybe a script 
> could print that there are multiple possibilities and it choose the first 
> branch in the list.
>
>
>> Plus, you are looking only at local branches. Wouldn't it make sense to look 
>> at remote branches, too?
>
> This is also related to restoring tags. If we go this way, we should have 
> this priority list:
> - local branches
> - remote branches

For remote branches you would create a local branch of the same name
(if such a branch would not exist, possibly setting it up to track that remote
branch)?

> - tags

as said in the other email and similar to remote branches, we'd not want to have
HEAD pointing to them directly but somehow have a local branch.

>> Submodule processing is already quite slow if you have many of them. I 
>> wonder how much this approach would affect the performance.
>
> Yes. It takes a few seconds to iterate all the submodules. It could be 
> improved if the processing wouldn't be based on slow Bash scripts spawning 
> lot's of sub-shells to execute multiple Git commands.

How many submodules are we talking about? (Are you on Windows to make
shell even more fun?)

Re: [PATCH v5 4/5] convert: move multiple file filter error handling to separate function

2017-06-19 Thread Torsten Bögershausen

On 2017-06-18 13:47, Lars Schneider wrote:
> 
>> On 18 Jun 2017, at 09:20, Torsten Bögershausen  wrote:
>>
>>
>> On 2017-06-01 10:22, Lars Schneider wrote:
>>> This is useful for the subsequent patch 'convert: add "status=delayed" to
>>> filter process protocol'.
>>
>> May be
>> Collecting all filter error handling is useful for the subsequent patch
>> 'convert: add "status=delayed" to filter process protocol'.
> 
> I think I get your point. However, I feel "Collecting" is not the right word.
> 
> How about:
> "Refactoring filter error handling is useful..."?

OK


> 
>>>
>>> Signed-off-by: Lars Schneider 
>>> ---
>>> convert.c | 47 ++-
>>> 1 file changed, 26 insertions(+), 21 deletions(-)
>>>
>>> diff --git a/convert.c b/convert.c
>>> index f1e168bc30..a5e09bb0e8 100644
>>> --- a/convert.c
>>> +++ b/convert.c
>>> @@ -565,6 +565,29 @@ static int start_multi_file_filter_fn(struct 
>>> subprocess_entry *subprocess)
>>> return err;
>>> }
>>>
>>> +static void handle_filter_error(const struct strbuf *filter_status,
>>> +   struct cmd2process *entry,
>>> +   const unsigned int wanted_capability) {
>>> +   if (!strcmp(filter_status->buf, "error")) {
>>> +   /* The filter signaled a problem with the file. */
>>> +   } else if (!strcmp(filter_status->buf, "abort") && wanted_capability) {
>>> +   /*
>>> +* The filter signaled a permanent problem. Don't try to filter
>>> +* files with the same command for the lifetime of the current
>>> +* Git process.
>>> +*/
>>> +entry->supported_capabilities &= ~wanted_capability;
>>> +   } else {
>>> +   /*
>>> +* Something went wrong with the protocol filter.
>>> +* Force shutdown and restart if another blob requires 
>>> filtering.
>>> +*/
>>> +   error("external filter '%s' failed", entry->subprocess.cmd);
>>> +   subprocess_stop(&subprocess_map, &entry->subprocess);
>>> +   free(entry);
>>> +   }
>>> +}
>>> +
>>> static int apply_multi_file_filter(const char *path, const char *src, 
>>> size_t len,
>>>int fd, struct strbuf *dst, const char *cmd,
>>>const unsigned int wanted_capability)
>>> @@ -656,28 +679,10 @@ static int apply_multi_file_filter(const char *path, 
>>> const char *src, size_t len
>>> done:
>>> sigchain_pop(SIGPIPE);
>>>
>>> -   if (err) {
>>> -   if (!strcmp(filter_status.buf, "error")) {
>>> -   /* The filter signaled a problem with the file. */
>>Note1: Do we need a line with a single ";" here ?
> 
> The single ";" wouldn't hurt but I don't think it is helpful either.
> I can't find anything in the coding guidelines about this...

OK about that. I was thinking to remove the {}, and the we -need- the ";"

> 
> 
>>Question: What should/will happen to the file ?
>>Will Git complain later ? Retry later ?
> 
> The file is not filtered and Git does not try, again. 
> That might be OK for certain filters:
> If "filter.foo.required = false" then this would be OK. 
> If "filter.foo.required = true" then this would cause an error.

Does it make sense to add it as a comment ?
I know, everything is documented otherwise, but when looking at the source
 code only, the context may be useful, it may be not.

> 
> 
>>> -   } else if (!strcmp(filter_status.buf, "abort")) {
>>> -   /*
>>> -* The filter signaled a permanent problem. Don't try 
>>> to filter
>>> -* files with the same command for the lifetime of the 
>>> current
>>> -* Git process.
>>> -*/
>>> -entry->supported_capabilities &= ~wanted_capability;
>> Hm, could this be clarified somewhat ?
>> Mapping "abort" to "permanent problem" makes sense as
>> such, but the only action that is done is to reset
>> a capablity.
> 
> I am not sure what you mean with "reset capability". We disable the 
> capability here.
> That means Git will not use the capability for the active session.

Sorry, my wrong - reset is a bad word here.
"Git will not use the capability for the active session" is perfect!

> 
> 
>>  /*
>>   * The filter signaled a missing capabilty. Don't try to filter
>>   * files with the same command for the lifetime of the current
>>   * Git process.
>>   */
> 
> I like the original version better because the capability is not "missing".
> There is "a permanent problem" and the filter wants to signal Git to not use
> this capability for the current session anymore.

Git and the filter are in a negotiation phase to find out which side is able
to do what.So I don't thi

1 2 >

1 - 100 of 127 matches

Mail list logo