Re: [PATCH 2/3] t1301: move movebits() to test-lib-functions.sh

2017-06-22 Thread Junio C Hamano
Ramsay Jones  writes:

> On 22/06/17 20:52, Junio C Hamano wrote:
>> Christian Couder  writes:
>> 
>>> As the movebits() function can be useful outside t1301,
>>> let's move it into test-lib-functions.sh, and while at
>>> it let's rename it test_movebits().
>> 
>> Good thinking, especially on the renaming.
>
> Err, except for the commit message! :-D
>
> Both the commit message subject and the commit message body
> refer to _move_bits() rather than _mode_bits() etc.
> (So, three instances of s/move/mode/).

Wow, I shouldn't give praising reviews too easily ;-)

Good eyes, thanks.


Re: What's cooking in git.git (Jun 2017, #06; Thu, 22)

2017-06-22 Thread Junio C Hamano
Ævar Arnfjörð Bjarmason  writes:

> On Thu, Jun 22 2017, Junio C. Hamano jotted:
>
>> * sd/branch-copy (2017-06-18) 3 commits
>>  - branch: add a --copy (-c) option to go with --move (-m)
>>  - branch: add test for -m renaming multiple config sections
>>  - config: create a function to format section headers
>>
>>  "git branch" learned "-c/-C" to create and switch to a new branch
>>  by copying an existing one.
>>
>>  Has a bit of interaction with two other topics, so perhaps needs to
>>  wait for them to stabilize a bit more.
>
> What topics are those? Didn't see any outright conflicts, but might have
> missed something. Anything Sahil & I can help with?

Sorry, I write these and then forget the details because I have to
tend to many other topics all the time X-<.  Perhaps attempting the
merge to 'pu' yourself will tell you more?

>> * ab/sha1dc (2017-06-07) 2 commits
>> ...
>>  Will keep in 'pu'.
>>  Impact to the various build and release infrastructure of using
>>  submodule is not yet fully known, but this lets us dip our toes.
> ...
> But it's been 1 month kicking around in pu now. What are we gaining from
> keeping it there even longer at this point?

Keeping as many things outside 'next' means I have less things I
have to worry about ;-)

I am sort of waiting for a success from Windows box at Travis.  It
hasn't passed for other reasons for a while, though.

>> * xz/send-email-batch-size (2017-05-23) 1 commit
>>  - send-email: --batch-size to work around some SMTP server limit
>>
>>  "git send-email" learned to overcome some SMTP server limitation
>>  that does not allow many pieces of e-mails to be sent over a single
>>  session.
>>
>>  Waiting for response.
>>  cf. 
>
> I think between <7993e188.d18d.15c3560bcaf.coremail.zxq_yx_...@163.com>
> & <20170523103050.1f7ab7e0@jvn> we have sufficient info about what bug
> this solves to try an alternate approach at some other time.

I thought your wish (which I found reasonable) was to record
whatever information that would help us in the future in the log
message?  I was waiting for that to happen.



Re: your mail

2017-06-22 Thread Junio C Hamano
Jeff King  writes:

>> +You don't need to be subscribed to the list to send mail to it, and
>> +others on-list will generally CC you when replying (although some
>> +forget this). It's adviced to subscribe to the list if you want to be
>> +sure you're not missing follow-up discussion, or if your interest in
>> +the project is wider than a one-off bug report, question or patch.
>> +
>>  The maintainer frequently sends the "What's cooking" reports that
>>  list the current status of various development topics to the mailing
>>  list.  The discussion following them give a good reference for
>
> You perhaps already read it, but you may want to steal wording or
> suggestions from the mailing list section at:
>
>   https://git-scm.com/community
>
> which is covering the same ideas (and vice versa, patches to that page
> are welcome if the README says something better).

OK, so... Ævar's version does not mention:

 - text/plain, which is a very good addition.

 - note about CC in a way as useful as the "community" page does,
   which may want to steal from the latter.

 - the archive, but it may not be needed in the context of this
   document.  "Read the archive to make sure you are not repeating
   somebody else said before speaking" is something we silently wish
   everybody to follow, but it is something we do not want to say
   out loud, especially to new people.

 - Windows, but I am not sure if it is necessary or even healthy.
   One thing I would rather not to see is the Windows mailing list
   becomes the first line triage place for any and all issues that
   were experienced by a user who happened to be using Windows, and
   majority of the issues turn out to be unspecific to Windows.  I'd
   suspect that all of us rather would want to see the referral go
   the other way around.

Otoh, "community" page does not encourage subscription as a way to
ensure you'll see follow-up discussion, which may be a good thing to
add.

A tangent I just found funny is this paragraph on the "community"
page:

The archive can be found on public-inbox. Click here to
subscribe.

Of course clicking does not take you to a subscription page for
public-inbox, even though the two sentences appear to be related.

Perhaps swap the order of the two, like so, with a bit richer
explanation taken from Ævar's version:

... disable HTML in your outgoing messages.

By subscribing (click here), you can make sure you're not
missing follow-up discussion and also learn from other
development in the community.  The list archive can be found
on public-inbox.



Re: [PATCH v3 0/4] Add regression tests for recent rebase -i fixes

2017-06-22 Thread Junio C Hamano
Junio C Hamano  writes:

>>  git-rebase.sh |   4 +-
>>  sequencer.c   |  11 ++--
>>  t/t3404-rebase-interactive.sh |   7 +++
>>  t/t3420-rebase-autostash.sh   | 136 
>> --
>>  4 files changed, 147 insertions(+), 11 deletions(-)
>
> I've merged this to 'next' but I probably shouldn't have before
> making sure that Travis tests passes 'pu' while this was still in
> there.  
>
> At least t3420 seems to fail under GETTEXT_POISON build.
>
>   https://travis-ci.org/git/git/jobs/245990993

This should be sufficient to make t3420 pass.  It seems that t3404
is also broken under GETTEXT_POISON build, but I won't have time to
look at it, at least tonight.

$ make GETTEXT_POISON=YesPlease
$ cd t && sh ./t3404-*.sh -i -v

to see how it breaks.

Thanks.

 t/t3420-rebase-autostash.sh | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/t/t3420-rebase-autostash.sh b/t/t3420-rebase-autostash.sh
index 6826c38cbd..e243700660 100755
--- a/t/t3420-rebase-autostash.sh
+++ b/t/t3420-rebase-autostash.sh
@@ -178,7 +178,7 @@ testrebase () {
test_when_finished git branch -D rebased-feature-branch &&
suffix=${type#\ --} && suffix=${suffix:-am} &&
create_expected_success_$suffix &&
-   test_cmp expected actual
+   test_i18ncmp expected actual
'
 
test_expect_success "rebase$type: dirty index, non-conflicting rebase" '
@@ -275,7 +275,7 @@ testrebase () {
test_when_finished git branch -D rebased-feature-branch &&
suffix=${type#\ --} && suffix=${suffix:-am} &&
create_expected_failure_$suffix &&
-   test_cmp expected actual
+   test_i18ncmp expected actual
'
 }
 


Re: [PATCH 08/26] diff.c: migrate emit_line_checked to use emit_diff_symbol

2017-06-22 Thread Junio C Hamano
Stefan Beller  writes:

> Actually that function already has some quick return:
>
> static int new_blank_line_at_eof(struct emit_callback *ecbdata, const
> char *line, int len)
> {
> if (!((ecbdata->ws_rule & WS_BLANK_AT_EOF) &&
>  ecbdata->blank_at_eof_in_preimage &&
>  ecbdata->blank_at_eof_in_postimage &&
>  ecbdata->blank_at_eof_in_preimage <= ecbdata->lno_in_preimage &&
>  ecbdata->blank_at_eof_in_postimage <= ecbdata->lno_in_postimage))
>   return 0;
> return ws_blank_line(line, len, ecbdata->ws_rule);
> }

I wouldn't call the first if() statement "quick return".  It
probably weighs as much as ws_blank_line() itself, which is not that
expensive.  

Even though new_blank_line_at_eof() may have been coded efficiently
and not very expensive as a whole, the fact remains that it didn't
get called when we know it was unneeded, and it now will get called
before we figure out if it is necessary.

And that is why I said it is a bit unsatisfactory (as opposed to
"horrible" or with other more severe adjective).




Re: [PATCH] pathspec: die on empty strings as pathspec

2017-06-22 Thread Junio C Hamano
Junio C Hamano  writes:

> Emily Xie  writes:
>
>> I ran the tests and none of them failed. 
>
> This is not about a test you touched, but applied to or merged to
> any of the recent integration branches (like 'master' or 'maint')
>
> $ make
> $ cd t
> $ GIT_TEST_LONG=YesPlease sh ./t0027-*.sh
>
> fails at the very beginning.  I do not know if 0027 is the only one
> that triggers this failure, though.

t0027 should pass with this, I think.

I am not sure if we even want the dot there, but at least that is
what the original author of the test intended to do when s/he
decided to pass an empty string as the pathspec.

 t/t0027-auto-crlf.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/t/t0027-auto-crlf.sh b/t/t0027-auto-crlf.sh
index 93725895a4..e41c9b3bb2 100755
--- a/t/t0027-auto-crlf.sh
+++ b/t/t0027-auto-crlf.sh
@@ -322,7 +322,7 @@ test_expect_success 'setup master' '
echo >.gitattributes &&
git checkout -b master &&
git add .gitattributes &&
-   git commit -m "add .gitattributes" "" &&
+   git commit -m "add .gitattributes" . &&
printf "\$Id:  
\$\nLINEONE\nLINETWO\nLINETHREE" >LF &&
printf "\$Id:  
\$\r\nLINEONE\r\nLINETWO\r\nLINETHREE" >CRLF &&
printf "\$Id:  
\$\nLINEONE\r\nLINETWO\nLINETHREE"   >CRLF_mix_LF &&


Re: [PATCH v3 0/4] Add regression tests for recent rebase -i fixes

2017-06-22 Thread Junio C Hamano
Phillip Wood  writes:

> From: Phillip Wood 
>
> I've updated the second two tests to be portable using q_to_cr() as
> Johannes suggested and added his patch to fix the autostash messages
> going to stdout rather than stderr. The reflog message test is
> unchanged. Thanks to Johannes for his help and to Junio for picking up
> the bashism in the last iteration.
>
> Johannes Schindelin (1):
>   sequencer: print autostash messages to stderr
>
> Phillip Wood (3):
>   rebase -i: Add test for reflog message
>   rebase: Add regression tests for console output
>   rebase: Add more regression tests for console output
>
>  git-rebase.sh |   4 +-
>  sequencer.c   |  11 ++--
>  t/t3404-rebase-interactive.sh |   7 +++
>  t/t3420-rebase-autostash.sh   | 136 
> --
>  4 files changed, 147 insertions(+), 11 deletions(-)

I've merged this to 'next' but I probably shouldn't have before
making sure that Travis tests passes 'pu' while this was still in
there.  

At least t3420 seems to fail under GETTEXT_POISON build.

  https://travis-ci.org/git/git/jobs/245990993



Re: [PATCH] pathspec: die on empty strings as pathspec

2017-06-22 Thread Junio C Hamano
Emily Xie  writes:

> I ran the tests and none of them failed. 

This is not about a test you touched, but applied to or merged to
any of the recent integration branches (like 'master' or 'maint')

$ make
$ cd t
$ GIT_TEST_LONG=YesPlease sh ./t0027-*.sh

fails at the very beginning.  I do not know if 0027 is the only one
that triggers this failure, though.

THanks.



Re: Truncating HEAD reflog on branch move

2017-06-22 Thread Jeff King
On Thu, Jun 22, 2017 at 06:25:46PM -0400, Jeff King wrote:

> So here's a patch on top of what I posted before that pushes the reflog
> check into the loop (it just decides whether to pull from the reflogs or
> from the commit queue at the top of the loop).
> 
> I was surprised to find, though, that simplify_commit() does not
> actually do the pathspec limiting! It's done by
> try_to_simplify_commit(), which is part of add_parents_to_list(). I
> hacked around it in the later part of the loop by calling
> try_to_simplify myself and checking the TREESAME flag. But I have a
> feeling I'm missing something about how the traversal is supposed to
> work.
> 
> This does behave sensibly with "--no-merges" and "--merges", as well as
> pathspec limiting.

And here's one more patch on top of those that's necessary to get the
tests to pass (I don't expect anybody to necessarily be applying this
slow string of patches; it's just to show the direction I'm looking in).

---
 builtin/log.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/builtin/log.c b/builtin/log.c
index 998437b23..512538479 100644
--- a/builtin/log.c
+++ b/builtin/log.c
@@ -373,9 +373,9 @@ static int cmd_log_walk(struct rev_info *rev)
if (!rev->reflog_info) {
/* we allow cycles in reflog ancestry */
free_commit_buffer(commit);
+   free_commit_list(commit->parents);
+   commit->parents = NULL;
}
-   free_commit_list(commit->parents);
-   commit->parents = NULL;
if (saved_nrl < rev->diffopt.needed_rename_limit)
saved_nrl = rev->diffopt.needed_rename_limit;
if (rev->diffopt.degraded_cc_to_c)


Re:

2017-06-22 Thread Marie Angèle Ouattara
-- 
I need your cooperation in a transaction that will benefit you,
details will disclose to you once i receive your reply.

Regards
Mrs. Marie Angèle O.


Re: [PATCH] pathspec: die on empty strings as pathspec

2017-06-22 Thread Emily Xie
I ran the tests and none of them failed. Technically, the state of the
index would indeed be different with these new changes, but this
shouldn't be an issue. In the current version, there's one only item
added to the index that ends up getting used in subsequent tests. That
is, foo1, which is tested in: 'git add --chmod=[+-]x stages
correctly'.

With the current code, foo1 gets added to the index with a mode of
100644. When the 'git add --chmod=[+-]x stages correctly' test
executes, it removes foo1 from the working directory, creates a new
foo1, and runs 'git add --chmod=+x'. It then checks if foo1 is in the
index with a file mode of 100755.

Given how things are set up, the tests pass either way. But I think
it'd actually be nicer with the new changes in this patch, as foo1
would not already be in the index by the time  'git add --chmod=[+-]x
stages correctly' executes.

On Sat, Jun 10, 2017 at 1:54 AM, Junio C Hamano  wrote:
>
> Emily Xie  writes:
>
> > diff --git a/t/t3700-add.sh b/t/t3700-add.sh
> > index f3a4b4a..40a0d2b 100755
> > --- a/t/t3700-add.sh
> > +++ b/t/t3700-add.sh
> > @@ -331,9 +331,8 @@ test_expect_success 'git add --dry-run --ignore-missing 
> > of non-existing file out
> >   test_i18ncmp expect.err actual.err
> >  '
> >
> > -test_expect_success 'git add empty string should invoke warning' '
> > - git add "" 2>output &&
> > - test_i18ngrep "warning: empty strings" output
> > +test_expect_success 'git add empty string should fail' '
> > + test_must_fail git add ""
> >  '
>
> Doesn't this affect the tests that follow this step?  We used to
> warn but went ahead and added all of them anyway, but now we fail
> without adding anything, which means that the state of the index
> before and after this patch would be different.
>
> >  test_expect_success 'git add --chmod=[+-]x stages correctly' '


Re: Ambiguity warning printed twice ?

2017-06-22 Thread Kaartic Sivaraam
On Thu, 2017-06-22 at 10:23 -0400, Jeff King wrote:
> It's not unreasonable for a complex command like git-status to need
> to
> resolve HEAD multiple times. You can see how we get to each case by
> running:
> 
>   gdb /path/to/git-status
> 
> and then doing:
> 
>   break warning
>   run
> 
> Each time it breaks, you can run "bt" to get a backtrace, and then
> "c"
> to continue".
> 
Thanks for the guidance about debugging. It was very helpful and was a
reminder that there's a useful tool called the debugger which I have
failed to consider!

> It looks like the first one is when cmd_status() resolves the HEAD to
> see if we are on an unborn branch, and the second is done by
> wt_status
> to diff the index against HEAD. It would probably be possible to feed
> the results of the first resolution to wt-status. But that would just
> help this case, and I wouldn't be surprised if there are other
> commands
> that do multiple resolutions (or even scripts which call multiple
> commands).
> 
> Since warning should be rare, we could keep track of which names
> we've
> warned about and suppress multiple warnings. That would help
> single-process cases like this, but scripts which call multiple Git
> commands would still show multiple warnings. I don't know if that's
> worth doing or not.
I guess it's NOT. If I understood things correctly, this issue occurs
only if the ref 'HEAD' is ambiguous. Trying out a very approximate
calculation shows the probability to be some where around "1 in the
trillions or quadrillions". Further, it drops down to 0 when common
sense is kicked-in (most people would know git uses HEAD internally).
Thus this isn't worth enough to deserve a  fix except if there are
other things that I'm missing that could spoil my calculation.

Anyways, hoping some "big" would happen, that avoids these kind of
issues (inherently). :)

-- 
Regards,
Kaartic Sivaraam 


[PATCHv2 01/25] diff.c: readability fix

2017-06-22 Thread Stefan Beller
We already have dereferenced 'p->two' into a local variable 'two'.
Use that.

Signed-off-by: Stefan Beller 
Signed-off-by: Junio C Hamano 
---
 diff.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/diff.c b/diff.c
index 74283d9001..3f5bf8b5a4 100644
--- a/diff.c
+++ b/diff.c
@@ -3283,8 +3283,8 @@ static void run_diff(struct diff_filepair *p, struct 
diff_options *o)
const char *other;
const char *attr_path;
 
-   name  = p->one->path;
-   other = (strcmp(name, p->two->path) ? p->two->path : NULL);
+   name  = one->path;
+   other = (strcmp(name, two->path) ? two->path : NULL);
attr_path = name;
if (o->prefix_length)
strip_prefix(o->prefix_length, , );
-- 
2.12.2.575.gb14f27f917



[PATCHv2 18/25] diff.c: convert word diffing to use emit_diff_symbol

2017-06-22 Thread Stefan Beller
The word diffing is not line oriented and would need some serious
effort to be transformed into a line oriented approach, so
just go with a symbol DIFF_SYMBOL_WORD_DIFF that is a partial line.

Signed-off-by: Stefan Beller 
---
 diff.c | 79 ++
 1 file changed, 46 insertions(+), 33 deletions(-)

diff --git a/diff.c b/diff.c
index 2108c191a3..60630d2c3f 100644
--- a/diff.c
+++ b/diff.c
@@ -569,6 +569,7 @@ enum diff_symbol {
DIFF_SYMBOL_STATS_SUMMARY_ABBREV,
DIFF_SYMBOL_STATS_SUMMARY_INSERTS_DELETES,
DIFF_SYMBOL_STATS_LINE,
+   DIFF_SYMBOL_WORD_DIFF,
DIFF_SYMBOL_SUBMODULE_ADD,
DIFF_SYMBOL_SUBMODULE_DEL,
DIFF_SYMBOL_SUBMODULE_UNTRACKED,
@@ -761,6 +762,9 @@ static void emit_diff_symbol(struct diff_options *o, enum 
diff_symbol s,
case DIFF_SYMBOL_STATS_SUMMARY_ABBREV:
emit_line(o, "", "", " ...\n", strlen(" ...\n"));
break;
+   case DIFF_SYMBOL_WORD_DIFF:
+   fprintf(o->file, "%.*s", len, line);
+   break;
default:
die("BUG: unknown diff symbol");
}
@@ -1091,37 +1095,49 @@ struct diff_words_data {
struct diff_words_style *style;
 };
 
-static int fn_out_diff_words_write_helper(FILE *fp,
+static int fn_out_diff_words_write_helper(struct diff_options *o,
  struct diff_words_style_elem *st_el,
  const char *newline,
- size_t count, const char *buf,
- const char *line_prefix)
+ size_t count, const char *buf)
 {
int print = 0;
+   struct strbuf sb = STRBUF_INIT;
 
while (count) {
char *p = memchr(buf, '\n', count);
if (print)
-   fputs(line_prefix, fp);
+   strbuf_addstr(, diff_line_prefix(o));
+
if (p != buf) {
-   if (st_el->color && fputs(st_el->color, fp) < 0)
-   return -1;
-   if (fputs(st_el->prefix, fp) < 0 ||
-   fwrite(buf, p ? p - buf : count, 1, fp) != 1 ||
-   fputs(st_el->suffix, fp) < 0)
-   return -1;
-   if (st_el->color && *st_el->color
-   && fputs(GIT_COLOR_RESET, fp) < 0)
-   return -1;
+   const char *reset = st_el->color && *st_el->color ?
+   GIT_COLOR_RESET : NULL;
+   if (st_el->color && *st_el->color)
+   strbuf_addstr(, st_el->color);
+   strbuf_addstr(, st_el->prefix);
+   strbuf_add(, buf, p ? p - buf : count);
+   strbuf_addstr(, st_el->suffix);
+   if (reset)
+   strbuf_addstr(, reset);
}
if (!p)
-   return 0;
-   if (fputs(newline, fp) < 0)
-   return -1;
+   goto out;
+
+   strbuf_addstr(, newline);
count -= p + 1 - buf;
buf = p + 1;
print = 1;
+   if (count) {
+   emit_diff_symbol(o, DIFF_SYMBOL_WORD_DIFF,
+sb.buf, sb.len, 0);
+   strbuf_reset();
+   }
}
+
+out:
+   if (sb.len)
+   emit_diff_symbol(o, DIFF_SYMBOL_WORD_DIFF,
+sb.buf, sb.len, 0);
+   strbuf_release();
return 0;
 }
 
@@ -1203,24 +1219,20 @@ static void fn_out_diff_words_aux(void *priv, char 
*line, unsigned long len)
fputs(line_prefix, diff_words->opt->file);
}
if (diff_words->current_plus != plus_begin) {
-   fn_out_diff_words_write_helper(diff_words->opt->file,
+   fn_out_diff_words_write_helper(diff_words->opt,
>ctx, style->newline,
plus_begin - diff_words->current_plus,
-   diff_words->current_plus, line_prefix);
-   if (*(plus_begin - 1) == '\n')
-   fputs(line_prefix, diff_words->opt->file);
+   diff_words->current_plus);
}
if (minus_begin != minus_end) {
-   fn_out_diff_words_write_helper(diff_words->opt->file,
+   fn_out_diff_words_write_helper(diff_words->opt,
>old, style->newline,
-   minus_end - minus_begin, minus_begin,
-   line_prefix);
+   minus_end - 

[PATCHv2 23/25] diff.c: color moved lines differently, plain mode

2017-06-22 Thread Stefan Beller
Add the 'plain' mode for move detection of code.
This omits the checking for adjacent blocks, so it is not as useful.
If you have a lot of the same blocks moved in the same patch, the 'Zebra'
would end up slow as it is O(n^2) (n is number of same blocks).
So this may be useful there and is generally easy to add

Signed-off-by: Stefan Beller 
Signed-off-by: Junio C Hamano 
---
 diff.c |  5 +
 diff.h |  1 +
 t/t4015-diff-whitespace.sh | 51 +-
 3 files changed, 56 insertions(+), 1 deletion(-)

diff --git a/diff.c b/diff.c
index e6f547804f..0eb744514f 100644
--- a/diff.c
+++ b/diff.c
@@ -246,6 +246,8 @@ static int parse_color_moved(const char *arg)
 {
if (!strcmp(arg, "no"))
return COLOR_MOVED_NO;
+   else if (!strcmp(arg, "plain"))
+   return COLOR_MOVED_PLAIN;
else if (!strcmp(arg, "zebra"))
return COLOR_MOVED_ZEBRA;
else
@@ -831,6 +833,9 @@ static void mark_color_as_moved(struct diff_options *o,
 
l->flags |= DIFF_SYMBOL_MOVED_LINE;
 
+   if (o->color_moved == COLOR_MOVED_PLAIN)
+   continue;
+
/* Check any potential block runs, advance each or nullify */
for (i = 0; i < pmb_nr; i++) {
struct moved_entry *p = pmb[i];
diff --git a/diff.h b/diff.h
index 7726ad255c..1aae8738ca 100644
--- a/diff.h
+++ b/diff.h
@@ -190,6 +190,7 @@ struct diff_options {
struct emitted_diff_symbols *emitted_symbols;
enum {
COLOR_MOVED_NO = 0,
+   COLOR_MOVED_PLAIN = 1,
COLOR_MOVED_ZEBRA = 2,
} color_moved;
 };
diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh
index 4a03766f1f..1ca16435d6 100755
--- a/t/t4015-diff-whitespace.sh
+++ b/t/t4015-diff-whitespace.sh
@@ -986,7 +986,7 @@ test_expect_success 'detect moved code, complete file' '
git mv test.c main.c &&
test_config color.diff.oldMoved "normal red" &&
test_config color.diff.newMoved "normal green" &&
-   git diff HEAD --color-moved --no-renames | test_decode_color >actual &&
+   git diff HEAD --color-moved=zebra --no-renames | test_decode_color 
>actual &&
cat >expected <<-\EOF &&
diff --git a/main.c b/main.c
new file mode 100644
@@ -1130,6 +1130,55 @@ test_expect_success 'detect malicious moved code, inside 
file' '
test_cmp expected actual
 '
 
+test_expect_success 'plain moved code, inside file' '
+   test_config color.diff.oldMoved "normal red" &&
+   test_config color.diff.newMoved "normal green" &&
+   test_config color.diff.oldMovedAlternative "blue" &&
+   test_config color.diff.newMovedAlternative "yellow" &&
+   # needs previous test as setup
+   git diff HEAD --no-renames --color-moved=plain| test_decode_color 
>actual &&
+   cat <<-\EOF >expected &&
+   diff --git a/main.c b/main.c
+   index 27a619c..7cf9336 100644
+   --- a/main.c
+   +++ b/main.c
+   @@ -5,13 +5,6 @@ printf("Hello ");
+printf("World\n");
+}
+
+   -int secure_foo(struct user *u)
+   -{
+   -if (!u->is_allowed_foo)
+   -return;
+   -foo(u);
+   -}
+   -
+int main()
+{
+foo();
+   diff --git a/test.c b/test.c
+   index 1dc1d85..2bedec9 100644
+   --- a/test.c
+   +++ b/test.c
+   @@ -4,6 +4,13 @@ int bar()
+printf("Hello World, but different\n");
+}
+
+   +int secure_foo(struct user *u)
+   +{
+   +foo(u);
+   +if (!u->is_allowed_foo)
+   +return;
+   +}
+   +
+int another_function()
+{
+bar();
+   EOF
+
+   test_cmp expected actual
+'
+
 test_expect_success 'no effect from --color-moved with --word-diff' '
cat <<-\EOF >text.txt &&
Lorem Ipsum is simply dummy text of the printing and typesetting 
industry.
-- 
2.12.2.575.gb14f27f917



[PATCHv2 20/25] diff.c: emit_diff_symbol learns about DIFF_SYMBOL_SUMMARY

2017-06-22 Thread Stefan Beller
Signed-off-by: Stefan Beller 
---
 diff.c | 71 ++
 1 file changed, 41 insertions(+), 30 deletions(-)

diff --git a/diff.c b/diff.c
index 0ed86ba984..f9fb94b0d3 100644
--- a/diff.c
+++ b/diff.c
@@ -571,6 +571,7 @@ enum diff_symbol {
DIFF_SYMBOL_STATS_LINE,
DIFF_SYMBOL_WORD_DIFF,
DIFF_SYMBOL_STAT_SEP,
+   DIFF_SYMBOL_SUMMARY,
DIFF_SYMBOL_SUBMODULE_ADD,
DIFF_SYMBOL_SUBMODULE_DEL,
DIFF_SYMBOL_SUBMODULE_UNTRACKED,
@@ -647,6 +648,7 @@ static void emit_diff_symbol(struct diff_options *o, enum 
diff_symbol s,
case DIFF_SYMBOL_SUBMODULE_ERROR:
case DIFF_SYMBOL_SUBMODULE_PIPETHROUGH:
case DIFF_SYMBOL_STATS_SUMMARY_INSERTS_DELETES:
+   case DIFF_SYMBOL_SUMMARY:
case DIFF_SYMBOL_STATS_LINE:
case DIFF_SYMBOL_BINARY_DIFF_BODY:
case DIFF_SYMBOL_CONTEXT_FRAGINFO:
@@ -4720,67 +4722,76 @@ static void flush_one_pair(struct diff_filepair *p, 
struct diff_options *opt)
}
 }
 
-static void show_file_mode_name(FILE *file, const char *newdelete, struct 
diff_filespec *fs)
+static void show_file_mode_name(struct diff_options *opt, const char 
*newdelete, struct diff_filespec *fs)
 {
+   struct strbuf sb = STRBUF_INIT;
if (fs->mode)
-   fprintf(file, " %s mode %06o ", newdelete, fs->mode);
+   strbuf_addf(, " %s mode %06o ", newdelete, fs->mode);
else
-   fprintf(file, " %s ", newdelete);
-   write_name_quoted(fs->path, file, '\n');
-}
+   strbuf_addf(, " %s ", newdelete);
 
+   quote_c_style(fs->path, , NULL, 0);
+   strbuf_addch(, '\n');
+   emit_diff_symbol(opt, DIFF_SYMBOL_SUMMARY,
+sb.buf, sb.len, 0);
+   strbuf_release();
+}
 
-static void show_mode_change(FILE *file, struct diff_filepair *p, int 
show_name,
-   const char *line_prefix)
+static void show_mode_change(struct diff_options *opt, struct diff_filepair *p,
+   int show_name)
 {
if (p->one->mode && p->two->mode && p->one->mode != p->two->mode) {
-   fprintf(file, "%s mode change %06o => %06o%c", line_prefix, 
p->one->mode,
-   p->two->mode, show_name ? ' ' : '\n');
+   struct strbuf sb = STRBUF_INIT;
+   strbuf_addf(, " mode change %06o => %06o",
+   p->one->mode, p->two->mode);
if (show_name) {
-   write_name_quoted(p->two->path, file, '\n');
+   strbuf_addch(, ' ');
+   quote_c_style(p->two->path, , NULL, 0);
}
+   emit_diff_symbol(opt, DIFF_SYMBOL_SUMMARY,
+sb.buf, sb.len, 0);
+   strbuf_release();
}
 }
 
-static void show_rename_copy(FILE *file, const char *renamecopy, struct 
diff_filepair *p,
-   const char *line_prefix)
+static void show_rename_copy(struct diff_options *opt, const char *renamecopy,
+   struct diff_filepair *p)
 {
+   struct strbuf sb = STRBUF_INIT;
char *names = pprint_rename(p->one->path, p->two->path);
-
-   fprintf(file, " %s %s (%d%%)\n", renamecopy, names, 
similarity_index(p));
+   strbuf_addf(, " %s %s (%d%%)\n",
+   renamecopy, names, similarity_index(p));
free(names);
-   show_mode_change(file, p, 0, line_prefix);
+   emit_diff_symbol(opt, DIFF_SYMBOL_SUMMARY,
+sb.buf, sb.len, 0);
+   show_mode_change(opt, p, 0);
 }
 
 static void diff_summary(struct diff_options *opt, struct diff_filepair *p)
 {
-   FILE *file = opt->file;
-   const char *line_prefix = diff_line_prefix(opt);
-
switch(p->status) {
case DIFF_STATUS_DELETED:
-   fputs(line_prefix, file);
-   show_file_mode_name(file, "delete", p->one);
+   show_file_mode_name(opt, "delete", p->one);
break;
case DIFF_STATUS_ADDED:
-   fputs(line_prefix, file);
-   show_file_mode_name(file, "create", p->two);
+   show_file_mode_name(opt, "create", p->two);
break;
case DIFF_STATUS_COPIED:
-   fputs(line_prefix, file);
-   show_rename_copy(file, "copy", p, line_prefix);
+   show_rename_copy(opt, "copy", p);
break;
case DIFF_STATUS_RENAMED:
-   fputs(line_prefix, file);
-   show_rename_copy(file, "rename", p, line_prefix);
+   show_rename_copy(opt, "rename", p);
break;
default:
if (p->score) {
-   fprintf(file, "%s rewrite ", line_prefix);
-   write_name_quoted(p->two->path, file, ' ');
-   fprintf(file, "(%d%%)\n", similarity_index(p));
+   struct 

[PATCHv2 17/25] diff.c: convert show_stats to use emit_diff_symbol

2017-06-22 Thread Stefan Beller
We call print_stat_summary from builtin/apply, so we still
need the version with a file pointer, so introduce
print_stat_summary_0 that uses emit_string machinery and
keep print_stat_summary with the same arguments around.

Signed-off-by: Stefan Beller 
---
 diff.c | 114 +
 diff.h |   4 +--
 2 files changed, 74 insertions(+), 44 deletions(-)

diff --git a/diff.c b/diff.c
index 7c92675f6f..2108c191a3 100644
--- a/diff.c
+++ b/diff.c
@@ -565,6 +565,10 @@ enum diff_symbol {
DIFF_SYMBOL_BINARY_DIFF_HEADER_LITERAL,
DIFF_SYMBOL_BINARY_DIFF_BODY,
DIFF_SYMBOL_BINARY_DIFF_FOOTER,
+   DIFF_SYMBOL_STATS_SUMMARY_NO_FILES,
+   DIFF_SYMBOL_STATS_SUMMARY_ABBREV,
+   DIFF_SYMBOL_STATS_SUMMARY_INSERTS_DELETES,
+   DIFF_SYMBOL_STATS_LINE,
DIFF_SYMBOL_SUBMODULE_ADD,
DIFF_SYMBOL_SUBMODULE_DEL,
DIFF_SYMBOL_SUBMODULE_UNTRACKED,
@@ -628,6 +632,7 @@ static void emit_diff_symbol(struct diff_options *o, enum 
diff_symbol s,
 {
static const char *nneof = " No newline at end of file\n";
const char *context, *reset, *set, *meta, *fraginfo;
+   struct strbuf sb = STRBUF_INIT;
switch (s) {
case DIFF_SYMBOL_NO_LF_EOF:
context = diff_get_color_opt(o, DIFF_CONTEXT);
@@ -639,6 +644,8 @@ static void emit_diff_symbol(struct diff_options *o, enum 
diff_symbol s,
case DIFF_SYMBOL_SUBMODULE_HEADER:
case DIFF_SYMBOL_SUBMODULE_ERROR:
case DIFF_SYMBOL_SUBMODULE_PIPETHROUGH:
+   case DIFF_SYMBOL_STATS_SUMMARY_INSERTS_DELETES:
+   case DIFF_SYMBOL_STATS_LINE:
case DIFF_SYMBOL_BINARY_DIFF_BODY:
case DIFF_SYMBOL_CONTEXT_FRAGINFO:
emit_line(o, "", "", line, len);
@@ -747,9 +754,17 @@ static void emit_diff_symbol(struct diff_options *o, enum 
diff_symbol s,
fprintf(o->file, "%sSubmodule %s contains modified content\n",
diff_line_prefix(o), line);
break;
+   case DIFF_SYMBOL_STATS_SUMMARY_NO_FILES:
+   emit_line(o, "", "", " 0 files changed\n",
+ strlen(" 0 files changed\n"));
+   break;
+   case DIFF_SYMBOL_STATS_SUMMARY_ABBREV:
+   emit_line(o, "", "", " ...\n", strlen(" ...\n"));
+   break;
default:
die("BUG: unknown diff symbol");
}
+   strbuf_release();
 }
 
 void diff_emit_submodule_del(struct diff_options *o, const char *line)
@@ -1705,20 +1720,14 @@ static int scale_linear(int it, int width, int 
max_change)
return 1 + (it * (width - 1) / max_change);
 }
 
-static void show_name(FILE *file,
- const char *prefix, const char *name, int len)
-{
-   fprintf(file, " %s%-*s |", prefix, len, name);
-}
-
-static void show_graph(FILE *file, char ch, int cnt, const char *set, const 
char *reset)
+static void show_graph(struct strbuf *out, char ch, int cnt,
+  const char *set, const char *reset)
 {
if (cnt <= 0)
return;
-   fprintf(file, "%s", set);
-   while (cnt--)
-   putc(ch, file);
-   fprintf(file, "%s", reset);
+   strbuf_addstr(out, set);
+   strbuf_addchars(out, ch, cnt);
+   strbuf_addstr(out, reset);
 }
 
 static void fill_print_name(struct diffstat_file *file)
@@ -1742,14 +1751,16 @@ static void fill_print_name(struct diffstat_file *file)
file->print_name = pname;
 }
 
-int print_stat_summary(FILE *fp, int files, int insertions, int deletions)
+static void print_stat_summary_inserts_deletes(struct diff_options *options,
+   int files, int insertions, int deletions)
 {
struct strbuf sb = STRBUF_INIT;
-   int ret;
 
if (!files) {
assert(insertions == 0 && deletions == 0);
-   return fprintf(fp, "%s\n", " 0 files changed");
+   emit_diff_symbol(options, DIFF_SYMBOL_STATS_SUMMARY_NO_FILES,
+NULL, 0, 0);
+   return;
}
 
strbuf_addf(,
@@ -1776,9 +1787,19 @@ int print_stat_summary(FILE *fp, int files, int 
insertions, int deletions)
deletions);
}
strbuf_addch(, '\n');
-   ret = fputs(sb.buf, fp);
+   emit_diff_symbol(options, DIFF_SYMBOL_STATS_SUMMARY_INSERTS_DELETES,
+sb.buf, sb.len, 0);
strbuf_release();
-   return ret;
+}
+
+void print_stat_summary(FILE *fp, int files,
+   int insertions, int deletions)
+{
+   struct diff_options o;
+   memset(, 0, sizeof(o));
+   o.file = fp;
+
+   print_stat_summary_inserts_deletes(, files, insertions, deletions);
 }
 
 static void show_stats(struct diffstat_t *data, struct diff_options *options)
@@ -1788,13 +1809,13 @@ static void show_stats(struct diffstat_t *data, struct 
diff_options *options)
int 

[PATCHv2 15/25] submodule.c: migrate diff output to use emit_diff_symbol

2017-06-22 Thread Stefan Beller
As the submodule process is no longer attached to the same file pointer
'o->file' as the superprojects process, there is a different result in
color.c::check_auto_color. That is why we need to pass coloring explicitly,
such that the submodule coloring decision will be made by the child process
processing the submodule. Only DIFF_SYMBOL_SUBMODULE_PIPETHROUGH contains
color, the other symbols are for embedding the submodule output into the
superprojects output.

Remove the colors from the function signatures, as all the coloring
decisions will be made either inside the child process or the final
emit_diff_symbol, but not in the functions driving the submodule diff.

Signed-off-by: Stefan Beller 
---
 diff.c  | 83 ++-
 diff.h  |  9 +++
 submodule.c | 85 +++--
 submodule.h | 13 +++---
 4 files changed, 122 insertions(+), 68 deletions(-)

diff --git a/diff.c b/diff.c
index 0314d57647..bc78a216ab 100644
--- a/diff.c
+++ b/diff.c
@@ -560,6 +560,13 @@ static void emit_line(struct diff_options *o, const char 
*set, const char *reset
 }
 
 enum diff_symbol {
+   DIFF_SYMBOL_SUBMODULE_ADD,
+   DIFF_SYMBOL_SUBMODULE_DEL,
+   DIFF_SYMBOL_SUBMODULE_UNTRACKED,
+   DIFF_SYMBOL_SUBMODULE_MODIFIED,
+   DIFF_SYMBOL_SUBMODULE_HEADER,
+   DIFF_SYMBOL_SUBMODULE_ERROR,
+   DIFF_SYMBOL_SUBMODULE_PIPETHROUGH,
DIFF_SYMBOL_REWRITE_DIFF,
DIFF_SYMBOL_BINARY_FILES,
DIFF_SYMBOL_HEADER,
@@ -624,6 +631,9 @@ static void emit_diff_symbol(struct diff_options *o, enum 
diff_symbol s,
emit_line_0(o, context, reset, '\\',
nneof, strlen(nneof));
break;
+   case DIFF_SYMBOL_SUBMODULE_HEADER:
+   case DIFF_SYMBOL_SUBMODULE_ERROR:
+   case DIFF_SYMBOL_SUBMODULE_PIPETHROUGH:
case DIFF_SYMBOL_CONTEXT_FRAGINFO:
emit_line(o, "", "", line, len);
break;
@@ -700,11 +710,68 @@ static void emit_diff_symbol(struct diff_options *o, enum 
diff_symbol s,
reset = diff_get_color_opt(o, DIFF_RESET);
emit_line(o, fraginfo, reset, line, len);
break;
+   case DIFF_SYMBOL_SUBMODULE_ADD:
+   set = diff_get_color_opt(o, DIFF_FILE_NEW);
+   reset = diff_get_color_opt(o, DIFF_RESET);
+   emit_line(o, set, reset, line, len);
+   break;
+   case DIFF_SYMBOL_SUBMODULE_DEL:
+   set = diff_get_color_opt(o, DIFF_FILE_OLD);
+   reset = diff_get_color_opt(o, DIFF_RESET);
+   emit_line(o, set, reset, line, len);
+   break;
+   case DIFF_SYMBOL_SUBMODULE_UNTRACKED:
+   fprintf(o->file, "%sSubmodule %s contains untracked content\n",
+   diff_line_prefix(o), line);
+   break;
+   case DIFF_SYMBOL_SUBMODULE_MODIFIED:
+   fprintf(o->file, "%sSubmodule %s contains modified content\n",
+   diff_line_prefix(o), line);
+   break;
default:
die("BUG: unknown diff symbol");
}
 }
 
+void diff_emit_submodule_del(struct diff_options *o, const char *line)
+{
+   emit_diff_symbol(o, DIFF_SYMBOL_SUBMODULE_DEL, line, strlen(line), 0);
+}
+
+void diff_emit_submodule_add(struct diff_options *o, const char *line)
+{
+   emit_diff_symbol(o, DIFF_SYMBOL_SUBMODULE_ADD, line, strlen(line), 0);
+}
+
+void diff_emit_submodule_untracked(struct diff_options *o, const char *path)
+{
+   emit_diff_symbol(o, DIFF_SYMBOL_SUBMODULE_UNTRACKED,
+path, strlen(path), 0);
+}
+
+void diff_emit_submodule_modified(struct diff_options *o, const char *path)
+{
+   emit_diff_symbol(o, DIFF_SYMBOL_SUBMODULE_MODIFIED,
+path, strlen(path), 0);
+}
+
+void diff_emit_submodule_header(struct diff_options *o, const char *header)
+{
+   emit_diff_symbol(o, DIFF_SYMBOL_SUBMODULE_HEADER,
+header, strlen(header), 0);
+}
+
+void diff_emit_submodule_error(struct diff_options *o, const char *err)
+{
+   emit_diff_symbol(o, DIFF_SYMBOL_SUBMODULE_ERROR, err, strlen(err), 0);
+}
+
+void diff_emit_submodule_pipethrough(struct diff_options *o,
+const char *line, int len)
+{
+   emit_diff_symbol(o, DIFF_SYMBOL_SUBMODULE_PIPETHROUGH, line, len, 0);
+}
+
 static int new_blank_line_at_eof(struct emit_callback *ecbdata, const char 
*line, int len)
 {
if (!((ecbdata->ws_rule & WS_BLANK_AT_EOF) &&
@@ -2467,24 +2534,16 @@ static void builtin_diff(const char *name_a,
if (o->submodule_format == DIFF_SUBMODULE_LOG &&
(!one->mode || S_ISGITLINK(one->mode)) &&
(!two->mode || S_ISGITLINK(two->mode))) {
-   const char *del = diff_get_color_opt(o, DIFF_FILE_OLD);
-   const char *add 

[PATCHv2 16/25] diff.c: convert emit_binary_diff_body to use emit_diff_symbol

2017-06-22 Thread Stefan Beller
Signed-off-by: Stefan Beller 
---
 diff.c | 63 ++-
 1 file changed, 46 insertions(+), 17 deletions(-)

diff --git a/diff.c b/diff.c
index bc78a216ab..7c92675f6f 100644
--- a/diff.c
+++ b/diff.c
@@ -560,6 +560,11 @@ static void emit_line(struct diff_options *o, const char 
*set, const char *reset
 }
 
 enum diff_symbol {
+   DIFF_SYMBOL_BINARY_DIFF_HEADER,
+   DIFF_SYMBOL_BINARY_DIFF_HEADER_DELTA,
+   DIFF_SYMBOL_BINARY_DIFF_HEADER_LITERAL,
+   DIFF_SYMBOL_BINARY_DIFF_BODY,
+   DIFF_SYMBOL_BINARY_DIFF_FOOTER,
DIFF_SYMBOL_SUBMODULE_ADD,
DIFF_SYMBOL_SUBMODULE_DEL,
DIFF_SYMBOL_SUBMODULE_UNTRACKED,
@@ -634,6 +639,7 @@ static void emit_diff_symbol(struct diff_options *o, enum 
diff_symbol s,
case DIFF_SYMBOL_SUBMODULE_HEADER:
case DIFF_SYMBOL_SUBMODULE_ERROR:
case DIFF_SYMBOL_SUBMODULE_PIPETHROUGH:
+   case DIFF_SYMBOL_BINARY_DIFF_BODY:
case DIFF_SYMBOL_CONTEXT_FRAGINFO:
emit_line(o, "", "", line, len);
break;
@@ -705,6 +711,19 @@ static void emit_diff_symbol(struct diff_options *o, enum 
diff_symbol s,
case DIFF_SYMBOL_HEADER:
fprintf(o->file, "%s", line);
break;
+   case DIFF_SYMBOL_BINARY_DIFF_HEADER:
+   fprintf(o->file, "%sGIT binary patch\n", diff_line_prefix(o));
+   break;
+   case DIFF_SYMBOL_BINARY_DIFF_HEADER_DELTA:
+   fprintf(o->file, "%sdelta %s\n", diff_line_prefix(o), line);
+   break;
+   case DIFF_SYMBOL_BINARY_DIFF_HEADER_LITERAL:
+   fprintf(o->file, "%sliteral %s\n", diff_line_prefix(o), line);
+   break;
+   case DIFF_SYMBOL_BINARY_DIFF_FOOTER:
+   fputs(diff_line_prefix(o), o->file);
+   fputc('\n', o->file);
+   break;
case DIFF_SYMBOL_REWRITE_DIFF:
fraginfo = diff_get_color(o->use_color, DIFF_FRAGINFO);
reset = diff_get_color_opt(o, DIFF_RESET);
@@ -2390,8 +2409,8 @@ static unsigned char *deflate_it(char *data,
return deflated;
 }
 
-static void emit_binary_diff_body(FILE *file, mmfile_t *one, mmfile_t *two,
- const char *prefix)
+static void emit_binary_diff_body(struct diff_options *o,
+ mmfile_t *one, mmfile_t *two)
 {
void *cp;
void *delta;
@@ -2420,13 +2439,18 @@ static void emit_binary_diff_body(FILE *file, mmfile_t 
*one, mmfile_t *two,
}
 
if (delta && delta_size < deflate_size) {
-   fprintf(file, "%sdelta %lu\n", prefix, orig_size);
+   char *s = xstrfmt("%lu", orig_size);
+   emit_diff_symbol(o, DIFF_SYMBOL_BINARY_DIFF_HEADER_DELTA,
+s, strlen(s), 0);
+   free(s);
free(deflated);
data = delta;
data_size = delta_size;
-   }
-   else {
-   fprintf(file, "%sliteral %lu\n", prefix, two->size);
+   } else {
+   char *s = xstrfmt("%lu", two->size);
+   emit_diff_symbol(o, DIFF_SYMBOL_BINARY_DIFF_HEADER_LITERAL,
+s, strlen(s), 0);
+   free(s);
free(delta);
data = deflated;
data_size = deflate_size;
@@ -2435,8 +2459,9 @@ static void emit_binary_diff_body(FILE *file, mmfile_t 
*one, mmfile_t *two,
/* emit data encoded in base85 */
cp = data;
while (data_size) {
+   int len;
int bytes = (52 < data_size) ? 52 : data_size;
-   char line[70];
+   char line[71];
data_size -= bytes;
if (bytes <= 26)
line[0] = bytes + 'A' - 1;
@@ -2444,20 +2469,24 @@ static void emit_binary_diff_body(FILE *file, mmfile_t 
*one, mmfile_t *two,
line[0] = bytes - 26 + 'a' - 1;
encode_85(line + 1, cp, bytes);
cp = (char *) cp + bytes;
-   fprintf(file, "%s", prefix);
-   fputs(line, file);
-   fputc('\n', file);
+
+   len = strlen(line);
+   line[len++] = '\n';
+   line[len] = '\0';
+
+   emit_diff_symbol(o, DIFF_SYMBOL_BINARY_DIFF_BODY,
+line, len, 0);
}
-   fprintf(file, "%s\n", prefix);
+   emit_diff_symbol(o, DIFF_SYMBOL_BINARY_DIFF_FOOTER, NULL, 0, 0);
free(data);
 }
 
-static void emit_binary_diff(FILE *file, mmfile_t *one, mmfile_t *two,
-const char *prefix)
+static void emit_binary_diff(struct diff_options *o,
+mmfile_t *one, mmfile_t *two)
 {
-   fprintf(file, "%sGIT binary patch\n", prefix);
-   emit_binary_diff_body(file, one, two, prefix);
-   

[PATCHv2 24/25] diff.c: add dimming to moved line detection

2017-06-22 Thread Stefan Beller
Any lines inside a moved block of code are not interesting. Boundaries
of blocks are only interesting if they are next to another block of moved
code.

Signed-off-by: Stefan Beller 
---
 color.h|   2 +
 diff.c | 139 -
 diff.h |   9 ++-
 t/t4015-diff-whitespace.sh | 124 
 4 files changed, 259 insertions(+), 15 deletions(-)

diff --git a/color.h b/color.h
index 90627650fc..fd2b688dfb 100644
--- a/color.h
+++ b/color.h
@@ -42,6 +42,8 @@ struct strbuf;
 #define GIT_COLOR_BG_BLUE  "\033[44m"
 #define GIT_COLOR_BG_MAGENTA   "\033[45m"
 #define GIT_COLOR_BG_CYAN  "\033[46m"
+#define GIT_COLOR_FAINT"\033[2m"
+#define GIT_COLOR_FAINT_ITALIC "\033[2;3m"
 
 /* A special value meaning "no color selected" */
 #define GIT_COLOR_NIL "NIL"
diff --git a/diff.c b/diff.c
index 0eb744514f..82ace48c38 100644
--- a/diff.c
+++ b/diff.c
@@ -57,10 +57,14 @@ static char diff_colors[][COLOR_MAXLEN] = {
GIT_COLOR_YELLOW,   /* COMMIT */
GIT_COLOR_BG_RED,   /* WHITESPACE */
GIT_COLOR_NORMAL,   /* FUNCINFO */
-   GIT_COLOR_MAGENTA,  /* OLD_MOVED */
-   GIT_COLOR_BLUE, /* OLD_MOVED ALTERNATIVE */
-   GIT_COLOR_CYAN, /* NEW_MOVED */
-   GIT_COLOR_YELLOW,   /* NEW_MOVED ALTERNATIVE */
+   GIT_COLOR_BOLD_MAGENTA, /* OLD_MOVED */
+   GIT_COLOR_BOLD_BLUE,/* OLD_MOVED ALTERNATIVE */
+   GIT_COLOR_FAINT,/* OLD_MOVED_DIM */
+   GIT_COLOR_FAINT_ITALIC, /* OLD_MOVED_ALTERNATIVE_DIM */
+   GIT_COLOR_BOLD_CYAN,/* NEW_MOVED */
+   GIT_COLOR_BOLD_YELLOW,  /* NEW_MOVED ALTERNATIVE */
+   GIT_COLOR_FAINT,/* NEW_MOVED_DIM */
+   GIT_COLOR_FAINT_ITALIC, /* NEW_MOVED_ALTERNATIVE_DIM */
 };
 
 static NORETURN void die_want_option(const char *option_name)
@@ -90,10 +94,18 @@ static int parse_diff_color_slot(const char *var)
return DIFF_FILE_OLD_MOVED;
if (!strcasecmp(var, "oldmovedalternative"))
return DIFF_FILE_OLD_MOVED_ALT;
+   if (!strcasecmp(var, "oldmoveddimmed"))
+   return DIFF_FILE_OLD_MOVED_DIM;
+   if (!strcasecmp(var, "oldmovedalternativedimmed"))
+   return DIFF_FILE_OLD_MOVED_ALT_DIM;
if (!strcasecmp(var, "newmoved"))
return DIFF_FILE_NEW_MOVED;
if (!strcasecmp(var, "newmovedalternative"))
return DIFF_FILE_NEW_MOVED_ALT;
+   if (!strcasecmp(var, "newmoveddimmed"))
+   return DIFF_FILE_NEW_MOVED_DIM;
+   if (!strcasecmp(var, "newmovedalternativedimmed"))
+   return DIFF_FILE_NEW_MOVED_ALT_DIM;
return -1;
 }
 
@@ -250,6 +262,8 @@ static int parse_color_moved(const char *arg)
return COLOR_MOVED_PLAIN;
else if (!strcmp(arg, "zebra"))
return COLOR_MOVED_ZEBRA;
+   else if (!strcmp(arg, "dimmed_zebra"))
+   return COLOR_MOVED_ZEBRA_DIM;
else
return -1;
 }
@@ -637,6 +651,7 @@ enum diff_symbol {
 #define DIFF_SYMBOL_CONTENT_BLANK_LINE_EOF (1<<16)
 #define DIFF_SYMBOL_MOVED_LINE (1<<17)
 #define DIFF_SYMBOL_MOVED_LINE_ALT (1<<18)
+#define DIFF_SYMBOL_MOVED_LINE_UNINTERESTING   (1<<19)
 #define DIFF_SYMBOL_CONTENT_WS_MASK (WSEH_NEW | WSEH_OLD | WSEH_CONTEXT | 
WS_RULE_MASK)
 
 /*
@@ -890,6 +905,67 @@ static void mark_color_as_moved(struct diff_options *o,
free(pmb);
 }
 
+#define DIFF_SYMBOL_MOVED_LINE_ZEBRA_MASK \
+  (DIFF_SYMBOL_MOVED_LINE | DIFF_SYMBOL_MOVED_LINE_ALT)
+static void dim_moved_lines(struct diff_options *o)
+{
+   int n;
+   for (n = 0; n < o->emitted_symbols->nr; n++) {
+   struct emitted_diff_symbol *prev = (n != 0) ?
+   >emitted_symbols->buf[n - 1] : NULL;
+   struct emitted_diff_symbol *l = >emitted_symbols->buf[n];
+   struct emitted_diff_symbol *next =
+   (n < o->emitted_symbols->nr - 1) ?
+   >emitted_symbols->buf[n + 1] : NULL;
+
+   /* Not a plus or minus line? */
+   if (l->s != DIFF_SYMBOL_PLUS && l->s != DIFF_SYMBOL_MINUS)
+   continue;
+
+   /* Not a moved line? */
+   if (!(l->flags & DIFF_SYMBOL_MOVED_LINE))
+   continue;
+
+   /*
+* If prev or next are not a plus or minus line,
+* pretend they don't exist
+*/
+   if (prev && prev->s != DIFF_SYMBOL_PLUS &&
+   prev->s != DIFF_SYMBOL_MINUS)
+   prev = NULL;
+   if (next && next->s != DIFF_SYMBOL_PLUS &&
+   next->s != DIFF_SYMBOL_MINUS)
+   next = NULL;
+
+   /* Inside a block? */
+  

[PATCHv2 12/25] diff.c: emit_diff_symbol learns DIFF_SYMBOL_HEADER

2017-06-22 Thread Stefan Beller
The header is constructed lazily including line breaks, so just emit
the raw string as is.

Signed-off-by: Stefan Beller 
---
 diff.c | 28 
 1 file changed, 20 insertions(+), 8 deletions(-)

diff --git a/diff.c b/diff.c
index 761ee581ad..34314455b5 100644
--- a/diff.c
+++ b/diff.c
@@ -560,6 +560,7 @@ static void emit_line(struct diff_options *o, const char 
*set, const char *reset
 }
 
 enum diff_symbol {
+   DIFF_SYMBOL_HEADER,
DIFF_SYMBOL_FILEPAIR_PLUS,
DIFF_SYMBOL_FILEPAIR_MINUS,
DIFF_SYMBOL_WORDS_PORCELAIN,
@@ -688,6 +689,9 @@ static void emit_diff_symbol(struct diff_options *o, enum 
diff_symbol s,
line, reset,
strchr(line, ' ') ? "\t" : "");
break;
+   case DIFF_SYMBOL_HEADER:
+   fprintf(o->file, "%s", line);
+   break;
default:
die("BUG: unknown diff symbol");
}
@@ -1385,7 +1389,8 @@ static void fn_out_consume(void *priv, char *line, 
unsigned long len)
o->found_changes = 1;
 
if (ecbdata->header) {
-   fprintf(o->file, "%s", ecbdata->header->buf);
+   emit_diff_symbol(o, DIFF_SYMBOL_HEADER,
+ecbdata->header->buf, ecbdata->header->len, 0);
strbuf_reset(ecbdata->header);
ecbdata->header = NULL;
}
@@ -2519,7 +2524,8 @@ static void builtin_diff(const char *name_a,
if (complete_rewrite &&
(textconv_one || !diff_filespec_is_binary(one)) &&
(textconv_two || !diff_filespec_is_binary(two))) {
-   fprintf(o->file, "%s", header.buf);
+   emit_diff_symbol(o, DIFF_SYMBOL_HEADER,
+header.buf, header.len, 0);
strbuf_reset();
emit_rewrite_diff(name_a, name_b, one, two,
textconv_one, textconv_two, o);
@@ -2529,7 +2535,8 @@ static void builtin_diff(const char *name_a,
}
 
if (o->irreversible_delete && lbl[1][0] == '/') {
-   fprintf(o->file, "%s", header.buf);
+   emit_diff_symbol(o, DIFF_SYMBOL_HEADER, header.buf,
+header.len, 0);
strbuf_reset();
goto free_ab_and_return;
} else if (!DIFF_OPT_TST(o, TEXT) &&
@@ -2540,10 +2547,13 @@ static void builtin_diff(const char *name_a,
!DIFF_OPT_TST(o, BINARY)) {
if (!oidcmp(>oid, >oid)) {
if (must_show_header)
-   fprintf(o->file, "%s", header.buf);
+   emit_diff_symbol(o, DIFF_SYMBOL_HEADER,
+header.buf, header.len,
+0);
goto free_ab_and_return;
}
-   fprintf(o->file, "%s", header.buf);
+   emit_diff_symbol(o, DIFF_SYMBOL_HEADER,
+header.buf, header.len, 0);
fprintf(o->file, "%sBinary files %s and %s differ\n",
line_prefix, lbl[0], lbl[1]);
goto free_ab_and_return;
@@ -2554,10 +2564,11 @@ static void builtin_diff(const char *name_a,
if (mf1.size == mf2.size &&
!memcmp(mf1.ptr, mf2.ptr, mf1.size)) {
if (must_show_header)
-   fprintf(o->file, "%s", header.buf);
+   emit_diff_symbol(o, DIFF_SYMBOL_HEADER,
+header.buf, header.len, 0);
goto free_ab_and_return;
}
-   fprintf(o->file, "%s", header.buf);
+   emit_diff_symbol(o, DIFF_SYMBOL_HEADER, header.buf, header.len, 
0);
strbuf_reset();
if (DIFF_OPT_TST(o, BINARY))
emit_binary_diff(o->file, , , line_prefix);
@@ -2575,7 +2586,8 @@ static void builtin_diff(const char *name_a,
const struct userdiff_funcname *pe;
 
if (must_show_header) {
-   fprintf(o->file, "%s", header.buf);
+   emit_diff_symbol(o, DIFF_SYMBOL_HEADER,
+header.buf, header.len, 0);
strbuf_reset();
}
 
-- 
2.12.2.575.gb14f27f917



[PATCHv2 11/25] diff.c: emit_diff_symbol learns DIFF_SYMBOL_FILEPAIR_{PLUS, MINUS}

2017-06-22 Thread Stefan Beller
We have to use fprintf instead of emit_line, because we want to emit the
tab after the color. This is important for ancient versions of gnu patch
AFAICT, although we probably do not want to feed colored output to the
patch utility, such that it would not matter if the trailing tab is
colored. Keep the corner case as-is though.

Signed-off-by: Stefan Beller 
---
 diff.c | 51 ++-
 1 file changed, 30 insertions(+), 21 deletions(-)

diff --git a/diff.c b/diff.c
index f3d0918810..761ee581ad 100644
--- a/diff.c
+++ b/diff.c
@@ -560,6 +560,8 @@ static void emit_line(struct diff_options *o, const char 
*set, const char *reset
 }
 
 enum diff_symbol {
+   DIFF_SYMBOL_FILEPAIR_PLUS,
+   DIFF_SYMBOL_FILEPAIR_MINUS,
DIFF_SYMBOL_WORDS_PORCELAIN,
DIFF_SYMBOL_WORDS,
DIFF_SYMBOL_CONTEXT,
@@ -610,7 +612,7 @@ static void emit_diff_symbol(struct diff_options *o, enum 
diff_symbol s,
 const char *line, int len, unsigned flags)
 {
static const char *nneof = " No newline at end of file\n";
-   const char *context, *reset, *set;
+   const char *context, *reset, *set, *meta;
switch (s) {
case DIFF_SYMBOL_NO_LF_EOF:
context = diff_get_color_opt(o, DIFF_CONTEXT);
@@ -672,6 +674,20 @@ static void emit_diff_symbol(struct diff_options *o, enum 
diff_symbol s,
}
emit_line(o, context, reset, line, len);
break;
+   case DIFF_SYMBOL_FILEPAIR_PLUS:
+   meta = diff_get_color_opt(o, DIFF_METAINFO);
+   reset = diff_get_color_opt(o, DIFF_RESET);
+   fprintf(o->file, "%s%s+++ %s%s%s\n", diff_line_prefix(o), meta,
+   line, reset,
+   strchr(line, ' ') ? "\t" : "");
+   break;
+   case DIFF_SYMBOL_FILEPAIR_MINUS:
+   meta = diff_get_color_opt(o, DIFF_METAINFO);
+   reset = diff_get_color_opt(o, DIFF_RESET);
+   fprintf(o->file, "%s%s--- %s%s%s\n", diff_line_prefix(o), meta,
+   line, reset,
+   strchr(line, ' ') ? "\t" : "");
+   break;
default:
die("BUG: unknown diff symbol");
}
@@ -843,8 +859,6 @@ static void emit_rewrite_diff(const char *name_a,
  struct diff_options *o)
 {
int lc_a, lc_b;
-   const char *name_a_tab, *name_b_tab;
-   const char *metainfo = diff_get_color(o->use_color, DIFF_METAINFO);
const char *fraginfo = diff_get_color(o->use_color, DIFF_FRAGINFO);
const char *reset = diff_get_color(o->use_color, DIFF_RESET);
static struct strbuf a_name = STRBUF_INIT, b_name = STRBUF_INIT;
@@ -864,8 +878,6 @@ static void emit_rewrite_diff(const char *name_a,
 
name_a += (*name_a == '/');
name_b += (*name_b == '/');
-   name_a_tab = strchr(name_a, ' ') ? "\t" : "";
-   name_b_tab = strchr(name_b, ' ') ? "\t" : "";
 
strbuf_reset(_name);
strbuf_reset(_name);
@@ -892,11 +904,13 @@ static void emit_rewrite_diff(const char *name_a,
 
lc_a = count_lines(data_one, size_one);
lc_b = count_lines(data_two, size_two);
-   fprintf(o->file,
-   "%s%s--- %s%s%s\n%s%s+++ %s%s%s\n%s%s@@ -",
-   line_prefix, metainfo, a_name.buf, name_a_tab, reset,
-   line_prefix, metainfo, b_name.buf, name_b_tab, reset,
-   line_prefix, fraginfo);
+
+   emit_diff_symbol(o, DIFF_SYMBOL_FILEPAIR_MINUS,
+a_name.buf, a_name.len, 0);
+   emit_diff_symbol(o, DIFF_SYMBOL_FILEPAIR_PLUS,
+b_name.buf, b_name.len, 0);
+
+   fprintf(o->file, "%s%s@@ -", line_prefix, fraginfo);
if (!o->irreversible_delete)
print_line_count(o->file, lc_a);
else
@@ -1365,10 +1379,8 @@ static void find_lno(const char *line, struct 
emit_callback *ecbdata)
 static void fn_out_consume(void *priv, char *line, unsigned long len)
 {
struct emit_callback *ecbdata = priv;
-   const char *meta = diff_get_color(ecbdata->color_diff, DIFF_METAINFO);
const char *reset = diff_get_color(ecbdata->color_diff, DIFF_RESET);
struct diff_options *o = ecbdata->opt;
-   const char *line_prefix = diff_line_prefix(o);
 
o->found_changes = 1;
 
@@ -1379,15 +1391,12 @@ static void fn_out_consume(void *priv, char *line, 
unsigned long len)
}
 
if (ecbdata->label_path[0]) {
-   const char *name_a_tab, *name_b_tab;
-
-   name_a_tab = strchr(ecbdata->label_path[0], ' ') ? "\t" : "";
-   name_b_tab = strchr(ecbdata->label_path[1], ' ') ? "\t" : "";
-
-   fprintf(o->file, "%s%s--- %s%s%s\n",
-   line_prefix, meta, ecbdata->label_path[0], reset, 
name_a_tab);
-   fprintf(o->file, "%s%s+++ %s%s%s\n",
-   

[PATCHv2 21/25] diff.c: buffer all output if asked to

2017-06-22 Thread Stefan Beller
Introduce a new option 'emitted_symbols' in the struct diff_options which
controls whether all output is buffered up until all output is available.
It is set internally in diff.c when necessary.

We'll have a new struct 'emitted_string' in diff.c which will be used to
buffer each line.  The emitted_string will duplicate the memory of the
line to buffer as that is easiest to reason about for now. In a future
patch we may want to decrease the memory usage by not duplicating all
output for buffering but rather we may want to store offsets into the
file or in case of hunk descriptions such as the similarity score, we
could just store the relevant number and reproduce the text later on.

This approach was chosen as a first step because it is quite simple
compared to the alternative with less memory footprint.

emit_diff_symbol factors out the emission part and depending on the
diff_options->emitted_symbols the emission will be performed directly
when calling emit_diff_symbol or after the whole process is done, i.e.
by buffering we have add the possibility for a second pass over the
whole output before doing the actual output.

In 6440d34 (2012-03-14, diff: tweak a _copy_ of diff_options with
word-diff) we introduced a duplicate diff options struct for word
emissions as we may have different regex settings in there.
When buffering the output, we need to operate on just one buffer,
so we have to copy back the emissions of the word buffer into the
main buffer.

Unconditionally enable output via buffer in this patch as it yields
a great opportunity for testing, i.e. all the diff tests from the
test suite pass without having reordering issues (i.e. only parts
of the output got buffered, and we forgot to buffer other parts).
The test suite passes, which gives confidence that we converted all
functions to use emit_string for output.

Signed-off-by: Stefan Beller 
---
 diff.c | 109 +++--
 diff.h |   2 ++
 2 files changed, 109 insertions(+), 2 deletions(-)

diff --git a/diff.c b/diff.c
index f9fb94b0d3..b344808c20 100644
--- a/diff.c
+++ b/diff.c
@@ -604,6 +604,47 @@ enum diff_symbol {
 #define DIFF_SYMBOL_CONTENT_BLANK_LINE_EOF (1<<16)
 #define DIFF_SYMBOL_CONTENT_WS_MASK (WSEH_NEW | WSEH_OLD | WSEH_CONTEXT | 
WS_RULE_MASK)
 
+/*
+ * This struct is used when we need to buffer the output of the diff output.
+ *
+ * NEEDSWORK: Instead of storing a copy of the line, add an offset pointer
+ * into the pre/post image file. This pointer could be a union with the
+ * line pointer. By storing an offset into the file instead of the literal 
line,
+ * we can decrease the memory footprint for the buffered output. At first we
+ * may want to only have indirection for the content lines, but we could also
+ * enhance the state for emitting prefabricated lines, e.g. the similarity
+ * score line or hunk/file headers would only need to store a number or path
+ * and then the output can be constructed later on depending on state.
+ */
+struct emitted_diff_symbol {
+   const char *line;
+   int len;
+   int flags;
+   enum diff_symbol s;
+};
+#define EMITTED_DIFF_SYMBOL_INIT {NULL}
+
+struct emitted_diff_symbols {
+   struct emitted_diff_symbol *buf;
+   int nr, alloc;
+};
+#define EMITTED_DIFF_SYMBOLS_INIT {NULL, 0, 0}
+
+static void append_emitted_diff_symbol(struct diff_options *o,
+  struct emitted_diff_symbol *e)
+{
+   struct emitted_diff_symbol *f;
+
+   ALLOC_GROW(o->emitted_symbols->buf,
+  o->emitted_symbols->nr + 1,
+  o->emitted_symbols->alloc);
+   f = >emitted_symbols->buf[o->emitted_symbols->nr++];
+
+   memcpy(f, e, sizeof(struct emitted_diff_symbol));
+   f->line = e->line ? xmemdupz(e->line, e->len) : NULL;
+}
+
+
 static void emit_line_ws_markup(struct diff_options *o,
const char *set, const char *reset,
const char *line, int len, char sign,
@@ -630,12 +671,18 @@ static void emit_line_ws_markup(struct diff_options *o,
}
 }
 
-static void emit_diff_symbol(struct diff_options *o, enum diff_symbol s,
-const char *line, int len, unsigned flags)
+static void emit_diff_symbol_from_struct(struct diff_options *o,
+struct emitted_diff_symbol *eds)
 {
static const char *nneof = " No newline at end of file\n";
const char *context, *reset, *set, *meta, *fraginfo;
struct strbuf sb = STRBUF_INIT;
+
+   enum diff_symbol s = eds->s;
+   const char *line = eds->line;
+   int len = eds->len;
+   unsigned flags = eds->flags;
+
switch (s) {
case DIFF_SYMBOL_NO_LF_EOF:
context = diff_get_color_opt(o, DIFF_CONTEXT);
@@ -777,6 +824,17 @@ static void emit_diff_symbol(struct diff_options *o, enum 
diff_symbol s,
strbuf_release();
 }
 

[PATCHv2 25/25] diff: document the new --color-moved setting

2017-06-22 Thread Stefan Beller
Signed-off-by: Stefan Beller 
Signed-off-by: Junio C Hamano 
---
 Documentation/config.txt   | 12 ++--
 Documentation/diff-options.txt | 27 +++
 2 files changed, 37 insertions(+), 2 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 475e874d51..29e0b9fa69 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1051,14 +1051,22 @@ This does not affect linkgit:git-format-patch[1] or the
 'git-diff-{asterisk}' plumbing commands.  Can be overridden on the
 command line with the `--color[=]` option.
 
+diff.colorMoved::
+   If set moved lines in a diff are colored differently,
+   for details see '--color-moved' in linkgit:git-diff[1].
+
 color.diff.::
Use customized color for diff colorization.  `` specifies
which part of the patch to use the specified color, and is one
of `context` (context text - `plain` is a historical synonym),
`meta` (metainformation), `frag`
(hunk header), 'func' (function in hunk header), `old` (removed lines),
-   `new` (added lines), `commit` (commit headers), or `whitespace`
-   (highlighting whitespace errors).
+   `new` (added lines), `commit` (commit headers), `whitespace`
+   (highlighting whitespace errors), `oldMoved` (deleted lines),
+   `newMoved` (added lines), `oldMovedDimmed`, `oldMovedAlternative`,
+   `oldMovedAlternativeDimmed`, `newMovedDimmed`, `newMovedAlternative`
+   and `newMovedAlternativeDimmed` (See the ''
+   setting of '--color-moved' in linkgit:git-diff[1] for details).
 
 color.decorate.::
Use customized color for 'git log --decorate' output.  `` is one
diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index 89cc0f48de..058c8014ed 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -231,6 +231,33 @@ ifdef::git-diff[]
 endif::git-diff[]
It is the same as `--color=never`.
 
+--color-moved[=]::
+   Moved lines of code are colored differently.
+ifdef::git-diff[]
+   It can be changed by the `diff.colorMoved` configuration setting.
+endif::git-diff[]
+   The  defaults to 'no' if the option is not given
+   and to 'dimmed_zebra' if the option with no mode is given.
+   The mode must be one of:
++
+--
+no::
+   Moved lines are not highlighted.
+plain::
+   Any line that is added in one location and was removed
+   in another location will be colored with 'color.diff.newMoved'.
+   Similarly 'color.diff.oldMoved' will be used for removed lines
+   that are added somewhere else in the diff.
+zebra::
+   Blocks of moved code are detected. The detected blocks are
+   painted using the 'color.diff.{old,new}Moved' alternating with
+   'color.diff.{old,new}MovedAlternative'.
+dimmed_zebra::
+   Similar to 'zebra', but additional dimming of uninteresting parts
+   of moved code is performed. The bordering lines of two adjacent
+   blocks are considered interesting, the rest is uninteresting.
+--
+
 --word-diff[=]::
Show a word diff, using the  to delimit changed words.
By default, words are delimited by whitespace; see
-- 
2.12.2.575.gb14f27f917



[PATCHv2 08/25] diff.c: migrate emit_line_checked to use emit_diff_symbol

2017-06-22 Thread Stefan Beller
Add a new flags field to emit_diff_symbol, that will be used by
context lines for:
* white space rules that are applicable (The first 12 bits)
  Take a note in cahe.c as well, when this ws rules are extended we have
  to fix the bits in the flags field.
* how the rules are evaluated (actually this double encodes the sign
  of the line, but the code is easier to keep this way, bits 13,14,15)
* if the line a blank line at EOF (bit 16)

The check if new lines need to be marked up as extra lines at the end of
file, is now done unconditionally. That should be ok, as
'new_blank_line_at_eof' has a quick early return.

Signed-off-by: Stefan Beller 
---
 cache.h |   2 ++
 diff.c  | 116 +---
 diff.h  |   6 ++--
 3 files changed, 80 insertions(+), 44 deletions(-)

diff --git a/cache.h b/cache.h
index e1f0e182ad..d2204bf6d1 100644
--- a/cache.h
+++ b/cache.h
@@ -2168,6 +2168,8 @@ void shift_tree_by(const struct object_id *, const struct 
object_id *, struct ob
 #define WS_TRAILING_SPACE  (WS_BLANK_AT_EOL|WS_BLANK_AT_EOF)
 #define WS_DEFAULT_RULE (WS_TRAILING_SPACE|WS_SPACE_BEFORE_TAB|8)
 #define WS_TAB_WIDTH_MASK077
+/* All WS_* -- when extended, adapt diff.c emit_symbol */
+#define WS_RULE_MASK   0
 extern unsigned whitespace_rule_cfg;
 extern unsigned whitespace_rule(const char *);
 extern unsigned parse_whitespace_rule(const char *);
diff --git a/diff.c b/diff.c
index b78f698cad..e7583efca3 100644
--- a/diff.c
+++ b/diff.c
@@ -560,17 +560,54 @@ static void emit_line(struct diff_options *o, const char 
*set, const char *reset
 }
 
 enum diff_symbol {
+   DIFF_SYMBOL_CONTEXT,
+   DIFF_SYMBOL_PLUS,
+   DIFF_SYMBOL_MINUS,
DIFF_SYMBOL_NO_LF_EOF,
DIFF_SYMBOL_CONTEXT_FRAGINFO,
DIFF_SYMBOL_CONTEXT_MARKER,
DIFF_SYMBOL_SEPARATOR
 };
+/*
+ * Flags for content lines:
+ * 0..12 are whitespace rules
+ * 13-15 are WSEH_NEW | WSEH_OLD | WSEH_CONTEXT
+ * 16 is marking if the line is blank at EOF
+ */
+#define DIFF_SYMBOL_CONTENT_BLANK_LINE_EOF (1<<16)
+#define DIFF_SYMBOL_CONTENT_WS_MASK (WSEH_NEW | WSEH_OLD | WSEH_CONTEXT | 
WS_RULE_MASK)
+
+static void emit_line_ws_markup(struct diff_options *o,
+   const char *set, const char *reset,
+   const char *line, int len, char sign,
+   unsigned ws_rule, int blank_at_eof)
+{
+   const char *ws = NULL;
+
+   if (o->ws_error_highlight & ws_rule) {
+   ws = diff_get_color_opt(o, DIFF_WHITESPACE);
+   if (!*ws)
+   ws = NULL;
+   }
+
+   if (!ws)
+   emit_line_0(o, set, reset, sign, line, len);
+   else if (blank_at_eof)
+   /* Blank line at EOF - paint '+' as well */
+   emit_line_0(o, ws, reset, sign, line, len);
+   else {
+   /* Emit just the prefix, then the rest. */
+   emit_line_0(o, set, reset, sign, "", 0);
+   ws_check_emit(line, len, ws_rule,
+ o->file, set, reset, ws);
+   }
+}
 
 static void emit_diff_symbol(struct diff_options *o, enum diff_symbol s,
-const char *line, int len)
+const char *line, int len, unsigned flags)
 {
static const char *nneof = " No newline at end of file\n";
-   const char *context, *reset;
+   const char *context, *reset, *set;
switch (s) {
case DIFF_SYMBOL_NO_LF_EOF:
context = diff_get_color_opt(o, DIFF_CONTEXT);
@@ -592,6 +629,25 @@ static void emit_diff_symbol(struct diff_options *o, enum 
diff_symbol s,
diff_line_prefix(o),
o->line_termination);
break;
+   case DIFF_SYMBOL_CONTEXT:
+   set = diff_get_color_opt(o, DIFF_CONTEXT);
+   reset = diff_get_color_opt(o, DIFF_RESET);
+   emit_line_ws_markup(o, set, reset, line, len, ' ',
+   flags & (DIFF_SYMBOL_CONTENT_WS_MASK), 0);
+   break;
+   case DIFF_SYMBOL_PLUS:
+   set = diff_get_color_opt(o, DIFF_FILE_NEW);
+   reset = diff_get_color_opt(o, DIFF_RESET);
+   emit_line_ws_markup(o, set, reset, line, len, '+',
+   flags & DIFF_SYMBOL_CONTENT_WS_MASK,
+   flags & DIFF_SYMBOL_CONTENT_BLANK_LINE_EOF);
+   break;
+   case DIFF_SYMBOL_MINUS:
+   set = diff_get_color_opt(o, DIFF_FILE_OLD);
+   reset = diff_get_color_opt(o, DIFF_RESET);
+   emit_line_ws_markup(o, set, reset, line, len, '-',
+   flags & DIFF_SYMBOL_CONTENT_WS_MASK, 0);
+   break;
default:
die("BUG: unknown diff symbol");
}
@@ -608,57 +664,31 @@ static int 

[PATCHv2 14/25] diff.c: emit_diff_symbol learns DIFF_SYMBOL_REWRITE_DIFF

2017-06-22 Thread Stefan Beller
Signed-off-by: Stefan Beller 
---
 diff.c | 35 +--
 1 file changed, 21 insertions(+), 14 deletions(-)

diff --git a/diff.c b/diff.c
index 78cf5ad691..0314d57647 100644
--- a/diff.c
+++ b/diff.c
@@ -560,6 +560,7 @@ static void emit_line(struct diff_options *o, const char 
*set, const char *reset
 }
 
 enum diff_symbol {
+   DIFF_SYMBOL_REWRITE_DIFF,
DIFF_SYMBOL_BINARY_FILES,
DIFF_SYMBOL_HEADER,
DIFF_SYMBOL_FILEPAIR_PLUS,
@@ -614,7 +615,7 @@ static void emit_diff_symbol(struct diff_options *o, enum 
diff_symbol s,
 const char *line, int len, unsigned flags)
 {
static const char *nneof = " No newline at end of file\n";
-   const char *context, *reset, *set, *meta;
+   const char *context, *reset, *set, *meta, *fraginfo;
switch (s) {
case DIFF_SYMBOL_NO_LF_EOF:
context = diff_get_color_opt(o, DIFF_CONTEXT);
@@ -694,6 +695,11 @@ static void emit_diff_symbol(struct diff_options *o, enum 
diff_symbol s,
case DIFF_SYMBOL_HEADER:
fprintf(o->file, "%s", line);
break;
+   case DIFF_SYMBOL_REWRITE_DIFF:
+   fraginfo = diff_get_color(o->use_color, DIFF_FRAGINFO);
+   reset = diff_get_color_opt(o, DIFF_RESET);
+   emit_line(o, fraginfo, reset, line, len);
+   break;
default:
die("BUG: unknown diff symbol");
}
@@ -816,17 +822,17 @@ static void remove_tempfile(void)
}
 }
 
-static void print_line_count(FILE *file, int count)
+static void add_line_count(struct strbuf *out, int count)
 {
switch (count) {
case 0:
-   fprintf(file, "0,0");
+   strbuf_addstr(out, "0,0");
break;
case 1:
-   fprintf(file, "1");
+   strbuf_addstr(out, "1");
break;
default:
-   fprintf(file, "1,%d", count);
+   strbuf_addf(out, "1,%d", count);
break;
}
 }
@@ -865,14 +871,12 @@ static void emit_rewrite_diff(const char *name_a,
  struct diff_options *o)
 {
int lc_a, lc_b;
-   const char *fraginfo = diff_get_color(o->use_color, DIFF_FRAGINFO);
-   const char *reset = diff_get_color(o->use_color, DIFF_RESET);
static struct strbuf a_name = STRBUF_INIT, b_name = STRBUF_INIT;
const char *a_prefix, *b_prefix;
char *data_one, *data_two;
size_t size_one, size_two;
struct emit_callback ecbdata;
-   const char *line_prefix = diff_line_prefix(o);
+   struct strbuf out = STRBUF_INIT;
 
if (diff_mnemonic_prefix && DIFF_OPT_TST(o, REVERSE_DIFF)) {
a_prefix = o->b_prefix;
@@ -916,14 +920,17 @@ static void emit_rewrite_diff(const char *name_a,
emit_diff_symbol(o, DIFF_SYMBOL_FILEPAIR_PLUS,
 b_name.buf, b_name.len, 0);
 
-   fprintf(o->file, "%s%s@@ -", line_prefix, fraginfo);
+   strbuf_addstr(, "@@ -");
if (!o->irreversible_delete)
-   print_line_count(o->file, lc_a);
+   add_line_count(, lc_a);
else
-   fprintf(o->file, "?,?");
-   fprintf(o->file, " +");
-   print_line_count(o->file, lc_b);
-   fprintf(o->file, " @@%s\n", reset);
+   strbuf_addstr(, "?,?");
+   strbuf_addstr(, " +");
+   add_line_count(, lc_b);
+   strbuf_addstr(, " @@\n");
+   emit_diff_symbol(o, DIFF_SYMBOL_REWRITE_DIFF, out.buf, out.len, 0);
+   strbuf_release();
+
if (lc_a && !o->irreversible_delete)
emit_rewrite_lines(, '-', data_one, size_one);
if (lc_b)
-- 
2.12.2.575.gb14f27f917



[PATCHv2 10/25] diff.c: emit_diff_symbol learns DIFF_SYMBOL_CONTEXT_INCOMPLETE

2017-06-22 Thread Stefan Beller
The context marker use the exact same output pattern, so reuse it.

Signed-off-by: Stefan Beller 
---
 diff.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/diff.c b/diff.c
index 3d081edd12..f3d0918810 100644
--- a/diff.c
+++ b/diff.c
@@ -563,6 +563,7 @@ enum diff_symbol {
DIFF_SYMBOL_WORDS_PORCELAIN,
DIFF_SYMBOL_WORDS,
DIFF_SYMBOL_CONTEXT,
+   DIFF_SYMBOL_CONTEXT_INCOMPLETE,
DIFF_SYMBOL_PLUS,
DIFF_SYMBOL_MINUS,
DIFF_SYMBOL_NO_LF_EOF,
@@ -621,6 +622,7 @@ static void emit_diff_symbol(struct diff_options *o, enum 
diff_symbol s,
case DIFF_SYMBOL_CONTEXT_FRAGINFO:
emit_line(o, "", "", line, len);
break;
+   case DIFF_SYMBOL_CONTEXT_INCOMPLETE:
case DIFF_SYMBOL_CONTEXT_MARKER:
context = diff_get_color_opt(o, DIFF_CONTEXT);
reset = diff_get_color_opt(o, DIFF_RESET);
@@ -1448,8 +1450,8 @@ static void fn_out_consume(void *priv, char *line, 
unsigned long len)
default:
/* incomplete line at the end */
ecbdata->lno_in_preimage++;
-   emit_line(o, diff_get_color(ecbdata->color_diff, DIFF_CONTEXT),
- reset, line, len);
+   emit_diff_symbol(o, DIFF_SYMBOL_CONTEXT_INCOMPLETE,
+line, len, 0);
break;
}
 }
-- 
2.12.2.575.gb14f27f917



[PATCHv2 19/25] diff.c: emit_diff_symbol learns about DIFF_SYMBOL_STAT_SEP

2017-06-22 Thread Stefan Beller
Signed-off-by: Stefan Beller 
---
 diff.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/diff.c b/diff.c
index 60630d2c3f..0ed86ba984 100644
--- a/diff.c
+++ b/diff.c
@@ -570,6 +570,7 @@ enum diff_symbol {
DIFF_SYMBOL_STATS_SUMMARY_INSERTS_DELETES,
DIFF_SYMBOL_STATS_LINE,
DIFF_SYMBOL_WORD_DIFF,
+   DIFF_SYMBOL_STAT_SEP,
DIFF_SYMBOL_SUBMODULE_ADD,
DIFF_SYMBOL_SUBMODULE_DEL,
DIFF_SYMBOL_SUBMODULE_UNTRACKED,
@@ -765,6 +766,9 @@ static void emit_diff_symbol(struct diff_options *o, enum 
diff_symbol s,
case DIFF_SYMBOL_WORD_DIFF:
fprintf(o->file, "%.*s", len, line);
break;
+   case DIFF_SYMBOL_STAT_SEP:
+   fputs(o->stat_sep, o->file);
+   break;
default:
die("BUG: unknown diff symbol");
}
@@ -5082,10 +5086,10 @@ void diff_flush(struct diff_options *options)
if (output_format & DIFF_FORMAT_PATCH) {
if (separator) {
emit_diff_symbol(options, DIFF_SYMBOL_SEPARATOR, NULL, 
0, 0);
-   if (options->stat_sep) {
+   if (options->stat_sep)
/* attach patch instead of inline */
-   fputs(options->stat_sep, options->file);
-   }
+   emit_diff_symbol(options, DIFF_SYMBOL_STAT_SEP,
+NULL, 0, 0);
}
 
diff_flush_patch_all_file_pairs(options);
-- 
2.12.2.575.gb14f27f917



[PATCHv2 22/25] diff.c: color moved lines differently

2017-06-22 Thread Stefan Beller
When a patch consists mostly of moving blocks of code around, it can
be quite tedious to ensure that the blocks are moved verbatim, and not
undesirably modified in the move. To that end, color blocks that are
moved within the same patch differently. For example (OM, del, add,
and NM are different colors):

[OM]  -void sensitive_stuff(void)
[OM]  -{
[OM]  -if (!is_authorized_user())
[OM]  -die("unauthorized");
[OM]  -sensitive_stuff(spanning,
[OM]  -multiple,
[OM]  -lines);
[OM]  -}

   void another_function()
   {
[del] -printf("foo");
[add] +printf("bar");
   }

[NM]  +void sensitive_stuff(void)
[NM]  +{
[NM]  +if (!is_authorized_user())
[NM]  +die("unauthorized");
[NM]  +sensitive_stuff(spanning,
[NM]  +multiple,
[NM]  +lines);
[NM]  +}

However adjacent blocks may be problematic. For example, in this
potentially malicious patch, the swapping of blocks can be spotted:

[OM]  -void sensitive_stuff(void)
[OM]  -{
[OMA] -if (!is_authorized_user())
[OMA] -die("unauthorized");
[OM]  -sensitive_stuff(spanning,
[OM]  -multiple,
[OM]  -lines);
[OMA] -}

   void another_function()
   {
[del] -printf("foo");
[add] +printf("bar");
   }

[NM]  +void sensitive_stuff(void)
[NM]  +{
[NMA] +sensitive_stuff(spanning,
[NMA] +multiple,
[NMA] +lines);
[NM]  +if (!is_authorized_user())
[NM]  +die("unauthorized");
[NMA] +}

If the moved code is larger, it is easier to hide some permutation in the
code, which is why some alternative coloring is needed.

This patch implements the first mode:
* basic alternating 'Zebra' mode
  This conveys all information needed to the user.  Defer customization to
  later patches.

First I implemented an alternative design, which would try to fingerprint
a line by its neighbors to detect if we are in a block or at the boundary.
This idea iss error prone as it inspected each line and its neighboring
lines to determine if the line was (a) moved and (b) if was deep inside
a hunk by having matching neighboring lines. This is unreliable as the
we can construct hunks which have equal neighbors that just exceed the
number of lines inspected. (Think of 'AXYZBXYZCXYZD..' with each letter
as a line, that is permutated to AXYZCXYZBXYZD..').

Instead this provides a dynamic programming greedy algorithm that finds
the largest moved hunk and then has several modes on highlighting bounds.

A note on the options '--submodule=diff' and '--color-words/--word-diff':
In the conversion to use emit_line in the prior patches both submodules
as well as word diff output carefully chose to call emit_line with sign=0.
All output with sign=0 is ignored for move detection purposes in this
patch, such that no weird looking output will be generated for these
cases. This leads to another thought: We could pass on '--color-moved' to
submodules such that they color up moved lines for themselves. If we'd do
so only line moves within a repository boundary are marked up.

Helped-by: Jonathan Tan 
Signed-off-by: Stefan Beller 
---
 diff.c | 303 ++---
 diff.h |  10 +-
 t/t4015-diff-whitespace.sh | 196 +
 3 files changed, 494 insertions(+), 15 deletions(-)

diff --git a/diff.c b/diff.c
index b344808c20..e6f547804f 100644
--- a/diff.c
+++ b/diff.c
@@ -15,6 +15,7 @@
 #include "userdiff.h"
 #include "submodule-config.h"
 #include "submodule.h"
+#include "hashmap.h"
 #include "ll-merge.h"
 #include "string-list.h"
 #include "argv-array.h"
@@ -31,6 +32,7 @@ static int diff_indent_heuristic; /* experimental */
 static int diff_rename_limit_default = 400;
 static int diff_suppress_blank_empty;
 static int diff_use_color_default = -1;
+static int diff_color_moved_default;
 static int diff_context_default = 3;
 static int diff_interhunk_context_default;
 static const char *diff_word_regex_cfg;
@@ -55,6 +57,10 @@ static char diff_colors[][COLOR_MAXLEN] = {
GIT_COLOR_YELLOW,   /* COMMIT */
GIT_COLOR_BG_RED,   /* WHITESPACE */
GIT_COLOR_NORMAL,   /* FUNCINFO */
+   GIT_COLOR_MAGENTA,  /* OLD_MOVED */
+   GIT_COLOR_BLUE, /* OLD_MOVED ALTERNATIVE */
+   GIT_COLOR_CYAN, /* NEW_MOVED */
+   GIT_COLOR_YELLOW,   /* NEW_MOVED ALTERNATIVE */
 };
 
 static NORETURN void die_want_option(const char *option_name)
@@ -80,6 +86,14 @@ static int parse_diff_color_slot(const char *var)

[PATCHv2 04/25] diff.c: introduce emit_diff_symbol

2017-06-22 Thread Stefan Beller
In a later patch we want to buffer all output before emitting it as a
new feature ("markup moved lines") conceptually cannot be implemented
in a single pass over the output.

There are different approaches to buffer all output such as:
* Buffering on the char level, i.e. we'd have a char[] which would
  grow at approximately 80 characters a line. This would keep the
  output completely unstructured, but might be very easy to implement,
  such as redirecting all output to a temporary file and working off
  that. The later passes over the buffer are quite complicated though,
  because we have to parse back any output and then decide if it should
  be modified.

* Buffer on a line level. As the output is mostly line oriented already,
  this would make sense, but it still is a bit awkward as we'd have to
  make sense of it again by looking at the first characters of a line
  to decide what part of a diff a line is.

* Buffer semantically. Imagine there is a formal grammar for the diff
  output and we'd keep the symbols of this grammar around. This keeps
  the highest level of structure in the buffered data, such that the
  actual memory requirements are less than say the first option. Instead
  of buffering the characters of the line, we'll buffer what we intend
  to do plus additional information for the specifics. An output of

diff --git a/new.txt b/new.txt
index fa69b07..412428c 100644
Binary files a/new.txt and b/new.txt differ

  could be buffered as
 DIFF_SYMBOL_DIFF_START + new.txt
 DIFF_SYMBOL_INDEX_MODE + fa69b07 412428c "non-executable" flag
 DIFF_SYMBOL_BINARY_FILES + new.txt

This and the following patches introduce the third option of buffering
by first moving any output to emit_diff_symbol, and then introducing the
buffering in this function.

Signed-off-by: Stefan Beller 
---
 diff.c | 22 +++---
 1 file changed, 19 insertions(+), 3 deletions(-)

diff --git a/diff.c b/diff.c
index 2f9722b382..2257d44e2c 100644
--- a/diff.c
+++ b/diff.c
@@ -559,6 +559,24 @@ static void emit_line(struct diff_options *o, const char 
*set, const char *reset
emit_line_0(o, set, reset, line[0], line+1, len-1);
 }
 
+enum diff_symbol {
+   DIFF_SYMBOL_SEPARATOR
+};
+
+static void emit_diff_symbol(struct diff_options *o, enum diff_symbol s,
+const char *line, int len)
+{
+   switch (s) {
+   case DIFF_SYMBOL_SEPARATOR:
+   fprintf(o->file, "%s%c",
+   diff_line_prefix(o),
+   o->line_termination);
+   break;
+   default:
+   die("BUG: unknown diff symbol");
+   }
+}
+
 static int new_blank_line_at_eof(struct emit_callback *ecbdata, const char 
*line, int len)
 {
if (!((ecbdata->ws_rule & WS_BLANK_AT_EOF) &&
@@ -4833,9 +4851,7 @@ void diff_flush(struct diff_options *options)
 
if (output_format & DIFF_FORMAT_PATCH) {
if (separator) {
-   fprintf(options->file, "%s%c",
-   diff_line_prefix(options),
-   options->line_termination);
+   emit_diff_symbol(options, DIFF_SYMBOL_SEPARATOR, NULL, 
0);
if (options->stat_sep) {
/* attach patch instead of inline */
fputs(options->stat_sep, options->file);
-- 
2.12.2.575.gb14f27f917



[PATCHv2 06/25] diff.c: emit_diff_symbol learns DIFF_SYMBOL_CONTEXT_FRAGINFO

2017-06-22 Thread Stefan Beller
Signed-off-by: Stefan Beller 
---
 diff.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/diff.c b/diff.c
index c550f75195..16818fa571 100644
--- a/diff.c
+++ b/diff.c
@@ -560,6 +560,7 @@ static void emit_line(struct diff_options *o, const char 
*set, const char *reset
 }
 
 enum diff_symbol {
+   DIFF_SYMBOL_CONTEXT_FRAGINFO,
DIFF_SYMBOL_CONTEXT_MARKER,
DIFF_SYMBOL_SEPARATOR
 };
@@ -569,6 +570,9 @@ static void emit_diff_symbol(struct diff_options *o, enum 
diff_symbol s,
 {
const char *context, *reset;
switch (s) {
+   case DIFF_SYMBOL_CONTEXT_FRAGINFO:
+   emit_line(o, "", "", line, len);
+   break;
case DIFF_SYMBOL_CONTEXT_MARKER:
context = diff_get_color_opt(o, DIFF_CONTEXT);
reset = diff_get_color_opt(o, DIFF_RESET);
@@ -704,8 +708,8 @@ static void emit_hunk_header(struct emit_callback *ecbdata,
 
strbuf_add(, line + len, org_len - len);
strbuf_complete_line();
-
-   emit_line(ecbdata->opt, "", "", msgbuf.buf, msgbuf.len);
+   emit_diff_symbol(ecbdata->opt,
+DIFF_SYMBOL_CONTEXT_FRAGINFO, msgbuf.buf, msgbuf.len);
strbuf_release();
 }
 
-- 
2.12.2.575.gb14f27f917



[PATCHv2 02/25] diff.c: move line ending check into emit_hunk_header

2017-06-22 Thread Stefan Beller
The emit_hunk_header() function is responsible for assembling a
hunk header and calling emit_line() to send the hunk header
to the output file.  Its only caller fn_out_consume() needs
to prepare for a case where the function emits an incomplete
line and add the terminating LF.

Instead make sure emit_hunk_header() to always send a
completed line to emit_line().

Signed-off-by: Stefan Beller 
Signed-off-by: Junio C Hamano 
---
 diff.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/diff.c b/diff.c
index 3f5bf8b5a4..c2ed605cd0 100644
--- a/diff.c
+++ b/diff.c
@@ -677,6 +677,8 @@ static void emit_hunk_header(struct emit_callback *ecbdata,
}
 
strbuf_add(, line + len, org_len - len);
+   strbuf_complete_line();
+
emit_line(ecbdata->opt, "", "", msgbuf.buf, msgbuf.len);
strbuf_release();
 }
@@ -1315,8 +1317,6 @@ static void fn_out_consume(void *priv, char *line, 
unsigned long len)
len = sane_truncate_line(ecbdata, line, len);
find_lno(line, ecbdata);
emit_hunk_header(ecbdata, line, len);
-   if (line[len-1] != '\n')
-   putc('\n', o->file);
return;
}
 
-- 
2.12.2.575.gb14f27f917



[PATCHv2 09/25] diff.c: emit_diff_symbol learns DIFF_SYMBOL_WORDS[_PORCELAIN]

2017-06-22 Thread Stefan Beller
Signed-off-by: Stefan Beller 
---
 diff.c | 42 ++
 1 file changed, 26 insertions(+), 16 deletions(-)

diff --git a/diff.c b/diff.c
index e7583efca3..3d081edd12 100644
--- a/diff.c
+++ b/diff.c
@@ -560,6 +560,8 @@ static void emit_line(struct diff_options *o, const char 
*set, const char *reset
 }
 
 enum diff_symbol {
+   DIFF_SYMBOL_WORDS_PORCELAIN,
+   DIFF_SYMBOL_WORDS,
DIFF_SYMBOL_CONTEXT,
DIFF_SYMBOL_PLUS,
DIFF_SYMBOL_MINUS,
@@ -648,6 +650,26 @@ static void emit_diff_symbol(struct diff_options *o, enum 
diff_symbol s,
emit_line_ws_markup(o, set, reset, line, len, '-',
flags & DIFF_SYMBOL_CONTENT_WS_MASK, 0);
break;
+   case DIFF_SYMBOL_WORDS_PORCELAIN:
+   context = diff_get_color_opt(o, DIFF_CONTEXT);
+   reset = diff_get_color_opt(o, DIFF_RESET);
+   emit_line(o, context, reset, line, len);
+   fputs("~\n", o->file);
+   break;
+   case DIFF_SYMBOL_WORDS:
+   context = diff_get_color_opt(o, DIFF_CONTEXT);
+   reset = diff_get_color_opt(o, DIFF_RESET);
+   /*
+* Skip the prefix character, if any.  With
+* diff_suppress_blank_empty, there may be
+* none.
+*/
+   if (line[0] != '\n') {
+   line++;
+   len--;
+   }
+   emit_line(o, context, reset, line, len);
+   break;
default:
die("BUG: unknown diff symbol");
}
@@ -1342,7 +1364,6 @@ static void fn_out_consume(void *priv, char *line, 
unsigned long len)
 {
struct emit_callback *ecbdata = priv;
const char *meta = diff_get_color(ecbdata->color_diff, DIFF_METAINFO);
-   const char *context = diff_get_color(ecbdata->color_diff, DIFF_CONTEXT);
const char *reset = diff_get_color(ecbdata->color_diff, DIFF_RESET);
struct diff_options *o = ecbdata->opt;
const char *line_prefix = diff_line_prefix(o);
@@ -1384,6 +1405,9 @@ static void fn_out_consume(void *priv, char *line, 
unsigned long len)
}
 
if (ecbdata->diff_words) {
+   enum diff_symbol s =
+   ecbdata->diff_words->type == DIFF_WORDS_PORCELAIN ?
+   DIFF_SYMBOL_WORDS_PORCELAIN : DIFF_SYMBOL_WORDS;
if (line[0] == '-') {
diff_words_append(line, len,
  >diff_words->minus);
@@ -1403,21 +1427,7 @@ static void fn_out_consume(void *priv, char *line, 
unsigned long len)
return;
}
diff_words_flush(ecbdata);
-   if (ecbdata->diff_words->type == DIFF_WORDS_PORCELAIN) {
-   emit_line(o, context, reset, line, len);
-   fputs("~\n", o->file);
-   } else {
-   /*
-* Skip the prefix character, if any.  With
-* diff_suppress_blank_empty, there may be
-* none.
-*/
-   if (line[0] != '\n') {
- line++;
- len--;
-   }
-   emit_line(o, context, reset, line, len);
-   }
+   emit_diff_symbol(o, s, line, len, 0);
return;
}
 
-- 
2.12.2.575.gb14f27f917



[PATCHv2 13/25] diff.c: emit_diff_symbol learns about DIFF_SYMBOL_BINARY_FILES

2017-06-22 Thread Stefan Beller
we could save a little bit of memory when buffering in a later mode
by just passing the inner part ("%s and %s", file1, file 2), but
those a just a few bytes, so instead let's reuse the implementation from
DIFF_SYMBOL_HEADER and keep the whole line around.

Signed-off-by: Stefan Beller 
---
 diff.c | 20 +++-
 1 file changed, 15 insertions(+), 5 deletions(-)

diff --git a/diff.c b/diff.c
index 34314455b5..78cf5ad691 100644
--- a/diff.c
+++ b/diff.c
@@ -560,6 +560,7 @@ static void emit_line(struct diff_options *o, const char 
*set, const char *reset
 }
 
 enum diff_symbol {
+   DIFF_SYMBOL_BINARY_FILES,
DIFF_SYMBOL_HEADER,
DIFF_SYMBOL_FILEPAIR_PLUS,
DIFF_SYMBOL_FILEPAIR_MINUS,
@@ -689,6 +690,7 @@ static void emit_diff_symbol(struct diff_options *o, enum 
diff_symbol s,
line, reset,
strchr(line, ' ') ? "\t" : "");
break;
+   case DIFF_SYMBOL_BINARY_FILES:
case DIFF_SYMBOL_HEADER:
fprintf(o->file, "%s", line);
break;
@@ -2542,6 +2544,7 @@ static void builtin_diff(const char *name_a,
} else if (!DIFF_OPT_TST(o, TEXT) &&
( (!textconv_one && diff_filespec_is_binary(one)) ||
  (!textconv_two && diff_filespec_is_binary(two)) )) {
+   struct strbuf sb = STRBUF_INIT;
if (!one->data && !two->data &&
S_ISREG(one->mode) && S_ISREG(two->mode) &&
!DIFF_OPT_TST(o, BINARY)) {
@@ -2554,8 +2557,11 @@ static void builtin_diff(const char *name_a,
}
emit_diff_symbol(o, DIFF_SYMBOL_HEADER,
 header.buf, header.len, 0);
-   fprintf(o->file, "%sBinary files %s and %s differ\n",
-   line_prefix, lbl[0], lbl[1]);
+   strbuf_addf(, "%sBinary files %s and %s differ\n",
+   diff_line_prefix(o), lbl[0], lbl[1]);
+   emit_diff_symbol(o, DIFF_SYMBOL_BINARY_FILES,
+sb.buf, sb.len, 0);
+   strbuf_release();
goto free_ab_and_return;
}
if (fill_mmfile(, one) < 0 || fill_mmfile(, two) < 0)
@@ -2572,9 +2578,13 @@ static void builtin_diff(const char *name_a,
strbuf_reset();
if (DIFF_OPT_TST(o, BINARY))
emit_binary_diff(o->file, , , line_prefix);
-   else
-   fprintf(o->file, "%sBinary files %s and %s differ\n",
-   line_prefix, lbl[0], lbl[1]);
+   else {
+   strbuf_addf(, "%sBinary files %s and %s differ\n",
+   diff_line_prefix(o), lbl[0], lbl[1]);
+   emit_diff_symbol(o, DIFF_SYMBOL_BINARY_FILES,
+sb.buf, sb.len, 0);
+   strbuf_release();
+   }
o->found_changes = 1;
} else {
/* Crazy xdl interfaces.. */
-- 
2.12.2.575.gb14f27f917



[PATCHv2 00/25] reroll of sb/diff-color-moved

2017-06-22 Thread Stefan Beller
v2:
* addressed all issues raised
* last patch dropped (WIP/RFC: diff.c: have a "machine parseable" move coloring)
* interdiff below

v1:
This is a complete rewrite of the series. Highlights:
* instead of buffering partial lines, we'll pretend all diff output
  follows a well defined grammar, and we emit symbols thereof.
  (The difference is mostly mental, though by this trick we reduce
  the memory footprint for storing one of these symbols from 7 variables
  (3 pointers, 3 ints, one state (also int) down to 4 variables
  (one pointer, 2 ints, one state).
* The algorithm for color painting was detangled:
  -> different functions for block detection and dimming
  -> The last patch (not to be applied) is an RFC that shows
 how we would approach non-colored, but machine parseable highlighting
 of moved lines.

Thanks,
Stefan

Stefan Beller (25):
  diff.c: readability fix
  diff.c: move line ending check into emit_hunk_header
  diff.c: factor out diff_flush_patch_all_file_pairs
  diff.c: introduce emit_diff_symbol
  diff.c: emit_diff_symbol learns DIFF_SYMBOL_CONTEXT_MARKER
  diff.c: emit_diff_symbol learns DIFF_SYMBOL_CONTEXT_FRAGINFO
  diff.c: emit_diff_symbol learns DIFF_SYMBOL_NO_LF_EOF
  diff.c: migrate emit_line_checked to use emit_diff_symbol
  diff.c: emit_diff_symbol learns DIFF_SYMBOL_WORDS[_PORCELAIN]
  diff.c: emit_diff_symbol learns DIFF_SYMBOL_CONTEXT_INCOMPLETE
  diff.c: emit_diff_symbol learns DIFF_SYMBOL_FILEPAIR_{PLUS, MINUS}
  diff.c: emit_diff_symbol learns DIFF_SYMBOL_HEADER
  diff.c: emit_diff_symbol learns about DIFF_SYMBOL_BINARY_FILES
  diff.c: emit_diff_symbol learns DIFF_SYMBOL_REWRITE_DIFF
  submodule.c: migrate diff output to use emit_diff_symbol
  diff.c: convert emit_binary_diff_body to use emit_diff_symbol
  diff.c: convert show_stats to use emit_diff_symbol
  diff.c: convert word diffing to use emit_diff_symbol
  diff.c: emit_diff_symbol learns about DIFF_SYMBOL_STAT_SEP
  diff.c: emit_diff_symbol learns about DIFF_SYMBOL_SUMMARY
  diff.c: buffer all output if asked to
  diff.c: color moved lines differently
  diff.c: color moved lines differently, plain mode
  diff.c: add dimming to moved line detection
  diff: document the new --color-moved setting

 Documentation/config.txt   |   12 +-
 Documentation/diff-options.txt |   27 +
 cache.h|2 +
 color.h|2 +
 diff.c | 1270 
 diff.h |   37 +-
 submodule.c|   85 ++-
 submodule.h|   13 +-
 t/t4015-diff-whitespace.sh |  369 
 9 files changed, 1501 insertions(+), 316 deletions(-)

diff to what is currently queued:
diff --git a/cache.h b/cache.h
index 4d63c44f07..d2204bf6d1 100644
--- a/cache.h
+++ b/cache.h
@@ -2168,6 +2168,7 @@ void shift_tree_by(const struct object_id *, const struct 
object_id *, struct ob
 #define WS_TRAILING_SPACE  (WS_BLANK_AT_EOL|WS_BLANK_AT_EOF)
 #define WS_DEFAULT_RULE (WS_TRAILING_SPACE|WS_SPACE_BEFORE_TAB|8)
 #define WS_TAB_WIDTH_MASK077
+/* All WS_* -- when extended, adapt diff.c emit_symbol */
 #define WS_RULE_MASK   0
 extern unsigned whitespace_rule_cfg;
 extern unsigned whitespace_rule(const char *);
diff --git a/color.h b/color.h
index 0e091b0cf5..fd2b688dfb 100644
--- a/color.h
+++ b/color.h
@@ -42,8 +42,8 @@ struct strbuf;
 #define GIT_COLOR_BG_BLUE  "\033[44m"
 #define GIT_COLOR_BG_MAGENTA   "\033[45m"
 #define GIT_COLOR_BG_CYAN  "\033[46m"
-#define GIT_COLOR_DI   "\033[2m"
-#define GIT_COLOR_DI_IT"\033[2;3m"
+#define GIT_COLOR_FAINT"\033[2m"
+#define GIT_COLOR_FAINT_ITALIC "\033[2;3m"
 
 /* A special value meaning "no color selected" */
 #define GIT_COLOR_NIL "NIL"
diff --git a/diff.c b/diff.c
index 7756f7610c..82ace48c38 100644
--- a/diff.c
+++ b/diff.c
@@ -59,12 +59,12 @@ static char diff_colors[][COLOR_MAXLEN] = {
GIT_COLOR_NORMAL,   /* FUNCINFO */
GIT_COLOR_BOLD_MAGENTA, /* OLD_MOVED */
GIT_COLOR_BOLD_BLUE,/* OLD_MOVED ALTERNATIVE */
-   GIT_COLOR_DI,   /* OLD_MOVED_DIM */
-   GIT_COLOR_DI_IT,/* OLD_MOVED_ALTERNATIVE_DIM */
+   GIT_COLOR_FAINT,/* OLD_MOVED_DIM */
+   GIT_COLOR_FAINT_ITALIC, /* OLD_MOVED_ALTERNATIVE_DIM */
GIT_COLOR_BOLD_CYAN,/* NEW_MOVED */
GIT_COLOR_BOLD_YELLOW,  /* NEW_MOVED ALTERNATIVE */
-   GIT_COLOR_DI,   /* NEW_MOVED_DIM */
-   GIT_COLOR_DI_IT,/* NEW_MOVED_ALTERNATIVE_DIM */
+   GIT_COLOR_FAINT,/* NEW_MOVED_DIM */
+   GIT_COLOR_FAINT_ITALIC, /* NEW_MOVED_ALTERNATIVE_DIM */
 };
 
 static NORETURN void die_want_option(const char *option_name)
@@ -607,32 +607,11 @@ static void emit_line(struct diff_options *o, const char 
*set, const char *reset
 }
 
 enum diff_symbol {
-   DIFF_SYMBOL_SEPARATOR,
-   DIFF_SYMBOL_CONTEXT_MARKER,
-   

[PATCHv2 07/25] diff.c: emit_diff_symbol learns DIFF_SYMBOL_NO_LF_EOF

2017-06-22 Thread Stefan Beller
Signed-off-by: Stefan Beller 
---
 diff.c | 19 +++
 1 file changed, 11 insertions(+), 8 deletions(-)

diff --git a/diff.c b/diff.c
index 16818fa571..b78f698cad 100644
--- a/diff.c
+++ b/diff.c
@@ -560,6 +560,7 @@ static void emit_line(struct diff_options *o, const char 
*set, const char *reset
 }
 
 enum diff_symbol {
+   DIFF_SYMBOL_NO_LF_EOF,
DIFF_SYMBOL_CONTEXT_FRAGINFO,
DIFF_SYMBOL_CONTEXT_MARKER,
DIFF_SYMBOL_SEPARATOR
@@ -568,8 +569,16 @@ enum diff_symbol {
 static void emit_diff_symbol(struct diff_options *o, enum diff_symbol s,
 const char *line, int len)
 {
+   static const char *nneof = " No newline at end of file\n";
const char *context, *reset;
switch (s) {
+   case DIFF_SYMBOL_NO_LF_EOF:
+   context = diff_get_color_opt(o, DIFF_CONTEXT);
+   reset = diff_get_color_opt(o, DIFF_RESET);
+   putc('\n', o->file);
+   emit_line_0(o, context, reset, '\\',
+   nneof, strlen(nneof));
+   break;
case DIFF_SYMBOL_CONTEXT_FRAGINFO:
emit_line(o, "", "", line, len);
break;
@@ -750,7 +759,6 @@ static void emit_rewrite_lines(struct emit_callback *ecb,
   int prefix, const char *data, int size)
 {
const char *endp = NULL;
-   static const char *nneof = " No newline at end of file\n";
const char *reset = diff_get_color(ecb->color_diff, DIFF_RESET);
 
while (0 < size) {
@@ -768,13 +776,8 @@ static void emit_rewrite_lines(struct emit_callback *ecb,
size -= len;
data += len;
}
-   if (!endp) {
-   const char *context = diff_get_color(ecb->color_diff,
-DIFF_CONTEXT);
-   putc('\n', ecb->opt->file);
-   emit_line_0(ecb->opt, context, reset, '\\',
-   nneof, strlen(nneof));
-   }
+   if (!endp)
+   emit_diff_symbol(ecb->opt, DIFF_SYMBOL_NO_LF_EOF, NULL, 0);
 }
 
 static void emit_rewrite_diff(const char *name_a,
-- 
2.12.2.575.gb14f27f917



[PATCHv2 05/25] diff.c: emit_diff_symbol learns DIFF_SYMBOL_CONTEXT_MARKER

2017-06-22 Thread Stefan Beller
Signed-off-by: Stefan Beller 
---
 diff.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/diff.c b/diff.c
index 2257d44e2c..c550f75195 100644
--- a/diff.c
+++ b/diff.c
@@ -560,13 +560,20 @@ static void emit_line(struct diff_options *o, const char 
*set, const char *reset
 }
 
 enum diff_symbol {
+   DIFF_SYMBOL_CONTEXT_MARKER,
DIFF_SYMBOL_SEPARATOR
 };
 
 static void emit_diff_symbol(struct diff_options *o, enum diff_symbol s,
 const char *line, int len)
 {
+   const char *context, *reset;
switch (s) {
+   case DIFF_SYMBOL_CONTEXT_MARKER:
+   context = diff_get_color_opt(o, DIFF_CONTEXT);
+   reset = diff_get_color_opt(o, DIFF_RESET);
+   emit_line(o, context, reset, line, len);
+   break;
case DIFF_SYMBOL_SEPARATOR:
fprintf(o->file, "%s%c",
diff_line_prefix(o),
@@ -661,7 +668,8 @@ static void emit_hunk_header(struct emit_callback *ecbdata,
if (len < 10 ||
memcmp(line, atat, 2) ||
!(ep = memmem(line + 2, len - 2, atat, 2))) {
-   emit_line(ecbdata->opt, context, reset, line, len);
+   emit_diff_symbol(ecbdata->opt,
+DIFF_SYMBOL_CONTEXT_MARKER, line, len);
return;
}
ep += 2; /* skip over @@ */
-- 
2.12.2.575.gb14f27f917



[PATCHv2 03/25] diff.c: factor out diff_flush_patch_all_file_pairs

2017-06-22 Thread Stefan Beller
In a later patch we want to do more things before and after all filepairs
are flushed. So factor flushing out all file pairs into its own function
that the new code can be plugged in easily.

Signed-off-by: Stefan Beller 
Signed-off-by: Junio C Hamano 
---
 diff.c | 17 -
 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/diff.c b/diff.c
index c2ed605cd0..2f9722b382 100644
--- a/diff.c
+++ b/diff.c
@@ -4737,6 +4737,17 @@ void diff_warn_rename_limit(const char *varname, int 
needed, int degraded_cc)
warning(_(rename_limit_advice), varname, needed);
 }
 
+static void diff_flush_patch_all_file_pairs(struct diff_options *o)
+{
+   int i;
+   struct diff_queue_struct *q = _queued_diff;
+   for (i = 0; i < q->nr; i++) {
+   struct diff_filepair *p = q->queue[i];
+   if (check_pair_status(p))
+   diff_flush_patch(p, o);
+   }
+}
+
 void diff_flush(struct diff_options *options)
 {
struct diff_queue_struct *q = _queued_diff;
@@ -4831,11 +4842,7 @@ void diff_flush(struct diff_options *options)
}
}
 
-   for (i = 0; i < q->nr; i++) {
-   struct diff_filepair *p = q->queue[i];
-   if (check_pair_status(p))
-   diff_flush_patch(p, options);
-   }
+   diff_flush_patch_all_file_pairs(options);
}
 
if (output_format & DIFF_FORMAT_CALLBACK)
-- 
2.12.2.575.gb14f27f917



Re: [PATCH 11/26] diff.c: emit_diff_symbol learns DIFF_SYMBOL_FILEPAIR

2017-06-22 Thread Stefan Beller
On Wed, Jun 21, 2017 at 1:09 PM, Junio C Hamano  wrote:
> Jonathan Tan  writes:
>
>> On Mon, 19 Jun 2017 19:48:01 -0700
>> Stefan Beller  wrote:
>>
>>> @@ -676,6 +677,14 @@ static void emit_diff_symbol(struct diff_options *o, 
>>> enum diff_symbol s,
>>>  }
>>>  emit_line(o, context, reset, line, len);
>>>  break;
>>> +case DIFF_SYMBOL_FILEPAIR:
>>> +meta = diff_get_color_opt(o, DIFF_METAINFO);
>>> +reset = diff_get_color_opt(o, DIFF_RESET);
>>> +fprintf(o->file, "%s%s%s%s%s%s\n", diff_line_prefix(o), meta,
>>> +flags ? "+++ " : "--- ",
>>
>> I think that a constant should be defined for this flag, just like for
>> content lines. If not it is not immediately obvious that a flag should
>> be set for the +++ lines.
>
> True.
>
>>> +line, reset,
>>> +strchr(line, ' ') ? "\t" : "");
>>> +break;
>
> Also this is somewhat misnamed in that a call to it only deals with
> one half of a filepair, iow, the caller must call it twice to show a
> single filepair.

Split into two and renamed to DIFF_SYMBOL_FILEPAIR_PLUS
and DIFF_SYMBOL_FILEPAIR_MINUS.

Thanks,
Stefan


Re: [PATCH 08/26] diff.c: migrate emit_line_checked to use emit_diff_symbol

2017-06-22 Thread Stefan Beller
On Thu, Jun 22, 2017 at 4:30 PM, Stefan Beller  wrote:
> On Wed, Jun 21, 2017 at 1:05 PM, Junio C Hamano  wrote:
>> Stefan Beller  writes:
>>
>>>  static void emit_add_line(const char *reset,
>>> struct emit_callback *ecbdata,
>>> const char *line, int len)
>>>  {
>>> - emit_line_checked(reset, ecbdata, line, len,
>>> -   DIFF_FILE_NEW, WSEH_NEW, '+');
>>> + unsigned flags = WSEH_NEW | ecbdata->ws_rule;
>>> + if (new_blank_line_at_eof(ecbdata, line, len))
>>> + flags |= DIFF_SYMBOL_CONTENT_BLANK_LINE_EOF;
>>> +
>>> + emit_diff_symbol(ecbdata->opt, DIFF_SYMBOL_PLUS, line, len, flags);
>>>  }
>>
>> This is a bit unsatisfactory that the original code didn't even have
>> to make a call to new_blank_line_at_eof() at all when we know we are
>> not checking for / coloring whitespace errors, but the function is
>> called unconditionally in the new code.
>
> We could check if we do colored output here, I think'll just do that for now,
> but on the other hand this becomes a maintenance nightmare when we
> rely on these flags in the future e.g. for "machine parse-able coloring"
> and would markup according to the flags strictly. I guess we can fix it then.

Actually that function already has some quick return:

static int new_blank_line_at_eof(struct emit_callback *ecbdata, const
char *line, int len)
{
if (!((ecbdata->ws_rule & WS_BLANK_AT_EOF) &&
 ecbdata->blank_at_eof_in_preimage &&
 ecbdata->blank_at_eof_in_postimage &&
 ecbdata->blank_at_eof_in_preimage <= ecbdata->lno_in_preimage &&
 ecbdata->blank_at_eof_in_postimage <= ecbdata->lno_in_postimage))
  return 0;
return ws_blank_line(line, len, ecbdata->ws_rule);
}

so maybe we could 'inline' it, as there is no reasonable faster check
than what we
have in there.

I'll keep it 'a bit unsatisfactory' here and if it actually is a
performance issue, we
can fix it.


Re: [PATCH 08/26] diff.c: migrate emit_line_checked to use emit_diff_symbol

2017-06-22 Thread Stefan Beller
On Wed, Jun 21, 2017 at 1:05 PM, Junio C Hamano  wrote:
> Stefan Beller  writes:
>
>>  static void emit_add_line(const char *reset,
>> struct emit_callback *ecbdata,
>> const char *line, int len)
>>  {
>> - emit_line_checked(reset, ecbdata, line, len,
>> -   DIFF_FILE_NEW, WSEH_NEW, '+');
>> + unsigned flags = WSEH_NEW | ecbdata->ws_rule;
>> + if (new_blank_line_at_eof(ecbdata, line, len))
>> + flags |= DIFF_SYMBOL_CONTENT_BLANK_LINE_EOF;
>> +
>> + emit_diff_symbol(ecbdata->opt, DIFF_SYMBOL_PLUS, line, len, flags);
>>  }
>
> This is a bit unsatisfactory that the original code didn't even have
> to make a call to new_blank_line_at_eof() at all when we know we are
> not checking for / coloring whitespace errors, but the function is
> called unconditionally in the new code.

We could check if we do colored output here, I think'll just do that for now,
but on the other hand this becomes a maintenance nightmare when we
rely on these flags in the future e.g. for "machine parse-able coloring"
and would markup according to the flags strictly. I guess we can fix it then.

>
>> diff --git a/diff.h b/diff.h
>> index 5be1ee77a7..8483ca0991 100644
>> --- a/diff.h
>> +++ b/diff.h
>> @@ -148,9 +148,9 @@ struct diff_options {
>>   int abbrev;
>>   int ita_invisible_in_index;
>>  /* white-space error highlighting */
>> -#define WSEH_NEW 1
>> -#define WSEH_CONTEXT 2
>> -#define WSEH_OLD 4
>> +#define WSEH_NEW (1<<12)
>> +#define WSEH_CONTEXT (1<<13)
>> +#define WSEH_OLD (1<<14)
>
> Hopefully this is a safe conversion, as everything should come from
> parse_ws_error_highlight() that uses these constants.

I reviewed the conversion again and I think it is safe, still.

We could do the up and downshifting for the flags to keep these
as-is, but I am not sure if that is a good idea, the up and downshifting
is not just run time complexity but also a readability issue.

>
> But this is brittle; there must be some hint to future readers of
> this code why these bits begin at 12th (it is linked to 0 we saw
> earlier); otherwise when they change something that needs to widen
> that 0, they will forget to shift these up, which would screw
> up everything that is carefully laid out here.

I added a note there. and a BUG check at the beginning of the output.
Smart compilers will drop that and for dumb compilers we'll have an
extra check per commit, which is not too bad


Re: your mail

2017-06-22 Thread Jeff King
On Thu, Jun 22, 2017 at 11:58:08PM +0200, Ævar Arnfjörð Bjarmason wrote:

> Which, in the context of what this follows (how to submit a bug,
> questions etc.) isn't a good use of time for the person reading the
> instructions.
> 
> Maybe something more like:
> 
> diff --git a/README.md b/README.md
> index f17af66a97..dc175757fa 100644
> --- a/README.md
> +++ b/README.md
> @@ -36,6 +36,12 @@ the body to majord...@vger.kernel.org. The mailing list 
> archives are
>  available at ,
>   and other archival sites.
> 
> +You don't need to be subscribed to the list to send mail to it, and
> +others on-list will generally CC you when replying (although some
> +forget this). It's adviced to subscribe to the list if you want to be
> +sure you're not missing follow-up discussion, or if your interest in
> +the project is wider than a one-off bug report, question or patch.
> +
>  The maintainer frequently sends the "What's cooking" reports that
>  list the current status of various development topics to the mailing
>  list.  The discussion following them give a good reference for

You perhaps already read it, but you may want to steal wording or
suggestions from the mailing list section at:

  https://git-scm.com/community

which is covering the same ideas (and vice versa, patches to that page
are welcome if the README says something better).

-Peff


Re: [PATCH 0/5] Move unquote_path() from git-add--interactive.perl to Git.pm

2017-06-22 Thread Jeff King
On Thu, Jun 22, 2017 at 11:26:17AM +0100, Phillip Wood wrote:

> From: Phillip Wood 
> 
> I'm using this in some scripts and it would be more convenient to have
> it available from Git.pm rather than copying and pasting it each time
> I need it. I think it should be useful to other people using Git.pm as
> well. It is not uncommon to get a quoted path back from a command that
> needs to be passed on the commandline of another command. While one
> can use -z in many cases, that leaves the problem of having to quote
> the path when printing it in error messages etc.

Grepping around the calls to unquote_path in add--interactive, I
definitely think many of them ought to be using "-z". But I don't think
that's a reason not to make unquote_path() more widely available. It
_is_ generally useful.

The changes look sane to me. My biggest question is how add--interactive
handles the exceptions thrown by the refactored function on error. Since
these paths are coming from Git, it should be something never comes up,
right? So failing hard is probably the best thing to do.

-Peff


Re: [BUG] add_again() off-by-one error in custom format

2017-06-22 Thread Jeff King
On Thu, Jun 22, 2017 at 08:19:40PM +0200, René Scharfe wrote:

> > I'd be OK with keeping it if we could reduce the number of magic
> > numbers. E.g,. rather than 32 elsewhere use:
> > 
> >(sizeof(*loose_objects_subdir_bitmap) * CHAR_BIT)
> 
> We have a bitsizeof macro for that.
> 
> > and similarly rather than 8 here use
> > 
> >256 / sizeof(*loose_objects_subdir_bitmap) / CHAR_BIT
> 
> If we're pretending not to know the number of bits in a byte then we
> need to round up, and we have DIV_ROUND_UP for that. :)

Thanks, I meant to mention the rounding thing but forgot. I didn't know
about either of those macros, though.

> There is another example of a bitmap in shallow_c (just search for
> "32").  It would benefit from the macros mentioned above.  That
> might make it easier to switch to the native word size (unsigned int)
> instead of using uint32_t everywhere.
> 
> But perhaps this one is actually a candidate for using EWAH, depending
> on the number of refs the code is supposed to handle.

I thought at first you meant real EWAH bitmaps. But those aren't random
access (so you have to read them into an uncompressed bitmap in memory,
which is why ewah/bitmap.c exists). But if you just mean reusing those
bitmaps, then yeah, possibly.

It looks like it's using a commit slab, though. If you know the number
of bits you need up front, then the commit slab can do it without any
waste. I didn't dig enough to see if that's what it's doing or not.

-Peff


Re: [PATCH] sha1_name: cache readdir(3) results in find_short_object_filename()

2017-06-22 Thread Jeff King
On Thu, Jun 22, 2017 at 08:19:48PM +0200, René Scharfe wrote:

> Read each loose object subdirectory at most once when looking for unique
> abbreviated hashes.  This speeds up commands like "git log --pretty=%h"
> considerably, which previously caused one readdir(3) call for each
> candidate, even for subdirectories that were visited before.

Is it worth adding a perf test that's heavy on abbreviations?

> The new cache is kept until the program ends and never invalidated.  The
> same is already true for pack indexes.  The inherent racy nature of
> finding unique short hashes makes it still fit for this purpose -- a
> conflicting new object may be added at any time.  Tasks with higher
> consistency requirements should not use it, though.
> 
> The cached object names are stored in an oid_array, which is quite
> compact.  The bitmap for remembering which subdir was already read is
> stored as a char array, with one char per directory -- that's not quite
> as compact, but really simple and incurs only an overhead equivalent to
> 11 hashes after all.

Sounds reasonable. The code itself looks good, with one minor nit below.

> @@ -1593,6 +1594,16 @@ extern struct alternate_object_database {
>   struct strbuf scratch;
>   size_t base_len;
>  
> + /*
> +  * Used to store the results of readdir(3) calls when searching
> +  * for unique abbreviated hashes.  This cache is never
> +  * invalidated, thus it's racy and not necessarily accurate.
> +  * That's fine for its purpose; don't use it for tasks requiring
> +  * greater accuracy!
> +  */
> + char loose_objects_subdir_seen[256];
> + struct oid_array loose_objects_cache;
> +

Thanks for adding this comment.

> @@ -1811,6 +1822,12 @@ typedef int each_loose_cruft_fn(const char *basename,
>  typedef int each_loose_subdir_fn(int nr,
>const char *path,
>void *data);
> +int for_each_file_in_obj_subdir(int subdir_nr,
> + struct strbuf *path,
> + each_loose_object_fn obj_cb,
> + each_loose_cruft_fn cruft_cb,
> + each_loose_subdir_fn subdir_cb,
> + void *data);

Now that this is becoming public, I think we need to document what
should be in "path" here. It is the complete path, including the 2-hex
subdir.

That's redundant with "subdir_nr". Should we push that logic down into
the function, and basically do:

  if (subdir_nr < 0 || subdir_nr > 256)
BUG("object subdir number out of range");
  orig_len = path->len;
  strbuf_addf(path, "/%02x", subdir_nr);
  baselen = path->len;

  ...

  strbuf_setlen(path, orig_len);

That's one less thing for the caller to worry about, and it's easy to
explain the argument as "the path to the object directory root".

> + if (!alt->loose_objects_subdir_seen[subdir_nr]) {
> + struct strbuf *buf = alt_scratch_buf(alt);
> + strbuf_addf(buf, "%02x/", subdir_nr);
> + for_each_file_in_obj_subdir(subdir_nr, buf,
> + append_loose_object,
> + NULL, NULL,
> + >loose_objects_cache);

I think the trailing slash here is superfluous. If you take my
suggestion above, that line goes away. But then the front slash it adds
becomes superfluous. It should probably just do:

  strbuf_complete(path, '/');
  strbuf_addf(path, "%02x", subdir_nr);

-Peff


Re: [PATCH 2/3] t1301: move movebits() to test-lib-functions.sh

2017-06-22 Thread Ramsay Jones


On 22/06/17 20:52, Junio C Hamano wrote:
> Christian Couder  writes:
> 
>> As the movebits() function can be useful outside t1301,
>> let's move it into test-lib-functions.sh, and while at
>> it let's rename it test_movebits().
> 
> Good thinking, especially on the renaming.

Err, except for the commit message! :-D

Both the commit message subject and the commit message body
refer to _move_bits() rather than _mode_bits() etc.
(So, three instances of s/move/mode/).

ATB,
Ramsay Jones




Re: What's cooking in git.git (Jun 2017, #06; Thu, 22)

2017-06-22 Thread Ævar Arnfjörð Bjarmason

On Thu, Jun 22 2017, Junio C. Hamano jotted:

> * sd/branch-copy (2017-06-18) 3 commits
>  - branch: add a --copy (-c) option to go with --move (-m)
>  - branch: add test for -m renaming multiple config sections
>  - config: create a function to format section headers
>
>  "git branch" learned "-c/-C" to create and switch to a new branch
>  by copying an existing one.
>
>  Has a bit of interaction with two other topics, so perhaps needs to
>  wait for them to stabilize a bit more.

What topics are those? Didn't see any outright conflicts, but might have
missed something. Anything Sahil & I can help with?

> * ab/sha1dc (2017-06-07) 2 commits
>  - sha1collisiondetection: automatically enable when submodule is populated
>  - sha1dc: optionally use sha1collisiondetection as a submodule
>
>  The "collission-detecting" implementation of SHA-1 hash we borrowed
>  from is replaced by directly binding the upstream project as our
>  submodule.
>
>  Will keep in 'pu'.
>  Impact to the various build and release infrastructure of using
>  submodule is not yet fully known, but this lets us dip our toes.

I'm in no rush to get this in. I just submitted it initially as a
"submodules make sense here, and in git.git, yay!"

But it's been 1 month kicking around in pu now. What are we gaining from
keeping it there even longer at this point?

> * bp/fsmonitor (2017-06-12) 6 commits
>  - fsmonitor: add a sample query-fsmonitor hook script for Watchman
>  - fsmonitor: add documentation for the fsmonitor extension.
>  - fsmonitor: add test cases for fsmonitor extension
>  - fsmonitor: teach git to optionally utilize a file system monitor to speed 
> up detecting new or changed files.
>  - dir: make lookup_untracked() available outside of dir.c
>  - bswap: add 64 bit endianness helper get_be64
>
>  We learned to talk to watchman to speed up "git status".
>
>  Waiting for discussion to settle.

Been meaning to do my part to follow-up on this one, sorry about the
delay.

> * xz/send-email-batch-size (2017-05-23) 1 commit
>  - send-email: --batch-size to work around some SMTP server limit
>
>  "git send-email" learned to overcome some SMTP server limitation
>  that does not allow many pieces of e-mails to be sent over a single
>  session.
>
>  Waiting for response.
>  cf. 

I think between <7993e188.d18d.15c3560bcaf.coremail.zxq_yx_...@163.com>
& <20170523103050.1f7ab7e0@jvn> we have sufficient info about what bug
this solves to try an alternate approach at some other time.

I think it makes sense to merge this down.


What's cooking in git.git (Jun 2017, #06; Thu, 22)

2017-06-22 Thread Junio C Hamano
Here are the topics that have been cooking.  Commits prefixed with
'-' are only in 'pu' (proposed updates) while commits prefixed with
'+' are in 'next'.  The ones marked with '.' do not appear in any of
the integration branches, but I am still holding onto them.

You can find the changes described here in the integration branches
of the repositories listed at

http://git-blame.blogspot.com/p/git-public-repositories.html

--
[Graduated to "master"]

* ah/filter-branch-setup (2017-06-12) 2 commits
  (merged to 'next' on 2017-06-19 at f3440f2c1a)
 + filter-branch: add [--] to usage
 + filter-branch: add `--setup` step

 "filter-branch" learned a pseudo filter "--setup" that can be used
 to define a common function/variable that can be used by other
 filters.


* km/test-mailinfo-b-failure (2017-06-12) 1 commit
  (merged to 'next' on 2017-06-19 at badc2c2337)
 + t5100: add some more mailinfo tests

 New tests.


* ls/github (2017-06-13) 1 commit
  (merged to 'next' on 2017-06-19 at 4d2024615f)
 + Configure Git contribution guidelines for github.com

 Help contributors that visit us at GitHub.


* mh/fast-import-raise-default-depth (2017-06-12) 1 commit
  (merged to 'next' on 2017-06-19 at 7093c07b8e)
 + fast-import: increase the default pack depth to 50

 "fast-import" uses a default pack chain depth that is consistent
 with other parts of the system.


* nd/fopen-errors (2017-06-15) 1 commit
  (merged to 'next' on 2017-06-15 at 86bcb7c082)
 + configure.ac: loosen FREAD_READS_DIRECTORIES test program

 Hotfix for a topic that is already in 'master'.


* pc/dir-count-slashes (2017-06-12) 1 commit
  (merged to 'next' on 2017-06-19 at 57351a2771)
 + dir: create function count_slashes()

 Three instances of the same helper function have been consolidated
 to one.


* ps/stash-push-pathspec-fix (2017-06-13) 1 commit
  (merged to 'next' on 2017-06-19 at 866c9035e0)
 + git-stash: fix pushing stash with pathspec from subdir

 "git stash push " did not work from a subdirectory at all.
 Bugfix for a topic in v2.13


* rs/strbuf-addftime-zZ (2017-06-15) 3 commits
  (merged to 'next' on 2017-06-19 at 77480669f0)
 + date: use localtime() for "-local" time formats
 + t0006: check --date=format zone offsets
 + strbuf: let strbuf_addftime handle %z and %Z itself

 As there is no portable way to pass timezone information to
 strftime, some output format from "git log" and friends are
 impossible to produce.  Teach our own strbuf_addftime to replace %z
 and %Z with caller-supplied values to help working around this.


* sb/t4005-modernize (2017-06-10) 1 commit
  (merged to 'next' on 2017-06-19 at beedeb757b)
 + t4005: modernize style and drop hard coded sha1

 Test clean-up.


* sd/t3200-branch-m-test (2017-06-13) 1 commit
  (merged to 'next' on 2017-06-19 at 0fd712c46e)
 + t3200: add test for single parameter passed to -m option

 New test.


* sg/revision-parser-skip-prefix (2017-06-12) 5 commits
  (merged to 'next' on 2017-06-19 at 0a90bec767)
 + revision.c: use skip_prefix() in handle_revision_pseudo_opt()
 + revision.c: use skip_prefix() in handle_revision_opt()
 + revision.c: stricter parsing of '--early-output'
 + revision.c: stricter parsing of '--no-{min,max}-parents'
 + revision.h: turn rev_info.early_output back into an unsigned int

 Code clean-up.

--
[New Topics]

* mh/packed-ref-store-prep-extra (2017-06-18) 1 commit
 - prefix_ref_iterator_advance(): relax the check of trim length
 (this branch uses mh/packed-ref-store-prep; is tangled with 
mh/packed-ref-store.)

 Split out of mh/packed-ref-store-prep; will drop.


* ab/die-errors-in-threaded (2017-06-21) 1 commit
 - die(): stop hiding errors due to overzealous recursion guard

 Traditionally, the default die() routine had a code to prevent it
 from getting called multiple times, which interacted badly when a
 threaded program used it (one downside is that the real error may
 be hidden and instead the only error message given to the user may
 end up being "die recursion detected", which is not very useful).

 Will merge to 'next'.


* ab/pcre-v2 (2017-06-21) 1 commit
  (merged to 'next' on 2017-06-21 at fb6320213c)
 + grep: fix erroneously copy/pasted variable in check/assert pattern

 Hotfix for a topic already in 'master'.

 Will merge to 'master'.


* bw/repo-object (2017-06-22) 21 commits
 - ls-files: use repository object
 - repository: enable initialization of submodules
 - submodule: convert is_submodule_initialized to work on a repository
 - submodule: add repo_read_gitmodules
 - submodule-config: store the_submodule_cache in the_repository
 - repository: add index_state to struct repo
 - config: read config from a repository object
 - path: add repo_worktree_path and strbuf_repo_worktree_path
 - path: add repo_git_path and strbuf_repo_git_path
 - path: worktree_git_path() should not use file relocation
 - path: convert do_git_path to take a 'struct 

Re: Truncating HEAD reflog on branch move

2017-06-22 Thread Jeff King
On Thu, Jun 22, 2017 at 05:52:35PM -0400, Jeff King wrote:

> Which really makes me feel like this patch is going in the right
> direction, as it makes all of this behave conceptually like:
> 
>   while read old new etc ...
>   do
> git show $new
>   done <.git/logs/$ref
> 
> which is simple to explain and is what I'd expect (and is certainly the
> direction we went with the "diff uses real parents" commit).
> 
> We just need to hit the simplify_commit() code path here.

So here's a patch on top of what I posted before that pushes the reflog
check into the loop (it just decides whether to pull from the reflogs or
from the commit queue at the top of the loop).

I was surprised to find, though, that simplify_commit() does not
actually do the pathspec limiting! It's done by
try_to_simplify_commit(), which is part of add_parents_to_list(). I
hacked around it in the later part of the loop by calling
try_to_simplify myself and checking the TREESAME flag. But I have a
feeling I'm missing something about how the traversal is supposed to
work.

This does behave sensibly with "--no-merges" and "--merges", as well as
pathspec limiting.

diff --git a/revision.c b/revision.c
index 675247cd9..203468ddf 100644
--- a/revision.c
+++ b/revision.c
@@ -3112,19 +3112,19 @@ static void track_linear(struct rev_info *revs, struct 
commit *commit)
 
 static struct commit *get_revision_1(struct rev_info *revs)
 {
-   if (revs->reflog_info) {
-   struct commit *commit = next_reflog_entry(revs->reflog_info);
-   if (commit) {
-   commit->object.flags &= ~(ADDED | SEEN | SHOWN);
-   return commit;
-   }
-   }
+   while (1) {
+   struct commit *commit;
 
-   if (!revs->commits)
-   return NULL;
+   if (revs->reflog_info)
+   commit = next_reflog_entry(revs->reflog_info);
+   else
+   commit = pop_commit(>commits);
 
-   do {
-   struct commit *commit = pop_commit(>commits);
+   if (!commit)
+   return NULL;
+
+   if (revs->reflog_info)
+   commit->object.flags &= ~(ADDED | SEEN | SHOWN);
 
/*
 * If we haven't done the list limiting, we need to look at
@@ -3135,7 +3135,8 @@ static struct commit *get_revision_1(struct rev_info 
*revs)
if (revs->max_age != -1 &&
(commit->date < revs->max_age))
continue;
-   if (add_parents_to_list(revs, commit, >commits, 
NULL) < 0) {
+   if (!revs->reflog_info &&
+   add_parents_to_list(revs, commit, >commits, 
NULL) < 0) {
if (!revs->ignore_missing_links)
die("Failed to traverse parents of 
commit %s",

oid_to_hex(>object.oid));
@@ -3151,10 +3152,14 @@ static struct commit *get_revision_1(struct rev_info 
*revs)
default:
if (revs->track_linear)
track_linear(revs, commit);
+   if (revs->reflog_info) {
+   try_to_simplify_commit(revs, commit);
+   if (commit->object.flags & TREESAME)
+   continue;
+   }
return commit;
}
-   } while (revs->commits);
-   return NULL;
+   }
 }
 
 /*


Re: [PATCHv5 1/2] clone: respect additional configured fetch refspecs during initial fetch

2017-06-22 Thread Junio C Hamano
Jeff King  writes:

> I'd still prefer this to have:
>
>   if (!remote->fetch && remote->fetch_refspec_nr)
>   BUG("attempt to add refspec to uninitialized list");
>
> at the top, as otherwise this case writes garbage into remote->fetch[0].
>
> I see you have another series dealing with the lazy parsing, but I
> haven't looked at it yet (hopefully this danger would just go away after
> that).
>
> Other than that, the patch looks fine to me.

SZEDER?  As long as the end result together with two series are
safe, I do not have a strong preference, but given that the other
one is a lot more invasive change [*1*], I think it is nicer to have
this two-patch series already safe without the other one.

What's your take on Peff's point?


[Footnote]

*1* Especially the other branch does not merge cleanly into 'pu' and
I haven't managed to include it in my tree yet.


Re: your mail

2017-06-22 Thread Junio C Hamano
Ævar Arnfjörð Bjarmason  writes:

> Maybe something more like:

Much better.

> diff --git a/README.md b/README.md
> index f17af66a97..dc175757fa 100644
> --- a/README.md
> +++ b/README.md
> @@ -36,6 +36,12 @@ the body to majord...@vger.kernel.org. The mailing list 
> archives are
>  available at ,
>   and other archival sites.
>
> +You don't need to be subscribed to the list to send mail to it, and
> +others on-list will generally CC you when replying (although some
> +forget this). It's adviced to subscribe to the list if you want to be
> +sure you're not missing follow-up discussion, or if your interest in
> +the project is wider than a one-off bug report, question or patch.
> +
>  The maintainer frequently sends the "What's cooking" reports that
>  list the current status of various development topics to the mailing
>  list.  The discussion following them give a good reference for


Re: [PATCH 2/3] pack-objects: WIP add max-blob-size filtering

2017-06-22 Thread Junio C Hamano
Jonathan Tan  writes:

> On Thu, 22 Jun 2017 20:36:14 +
> Jeff Hostetler  wrote:
>
>> +static signed long max_blob_size = -1;
>
> FYI Junio suggested "blob-max-bytes" when he looked at my patch [1].
>
> [1] https://public-inbox.org/git/xmqqmv9ryoym@gitster.mtv.corp.google.com/

To give credit to where credit is due, I learned this from akpm.

When you are tempted to say "size", "length", "weight", etc., if you
think of a way to phrase it with a more concrete unit, you'd end up
with a much less ambiguous name.  People can imagine max_blob_size
is counted in kilobytes, or megabytes, or whatever.  There is no
room in max_blob_bytes to be interpreted in multiple ways.


Re: [PATCH 1/3] list-objects: add filter_blob to traverse_commit_list

2017-06-22 Thread Jonathan Tan
On Thu, 22 Jun 2017 14:45:26 -0700
Jonathan Tan  wrote:

> On Thu, 22 Jun 2017 20:36:13 +
> Jeff Hostetler  wrote:
> 
> > From: Jeff Hostetler 
> > 
> > In preparation for partial/sparse clone/fetch where the
> > server is allowed to omit large/all blobs from the packfile,
> > teach traverse_commit_list() to take a blob filter-proc that
> > controls when blobs are shown and marked as SEEN.
> > 
> > Normally, traverse_commit_list() always marks visited blobs
> > as SEEN, so that the show_object() callback will never see
> > duplicates.  Since a single blob OID may be referenced by
> > multiple pathnames, we may not be able to decide if a blob
> > should be excluded until later pathnames have been traversed.
> > With the filter-proc, the automatic deduping is disabled.
> 
> Comparing this approach (this patch and the next one) against mine [1],
> I see that this has the advantage of (in pack-objects) avoiding the
> invocation of add_preferred_base_object() on blobs that are filtered
> out, avoiding polluting the "to_pack" data structure with information
> that we do not need.
> 
> But it does add an extra place where blobs are filtered out (besides
> add_object_entry()), which makes it harder for the reader to keep track
> of what's going on. I took a brief look to see if filtering could be
> eliminated from add_object_entry(), but that function is called from
> many places, so I couldn't tell.
> 
> Anyway, I think both approaches will work (this patch's and mine [1]).
> 
> [1] 
> https://public-inbox.org/git/6f7934621717141ce3bb6bc05cf1d59c7900ccc5.1496432147.git.jonathanta...@google.com/

Also it should be mentioned somewhere that this does not cover the
bitmap case - a similar discussion should be included in one of the
patches, like I did in [1].

And looking back at my original cover letter [2], I wrote:

> This is similar to [1] except that this
[...]
> is slightly more comprehensive in
> that the read_object_list_from_stdin() codepath is also covered in
> addition to the get_object_list() codepath. (Although, to be clear,
> upload-pack always passes "--revs" and thus only needs the
> get_object_list() codepath).

(The [1] in the quote above refers to one of Jeff Hostetler's patches,
[QUOTE 1].)

The same issue applies to this patch (since
read_object_list_from_stdin() invokes add_object_entry() directly
without going through the traversal mechanism) and probably warrants at
least some description in one of the commit messages.

[1] 
https://public-inbox.org/git/6f7934621717141ce3bb6bc05cf1d59c7900ccc5.1496432147.git.jonathanta...@google.com/
[2] https://public-inbox.org/git/cover.1496361873.git.jonathanta...@google.com/

[QUOTE 1] 
https://public-inbox.org/git/1488994685-37403-3-git-send-email-jeffh...@microsoft.com/


Re: [PATCH v4 15/20] repository: add index_state to struct repo

2017-06-22 Thread Junio C Hamano
Brandon Williams  writes:

> That makes sense.  While at it, would it make sense to ensure that the
> 'struct index_state *' which is stored in 'the_repository.index' be
> '_index'?

I was imagining (read: speculating one possible future, without
thinking things through to judge if it makes sense at all) that
conceptually most of the current API that works on things in the
current repository to be implicitly working on the_repository
singleton instance in the "repository object" world.

When that happens, the_index can become a CPP macro that translates
to "the_repository.index", I would think.



Re: [PATCH] name-rev: Fix tag lookup on repository with mixed types of tags

2017-06-22 Thread Junio C Hamano
Orgad Shaneh  writes:

> Commit 7550424804 (name-rev: include taggerdate in considering the best
> name) introduced a bug in name-rev.
>
> If a repository has both annotated and non-annotated tags, annotated
> tag will always win, even if it was created decades after the commit.

Thanks.  It is a problem that light-weight tags are unduly
penalized, and we attempted to address it a few months ago in a
slightly different way.  The necessary changes are already in
'master' but not yet in any released branch.

Here is an entry in the Release Notes for the next release that will
come out of the current 'master'.

 * "git describe --contains" penalized light-weight tags so much that
   they were almost never considered.  Instead, give them about the
   same chance to be considered as an annotated tag that is the same
   age as the underlying commit would.
   (merge ef1e74065c jc/name-rev-lw-tag later to maint).



Re: your mail

2017-06-22 Thread Ævar Arnfjörð Bjarmason

On Thu, Jun 22 2017, Junio C. Hamano jotted:

> Simon Ruderich  writes:
>
>> On Thu, Jun 22, 2017 at 01:55:27PM +, Patrick Lehmann wrote:
>>> The description on https://github.com/git/git doesn't reflect that policy.
>>>
>>> a)
>>> It explains that discussions take place in the mentioned mailing list.
>>> b)
>>> It describes how to subscribe.
>>
>> However it doesn't say that you have to subscribe to send, only
>> how to subscribe.
>
> For that matter, we also say "everyone is welcome to post", which
> makes it clear that no subscription is required.
>
> But I view these merely being technically correct.  And making it
> absolutely obvious does not cost too much.
>
>>> With knowledge of other mailing lists (mostly managed by mailman),
>>> subscription is required for participation.
>>
>> That depends on the mailing list, some require subscription to
>> prevent spams but not all do.
>
> Yes.  But not many people realize that the world they know is the
> only world.  We are used to an open list and are shocked when we
> encouter a closed one; let's not forget that shock.
>
> How about doing it like this?
>
>  README.md | 4 
>  1 file changed, 4 insertions(+)
>
> diff --git a/README.md b/README.md
> index f17af66a97..bbaf54bffb 100644
> --- a/README.md
> +++ b/README.md
> @@ -31,6 +31,10 @@ The user discussion and development of Git take place on 
> the Git
>  mailing list -- everyone is welcome to post bug reports, feature
>  requests, comments and patches to git@vger.kernel.org (read
>  [Documentation/SubmittingPatches][] for instructions on patch submission).
> +
> +You can send messages without subscribing to the list, but it is
> +recommended to read what other people are saying on the list before
> +you speak.
>  To subscribe to the list, send an email with just "subscribe git" in
>  the body to majord...@vger.kernel.org. The mailing list archives are
>  available at ,

It's unclear what that means. I *think* it means "consider taking a look
around the list before you post", but then it's probably better advice
to tell people to skim the archives first to get an idea of the traffic.

E.g. if I page through the first 2 pages of public-inbox.org I get
messages going back to the 19th, but if I were to subscribe to the list
I'd need to wait 4 days to get the same mail.

Which, in the context of what this follows (how to submit a bug,
questions etc.) isn't a good use of time for the person reading the
instructions.

Maybe something more like:

diff --git a/README.md b/README.md
index f17af66a97..dc175757fa 100644
--- a/README.md
+++ b/README.md
@@ -36,6 +36,12 @@ the body to majord...@vger.kernel.org. The mailing list 
archives are
 available at ,
  and other archival sites.

+You don't need to be subscribed to the list to send mail to it, and
+others on-list will generally CC you when replying (although some
+forget this). It's adviced to subscribe to the list if you want to be
+sure you're not missing follow-up discussion, or if your interest in
+the project is wider than a one-off bug report, question or patch.
+
 The maintainer frequently sends the "What's cooking" reports that
 list the current status of various development topics to the mailing
 list.  The discussion following them give a good reference for


Re: [PATCH 2/3] pack-objects: WIP add max-blob-size filtering

2017-06-22 Thread Jonathan Tan
On Thu, 22 Jun 2017 20:36:14 +
Jeff Hostetler  wrote:

> +static signed long max_blob_size = -1;

FYI Junio suggested "blob-max-bytes" when he looked at my patch [1].

[1] https://public-inbox.org/git/xmqqmv9ryoym@gitster.mtv.corp.google.com/

[snip]

> +/*
> + * Filter blobs by pathname or size.
> + * Return 1 to mark the blob SEEN so that it will not be reported again.
> + * Return 0 to allow it to be presented again.
> + */
> +static int filter_blob(
> + struct object *obj,
> + const char *pathname,
> + const char *entryname,
> + void *data)
> +{
> + assert(obj->type == OBJ_BLOB);
> + assert((obj->flags & SEEN) == 0);
> + assert((obj->flags & OBJECT_ADDED) == 0);
> + assert(max_blob_size >= 0);
> +
> + /*
> +  * Always include blobs for special files of the form ".git*".
> +  */
> + if ((strncmp(entryname, ".git", 4) == 0) && entryname[4]) {
> + if (obj->flags & BLOB_OMITTED) {
> + /*
> +  * TODO
> +  * TODO Remove this blob from the omitted blob list.
> +  * TODO
> +  */
> + obj->flags &= ~BLOB_OMITTED;
> + }
> + show_object(obj, pathname, data);
> + return 1;
> + }
> +
> + /*
> +  * We already know the blob is too big because it was previously
> +  * omitted.  We still don't want it yet.  DO NOT mark it SEEN
> +  * in case it is associated with a ".git*" path in another tree
> +  * or commit.
> +  */
> + if (obj->flags & BLOB_OMITTED)
> + return 0;
> +
> + /*
> +  * We only want blobs that are LESS THAN the maximum.
> +  * This allows zero to mean NO BLOBS.

That is not what maximum means...

This is the reason why I originally called it "limit", but after some
reflection, I decided that it is no big deal to always send zero-sized
blobs. The client must be able to handle the presence of blobs anyway
(because it will receive the ".git" ones), and excluding all blobs
regardless of size does not remove the necessity of obtaining their
sizes, since we need those sizes to put in the omitted blob list.

> +  */
> + if (max_blob_size > 0) {
> + unsigned long s;
> + enum object_type t = sha1_object_info(obj->oid.hash, );
> + assert(t == OBJ_BLOB);
> + if (s < max_blob_size) {
> + show_object(obj, pathname, data);
> + return 1;
> + }
> + }
> +
> + /*
> +  * TODO
> +  * TODO (Provisionally) add this blob to the omitted blob list.
> +  * TODO

As for the omitted blob list itself, you can see from my patch [2] that I
used a hashmap, but the decorate.h functionality might work too (I
haven't looked into that in detail though).

[2] 
https://public-inbox.org/git/6f7934621717141ce3bb6bc05cf1d59c7900ccc5.1496432147.git.jonathanta...@google.com/


Re: Truncating HEAD reflog on branch move

2017-06-22 Thread Jeff King
On Thu, Jun 22, 2017 at 01:32:44PM -0700, Junio C Hamano wrote:

> I do not think command line parser does not allow "log -g
> maint..master" so all the "limited" processing the remainder of
> get_revision_1() does shouldn't matter.

Yeah, I don't think negative endpoints work at all, and "foo...bar"
seems to also break (though with a confusing message). It seems clear to
me that multiple positive endpoints don't work well either, if they have
overlapping commits.

> I however think pathspec will affect simplify_commit() and suspect
> that "git log -g -20 HEAD path" will behave differently.  Perhaps
> the difference is "it used to use path in an unexplainable way, now
> it ignores", in which case this is an improvement.

The current behavior there does seem like nonsense, because it's based
on the fake parents. For instance, if I set up a simple two-branch case:

  commit() {
echo "$1" >"$1" && git add "$1" && git commit -m "$1"
  }

  git init repo
  cd repo
  commit base
  commit master
  git checkout -b side HEAD^
  commit side
  git merge --no-edit master
  commit combined

Then I get:

  $ git log -g --oneline --name-status -- master
  f06c3cd HEAD@{1}: merge master: Merge made by the 'recursive' strategy.
  5bf12c4 HEAD@{3}: checkout: moving from master to side
  dfa408b HEAD@{4}: commit: master
  A   master

Even though only one of those commits touched master. But with my patch,
it's also somewhat confusing. We ignore the pathspec when picking which
commits to show, but still apply it for diffing. So:

  03cf1ad HEAD@{0}: commit: combined
  f06c3cd HEAD@{1}: merge master: Merge made by the 'recursive' strategy.
  277042b HEAD@{2}: commit: side
  5bf12c4 HEAD@{3}: checkout: moving from master to side
  dfa408b HEAD@{4}: commit: master
  A   master
  5bf12c4 HEAD@{5}: commit (initial): base

I think we'd want to just omit any entries that are TREESAME to their
parents. We don't actually care about true parent rewriting (since we're
not walking the parents), but if it happened as a side effect that would
probably be OK.

It looks like simplify_commit() is also where we apply bits like
--no-merges, which doesn't work with my patch. It _also_ behaves
nonsensically in the current code, because of course the fake reflog
parents are never merges.

Which really makes me feel like this patch is going in the right
direction, as it makes all of this behave conceptually like:

  while read old new etc ...
  do
git show $new
  done <.git/logs/$ref

which is simple to explain and is what I'd expect (and is certainly the
direction we went with the "diff uses real parents" commit).

We just need to hit the simplify_commit() code path here.

-Peff


Re: [PATCH 1/3] list-objects: add filter_blob to traverse_commit_list

2017-06-22 Thread Jonathan Tan
On Thu, 22 Jun 2017 20:36:13 +
Jeff Hostetler  wrote:

> From: Jeff Hostetler 
> 
> In preparation for partial/sparse clone/fetch where the
> server is allowed to omit large/all blobs from the packfile,
> teach traverse_commit_list() to take a blob filter-proc that
> controls when blobs are shown and marked as SEEN.
> 
> Normally, traverse_commit_list() always marks visited blobs
> as SEEN, so that the show_object() callback will never see
> duplicates.  Since a single blob OID may be referenced by
> multiple pathnames, we may not be able to decide if a blob
> should be excluded until later pathnames have been traversed.
> With the filter-proc, the automatic deduping is disabled.

Comparing this approach (this patch and the next one) against mine [1],
I see that this has the advantage of (in pack-objects) avoiding the
invocation of add_preferred_base_object() on blobs that are filtered
out, avoiding polluting the "to_pack" data structure with information
that we do not need.

But it does add an extra place where blobs are filtered out (besides
add_object_entry()), which makes it harder for the reader to keep track
of what's going on. I took a brief look to see if filtering could be
eliminated from add_object_entry(), but that function is called from
many places, so I couldn't tell.

Anyway, I think both approaches will work (this patch's and mine [1]).

[1] 
https://public-inbox.org/git/6f7934621717141ce3bb6bc05cf1d59c7900ccc5.1496432147.git.jonathanta...@google.com/

[snip]

>  static void process_blob(struct rev_info *revs,
>struct blob *blob,
>show_object_fn show,
> +  filter_blob_fn filter_blob,
>struct strbuf *path,
>const char *name,
>void *cb_data)
> @@ -24,11 +25,28 @@ static void process_blob(struct rev_info *revs,
>   die("bad blob object");
>   if (obj->flags & (UNINTERESTING | SEEN))
>   return;
> - obj->flags |= SEEN;
>  
>   pathlen = path->len;
>   strbuf_addstr(path, name);
> - show(obj, path->buf, cb_data);
> + if (!filter_blob) {
> + /*
> +  * Normal processing is to imediately dedup blobs
> +  * during commit traversal, regardless of how many
> +  * times it appears in a single or multiple commits,
> +  * so we always set SEEN.
> +  */
> + obj->flags |= SEEN;
> + show(obj, path->buf, cb_data);
> + } else {
> + /*
> +  * Use the filter-proc to decide whether to show
> +  * the blob.  We only set SEEN if requested.  For
> +  * example, this could be used to omit a specific
> +  * blob until it appears with a ".git*" entryname.
> +  */
> + if (filter_blob(obj, path->buf, >buf[pathlen], cb_data))
> + obj->flags |= SEEN;
> + }
>   strbuf_setlen(path, pathlen);
>  }

After looking at this code again, I wonder if we should explicitly
document that SEEN will be set on the object before show(), and that
show() is allowed to unset SEEN if it wants traversal to process that
object again. That seems to accomplish what we want to accomplish
without this expansion of the API.

(This does mean that the next patch will need to call strrchr() itself,
since show() does not provide the basename of the file name.)

Having said that, if we keep with the current approach, I think there
should be a show() call after the invocation to filter_blob(). That is
much less surprising to me, and also allows us to remove some
show_object() invocations in the next patch.

> +void traverse_commit_list_filtered(
> + struct rev_info *revs,
> + show_commit_fn show_commit,
> + show_object_fn show_object,
> + filter_blob_fn filter_blob,
> + void *data)
> +{
>   int i;
>   struct commit *commit;
>   struct strbuf base;

Git style is to indent to the location after the first "(", I believe.
Likewise in the header file below.

>  typedef void (*show_commit_fn)(struct commit *, void *);
>  typedef void (*show_object_fn)(struct object *, const char *, void *);
> +typedef int  (*filter_blob_fn)(struct object *, const char *, const char *, 
> void *);

Can this have parameter names added (especially both the const char *
ones, otherwise indistinguishable) and, if necessary, documented?


[PATCH 0/3] wildmatch refactoring

2017-06-22 Thread Ævar Arnfjörð Bjarmason
The first patch here should be applied, but 2 & 3 trail along as RFCs
to show where this is going.

The RFC patches work, but I'm sure there'll be critiques of the
interface / other suggestions, so they're being sent as RFC so Junio
doesn't need to worry about picking them up / tracking them.

This is prep work for a local series of mine which compiles the
wildmatch pattern into JIT-ed PCRE v2 patterns. That may or may not
end up going anywhere, but as 2/3 notes just making these changes
makes it easier to optimize wildmatch() further down the road.

Ævar Arnfjörð Bjarmason (3):
  wildmatch: remove unused wildopts parameter
  wildmatch: add interface for precompiling wildmatch() patterns
  wildmatch: make use of the interface for precompiling wildmatch()
patterns

 apply.c   |  2 +-
 builtin/describe.c|  4 ++--
 builtin/ls-remote.c   |  2 +-
 builtin/name-rev.c|  6 +-
 builtin/reflog.c  |  2 +-
 builtin/replace.c |  7 ---
 builtin/show-branch.c |  2 +-
 config.c  |  8 ++--
 diffcore-order.c  |  2 +-
 dir.c | 20 +++-
 ref-filter.c  |  4 ++--
 refs.c|  7 ---
 revision.c|  2 +-
 t/helper/test-wildmatch.c |  6 +++---
 wildmatch.c   | 23 +--
 wildmatch.h   | 13 +
 16 files changed, 77 insertions(+), 33 deletions(-)

-- 
2.13.1.611.g7e3b11ae1



[PATCH 1/3] wildmatch: remove unused wildopts parameter

2017-06-22 Thread Ævar Arnfjörð Bjarmason
Remove the unused wildopts placeholder struct from being passed to all
wildmatch() invocations, or rather remove all the boilerplate NULL
parameters.

This parameter was added back in commit 9b3497cab9 ("wildmatch: rename
constants and update prototype", 2013-01-01) as a placeholder for
future use. Over 4 years later nothing has made use of it, let's just
remove it. It can be added in the future if we find some reason to
start using such a parameter.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 apply.c   | 2 +-
 builtin/describe.c| 4 ++--
 builtin/ls-remote.c   | 2 +-
 builtin/name-rev.c| 2 +-
 builtin/reflog.c  | 2 +-
 builtin/replace.c | 2 +-
 builtin/show-branch.c | 2 +-
 config.c  | 2 +-
 diffcore-order.c  | 2 +-
 dir.c | 8 +++-
 ref-filter.c  | 4 ++--
 refs.c| 2 +-
 revision.c| 2 +-
 t/helper/test-wildmatch.c | 6 +++---
 wildmatch.c   | 3 +--
 wildmatch.h   | 6 +-
 16 files changed, 22 insertions(+), 29 deletions(-)

diff --git a/apply.c b/apply.c
index 854faa6779..cc3278978a 100644
--- a/apply.c
+++ b/apply.c
@@ -2088,7 +2088,7 @@ static int use_patch(struct apply_state *state, struct 
patch *p)
/* See if it matches any of exclude/include rule */
for (i = 0; i < state->limit_by_name.nr; i++) {
struct string_list_item *it = >limit_by_name.items[i];
-   if (!wildmatch(it->string, pathname, 0, NULL))
+   if (!wildmatch(it->string, pathname, 0))
return (it->util != NULL);
}
 
diff --git a/builtin/describe.c b/builtin/describe.c
index 893c8789f4..2fe05f5324 100644
--- a/builtin/describe.c
+++ b/builtin/describe.c
@@ -142,7 +142,7 @@ static int get_name(const char *path, const struct 
object_id *oid, int flag, voi
return 0;
 
for_each_string_list_item(item, _patterns) {
-   if (!wildmatch(item->string, path + 10, 0, NULL))
+   if (!wildmatch(item->string, path + 10, 0))
return 0;
}
}
@@ -158,7 +158,7 @@ static int get_name(const char *path, const struct 
object_id *oid, int flag, voi
return 0;
 
for_each_string_list_item(item, ) {
-   if (!wildmatch(item->string, path + 10, 0, NULL))
+   if (!wildmatch(item->string, path + 10, 0))
break;
 
/* If we get here, no pattern matched. */
diff --git a/builtin/ls-remote.c b/builtin/ls-remote.c
index b2d7d5ce68..c4be98ab9e 100644
--- a/builtin/ls-remote.c
+++ b/builtin/ls-remote.c
@@ -24,7 +24,7 @@ static int tail_match(const char **pattern, const char *path)
 
pathbuf = xstrfmt("/%s", path);
while ((p = *(pattern++)) != NULL) {
-   if (!wildmatch(p, pathbuf, 0, NULL)) {
+   if (!wildmatch(p, pathbuf, 0)) {
free(pathbuf);
return 1;
}
diff --git a/builtin/name-rev.c b/builtin/name-rev.c
index 7fc7e66e85..d21a5002a7 100644
--- a/builtin/name-rev.c
+++ b/builtin/name-rev.c
@@ -129,7 +129,7 @@ static int subpath_matches(const char *path, const char 
*filter)
const char *subpath = path;
 
while (subpath) {
-   if (!wildmatch(filter, subpath, 0, NULL))
+   if (!wildmatch(filter, subpath, 0))
return subpath - path;
subpath = strchr(subpath, '/');
if (subpath)
diff --git a/builtin/reflog.c b/builtin/reflog.c
index 920c16dac0..96c82b4f5e 100644
--- a/builtin/reflog.c
+++ b/builtin/reflog.c
@@ -485,7 +485,7 @@ static void set_reflog_expiry_param(struct 
cmd_reflog_expire_cb *cb, int slot, c
return; /* both given explicitly -- nothing to tweak */
 
for (ent = reflog_expire_cfg; ent; ent = ent->next) {
-   if (!wildmatch(ent->pattern, ref, 0, NULL)) {
+   if (!wildmatch(ent->pattern, ref, 0)) {
if (!(slot & EXPIRE_TOTAL))
cb->expire_total = ent->expire_total;
if (!(slot & EXPIRE_UNREACH))
diff --git a/builtin/replace.c b/builtin/replace.c
index c921bc976f..89262df5a1 100644
--- a/builtin/replace.c
+++ b/builtin/replace.c
@@ -40,7 +40,7 @@ static int show_reference(const char *refname, const struct 
object_id *oid,
 {
struct show_data *data = cb_data;
 
-   if (!wildmatch(data->pattern, refname, 0, NULL)) {
+   if (!wildmatch(data->pattern, refname, 0)) {
if (data->format == REPLACE_FORMAT_SHORT)
printf("%s\n", refname);
else if (data->format == REPLACE_FORMAT_MEDIUM)
diff --git a/builtin/show-branch.c b/builtin/show-branch.c
index 

[RFC/PATCH 2/3] wildmatch: add interface for precompiling wildmatch() patterns

2017-06-22 Thread Ævar Arnfjörð Bjarmason
Add the scaffolding necessary for precompiling wildmatch()
patterns.

There is currently no point in doing this with the wildmatch()
function we have now, since it can't make any use of precompiling the
pattern.

But adding this interface and making use of it will make it easy to
refactor the wildmatch() function to e.g. parse the pattern into
opcode as the BSD glob() implementation does, or to drop an alternate
wildmatch() backend in which trades parsing slowness for faster
matching.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 wildmatch.c | 20 
 wildmatch.h |  9 +
 2 files changed, 29 insertions(+)

diff --git a/wildmatch.c b/wildmatch.c
index d074c1be10..ba6a92a393 100644
--- a/wildmatch.c
+++ b/wildmatch.c
@@ -276,3 +276,23 @@ int wildmatch(const char *pattern, const char *text, 
unsigned int flags)
 {
return dowild((const uchar*)pattern, (const uchar*)text, flags);
 }
+
+struct wildmatch_compiled *wildmatch_compile(const char *pattern, unsigned int 
flags)
+{
+   struct wildmatch_compiled *code = xmalloc(sizeof(struct 
wildmatch_compiled));
+   code->pattern = xstrdup(pattern);
+   code->flags = flags;
+
+   return code;
+}
+
+int wildmatch_match(struct wildmatch_compiled *code, const char *text)
+{
+   return wildmatch(code->pattern, text, code->flags);
+}
+
+void wildmatch_free(struct wildmatch_compiled *code)
+{
+   free((void *)code->pattern);
+   free(code);
+}
diff --git a/wildmatch.h b/wildmatch.h
index b8c826aa68..6156d46a33 100644
--- a/wildmatch.h
+++ b/wildmatch.h
@@ -10,5 +10,14 @@
 #define WM_ABORT_ALL -1
 #define WM_ABORT_TO_STARSTAR -2
 
+struct wildmatch_compiled {
+   const char *pattern;
+   unsigned int flags;
+};
+
 int wildmatch(const char *pattern, const char *text, unsigned int flags);
+struct wildmatch_compiled *wildmatch_compile(const char *pattern, unsigned int 
flags);
+int wildmatch_match(struct wildmatch_compiled *code, const char *text);
+void wildmatch_free(struct wildmatch_compiled *code);
+
 #endif
-- 
2.13.1.611.g7e3b11ae1



[RFC/PATCH 3/3] wildmatch: make use of the interface for precompiling wildmatch() patterns

2017-06-22 Thread Ævar Arnfjörð Bjarmason
Make use of the batch wildmatch() interface. As noted in the comment
here the main hot codepath is not being touched, but some other
invocations where we repeatedly match the same glob against multiple
strings have been migrated.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 builtin/name-rev.c |  6 +-
 builtin/replace.c  |  7 ---
 config.c   |  8 ++--
 dir.c  | 12 
 refs.c |  7 ---
 5 files changed, 31 insertions(+), 9 deletions(-)

diff --git a/builtin/name-rev.c b/builtin/name-rev.c
index d21a5002a7..a8f8010e1e 100644
--- a/builtin/name-rev.c
+++ b/builtin/name-rev.c
@@ -127,14 +127,18 @@ static void name_rev(struct commit *commit,
 static int subpath_matches(const char *path, const char *filter)
 {
const char *subpath = path;
+   struct wildmatch_compiled *code = wildmatch_compile(filter, 0);
 
while (subpath) {
-   if (!wildmatch(filter, subpath, 0))
+   if (!wildmatch_match(code, subpath)) {
+   wildmatch_free(code);
return subpath - path;
+   }
subpath = strchr(subpath, '/');
if (subpath)
subpath++;
}
+   wildmatch_free(code);
return -1;
 }
 
diff --git a/builtin/replace.c b/builtin/replace.c
index 89262df5a1..70e7af72ce 100644
--- a/builtin/replace.c
+++ b/builtin/replace.c
@@ -31,7 +31,7 @@ enum replace_format {
 };
 
 struct show_data {
-   const char *pattern;
+   struct wildmatch_compiled *code;
enum replace_format format;
 };
 
@@ -40,7 +40,7 @@ static int show_reference(const char *refname, const struct 
object_id *oid,
 {
struct show_data *data = cb_data;
 
-   if (!wildmatch(data->pattern, refname, 0)) {
+   if (!wildmatch_match(data->code, refname)) {
if (data->format == REPLACE_FORMAT_SHORT)
printf("%s\n", refname);
else if (data->format == REPLACE_FORMAT_MEDIUM)
@@ -69,7 +69,7 @@ static int list_replace_refs(const char *pattern, const char 
*format)
 
if (pattern == NULL)
pattern = "*";
-   data.pattern = pattern;
+   data.code = wildmatch_compile(pattern, 0);
 
if (format == NULL || *format == '\0' || !strcmp(format, "short"))
data.format = REPLACE_FORMAT_SHORT;
@@ -83,6 +83,7 @@ static int list_replace_refs(const char *pattern, const char 
*format)
format);
 
for_each_replace_ref(show_reference, (void *));
+   wildmatch_free(data.code);
 
return 0;
 }
diff --git a/config.c b/config.c
index 260caf27b8..2ec45e9790 100644
--- a/config.c
+++ b/config.c
@@ -215,6 +215,7 @@ static int include_by_gitdir(const struct config_options 
*opts,
int ret = 0, prefix;
const char *git_dir;
int already_tried_absolute = 0;
+   struct wildmatch_compiled *code = NULL;
 
if (opts->git_dir)
git_dir = opts->git_dir;
@@ -244,8 +245,10 @@ static int include_by_gitdir(const struct config_options 
*opts,
goto done;
}
 
-   ret = !wildmatch(pattern.buf + prefix, text.buf + prefix,
-icase ? WM_CASEFOLD : 0);
+   if (!code)
+   code = wildmatch_compile(pattern.buf + prefix,
+icase ? WM_CASEFOLD : 0);
+   ret = !wildmatch_match(code, text.buf + prefix);
 
if (!ret && !already_tried_absolute) {
/*
@@ -264,6 +267,7 @@ static int include_by_gitdir(const struct config_options 
*opts,
 done:
strbuf_release();
strbuf_release();
+   wildmatch_free(code);
return ret;
 }
 
diff --git a/dir.c b/dir.c
index 544056b37a..f4b5b93956 100644
--- a/dir.c
+++ b/dir.c
@@ -79,6 +79,18 @@ int git_fnmatch(const struct pathspec_item *item,
ps_strcmp(item, pattern,
  string + string_len - pattern_len);
}
+
+   /*
+* TODO: This is the main hot path, but untangling this whole
+* munging of the prefix is a PITA. We take e.g. the pattern
+* "t/**.sh" and then conclude that there's a directory "t",
+* and then match its entries (recursively) against "**.sh".
+*
+* We should try to just always match the full glob against
+* the full pattern. See my "BUG: wildmatches ... don't match
+* properly due to internal optimizations" on the mailing list
+* 

Re: your mail

2017-06-22 Thread Junio C Hamano
Simon Ruderich  writes:

> On Thu, Jun 22, 2017 at 01:55:27PM +, Patrick Lehmann wrote:
>> The description on https://github.com/git/git doesn't reflect that policy.
>>
>> a)
>> It explains that discussions take place in the mentioned mailing list.
>> b)
>> It describes how to subscribe.
>
> However it doesn't say that you have to subscribe to send, only
> how to subscribe.

For that matter, we also say "everyone is welcome to post", which
makes it clear that no subscription is required.

But I view these merely being technically correct.  And making it
absolutely obvious does not cost too much.

>> With knowledge of other mailing lists (mostly managed by mailman),
>> subscription is required for participation.
>
> That depends on the mailing list, some require subscription to
> prevent spams but not all do.

Yes.  But not many people realize that the world they know is the
only world.  We are used to an open list and are shocked when we
encouter a closed one; let's not forget that shock.

How about doing it like this?

 README.md | 4 
 1 file changed, 4 insertions(+)

diff --git a/README.md b/README.md
index f17af66a97..bbaf54bffb 100644
--- a/README.md
+++ b/README.md
@@ -31,6 +31,10 @@ The user discussion and development of Git take place on the 
Git
 mailing list -- everyone is welcome to post bug reports, feature
 requests, comments and patches to git@vger.kernel.org (read
 [Documentation/SubmittingPatches][] for instructions on patch submission).
+
+You can send messages without subscribing to the list, but it is 
+recommended to read what other people are saying on the list before
+you speak.
 To subscribe to the list, send an email with just "subscribe git" in
 the body to majord...@vger.kernel.org. The mailing list archives are
 available at ,


Re: [PATCHv2] submodules: overhaul documentation

2017-06-22 Thread Stefan Beller
On Thu, Jun 22, 2017 at 2:03 PM, Brandon Williams  wrote:
> On 06/22, Stefan Beller wrote:
>> On Thu, Jun 22, 2017 at 1:20 PM, Junio C Hamano  wrote:
>> > Brandon Williams  writes:
>> >
>> >> On 06/20, Stefan Beller wrote:
>> >> ...
>> >>> +The configuration of submodules
>> >>> +---
>> >>> +
>> >>> +Submodule operations can be configured using the following mechanisms
>> >>> +(from highest to lowest precedence):
>> >>> +
>> >>> + * the command line for those commands that support taking submodule 
>> >>> specs.
>> >>> +
>> >>> + * the configuration file `$GIT_DIR/config` in the superproject.
>> >>> +
>> >>> + * the `.gitmodules` file inside the superproject. A project usually
>> >>> +   includes this file to suggest defaults for the upstream collection
>> >>> +   of repositories.
>> >>
>> >> I dislike this last point.  Realistically we don't want this right?  So
>> >> perhaps we shouldn't include it?
>> >
>> > I am not sure if I follow.  Without .gitmodules, how would you, as a
>> > downstream developer, bootstrap the whole thing?
>> >
>>
>> I think Brandon eludes to our long term vision of having a separate
>> magic ref containing these informations instead of carrying it in tree.
>>
>> As urls change over time, it is better to keep the urls out of the
>> actual history, but still versioned so maybe we'll want to have
>> a ref/submodule-config/master ref that contains all the bootstrapping
>> information. The .gitmodules file would degenerate to a pure
>> name<->path mapping.
>
> I was more eluding to having fetch.recurse and the other similar bits
> stored in the gitmodules file.

Well yes, but these configurations would also go onto the new magic ref
once implemented.

And to answer your question:
Yes we (you and me) dislike it, but it is the best we can do given
the current implementation. Once the implementation changes,
we may want to adapt this man page.


>
> --
> Brandon Williams


Re: [PATCH 3/3] t1700: make sure split-index respects core.sharedrepository

2017-06-22 Thread Christian Couder
On Thu, Jun 22, 2017 at 10:25 PM, Junio C Hamano  wrote:
> Christian Couder  writes:
>
>> We use "git config core.sharedrepository 0666" at the beginning of
>> this test, so it will only apply to the shared index files that are
>> created after that.
>>
>> Do you suggest that we test before setting core.sharedrepository that
>> the existing shared index files all have the default permissions?
>
> I think it would be sensible to see at least two values appear.
> Otherwise we cannot tell if the right value is coming by accident
> (because it was the default) or by design (because the configuration
> is correctly read).

Ok, I think I will use something like this then:

while read -r mode modebits filename; do
test_expect_success POSIXPERM "split index respects
core.sharedrepository $mode" '
git config core.sharedrepository "$mode" &&
: >"$filename" &&
git update-index --add "$filename" &&
echo "$modebits" >expect &&
test_modebits .git/index >actual &&
test_cmp expect actual &&
newest_shared_index=$(ls -t .git/sharedindex.* | head -1) &&
test_modebits "$newest_shared_index" >actual &&
test_cmp expect actual
'
done <<\EOF
0666 -rw-rw-rw- seventeen
0642 -rw-r---w- eightteen
EOF


Re: [PATCHv2] submodules: overhaul documentation

2017-06-22 Thread Brandon Williams
On 06/22, Stefan Beller wrote:
> On Thu, Jun 22, 2017 at 1:20 PM, Junio C Hamano  wrote:
> > Brandon Williams  writes:
> >
> >> On 06/20, Stefan Beller wrote:
> >> ...
> >>> +The configuration of submodules
> >>> +---
> >>> +
> >>> +Submodule operations can be configured using the following mechanisms
> >>> +(from highest to lowest precedence):
> >>> +
> >>> + * the command line for those commands that support taking submodule 
> >>> specs.
> >>> +
> >>> + * the configuration file `$GIT_DIR/config` in the superproject.
> >>> +
> >>> + * the `.gitmodules` file inside the superproject. A project usually
> >>> +   includes this file to suggest defaults for the upstream collection
> >>> +   of repositories.
> >>
> >> I dislike this last point.  Realistically we don't want this right?  So
> >> perhaps we shouldn't include it?
> >
> > I am not sure if I follow.  Without .gitmodules, how would you, as a
> > downstream developer, bootstrap the whole thing?
> >
> 
> I think Brandon eludes to our long term vision of having a separate
> magic ref containing these informations instead of carrying it in tree.
> 
> As urls change over time, it is better to keep the urls out of the
> actual history, but still versioned so maybe we'll want to have
> a ref/submodule-config/master ref that contains all the bootstrapping
> information. The .gitmodules file would degenerate to a pure
> name<->path mapping.

I was more eluding to having fetch.recurse and the other similar bits
stored in the gitmodules file.

-- 
Brandon Williams


[PATCHv3] submodules: overhaul documentation

2017-06-22 Thread Stefan Beller
This patch aims to detangle (a) the usage of `git-submodule`
from (b) the concept of submodules and (c) how the actual
implementation looks like, such as where they are configured
and (d) what the best practices are.

To do so, move the conceptual parts of the 'git-submodule'
man page to a new man page gitsubmodules(7). This new page
is just like gitmodules(5), gitattributes(5), gitcredentials(7),
gitnamespaces(7), gittutorial(7), which introduce a concept
rather than explaining a specific command.

Signed-off-by: Stefan Beller 
Signed-off-by: Junio C Hamano 
---

Rerolling sb/submodule-doc once again.

* Adding examples to the config section
* no extra word invented (the "tracking directory" is no more)

I rebased this on top of Kaartics submodule cleanup patch, but there were
no conflicts, such that we can keep it in parallel branches.

 Documentation/Makefile  |   1 +
 Documentation/git-rm.txt|   4 +-
 Documentation/git-submodule.txt |  44 ++--
 Documentation/gitsubmodules.txt | 221 
 4 files changed, 234 insertions(+), 36 deletions(-)
 create mode 100644 Documentation/gitsubmodules.txt

diff --git a/Documentation/Makefile b/Documentation/Makefile
index b5be2e2d3f..2415e0d657 100644
--- a/Documentation/Makefile
+++ b/Documentation/Makefile
@@ -31,6 +31,7 @@ MAN7_TXT += giteveryday.txt
 MAN7_TXT += gitglossary.txt
 MAN7_TXT += gitnamespaces.txt
 MAN7_TXT += gitrevisions.txt
+MAN7_TXT += gitsubmodules.txt
 MAN7_TXT += gittutorial-2.txt
 MAN7_TXT += gittutorial.txt
 MAN7_TXT += gitworkflows.txt
diff --git a/Documentation/git-rm.txt b/Documentation/git-rm.txt
index f1efc116eb..db444693dd 100644
--- a/Documentation/git-rm.txt
+++ b/Documentation/git-rm.txt
@@ -152,8 +152,8 @@ Ignored files are deemed expendable and won't stop a 
submodule's work
 tree from being removed.
 
 If you only want to remove the local checkout of a submodule from your
-work tree without committing the removal,
-use linkgit:git-submodule[1] `deinit` instead.
+work tree without committing the removal, use linkgit:git-submodule[1] `deinit`
+instead. Also see linkgit:gitsubmodules[7] for details on submodule removal.
 
 EXAMPLES
 
diff --git a/Documentation/git-submodule.txt b/Documentation/git-submodule.txt
index 6e07bade39..e67b58bddc 100644
--- a/Documentation/git-submodule.txt
+++ b/Documentation/git-submodule.txt
@@ -24,37 +24,7 @@ DESCRIPTION
 ---
 Inspects, updates and manages submodules.
 
-A submodule allows you to keep another Git repository in a subdirectory
-of your repository. The other repository has its own history, which does not
-interfere with the history of the current repository. This can be used to
-have external dependencies such as third party libraries for example.
-
-When cloning or pulling a repository containing submodules however,
-these will not be checked out by default; the 'init' and 'update'
-subcommands will maintain submodules checked out and at
-appropriate revision in your working tree.
-
-Submodules are composed from a so-called `gitlink` tree entry
-in the main repository that refers to a particular commit object
-within the inner repository that is completely separate.
-A record in the `.gitmodules` (see linkgit:gitmodules[5]) file at the
-root of the source tree assigns a logical name to the submodule and
-describes the default URL the submodule shall be cloned from.
-The logical name can be used for overriding this URL within your
-local repository configuration (see 'submodule init').
-
-Submodules are not to be confused with remotes, which are other
-repositories of the same project; submodules are meant for
-different projects you would like to make part of your source tree,
-while the history of the two projects still stays completely
-independent and you cannot modify the contents of the submodule
-from within the main project.
-If you want to merge the project histories and want to treat the
-aggregated whole as a single project from then on, you may want to
-add a remote for the other project and use the 'subtree' merge strategy,
-instead of treating the other project as a submodule. Directories
-that come from both projects can be cloned and checked out as a whole
-if you choose to go that route.
+For more information about submodules, see linkgit:gitsubmodules[7].
 
 COMMANDS
 
@@ -142,15 +112,17 @@ deinit [-f|--force] (--all|[--] ...)::
tree. Further calls to `git submodule update`, `git submodule foreach`
and `git submodule sync` will skip any unregistered submodules until
they are initialized again, so use this command if you don't want to
-   have a local checkout of the submodule in your working tree anymore. If
-   you really want to remove a submodule from the repository and commit
-   that use linkgit:git-rm[1] instead.
+   have a local checkout of the submodule in your working tree anymore.
 +
 When the command is 

Re: your mail

2017-06-22 Thread Simon Ruderich
On Thu, Jun 22, 2017 at 01:55:27PM +, Patrick Lehmann wrote:
> The description on https://github.com/git/git doesn't reflect that policy.
>
> a)
> It explains that discussions take place in the mentioned mailing list.
> b)
> It describes how to subscribe.

However it doesn't say that you have to subscribe to send, only
how to subscribe.

> With knowledge of other mailing lists (mostly managed by mailman),
> subscription is required for participation.

That depends on the mailing list, some require subscription to
prevent spams but not all do.

Somebody might want to update the documentation, but personally
I see no reason to change anything. If you want to send a patch
to improve it, that would be great of course.

Regards
Simon

PS: Please don't top-post on this mailing list.
-- 
+ Privatsphäre ist notwendig
+ Ich verwende GnuPG http://gnupg.org
+ Öffentlicher Schlüssel: 0x92FEFDB7E44C32F9


Re: [PATCH] name-rev: Fix tag lookup on repository with mixed types of tags

2017-06-22 Thread Stefan Beller
On Thu, Jun 22, 2017 at 12:52 PM,   wrote:
> From: Orgad Shaneh 
>
> Commit 7550424804 (name-rev: include taggerdate in considering the best
> name) introduced a bug in name-rev.
>
> If a repository has both annotated and non-annotated tags, annotated
> tag will always win, even if it was created decades after the commit.
>
> Consider a repository that always used non-annotated tags, and at some
> point started using annotated tags - name-rev --tags will return the
> first annotated tags for all the old commits (in our repository it is
> followed by ~5067 for one commit, or by ~120^2~21^2~88^2~87 for
> another...). This is obviously not what the user expects.
>
> The taggerdate should only be matched if *both tags* have it.

Please sign off the patch. See https://developercertificate.org/
or Documentation/SubmittingPatches

> ---
>  builtin/name-rev.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/builtin/name-rev.c b/builtin/name-rev.c
> index 92a5d8a5d2..8f77023482 100644
> --- a/builtin/name-rev.c
> +++ b/builtin/name-rev.c
> @@ -46,11 +46,13 @@ static void name_rev(struct commit *commit,
> commit->util = name;
> goto copy_data;
> } else if (name->taggerdate > taggerdate ||
> -   (name->taggerdate == taggerdate &&
> +   ((taggerdate == ULONG_MAX || name->taggerdate == 
> taggerdate) &&
>  name->distance > distance)) {
>  copy_data:
> name->tip_name = tip_name;
> -   name->taggerdate = taggerdate;
> +   if (taggerdate != ULONG_MAX) {
> +   name->taggerdate = taggerdate;
> +   }

style: you may omit the braces for a single statement
after if.

> name->generation = generation;
> name->distance = distance;
> } else
> --
> 2.13.1.windows.1.1.ga36e14b3aa
>


[PATCH 1/3] list-objects: add filter_blob to traverse_commit_list

2017-06-22 Thread Jeff Hostetler
From: Jeff Hostetler 

In preparation for partial/sparse clone/fetch where the
server is allowed to omit large/all blobs from the packfile,
teach traverse_commit_list() to take a blob filter-proc that
controls when blobs are shown and marked as SEEN.

Normally, traverse_commit_list() always marks visited blobs
as SEEN, so that the show_object() callback will never see
duplicates.  Since a single blob OID may be referenced by
multiple pathnames, we may not be able to decide if a blob
should be excluded until later pathnames have been traversed.
With the filter-proc, the automatic deduping is disabled.

Signed-off-by: Jeff Hostetler 
---
 list-objects.c | 39 +++
 list-objects.h |  8 
 2 files changed, 43 insertions(+), 4 deletions(-)

diff --git a/list-objects.c b/list-objects.c
index f3ca6aa..c9ca81c 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -11,6 +11,7 @@
 static void process_blob(struct rev_info *revs,
 struct blob *blob,
 show_object_fn show,
+filter_blob_fn filter_blob,
 struct strbuf *path,
 const char *name,
 void *cb_data)
@@ -24,11 +25,28 @@ static void process_blob(struct rev_info *revs,
die("bad blob object");
if (obj->flags & (UNINTERESTING | SEEN))
return;
-   obj->flags |= SEEN;
 
pathlen = path->len;
strbuf_addstr(path, name);
-   show(obj, path->buf, cb_data);
+   if (!filter_blob) {
+   /*
+* Normal processing is to imediately dedup blobs
+* during commit traversal, regardless of how many
+* times it appears in a single or multiple commits,
+* so we always set SEEN.
+*/
+   obj->flags |= SEEN;
+   show(obj, path->buf, cb_data);
+   } else {
+   /*
+* Use the filter-proc to decide whether to show
+* the blob.  We only set SEEN if requested.  For
+* example, this could be used to omit a specific
+* blob until it appears with a ".git*" entryname.
+*/
+   if (filter_blob(obj, path->buf, >buf[pathlen], cb_data))
+   obj->flags |= SEEN;
+   }
strbuf_setlen(path, pathlen);
 }
 
@@ -67,6 +85,7 @@ static void process_gitlink(struct rev_info *revs,
 static void process_tree(struct rev_info *revs,
 struct tree *tree,
 show_object_fn show,
+filter_blob_fn filter_blob,
 struct strbuf *base,
 const char *name,
 void *cb_data)
@@ -111,7 +130,7 @@ static void process_tree(struct rev_info *revs,
if (S_ISDIR(entry.mode))
process_tree(revs,
 lookup_tree(entry.oid->hash),
-show, base, entry.path,
+show, filter_blob, base, entry.path,
 cb_data);
else if (S_ISGITLINK(entry.mode))
process_gitlink(revs, entry.oid->hash,
@@ -120,7 +139,7 @@ static void process_tree(struct rev_info *revs,
else
process_blob(revs,
 lookup_blob(entry.oid->hash),
-show, base, entry.path,
+show, filter_blob, base, entry.path,
 cb_data);
}
strbuf_setlen(base, baselen);
@@ -188,6 +207,16 @@ void traverse_commit_list(struct rev_info *revs,
  show_object_fn show_object,
  void *data)
 {
+   traverse_commit_list_filtered(revs, show_commit, show_object, NULL, 
data);
+}
+
+void traverse_commit_list_filtered(
+   struct rev_info *revs,
+   show_commit_fn show_commit,
+   show_object_fn show_object,
+   filter_blob_fn filter_blob,
+   void *data)
+{
int i;
struct commit *commit;
struct strbuf base;
@@ -218,11 +247,13 @@ void traverse_commit_list(struct rev_info *revs,
path = "";
if (obj->type == OBJ_TREE) {
process_tree(revs, (struct tree *)obj, show_object,
+filter_blob,
 , path, data);
continue;
}
if (obj->type == OBJ_BLOB) {
process_blob(revs, (struct blob *)obj, show_object,
+filter_blob,
 , path, data);
continue;

[PATCH 2/3] pack-objects: WIP add max-blob-size filtering

2017-06-22 Thread Jeff Hostetler
From: Jeff Hostetler 

Teach pack-objects command to accept --max-blob-size= argument
and use a traverse_commit_list filter-proc to omit unwanted blobs
from the resulting packfile.

This filter-proc always includes special files matching ".git*"
(such as ".gitignore") and blobs smaller than .   is a
magnitude value and accepts [kmg] suffixes.  A value of zero
can be used to omit all blobs (except for special files).

There are 2 placeholder TODOs in this code to talk about building
an omitted-blob list for the client.

Signed-off-by: Jeff Hostetler 
---
 builtin/pack-objects.c | 76 +-
 1 file changed, 75 insertions(+), 1 deletion(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 50e01aa..cdcd4d6 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -77,6 +77,8 @@ static unsigned long cache_max_small_delta_size = 1000;
 
 static unsigned long window_memory_limit = 0;
 
+static signed long max_blob_size = -1;
+
 /*
  * stats
  */
@@ -2519,6 +2521,7 @@ static void read_object_list_from_stdin(void)
 }
 
 #define OBJECT_ADDED (1u<<20)
+#define BLOB_OMITTED (1u<<21)
 
 static void show_commit(struct commit *commit, void *data)
 {
@@ -2536,6 +2539,70 @@ static void show_object(struct object *obj, const char 
*name, void *data)
obj->flags |= OBJECT_ADDED;
 }
 
+/*
+ * Filter blobs by pathname or size.
+ * Return 1 to mark the blob SEEN so that it will not be reported again.
+ * Return 0 to allow it to be presented again.
+ */
+static int filter_blob(
+   struct object *obj,
+   const char *pathname,
+   const char *entryname,
+   void *data)
+{
+   assert(obj->type == OBJ_BLOB);
+   assert((obj->flags & SEEN) == 0);
+   assert((obj->flags & OBJECT_ADDED) == 0);
+   assert(max_blob_size >= 0);
+
+   /*
+* Always include blobs for special files of the form ".git*".
+*/
+   if ((strncmp(entryname, ".git", 4) == 0) && entryname[4]) {
+   if (obj->flags & BLOB_OMITTED) {
+   /*
+* TODO
+* TODO Remove this blob from the omitted blob list.
+* TODO
+*/
+   obj->flags &= ~BLOB_OMITTED;
+   }
+   show_object(obj, pathname, data);
+   return 1;
+   }
+
+   /*
+* We already know the blob is too big because it was previously
+* omitted.  We still don't want it yet.  DO NOT mark it SEEN
+* in case it is associated with a ".git*" path in another tree
+* or commit.
+*/
+   if (obj->flags & BLOB_OMITTED)
+   return 0;
+
+   /*
+* We only want blobs that are LESS THAN the maximum.
+* This allows zero to mean NO BLOBS.
+*/
+   if (max_blob_size > 0) {
+   unsigned long s;
+   enum object_type t = sha1_object_info(obj->oid.hash, );
+   assert(t == OBJ_BLOB);
+   if (s < max_blob_size) {
+   show_object(obj, pathname, data);
+   return 1;
+   }
+   }
+
+   /*
+* TODO
+* TODO (Provisionally) add this blob to the omitted blob list.
+* TODO
+*/
+   obj->flags |= BLOB_OMITTED;
+   return 0;
+}
+
 static void show_edge(struct commit *commit)
 {
add_preferred_base(commit->object.oid.hash);
@@ -2800,7 +2867,12 @@ static void get_object_list(int ac, const char **av)
if (prepare_revision_walk())
die("revision walk setup failed");
mark_edges_uninteresting(, show_edge);
-   traverse_commit_list(, show_commit, show_object, NULL);
+
+   if (max_blob_size == -1)
+   traverse_commit_list(, show_commit, show_object, NULL);
+   else
+   traverse_commit_list_filtered(, show_commit, show_object,
+   filter_blob, NULL);
 
if (unpack_unreachable_expiration) {
revs.ignore_missing_links = 1;
@@ -2936,6 +3008,8 @@ int cmd_pack_objects(int argc, const char **argv, const 
char *prefix)
 N_("use a bitmap index if available to speed up 
counting objects")),
OPT_BOOL(0, "write-bitmap-index", _bitmap_index,
 N_("write a bitmap index together with the pack 
index")),
+   OPT_MAGNITUDE(0, "max-blob-size", (unsigned long 
*)_blob_size,
+ N_("omit large blobs from packfile")),
OPT_END(),
};
 
-- 
2.9.3



[PATCH 0/3] WIP list-objects and pack-objects for partial clone

2017-06-22 Thread Jeff Hostetler
From: Jeff Hostetler 

This WIP is a follow up to earlier patches to teach pack-objects
to omit large blobs from packfiles.  This doesn't attempt to solve
the whole end-to-end problem of partial/sparse clone/fetch or that
of the client operating with missing blobs.  This WIP is for now
limited to building the packfile with omitted blobs and hopefully
can mesh nicely with Jonathan Tan's work in [3].

It supports filtering by size while always including blobs associated
with ".git*" paths, something we both proposed in [1] and [3].

The approach here differs from [1] and [3] in that it extends
traverse_commit_list() to allow custom blob filtering using a new
callback provided by pack-objects.  This should make it easier to
do other filters laters.  Part of this based upon Peff's suggestion
about rev-list in [2].  I have not updated the rev-list command,
but rather the routines in list-objects.c that it calls.  Jonathan's
ideas in [3] to build and send the omitted blobs list means that I
think we need pack-objects.c manage the filter-proc used here.

I considered, but omitted from this version, ideas to allow the
filter-proc to know of the process_tree() boundaries which might
let pack-objects filter by sub-tree (think sparse-checkout) as
suggested in [4] and various replies.

[1] 
https://public-inbox.org/git/1488994685-37403-3-git-send-email-jeffh...@microsoft.com/
[2] 
https://public-inbox.org/git/20170309073117.g3br5btsfwntc...@sigill.intra.peff.net/
[3] https://public-inbox.org/git/cover.1496361873.git.jonathanta...@google.com/
[4] 
https://public-inbox.org/git/20170602232508.ga21...@aiede.mtv.corp.google.com/


Jeff Hostetler (3):
  list-objects: add filter_blob to traverse_commit_list
  pack-objects: WIP add max-blob-size filtering
  pack-objects: add t5317 to test max-blob-size

 builtin/pack-objects.c | 76 +-
 list-objects.c | 39 +++--
 list-objects.h |  8 
 t/t5317-pack-objects-blob-filtering.sh | 68 ++
 4 files changed, 186 insertions(+), 5 deletions(-)
 create mode 100644 t/t5317-pack-objects-blob-filtering.sh

-- 
2.9.3



[PATCH 3/3] pack-objects: add t5317 to test max-blob-size

2017-06-22 Thread Jeff Hostetler
From: Jeff Hostetler 

Signed-off-by: Jeff Hostetler 
---
 t/t5317-pack-objects-blob-filtering.sh | 68 ++
 1 file changed, 68 insertions(+)
 create mode 100644 t/t5317-pack-objects-blob-filtering.sh

diff --git a/t/t5317-pack-objects-blob-filtering.sh 
b/t/t5317-pack-objects-blob-filtering.sh
new file mode 100644
index 000..58124ab
--- /dev/null
+++ b/t/t5317-pack-objects-blob-filtering.sh
@@ -0,0 +1,68 @@
+#!/bin/sh
+
+test_description='pack-objects blob filtering'
+
+. ./test-lib.sh
+
+test_expect_success setup '
+   printf "%10s"   X >x10.txt   &&
+   printf "%100s"  X >x100.txt  &&
+   printf "%1000s" X >x1000.txt &&
+   git add *.txt &&
+   git commit -m txt
+'
+
+test_expect_success 'all blobs' '
+   test_when_finished "rm -f *.pack *.idx" &&
+   git pack-objects --revs --thin --stdout >z.pack <<-EOF &&
+   master
+
+   EOF
+   git index-pack z.pack &&
+   test 3 = $(git verify-pack -v z.pack | grep blob | wc -l)
+'
+
+test_expect_success 'no blobs (max equals 0)' '
+   test_when_finished "rm -f *.pack *.idx" &&
+   git pack-objects --revs --thin --stdout --max-blob-size=0 >z.pack 
<<-EOF &&
+   master
+
+   EOF
+   git index-pack z.pack &&
+   test 0 = $(git verify-pack -v z.pack | grep blob | wc -l)
+'
+
+test_expect_success 'small 20 blobs' '
+   test_when_finished "rm -f *.pack *.idx" &&
+   git pack-objects --revs --thin --stdout --max-blob-size=20 >z.pack 
<<-EOF &&
+   master
+
+   EOF
+   git index-pack z.pack &&
+   test 1 = $(git verify-pack -v z.pack | grep blob | wc -l)
+'
+
+test_expect_success 'small 200 blobs' '
+   test_when_finished "rm -f *.pack *.idx" &&
+   git pack-objects --revs --thin --stdout --max-blob-size=200 >z.pack 
<<-EOF &&
+   master
+
+   EOF
+   git index-pack z.pack &&
+   test 2 = $(git verify-pack -v z.pack | grep blob | wc -l)
+'
+
+test_expect_success 'special files always present' '
+   test_when_finished "rm -f *.pack *.idx" &&
+   cp x1000.txt .gitignore &&
+   git add .gitignore &&
+   git commit -m "add ignores" &&
+   git pack-objects --revs --stdout --max-blob-size=0 >z.pack <<-EOF &&
+   master
+
+   EOF
+   git index-pack z.pack &&
+   test 1 = $(git verify-pack -v z.pack | grep blob | wc -l)
+'
+
+test_done
-- 
2.9.3



Re: [PATCH v4 15/20] repository: add index_state to struct repo

2017-06-22 Thread Brandon Williams
On 06/22, Junio C Hamano wrote:
> Brandon Williams  writes:
> 
> >  struct repository {
> > /* Environment */
> > @@ -49,6 +50,12 @@ struct repository {
> >  */
> > struct config_set *config;
> >  
> > +   /*
> > +* Repository's in-memory index.
> > +* 'repo_read_index()' can be used to populate 'index'.
> > +*/
> > +   struct index_state *index;
> > +
> > /* Configurations */
> > /*
> >  * Bit used during initialization to indicate if repository state (like
> > @@ -71,4 +78,6 @@ extern void repo_set_worktree(struct repository *repo, 
> > const char *path);
> >  extern int repo_init(struct repository *repo, const char *gitdir, const 
> > char *worktree);
> >  extern void repo_clear(struct repository *repo);
> >  
> > +extern int repo_read_index(struct repository *repo);
> > +
> >  #endif /* REPOSITORY_H */
> 
> While you are working on a simple read-only operation like
> "ls-files", you can get away with just a singleton (the equivalent
> to "the_index" in the repository object world) in-core index
> instance, and having a way to tell the system to read things into
> the default thing without having to name what that default thing is
> is a very useful thing to have.  It is a good thing that this only
> needs "struct repository *" and no "struct index_state *" parameter.
> 
> But you will need a way to read, update, write it out as a tree,
> etc. to a non-default in-core index, once you start doing more
> complex things.  The function signature of the lowest level
> primitive helper for them will be:
> 
> extern int (*)(struct repository *, struct index_state *);
> 
> And you would want to reserve "index" suffix for such a function,
> following the "if you use the default in-core thing, you do not have
> to pass it as a parameter---just call _cache() convenience macro;
> but you can explicitly pass it to the underlying _index() function"
> convention we established long time ago.
> 
> So I'd suggest renaming the above one that uses the default in-core
> index instance to
> 
>   extern int repo_read_cache(struct repository *);
> 
> or you will regret later.
> 

That makes sense.  While at it, would it make sense to ensure that the
'struct index_state *' which is stored in 'the_repository.index' be
'_index'?

-- 
Brandon Williams


Re: Truncating HEAD reflog on branch move

2017-06-22 Thread Junio C Hamano
Jeff King  writes:

> So I'd be tempted to just ditch the whole thing and teach
> get_revision_1() to just walk through the list of logs, rather than this
> weird "add a pending commit and then try to figure out which reflog it
> referred to". For instance, right now:
>
>   git log -g HEAD $(git symbolic-ref HEAD)
>
> only shows _one_ reflog. The patch below is the direction I'm thinking.
> It fails two tests, but haven't dug yet.
>
> ---
>  reflog-walk.c | 112 +--
>  reflog-walk.h |   4 +-
>  revision.c|  24 
>  3 files changed, 43 insertions(+), 97 deletions(-)

Yeah, I agree with the "we now show diffs with true parents"
reasoning, and I like the above code reduction, obviously ;-)

> @@ -3114,18 +3112,20 @@ static void track_linear(struct rev_info *revs, 
> struct commit *commit)
>  
>  static struct commit *get_revision_1(struct rev_info *revs)
>  {
> + if (revs->reflog_info) {
> + struct commit *commit = next_reflog_entry(revs->reflog_info);
> + if (commit) {
> + commit->object.flags &= ~(ADDED | SEEN | SHOWN);
> + return commit;
> + }
> + }
> +
>   if (!revs->commits)
>   return NULL;
>  
>   do {
>   struct commit *commit = pop_commit(>commits);
>  
> - if (revs->reflog_info) {
> - save_parents(revs, commit);
> - fake_reflog_parent(revs->reflog_info, commit);
> - commit->object.flags &= ~(ADDED | SEEN | SHOWN);
> - }
> -
>   /*
>* If we haven't done the list limiting, we need to look at
>* the parents here. We also need to do the date-based limiting

This part of the patch I can 100% agree with ;-)  

I do not think command line parser does not allow "log -g
maint..master" so all the "limited" processing the remainder of
get_revision_1() does shouldn't matter.

I however think pathspec will affect simplify_commit() and suspect
that "git log -g -20 HEAD path" will behave differently.  Perhaps
the difference is "it used to use path in an unexplainable way, now
it ignores", in which case this is an improvement.

Thanks.







Re: [PATCHv2] submodules: overhaul documentation

2017-06-22 Thread Stefan Beller
On Thu, Jun 22, 2017 at 1:20 PM, Junio C Hamano  wrote:
> Brandon Williams  writes:
>
>> On 06/20, Stefan Beller wrote:
>> ...
>>> +The configuration of submodules
>>> +---
>>> +
>>> +Submodule operations can be configured using the following mechanisms
>>> +(from highest to lowest precedence):
>>> +
>>> + * the command line for those commands that support taking submodule specs.
>>> +
>>> + * the configuration file `$GIT_DIR/config` in the superproject.
>>> +
>>> + * the `.gitmodules` file inside the superproject. A project usually
>>> +   includes this file to suggest defaults for the upstream collection
>>> +   of repositories.
>>
>> I dislike this last point.  Realistically we don't want this right?  So
>> perhaps we shouldn't include it?
>
> I am not sure if I follow.  Without .gitmodules, how would you, as a
> downstream developer, bootstrap the whole thing?
>

I think Brandon eludes to our long term vision of having a separate
magic ref containing these informations instead of carrying it in tree.

As urls change over time, it is better to keep the urls out of the
actual history, but still versioned so maybe we'll want to have
a ref/submodule-config/master ref that contains all the bootstrapping
information. The .gitmodules file would degenerate to a pure
name<->path mapping.


Re: [PATCH 3/3] t1700: make sure split-index respects core.sharedrepository

2017-06-22 Thread Junio C Hamano
Christian Couder  writes:

> We use "git config core.sharedrepository 0666" at the beginning of
> this test, so it will only apply to the shared index files that are
> created after that.
>
> Do you suggest that we test before setting core.sharedrepository that
> the existing shared index files all have the default permissions?

I think it would be sensible to see at least two values appear.
Otherwise we cannot tell if the right value is coming by accident
(because it was the default) or by design (because the configuration
is correctly read).



Re: Truncating HEAD reflog on branch move

2017-06-22 Thread Jeff King
On Thu, Jun 22, 2017 at 12:03:31PM -0700, Junio C Hamano wrote:

> Jeff King  writes:
> 
> > I don't think this is quite right, though. We've decremented "recno"
> > after assigning the old pointer to "reflog". So in the existing code,
> > "reflog" in that second conditional pointing to the _next_ entry (or
> > previous, really, since we are going in reverse order).
> >
> > So I think you'd need to look at commit->reflog again (after checking
> > that we didn't go past the start of the array).
> 
> Perhaps.  I did the illustration that way simply because I was not
> sure if the current "the entry was NULL from something new, so skip
> and look at the previous entry's new" was correct to begin with.

I'm not sure it makes sense for an entry to have its "after" state as
its own parent. On the other hand, I'm not sure it makes sense to
consider this fake parentage in the first place.

I think in the old days we used truly rewrite the parents, and traversal
like "git log -g -p" would show the diff between reflog entries. But
that changed in 838f9a156 (log: use true parents for diff when walking
reflogs, 2013-08-03). Given that we disable reflog-walking with things
like "--graph" that show the relationship, are the fake parents actually
accomplishing anything anymore? The parents we show via "log -g
--parents" seem like nonsense, and I'm not sure how we would do any
history simplification based no the munged values.

So I'd be tempted to just ditch the whole thing and teach
get_revision_1() to just walk through the list of logs, rather than this
weird "add a pending commit and then try to figure out which reflog it
referred to". For instance, right now:

  git log -g HEAD $(git symbolic-ref HEAD)

only shows _one_ reflog. The patch below is the direction I'm thinking.
It fails two tests, but haven't dug yet.

---
 reflog-walk.c | 112 +--
 reflog-walk.h |   4 +-
 revision.c|  24 
 3 files changed, 43 insertions(+), 97 deletions(-)

diff --git a/reflog-walk.c b/reflog-walk.c
index ed99437ad..77170a3cb 100644
--- a/reflog-walk.c
+++ b/reflog-walk.c
@@ -78,45 +78,6 @@ static int get_reflog_recno_by_time(struct complete_reflogs 
*array,
return -1;
 }
 
-struct commit_info_lifo {
-   struct commit_info {
-   struct commit *commit;
-   void *util;
-   } *items;
-   int nr, alloc;
-};
-
-static struct commit_info *get_commit_info(struct commit *commit,
-   struct commit_info_lifo *lifo, int pop)
-{
-   int i;
-   for (i = 0; i < lifo->nr; i++)
-   if (lifo->items[i].commit == commit) {
-   struct commit_info *result = >items[i];
-   if (pop) {
-   if (i + 1 < lifo->nr)
-   memmove(lifo->items + i,
-   lifo->items + i + 1,
-   (lifo->nr - i) *
-   sizeof(struct commit_info));
-   lifo->nr--;
-   }
-   return result;
-   }
-   return NULL;
-}
-
-static void add_commit_info(struct commit *commit, void *util,
-   struct commit_info_lifo *lifo)
-{
-   struct commit_info *info;
-   ALLOC_GROW(lifo->items, lifo->nr + 1, lifo->alloc);
-   info = lifo->items + lifo->nr;
-   info->commit = commit;
-   info->util = util;
-   lifo->nr++;
-}
-
 struct commit_reflog {
int recno;
enum selector_type {
@@ -128,7 +89,8 @@ struct commit_reflog {
 };
 
 struct reflog_walk_info {
-   struct commit_info_lifo reflogs;
+   struct commit_reflog **logs;
+   size_t nr, alloc, cur;
struct string_list complete_reflogs;
struct commit_reflog *last_commit_reflog;
 };
@@ -226,50 +188,10 @@ int add_reflog_for_walk(struct reflog_walk_info *info,
commit_reflog->selector = selector;
commit_reflog->reflogs = reflogs;
 
-   add_commit_info(commit, commit_reflog, >reflogs);
-   return 0;
-}
-
-void fake_reflog_parent(struct reflog_walk_info *info, struct commit *commit)
-{
-   struct commit_info *commit_info =
-   get_commit_info(commit, >reflogs, 0);
-   struct commit_reflog *commit_reflog;
-   struct object *logobj;
-   struct reflog_info *reflog;
-
-   info->last_commit_reflog = NULL;
-   if (!commit_info)
-   return;
+   ALLOC_GROW(info->logs, info->nr + 1, info->alloc);
+   info->logs[info->nr++] = commit_reflog;
 
-   commit_reflog = commit_info->util;
-   if (commit_reflog->recno < 0) {
-   commit->parents = NULL;
-   return;
-   }
-   info->last_commit_reflog = commit_reflog;
-
-   do {
-   reflog = _reflog->reflogs->items[commit_reflog->recno];
-   commit_reflog->recno--;
-  

Re: [PATCHv2] submodules: overhaul documentation

2017-06-22 Thread Junio C Hamano
Brandon Williams  writes:

> On 06/20, Stefan Beller wrote:
> ...
>> +The configuration of submodules
>> +---
>> +
>> +Submodule operations can be configured using the following mechanisms
>> +(from highest to lowest precedence):
>> +
>> + * the command line for those commands that support taking submodule specs.
>> +
>> + * the configuration file `$GIT_DIR/config` in the superproject.
>> +
>> + * the `.gitmodules` file inside the superproject. A project usually
>> +   includes this file to suggest defaults for the upstream collection
>> +   of repositories.
>
> I dislike this last point.  Realistically we don't want this right?  So
> perhaps we shouldn't include it?

I am not sure if I follow.  Without .gitmodules, how would you, as a
downstream developer, bootstrap the whole thing?



Re: [PATCH 3/3] t1700: make sure split-index respects core.sharedrepository

2017-06-22 Thread Christian Couder
On Thu, Jun 22, 2017 at 9:53 PM, Junio C Hamano  wrote:
> Christian Couder  writes:
>
>> Add a test to check that both the split-index file and the
>> shared-index file are created using the right permissions
>> when core.sharedrepository is set.
>>
>> Signed-off-by: Christian Couder 
>> ---
>>  t/t1700-split-index.sh | 12 
>>  1 file changed, 12 insertions(+)
>>
>> diff --git a/t/t1700-split-index.sh b/t/t1700-split-index.sh
>> index af3ec0da5a..a52b92e82b 100755
>> --- a/t/t1700-split-index.sh
>> +++ b/t/t1700-split-index.sh
>> @@ -370,4 +370,16 @@ test_expect_success 'check splitIndex.sharedIndexExpire 
>> set to "never" and "now"
>>   test $(ls .git/sharedindex.* | wc -l) -le 2
>>  '
>>
>> +test_expect_success POSIXPERM 'split index respects core.sharedrepository' '
>> + git config core.sharedrepository 0666 &&
>> + : >seventeen &&
>> + git update-index --add seventeen &&
>> + echo "-rw-rw-rw-" >expect &&
>> + test_modebits .git/index >actual &&
>> + test_cmp expect actual &&
>> + newest_shared_index=$(ls -t .git/sharedindex.* | head -1) &&
>
> Hmph.  Don't you want to make sure all of them, not just the latest
> one, have the expected mode bits?

We use "git config core.sharedrepository 0666" at the beginning of
this test, so it will only apply to the shared index files that are
created after that.

Do you suggest that we test before setting core.sharedrepository that
the existing shared index files all have the default permissions?


Re: [PATCH v4 15/20] repository: add index_state to struct repo

2017-06-22 Thread Junio C Hamano
Brandon Williams  writes:

>  struct repository {
>   /* Environment */
> @@ -49,6 +50,12 @@ struct repository {
>*/
>   struct config_set *config;
>  
> + /*
> +  * Repository's in-memory index.
> +  * 'repo_read_index()' can be used to populate 'index'.
> +  */
> + struct index_state *index;
> +
>   /* Configurations */
>   /*
>* Bit used during initialization to indicate if repository state (like
> @@ -71,4 +78,6 @@ extern void repo_set_worktree(struct repository *repo, 
> const char *path);
>  extern int repo_init(struct repository *repo, const char *gitdir, const char 
> *worktree);
>  extern void repo_clear(struct repository *repo);
>  
> +extern int repo_read_index(struct repository *repo);
> +
>  #endif /* REPOSITORY_H */

While you are working on a simple read-only operation like
"ls-files", you can get away with just a singleton (the equivalent
to "the_index" in the repository object world) in-core index
instance, and having a way to tell the system to read things into
the default thing without having to name what that default thing is
is a very useful thing to have.  It is a good thing that this only
needs "struct repository *" and no "struct index_state *" parameter.

But you will need a way to read, update, write it out as a tree,
etc. to a non-default in-core index, once you start doing more
complex things.  The function signature of the lowest level
primitive helper for them will be:

extern int (*)(struct repository *, struct index_state *);

And you would want to reserve "index" suffix for such a function,
following the "if you use the default in-core thing, you do not have
to pass it as a parameter---just call _cache() convenience macro;
but you can explicitly pass it to the underlying _index() function"
convention we established long time ago.

So I'd suggest renaming the above one that uses the default in-core
index instance to

extern int repo_read_cache(struct repository *);

or you will regret later.



Re: [PATCH 3/3] t1700: make sure split-index respects core.sharedrepository

2017-06-22 Thread Junio C Hamano
Christian Couder  writes:

> Add a test to check that both the split-index file and the
> shared-index file are created using the right permissions
> when core.sharedrepository is set.
>
> Signed-off-by: Christian Couder 
> ---
>  t/t1700-split-index.sh | 12 
>  1 file changed, 12 insertions(+)
>
> diff --git a/t/t1700-split-index.sh b/t/t1700-split-index.sh
> index af3ec0da5a..a52b92e82b 100755
> --- a/t/t1700-split-index.sh
> +++ b/t/t1700-split-index.sh
> @@ -370,4 +370,16 @@ test_expect_success 'check splitIndex.sharedIndexExpire 
> set to "never" and "now"
>   test $(ls .git/sharedindex.* | wc -l) -le 2
>  '
>  
> +test_expect_success POSIXPERM 'split index respects core.sharedrepository' '
> + git config core.sharedrepository 0666 &&
> + : >seventeen &&
> + git update-index --add seventeen &&
> + echo "-rw-rw-rw-" >expect &&
> + test_modebits .git/index >actual &&
> + test_cmp expect actual &&
> + newest_shared_index=$(ls -t .git/sharedindex.* | head -1) &&

Hmph.  Don't you want to make sure all of them, not just the latest
one, have the expected mode bits?

> + test_modebits "$newest_shared_index" >actual &&
> + test_cmp expect actual
> +'
> +
>  test_done


[PATCH] name-rev: Fix tag lookup on repository with mixed types of tags

2017-06-22 Thread orgads
From: Orgad Shaneh 

Commit 7550424804 (name-rev: include taggerdate in considering the best
name) introduced a bug in name-rev.

If a repository has both annotated and non-annotated tags, annotated
tag will always win, even if it was created decades after the commit.

Consider a repository that always used non-annotated tags, and at some
point started using annotated tags - name-rev --tags will return the
first annotated tags for all the old commits (in our repository it is
followed by ~5067 for one commit, or by ~120^2~21^2~88^2~87 for
another...). This is obviously not what the user expects.

The taggerdate should only be matched if *both tags* have it.
---
 builtin/name-rev.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/builtin/name-rev.c b/builtin/name-rev.c
index 92a5d8a5d2..8f77023482 100644
--- a/builtin/name-rev.c
+++ b/builtin/name-rev.c
@@ -46,11 +46,13 @@ static void name_rev(struct commit *commit,
commit->util = name;
goto copy_data;
} else if (name->taggerdate > taggerdate ||
-   (name->taggerdate == taggerdate &&
+   ((taggerdate == ULONG_MAX || name->taggerdate == 
taggerdate) &&
 name->distance > distance)) {
 copy_data:
name->tip_name = tip_name;
-   name->taggerdate = taggerdate;
+   if (taggerdate != ULONG_MAX) {
+   name->taggerdate = taggerdate;
+   }
name->generation = generation;
name->distance = distance;
} else
-- 
2.13.1.windows.1.1.ga36e14b3aa



Re: [PATCH 2/3] t1301: move movebits() to test-lib-functions.sh

2017-06-22 Thread Junio C Hamano
Christian Couder  writes:

> As the movebits() function can be useful outside t1301,
> let's move it into test-lib-functions.sh, and while at
> it let's rename it test_movebits().

Good thinking, especially on the renaming.


Re: [PATCH 1/3] read-cache: use shared perms when writing shared index

2017-06-22 Thread Junio C Hamano
Christian Couder  writes:

> Since f6ecc62dbf (write_shared_index(): use tempfile module, 2015-08-10)
> write_shared_index() has been using mks_tempfile() to create the
> temporary file that will become the shared index.
>
> But even before that, it looks like the functions used to create this
> file didn't call adjust_shared_perm(), which means that the shared
> index file has always been created with 600 permissions regardless
> of the shared permission settings.
>
> This means that on repositories created with `git init --shared=all`
> and using the split index feature one gets an error like:
>
> fatal: .git/sharedindex.a52f910b489bc462f187ab572ba0086f7b5157de: index file 
> open failed: Permission denied
>
> when another user performs any operation that reads the shared index.

Assuming that a "shared" repository setting should allow uses by
different users, the above analysis makes sense to me.

But the conclusion does not.

> Let's fix that by using create_tempfile() instead of mks_tempfile()
> to create the shared index file.
>
> ...
> - fd = mks_tempfile(_sharedindex, 
> git_path("sharedindex_XX"));
> + fd = create_tempfile(_sharedindex, 
> git_path("sharedindex_XX"));

So we used to create a temporary file that made sure its name is
unique but now we create sharedindex_XX with 6 X's literally 
at the end?

Doesn't mks_tempfile() family include a variant where you can give
custom mode?  Better yet, perhaps you can call adjust_shared_perm()
on the path _after_ seeing that mks_tempfile() succeeds (you can ask
get_tempfile_path() which path to adjust, I presume)?

>   if (fd < 0) {
>   hashclr(si->base_sha1);
>   return do_write_locked_index(istate, lock, flags);


Re: [PATCH v4 00/20] repository object

2017-06-22 Thread Stefan Beller
On Thu, Jun 22, 2017 at 11:43 AM, Brandon Williams  wrote:
> As before you can find this series at:
> https://github.com/bmwill/git/tree/repository-object
>
> Changes in v4:
>
> * Patch 11 is slightly different and turns off all path relocation when a
>   worktree is provided instead of just for the index file (Thanks for the help
>   Jonathan Nieder).
> * 'repo_init()' has a tighter API and now requires that the provided gitdir is
>   a path to the gitdir instead of either a path to the gitdir or path to the
>   worktree (which has a .git file or directory) (Thanks Jonathan Tan).
> * Minor comment and commit message chagnes
>
> Note: Like v3 this series is dependent on on 'bw/config-h' and
>   'bw/ls-files-sans-the-index'

I read the whole series and I consider it
Reviewed-by: Stefan Beller 

Thanks,
Stefan


[PATCH] name-rev: Fix tag lookup on repository with mixed types of tags

2017-06-22 Thread Orgad Shaneh
Commit 7550424804 (name-rev: include taggerdate in considering the best
name) introduced a bug in name-rev.

If a repository has both annotated and non-annotated tags, annotated
tag will always win, even if it was created decades after the commit.

Consider a repository that always used non-annotated tags, and at some
point started using annotated tags - name-rev --tags will return the
first annotated tags for all the old commits (in our repository it is
followed by ~5067 for one commit, or by ~120^2~21^2~88^2~87 for
another...). This is obviously not what the user expects.

The taggerdate should only be matched if *both tags* have it.
---
 builtin/name-rev.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/builtin/name-rev.c b/builtin/name-rev.c
index 92a5d8a5d2..8f77023482 100644
--- a/builtin/name-rev.c
+++ b/builtin/name-rev.c
@@ -46,11 +46,13 @@ static void name_rev(struct commit *commit,
  commit->util = name;
  goto copy_data;
  } else if (name->taggerdate > taggerdate ||
- (name->taggerdate == taggerdate &&
+ ((taggerdate == ULONG_MAX || name->taggerdate == taggerdate) &&
  name->distance > distance)) {
 copy_data:
  name->tip_name = tip_name;
- name->taggerdate = taggerdate;
+ if (taggerdate != ULONG_MAX) {
+ name->taggerdate = taggerdate;
+ }
  name->generation = generation;
  name->distance = distance;
  } else
-- 
2.13.1.windows.1.1.ga36e14b3aa


Re: [PATCH 0/5] Move unquote_path() from git-add--interactive.perl to Git.pm

2017-06-22 Thread Junio C Hamano
Thanks.  I had some comments, mostly on the structure of the series,
but overall it was a pleasant read that takes the code in the right
direction.



Re: [PATCH 3/5] Git::unquote_path() throw an exception on bad path

2017-06-22 Thread Junio C Hamano
Phillip Wood  writes:

> From: Phillip Wood 
>
> This is what the other routines in Git.pm do if there's an error.
>
> Signed-off-by: Phillip Wood 
> ---
>  perl/Git.pm | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/perl/Git.pm b/perl/Git.pm
> index 
> 889bf88cfcd34136e24e166fb3b72cced6debf9d..51cb3446088dd12e8fd93d47b95e29fab22a8466
>  100644
> --- a/perl/Git.pm
> +++ b/perl/Git.pm
> @@ -1493,8 +1493,8 @@ when not using -z
>   $_ = $2;
>   last;
>   }
> - # This is malformed -- just return it as-is for 
> now.
> - return $_[0];
> + # This is malformed
> + throw Error::Simple("Invalid quoted path 
> $_[0]");
>   }

Output from "git grep Error::Simple" tells me that mostly we do not
upcase the first word of the error message.  Perhaps this should
follow the convention.

It is a different matter that use of Error::Simple has been
discouraged for quite some time (e.g. *1*) and we haven't been got
around doing that.  When the clean-up happens, this code will have
to be migrated to a different mechanism together with 30+ existing
uses of it; in the scope of this series, throwing of Error::Simple 
is consistent with the remainder of Git.pm and is a good thing.

[References]

*1* 
https://public-inbox.org/git/1330809281-25774-1-git-send-email-jna...@gmail.com/


Re: [PATCH 1/5] Git.pm Add unquote_path()

2017-06-22 Thread Junio C Hamano
Phillip Wood  writes:

> Subject: Re: [PATCH 1/5] Git.pm Add unquote_path()

Subject: [PATCH 1/5] Git.pm: add unquote_path()

But I think it is more customary to remove the implementation in the
other file and adjust the existing callers to call this new one in
this same commit.  And then in follow-up commits, improve the
implementation in Git.pm.

When that happens, the subject would become

Subject: [PATCH 1/?] add-i: move unquote_path() to Git.pm

and the first sentence in the body would be tweaked ("move
unquote_path() from ... to Git.pm so that ...").

> From: Phillip Wood 
>
> Add unquote_path() from git-add--interactive so it can be used by
> other scripts. Note this is a straight copy, it does not handle
> '\a'. That will be fixed in the next commit

s/next commit/&./

Good to notice and document the lack of '\a' which does get used by
quote.c::sq_lookup[] when showing chr(7).

> Signed-off-by: Phillip Wood 
> ---
>  perl/Git.pm | 53 -
>  1 file changed, 52 insertions(+), 1 deletion(-)
>
> diff --git a/perl/Git.pm b/perl/Git.pm
> index 
> bfce1f795dfa5fea05f4f96637a1ae2333038735..8afde87fc8162271ba178e0fff3e921f070ac621
>  100644
> --- a/perl/Git.pm
> +++ b/perl/Git.pm
> @@ -61,7 +61,8 @@ require Exporter;
>  remote_refs prompt
>  get_tz_offset get_record
>  credential credential_read credential_write
> -temp_acquire temp_is_locked temp_release temp_reset 
> temp_path);
> +temp_acquire temp_is_locked temp_release temp_reset temp_path
> +unquote_path);
>  
>  
>  =head1 DESCRIPTION
> @@ -1451,6 +1452,56 @@ sub prefix_lines {
>   return $string;
>  }
>  
> +=item unquote_path ( PATH )
> +
> +Unquote a quoted path containing c-escapes as returned by ls-files etc.
> +when not using -z
> +
> +=cut
> +
> +{
> + my %unquote_map = (

The original calls this cquote_map, partly because there may be
other kinds of quoting possible.  I do not see a reason not to
follow suit here.

> + "b" => chr(8),
> + "t" => chr(9),
> + "n" => chr(10),
> + "v" => chr(11),
> + "f" => chr(12),
> + "r" => chr(13),
> + "\\" => "\\",
> + "\"" => "\""

In an early part of "sub unquote_path" we see dq spelled as "\042"
and in the original cquote_map, this is also defined with "\042".  I
do not think you want to change the above to make them inconsistent.

> + );
> +
> + sub unquote_path {
> + local ($_) = @_;
> + my ($retval, $remainder);
> + if (!/^\042(.*)\042$/) {
> + return $_;
> + }

Other than that, I can see this is just a straight-copy, which is
exactly what we want in an early step of a series like this.  Squash
in a change to remove the original from add-i and adjust the caller
to call this one (you may not have to do anything as it uses Git.pm
already, or perhaps "use Git qw(unquote_path)" or something?) and it
would be perfect ;-)

Thanks.



Re: [PATCH/FINAL] Documentation/git-submodule: cleanup

2017-06-22 Thread Stefan Beller
On Thu, Jun 22, 2017 at 12:07 PM, Junio C Hamano  wrote:
> Kaartic Sivaraam  writes:
>
>> The "add" section for 'git-submodule' is redundant in
>> its description and the short synopsis line. Fix it.
>>
>> Remove the redundant mentioning of the 'repository' argument
>> being mandatory.
>>
>> The text is hard to read because of back-references, so remove
>> those.
>>
>> Replace the word "humanish" by "canonical" as that conveys better
>> what we do to guess the path.
>>
>> While at it, quote all occurrences of '.gitmodules' as that is an
>> important file in the submodule context, also link to it on its
>> first mention.
>>
>> Helped-by: Stefan Beller 
>> Signed-off-by: Kaartic Sivaraam 
>> ---
>
> Stefan, do you want to add a Reviewed-by: to this one?  I've scanned
> it over and overall it looked OK (iow I didn't find anything worth
> complaining about).

I found it a pleasant read, thanks.

As all occurrences of '.gitmodules' are touched, I suspect we will run into
a conflict with sb/submodule-doc, such that I will rebase on top of this.

Thanks,
Stefan


Re: [PATCH/FINAL] Documentation/git-submodule: cleanup

2017-06-22 Thread Junio C Hamano
Kaartic Sivaraam  writes:

> The "add" section for 'git-submodule' is redundant in
> its description and the short synopsis line. Fix it.
>
> Remove the redundant mentioning of the 'repository' argument
> being mandatory.
>
> The text is hard to read because of back-references, so remove
> those.
>
> Replace the word "humanish" by "canonical" as that conveys better
> what we do to guess the path.
>
> While at it, quote all occurrences of '.gitmodules' as that is an
> important file in the submodule context, also link to it on its
> first mention.
>
> Helped-by: Stefan Beller 
> Signed-off-by: Kaartic Sivaraam 
> ---

Stefan, do you want to add a Reviewed-by: to this one?  I've scanned
it over and overall it looked OK (iow I didn't find anything worth
complaining about).

Thanks.

>  Documentation/git-submodule.txt | 49 
> ++---
>  1 file changed, 21 insertions(+), 28 deletions(-)
>
> diff --git a/Documentation/git-submodule.txt b/Documentation/git-submodule.txt
> index 74bc6200d564c..6e07bade39a7c 100644
> --- a/Documentation/git-submodule.txt
> +++ b/Documentation/git-submodule.txt
> @@ -63,14 +63,6 @@ add [-b ] [-f|--force] [--name ] 
> [--reference ] [--dep
>   to the changeset to be committed next to the current
>   project: the current project is termed the "superproject".
>  +
> -This requires at least one argument: . The optional
> -argument  is the relative location for the cloned submodule
> -to exist in the superproject. If  is not given, the
> -"humanish" part of the source repository is used ("repo" for
> -"/path/to/repo.git" and "foo" for "host.xz:foo/.git").
> -The  is also used as the submodule's logical name in its
> -configuration entries unless `--name` is used to specify a logical name.
> -+
>   is the URL of the new submodule's origin repository.
>  This may be either an absolute URL, or (if it begins with ./
>  or ../), the location relative to the superproject's default remote
> @@ -87,21 +79,22 @@ If the superproject doesn't have a default remote 
> configured
>  the superproject is its own authoritative upstream and the current
>  working directory is used instead.
>  +
> - is the relative location for the cloned submodule to
> -exist in the superproject. If  does not exist, then the
> -submodule is created by cloning from the named URL. If  does
> -exist and is already a valid Git repository, then this is added
> -to the changeset without cloning. This second form is provided
> -to ease creating a new submodule from scratch, and presumes
> -the user will later push the submodule to the given URL.
> +The optional argument  is the relative location for the cloned
> +submodule to exist in the superproject. If  is not given, the
> +canonical part of the source repository is used ("repo" for
> +"/path/to/repo.git" and "foo" for "host.xz:foo/.git"). If 
> +exists and is already a valid Git repository, then it is staged
> +for commit without cloning. The  is also used as the submodule's
> +logical name in its configuration entries unless `--name` is used
> +to specify a logical name.
>  +
> -In either case, the given URL is recorded into .gitmodules for
> -use by subsequent users cloning the superproject. If the URL is
> -given relative to the superproject's repository, the presumption
> -is the superproject and submodule repositories will be kept
> -together in the same relative location, and only the
> -superproject's URL needs to be provided: git-submodule will correctly
> -locate the submodule using the relative URL in .gitmodules.
> +The given URL is recorded into `.gitmodules` for use by subsequent users
> +cloning the superproject. If the URL is given relative to the
> +superproject's repository, the presumption is the superproject and
> +submodule repositories will be kept together in the same relative
> +location, and only the superproject's URL needs to be provided.
> +git-submodule will correctly locate the submodule using the relative
> +URL in `.gitmodules`.
>  
>  status [--cached] [--recursive] [--] [...]::
>   Show the status of the submodules. This will print the SHA-1 of the
> @@ -123,7 +116,7 @@ too (and can also report changes to a submodule's work 
> tree).
>  init [--] [...]::
>   Initialize the submodules recorded in the index (which were
>   added and committed elsewhere) by setting `submodule.$name.url`
> - in .git/config. It uses the same setting from .gitmodules as
> + in .git/config. It uses the same setting from `.gitmodules` as
>   a template. If the URL is relative, it will be resolved using
>   the default remote. If there is no default remote, the current
>   repository will be assumed to be upstream.
> @@ -197,7 +190,7 @@ configuration variable:
>   none;; the submodule is not updated.
>  
>  If the submodule is not yet initialized, and you just want to use the
> -setting as stored in 

Re: Truncating HEAD reflog on branch move

2017-06-22 Thread Junio C Hamano
Jeff King  writes:

> I don't think this is quite right, though. We've decremented "recno"
> after assigning the old pointer to "reflog". So in the existing code,
> "reflog" in that second conditional pointing to the _next_ entry (or
> previous, really, since we are going in reverse order).
>
> So I think you'd need to look at commit->reflog again (after checking
> that we didn't go past the start of the array).

Perhaps.  I did the illustration that way simply because I was not
sure if the current "the entry was NULL from something new, so skip
and look at the previous entry's new" was correct to begin with.



[PATCH 3/3] t1700: make sure split-index respects core.sharedrepository

2017-06-22 Thread Christian Couder
Add a test to check that both the split-index file and the
shared-index file are created using the right permissions
when core.sharedrepository is set.

Signed-off-by: Christian Couder 
---
 t/t1700-split-index.sh | 12 
 1 file changed, 12 insertions(+)

diff --git a/t/t1700-split-index.sh b/t/t1700-split-index.sh
index af3ec0da5a..a52b92e82b 100755
--- a/t/t1700-split-index.sh
+++ b/t/t1700-split-index.sh
@@ -370,4 +370,16 @@ test_expect_success 'check splitIndex.sharedIndexExpire 
set to "never" and "now"
test $(ls .git/sharedindex.* | wc -l) -le 2
 '
 
+test_expect_success POSIXPERM 'split index respects core.sharedrepository' '
+   git config core.sharedrepository 0666 &&
+   : >seventeen &&
+   git update-index --add seventeen &&
+   echo "-rw-rw-rw-" >expect &&
+   test_modebits .git/index >actual &&
+   test_cmp expect actual &&
+   newest_shared_index=$(ls -t .git/sharedindex.* | head -1) &&
+   test_modebits "$newest_shared_index" >actual &&
+   test_cmp expect actual
+'
+
 test_done
-- 
2.13.1.516.g05ec6e13aa



[PATCH 1/3] read-cache: use shared perms when writing shared index

2017-06-22 Thread Christian Couder
Since f6ecc62dbf (write_shared_index(): use tempfile module, 2015-08-10)
write_shared_index() has been using mks_tempfile() to create the
temporary file that will become the shared index.

But even before that, it looks like the functions used to create this
file didn't call adjust_shared_perm(), which means that the shared
index file has always been created with 600 permissions regardless
of the shared permission settings.

This means that on repositories created with `git init --shared=all`
and using the split index feature one gets an error like:

fatal: .git/sharedindex.a52f910b489bc462f187ab572ba0086f7b5157de: index file 
open failed: Permission denied

when another user performs any operation that reads the shared index.

Let's fix that by using create_tempfile() instead of mks_tempfile()
to create the shared index file.

Signed-off-by: Christian Couder 
---
 read-cache.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/read-cache.c b/read-cache.c
index bc156a133e..eb71e93aa4 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -2414,7 +2414,7 @@ static int write_shared_index(struct index_state *istate,
struct split_index *si = istate->split_index;
int fd, ret;
 
-   fd = mks_tempfile(_sharedindex, 
git_path("sharedindex_XX"));
+   fd = create_tempfile(_sharedindex, 
git_path("sharedindex_XX"));
if (fd < 0) {
hashclr(si->base_sha1);
return do_write_locked_index(istate, lock, flags);
-- 
2.13.1.516.g05ec6e13aa



[PATCH 2/3] t1301: move movebits() to test-lib-functions.sh

2017-06-22 Thread Christian Couder
As the movebits() function can be useful outside t1301,
let's move it into test-lib-functions.sh, and while at
it let's rename it test_movebits().

Signed-off-by: Christian Couder 
---
 t/t1301-shared-repo.sh  | 18 +++---
 t/test-lib-functions.sh |  5 +
 2 files changed, 12 insertions(+), 11 deletions(-)

diff --git a/t/t1301-shared-repo.sh b/t/t1301-shared-repo.sh
index 1312004f8c..dfece751b5 100755
--- a/t/t1301-shared-repo.sh
+++ b/t/t1301-shared-repo.sh
@@ -19,10 +19,6 @@ test_expect_success 'shared = 0400 (faulty permission u-w)' '
)
 '
 
-modebits () {
-   ls -l "$1" | sed -e 's|^\(..\).*|\1|'
-}
-
 for u in 002 022
 do
test_expect_success POSIXPERM "shared=1 does not clear bits preset by 
umask $u" '
@@ -88,7 +84,7 @@ do
 
rm -f .git/info/refs &&
git update-server-info &&
-   actual="$(modebits .git/info/refs)" &&
+   actual="$(test_modebits .git/info/refs)" &&
verbose test "x$actual" = "x-$y"
 
'
@@ -98,7 +94,7 @@ do
 
rm -f .git/info/refs &&
git update-server-info &&
-   actual="$(modebits .git/info/refs)" &&
+   actual="$(test_modebits .git/info/refs)" &&
verbose test "x$actual" = "x-$x"
 
'
@@ -111,7 +107,7 @@ test_expect_success POSIXPERM 'info/refs respects umask in 
unshared repo' '
umask 002 &&
git update-server-info &&
echo "-rw-rw-r--" >expect &&
-   modebits .git/info/refs >actual &&
+   test_modebits .git/info/refs >actual &&
test_cmp expect actual
 '
 
@@ -177,7 +173,7 @@ test_expect_success POSIXPERM 'remote init does not use 
config from cwd' '
umask 0022 &&
git init --bare child.git &&
echo "-rw-r--r--" >expect &&
-   modebits child.git/config >actual &&
+   test_modebits child.git/config >actual &&
test_cmp expect actual
 '
 
@@ -187,7 +183,7 @@ test_expect_success POSIXPERM 're-init respects 
core.sharedrepository (local)' '
echo whatever >templates/foo &&
git init --template=templates &&
echo "-rw-rw-rw-" >expect &&
-   modebits .git/foo >actual &&
+   test_modebits .git/foo >actual &&
test_cmp expect actual
 '
 
@@ -198,7 +194,7 @@ test_expect_success POSIXPERM 're-init respects 
core.sharedrepository (remote)'
test_path_is_missing child.git/foo &&
git init --bare --template=../templates child.git &&
echo "-rw-rw-rw-" >expect &&
-   modebits child.git/foo >actual &&
+   test_modebits child.git/foo >actual &&
test_cmp expect actual
 '
 
@@ -209,7 +205,7 @@ test_expect_success POSIXPERM 'template can set 
core.sharedrepository' '
cp .git/config templates/config &&
git init --bare --template=../templates child.git &&
echo "-rw-rw-rw-" >expect &&
-   modebits child.git/HEAD >actual &&
+   test_modebits child.git/HEAD >actual &&
test_cmp expect actual
 '
 
diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh
index 5ee124332a..db622c3555 100644
--- a/t/test-lib-functions.sh
+++ b/t/test-lib-functions.sh
@@ -216,6 +216,11 @@ test_chmod () {
git update-index --add "--chmod=$@"
 }
 
+# Get the modebits from a file.
+test_modebits () {
+   ls -l "$1" | sed -e 's|^\(..\).*|\1|'
+}
+
 # Unset a configuration variable, but don't fail if it doesn't exist.
 test_unconfig () {
config_dir=
-- 
2.13.1.516.g05ec6e13aa



Re: [PATCHv2] submodules: overhaul documentation

2017-06-22 Thread Stefan Beller
>> + * the `.gitmodules` file inside the superproject. A project usually
>> +   includes this file to suggest defaults for the upstream collection
>> +   of repositories.
>
> I dislike this last point.  Realistically we don't want this right?  So
> perhaps we shouldn't include it?

Well, it describes the current situation accurately. In a resend
I'll de-emphasize it.


[PATCH v4 12/20] path: add repo_git_path and strbuf_repo_git_path

2017-06-22 Thread Brandon Williams
Introduce 'repo_git_path' and 'strbuf_repo_git_path' which take a
repository struct and constructs a path into the repository's git
directory.

Signed-off-by: Brandon Williams 
---
 path.c | 21 +
 path.h |  8 
 2 files changed, 29 insertions(+)

diff --git a/path.c b/path.c
index 2bdd0044f..ffc0f10fd 100644
--- a/path.c
+++ b/path.c
@@ -416,6 +416,27 @@ static void do_git_path(const struct repository *repo,
strbuf_cleanup_path(buf);
 }
 
+char *repo_git_path(const struct repository *repo,
+   const char *fmt, ...)
+{
+   struct strbuf path = STRBUF_INIT;
+   va_list args;
+   va_start(args, fmt);
+   do_git_path(repo, NULL, , fmt, args);
+   va_end(args);
+   return strbuf_detach(, NULL);
+}
+
+void strbuf_repo_git_path(struct strbuf *sb,
+ const struct repository *repo,
+ const char *fmt, ...)
+{
+   va_list args;
+   va_start(args, fmt);
+   do_git_path(repo, NULL, sb, fmt, args);
+   va_end(args);
+}
+
 char *git_path_buf(struct strbuf *buf, const char *fmt, ...)
 {
va_list args;
diff --git a/path.h b/path.h
index 568d63be5..c779c4aa2 100644
--- a/path.h
+++ b/path.h
@@ -35,6 +35,14 @@ extern char *mkpathdup(const char *fmt, ...)
 extern char *git_pathdup_submodule(const char *path, const char *fmt, ...)
__attribute__((format (printf, 2, 3)));
 
+extern char *repo_git_path(const struct repository *repo,
+  const char *fmt, ...)
+   __attribute__((format (printf, 2, 3)));
+extern void strbuf_repo_git_path(struct strbuf *sb,
+const struct repository *repo,
+const char *fmt, ...)
+   __attribute__((format (printf, 3, 4)));
+
 extern void report_linked_checkout_garbage(void);
 
 /*
-- 
2.13.1.704.gde00cce3c-goog



[PATCH v4 15/20] repository: add index_state to struct repo

2017-06-22 Thread Brandon Williams
Signed-off-by: Brandon Williams 
---
 repository.c | 16 
 repository.h |  9 +
 2 files changed, 25 insertions(+)

diff --git a/repository.c b/repository.c
index 686a964ad..6f6f4d91e 100644
--- a/repository.c
+++ b/repository.c
@@ -163,4 +163,20 @@ void repo_clear(struct repository *repo)
free(repo->config);
repo->config = NULL;
}
+
+   if (repo->index) {
+   discard_index(repo->index);
+   free(repo->index);
+   repo->index = NULL;
+   }
+}
+
+int repo_read_index(struct repository *repo)
+{
+   if (!repo->index)
+   repo->index = xcalloc(1, sizeof(*repo->index));
+   else
+   discard_index(repo->index);
+
+   return read_index_from(repo->index, repo->index_file);
 }
diff --git a/repository.h b/repository.h
index 8ae5e8653..3a41568aa 100644
--- a/repository.h
+++ b/repository.h
@@ -2,6 +2,7 @@
 #define REPOSITORY_H
 
 struct config_set;
+struct index_state;
 
 struct repository {
/* Environment */
@@ -49,6 +50,12 @@ struct repository {
 */
struct config_set *config;
 
+   /*
+* Repository's in-memory index.
+* 'repo_read_index()' can be used to populate 'index'.
+*/
+   struct index_state *index;
+
/* Configurations */
/*
 * Bit used during initialization to indicate if repository state (like
@@ -71,4 +78,6 @@ extern void repo_set_worktree(struct repository *repo, const 
char *path);
 extern int repo_init(struct repository *repo, const char *gitdir, const char 
*worktree);
 extern void repo_clear(struct repository *repo);
 
+extern int repo_read_index(struct repository *repo);
+
 #endif /* REPOSITORY_H */
-- 
2.13.1.704.gde00cce3c-goog



  1   2   >