[PATCH] name-rev: change a "long" variable to timestamp_t

2017-05-19 Thread Junio C Hamano
Earlier dddbad72 ("timestamp_t: a new data type for timestamps",
2017-04-26) updated all in-core variables, fields and function
return values that are used to store "seconds since epoch" to a new
type timestamp_t.  Unfortunately one variable "cutoff", which is
used to keep track of the oldest timestamp of commit we saw on the
command line, was "long", and left behind.

Update it to timestamp_t as well.

Signed-off-by: Junio C Hamano 
---

 * Noticed while merging the topic 'mg/name-rev-debug' to 'pu'.

 builtin/name-rev.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/builtin/name-rev.c b/builtin/name-rev.c
index 4437475017..1f6fcae121 100644
--- a/builtin/name-rev.c
+++ b/builtin/name-rev.c
@@ -15,7 +15,7 @@ typedef struct rev_name {
int distance;
 } rev_name;
 
-static long cutoff = LONG_MAX;
+static timestamp_t cutoff = TIME_MAX;
 
 /* How many generations are maximally preferred over _one_ merge traversal? */
 #define MERGE_TRAVERSAL_WEIGHT 65535
-- 
2.13.0-439-gd4feae47b3



Re: [PATCH v3 3/4] name-rev: provide debug output

2017-05-19 Thread Junio C Hamano
Michael J Gruber  writes:

> I think I should change 3/4 to display exactly those bits that name-rev
> actually uses for weighing different possible descriptions; they are
> differents from the "describe-bits". So please withhold 3/4 and 4/4.

OK, I think 1&2/4 from this series can progress before that as it is
an end-user visible improvement.  While looking at it, I also found
a variable that recent "timestamp_t" series didn't notice and
update, so perhaps 1&2/4 needs to be rebased on a fix for that
anyway ;-)

Thanks.  Will hold 3&4/4 for now.


Re: [PATCHv2 12/20] submodule.c: convert show_submodule_summary to use emit_line_fmt

2017-05-19 Thread Junio C Hamano
Stefan Beller  writes:

> That could be added in ws.c:ws_check_emit, as these certain words
> are similar to coloring whitespace.

I actually was envisioning of highlighting a part of a line, like

-Very poor SCM
+Very nice SCM

which would be done by finding semi-matching removed and added lines
in the same hunk (i.e. local buffering) and makes a coloring decision.
That does not have any place in ws.c.

>> Having said that, we need to start somewhere, and I think it is a
>> reasonable first-cut attempt to work on top of the textual output
>> like this series does (IOW, while I do agree with the NEEDSWORK and
>> the way this series currently does things must be revamped in the
>> longer term, I do not think we should wait until that happens to
>> start playing with this topic).
>
> Ok. I share a similar reaction to submodule diffs that we discuss above
> and word coloring, that Jonathan Tan brought up off list.
>
> Both of them are broken in this implementation, but the NEEDSWORK
> would hint at how to fix them.

Yes, but if NEEDSWORK has to say "the current hack is working at a
wrong level, we need to do all of this before producing textual
diffs that are passed to the layer that colors lines", that wouldn't
help that much as a hint X-<.


Re: Git log.decorate can't be overridden in global config (v2.13.0)

2017-05-19 Thread Junio C Hamano
Todd Zullinger  writes:

> I believe a patch for this is on the ah/log-decorate-default-to-auto
> branch, courtesy of Brian Carlson (Cc:d):
>
>https://github.com/gitster/git/commit/c74271aae7
>
> The relevant list thread is here:
>
>
> https://public-inbox.org/git/20170512213407.46251-1-sand...@crustytoothpaste.net/
>
> Cheers,

Thanks.  

We should merge this (and other regression fixes, if any) and tag
2.13.1 during the next two weeks.




Re: Git log.decorate can't be overridden in global config (v2.13.0)

2017-05-19 Thread Todd Zullinger

Hi Caleb,

Caleb Evans wrote:
I recently updated to Git v2.13.0 (macOS Sierra via Homebrew), and I 
noticed that the default --decorate option for `git log` has changed 
from --decorate=no to --decorate=auto. I'd prefer to keep decoration 
disabled to minimize clutter in the logs, so I try disabling it in 
my global .gitconfig (~/.gitconfig) like so:


git config --global log.decorate none

However, this does not seem to disable the log decorations. Running 
`git log` again still shows the decorations. Strangely, running `git 
config log.decorate none` does seem to disable decoration, but I 
would prefer not to do this on a per-repository basis. I can only 
suspect that this is a Git bug, as my log.decorate global config 
setting is not being honored. I have already confirmed that my 
repository's .git/config is not overriding my global preference.


I hope this should be a simple issue, but please let me know if you 
have questions.


I believe a patch for this is on the ah/log-decorate-default-to-auto 
branch, courtesy of Brian Carlson (Cc:d):


   https://github.com/gitster/git/commit/c74271aae7

The relevant list thread is here:

   
https://public-inbox.org/git/20170512213407.46251-1-sand...@crustytoothpaste.net/

Cheers,

--
Todd
~~
Ambition is a poor excuse for not having enough sense to be lazy.



Git log.decorate can't be overridden in global config (v2.13.0)

2017-05-19 Thread Caleb Evans
Hi,

I recently updated to Git v2.13.0 (macOS Sierra via Homebrew), and I noticed 
that the default --decorate option for `git log` has changed from --decorate=no 
to --decorate=auto. I'd prefer to keep decoration disabled to minimize clutter 
in the logs, so I try disabling it in my global .gitconfig (~/.gitconfig) like 
so:

git config --global log.decorate none

However, this does not seem to disable the log decorations. Running `git log` 
again still shows the decorations. Strangely, running `git config log.decorate 
none` does seem to disable decoration, but I would prefer not to do this on a 
per-repository basis. I can only suspect that this is a Git bug, as my 
log.decorate global config setting is not being honored. I have already 
confirmed that my repository's .git/config is not overriding my global 
preference.

I hope this should be a simple issue, but please let me know if you have 
questions.

Thanks,
Caleb Evans





Re: [Bug] git branch -v has problems with carriage returns

2017-05-19 Thread Animi Vulpis
No problem, thanks for taking the time to help me.

I managed to create a minimal repository that shows the bug.
(I was able to deploy gitlab-ce-v8.15.8-ce.0 from docker locally and
create the repo, create the merge request and merge it)

I created a github repository so everybody interested can use it:
https://github.com/AnimiVulpis/git-bug
A few additional informations are in the README.md inside the repository.

FYI: I also tried a lot of things to create commit messages with \r\n
but without success. git does a good job preventing this.

Based on the history of the homebrew git formula
(https://github.com/Homebrew/homebrew-core/commits/master/Formula/git.rb)
and the fact that I `brew udpate` at least once a week I am pretty
sure that this bug does not exist in
git v2.12.2

Hope that helps
Have a nice weekend
David

2017-05-19 23:55 GMT+02:00 Atousa Duprat :
> Sorry for the noise with previous response...
>
> I have tried to repro this issue but git goes out of its way to store
> the commit messages using unix end-of-line format.
> I think that git itself cannot create a repo exhibiting this problem.
>
> Most helpful would be if you could create a mini repo using gitlab.
> All it would need is one file, two branches, and a merge.
> With that in hand, it should be pretty easy to track down the problem
> and fix git.
>
> You mentioned that the previous version you were using was working
> fine, can you tell me which version that was?
> It'll help to narrow down the changes that could have affected the issue.
>
> Thanks,
>
> Atousa
>
> On Tue, May 16, 2017 at 4:22 PM, Animi Vulpis  wrote:
>> Hi,
>>
>> I upgraded to git v2.13.0 and since then git branch -v has problems
>> with carriage returns in subject lines.
>>
>> We are using gitlab (not the newest version). So this bug (It's about
>> carriage returns in auto-generated merge messages (\r\n)) is not yet
>> fixed in our version:
>> https://gitlab.com/gitlab-org/gitlab-ce/issues/31671
>> That's were the carriage returns are coming from.
>>
>> In my specific case the auto-generated merge message has three lines
>> with empty lines in between.
>> So every line ends with `\r\n\r\n`
>>
>> If I do `git branch -v` with such a subject line somehow the third and
>> second line get combined before the hash. Example:
>>
>> $ git branch -v
>> See merge request ! temp space 84e18d22fd Merge branch
>> 'feature-XXX' into 'develop'
>> #  > third)>  
>>
>> Before git v2.13.0 `git branch -v` worked completely normal.
>>
>> I was not able to create a minimal local example, because my manually
>> created \r\n in commit messages were transformed into \n\n
>>
>> Please let me know if I can provide any more information that would be 
>> helpful.
>>
>> Cheers


Re: persistent-https, url insteadof, and `git submodule`

2017-05-19 Thread Dennis Kaarsemaker
On Fri, 2017-05-19 at 23:43 +0200, Dennis Kaarsemaker wrote:
> On Fri, 2017-05-19 at 14:57 -0500, Elliott Cable wrote:
> > Set up `persistent-https` as described in the [README][]; including the
> > ‘rewrite https urls’ feature in `.gitconfig`:
> > 
> > [url "persistent-https"]
> > insteadof = https
> > [url "persistent-http"]
> > insteadof = http
> > 
> > Unfortunately, this breaks `git submodule add`:
> > 
> > > git submodule add https://github.com/nodenv/nodenv.git \
> > ./Vendor/nodenv
> > Cloning into '/Users/ec/Library/System Repo/Vendor/nodenv'...
> > fatal: transport 'persistent-https' not allowed
> > fatal: clone of 'https://github.com/nodenv/nodenv.git' into
> > submodule path '/Users/ec/Library/System Repo/Vendor/nodenv' failed
> > 
> > Presumably this isn't intended behaviour?
> 
> It actually is. git-submodule sets GIT_PROTOCOL_FROM_USER to 0, which
> makes git not trust any urls except http(s), git, ssh and file urls
> unless you explicitely configure git to allow it. See the
> GIT_ALLOW_PROTOCOL section in man git and the git-config section it
> links to.

33cfccbbf3 (submodule: allow only certain protocols for submodule
fetches, 2015-09-16) says:

submodule: allow only certain protocols for submodule fetches

Some protocols (like git-remote-ext) can execute arbitrary
code found in the URL. The URLs that submodules use may come
from arbitrary sources (e.g., .gitmodules files in a remote
repository). Let's restrict submodules to fetching from a
known-good subset of protocols.

Note that we apply this restriction to all submodule
commands, whether the URL comes from .gitmodules or not.
This is more restrictive than we need to be; for example, in
the tests we run:

  git submodule add ext::...

which should be trusted, as the URL comes directly from the
command line provided by the user. But doing it this way is
simpler, and makes it much less likely that we would miss a
case. And since such protocols should be an exception
(especially because nobody who clones from them will be able
to update the submodules!), it's not likely to inconvenience
anyone in practice.


D.



Re: [Bug] git branch -v has problems with carriage returns

2017-05-19 Thread Atousa Duprat
Sorry for the noise with previous response...

I have tried to repro this issue but git goes out of its way to store
the commit messages using unix end-of-line format.
I think that git itself cannot create a repo exhibiting this problem.

Most helpful would be if you could create a mini repo using gitlab.
All it would need is one file, two branches, and a merge.
With that in hand, it should be pretty easy to track down the problem
and fix git.

You mentioned that the previous version you were using was working
fine, can you tell me which version that was?
It'll help to narrow down the changes that could have affected the issue.

Thanks,

Atousa

On Tue, May 16, 2017 at 4:22 PM, Animi Vulpis  wrote:
> Hi,
>
> I upgraded to git v2.13.0 and since then git branch -v has problems
> with carriage returns in subject lines.
>
> We are using gitlab (not the newest version). So this bug (It's about
> carriage returns in auto-generated merge messages (\r\n)) is not yet
> fixed in our version:
> https://gitlab.com/gitlab-org/gitlab-ce/issues/31671
> That's were the carriage returns are coming from.
>
> In my specific case the auto-generated merge message has three lines
> with empty lines in between.
> So every line ends with `\r\n\r\n`
>
> If I do `git branch -v` with such a subject line somehow the third and
> second line get combined before the hash. Example:
>
> $ git branch -v
> See merge request ! temp space 84e18d22fd Merge branch
> 'feature-XXX' into 'develop'
> #   third)>  
>
> Before git v2.13.0 `git branch -v` worked completely normal.
>
> I was not able to create a minimal local example, because my manually
> created \r\n in commit messages were transformed into \n\n
>
> Please let me know if I can provide any more information that would be 
> helpful.
>
> Cheers


Re: Delivery Status Notification (Failure)

2017-05-19 Thread Atousa Duprat
Hi,

I have tried to repro this issue but git goes out of its way to store
the commit messages using unix end-of-line format.
I think that git itself cannot create a repo exhibiting this problem.

Most helpful would be if you could create a mini repo using gitlab.
All it would need is one file, two branches, and a merge.
With that in hand, it should be pretty easy to track down the problem
and fix git.

You mentioned that the previous version you were using was working
fine, can you tell me which version that was?
It'll help to narrow down the changes that could have affected the issue.

Thanks,

Atousa


Re: persistent-https, url insteadof, and `git submodule`

2017-05-19 Thread Dennis Kaarsemaker
On Fri, 2017-05-19 at 14:57 -0500, Elliott Cable wrote:
> Set up `persistent-https` as described in the [README][]; including the
> ‘rewrite https urls’ feature in `.gitconfig`:
> 
> [url "persistent-https"]
> insteadof = https
> [url "persistent-http"]
> insteadof = http
> 
> Unfortunately, this breaks `git submodule add`:
> 
> > git submodule add https://github.com/nodenv/nodenv.git \
> ./Vendor/nodenv
> Cloning into '/Users/ec/Library/System Repo/Vendor/nodenv'...
> fatal: transport 'persistent-https' not allowed
> fatal: clone of 'https://github.com/nodenv/nodenv.git' into
> submodule path '/Users/ec/Library/System Repo/Vendor/nodenv' failed
> 
> Presumably this isn't intended behaviour?

It actually is. git-submodule sets GIT_PROTOCOL_FROM_USER to 0, which
makes git not trust any urls except http(s), git, ssh and file urls
unless you explicitely configure git to allow it. See the
GIT_ALLOW_PROTOCOL section in man git and the git-config section it
links to.

D.


Re: [PATCH v2] send-email: Net::SMTP::SSL is obsolete, use only when necessary

2017-05-19 Thread Dennis Kaarsemaker
Second ping. This problem is not going away, so if this solution is not
acceptable, I'd like to know what needs to be improved.

On Thu, 2017-05-04 at 09:01 +0200, Dennis Kaarsemaker wrote:
> Ping. It's a little over a month since I sent this, but I haven't seen
> any comments. Is this commit good to go?
> 
> On Fri, 2017-03-24 at 22:37 +0100, Dennis Kaarsemaker wrote:
> > Net::SMTP itself can do the necessary SSL and STARTTLS bits just fine
> > since version 1.28, and Net::SMTP::SSL is now deprecated. Since 1.28
> > isn't that old yet, keep the old code in place and use it when
> > necessary.
> > 
> > While we're in the area, mark some messages for translation that were
> > not yet marked as such.
> > 
> > Signed-off-by: Dennis Kaarsemaker 
> > ---
> >  git-send-email.perl | 54 
> > ++---
> >  1 file changed, 35 insertions(+), 19 deletions(-)
> > 
> > diff --git a/git-send-email.perl b/git-send-email.perl
> > index eea0a517f7..0d90439d9a 100755
> > --- a/git-send-email.perl
> > +++ b/git-send-email.perl
> > @@ -1353,10 +1353,12 @@ EOF
> > die __("The required SMTP server is not properly 
> > defined.")
> > }
> >  
> > +   require Net::SMTP;
> > +   my $use_net_smtp_ssl = version->parse($Net::SMTP::VERSION) < 
> > version->parse("1.28");
> > +   $smtp_domain ||= maildomain();
> > +
> > if ($smtp_encryption eq 'ssl') {
> > $smtp_server_port ||= 465; # ssmtp
> > -   require Net::SMTP::SSL;
> > -   $smtp_domain ||= maildomain();
> > require IO::Socket::SSL;
> >  
> > # Suppress "variable accessed once" warning.
> > @@ -1368,34 +1370,48 @@ EOF
> > # Net::SMTP::SSL->new() does not forward any SSL options
> > IO::Socket::SSL::set_client_defaults(
> > ssl_verify_params());
> > -   $smtp ||= Net::SMTP::SSL->new($smtp_server,
> > - Hello => $smtp_domain,
> > - Port => $smtp_server_port,
> > - Debug => $debug_net_smtp);
> > +
> > +   if ($use_net_smtp_ssl) {
> > +   require Net::SMTP::SSL;
> > +   $smtp ||= Net::SMTP::SSL->new($smtp_server,
> > + Hello => 
> > $smtp_domain,
> > + Port => 
> > $smtp_server_port,
> > + Debug => 
> > $debug_net_smtp);
> > +   }
> > +   else {
> > +   $smtp ||= Net::SMTP->new($smtp_server,
> > +Hello => $smtp_domain,
> > +Port => 
> > $smtp_server_port,
> > +Debug => 
> > $debug_net_smtp,
> > +SSL => 1);
> > +   }
> > }
> > else {
> > -   require Net::SMTP;
> > -   $smtp_domain ||= maildomain();
> > $smtp_server_port ||= 25;
> > $smtp ||= Net::SMTP->new($smtp_server,
> >  Hello => $smtp_domain,
> >  Debug => $debug_net_smtp,
> >  Port => $smtp_server_port);
> > if ($smtp_encryption eq 'tls' && $smtp) {
> > -   require Net::SMTP::SSL;
> > -   $smtp->command('STARTTLS');
> > -   $smtp->response();
> > -   if ($smtp->code == 220) {
> > +   if ($use_net_smtp_ssl) {
> > +   $smtp->command('STARTTLS');
> > +   $smtp->response();
> > +   if ($smtp->code != 220) {
> > +   die sprintf(__("Server does not 
> > support STARTTLS! %s"), $smtp->message);
> > +   }
> > +   require Net::SMTP::SSL;
> > $smtp = Net::SMTP::SSL->start_SSL($smtp,
> >   
> > ssl_verify_params())
> > -   or die "STARTTLS failed! 
> > ".IO::Socket::SSL::errstr();
> > -   $smtp_encryption = '';
> > -   # Send EHLO again to receive fresh
> > -   # supported commands
> > -

persistent-https, url insteadof, and `git submodule`

2017-05-19 Thread Elliott Cable
Set up `persistent-https` as described in the [README][]; including the
‘rewrite https urls’ feature in `.gitconfig`:

[url "persistent-https"]
insteadof = https
[url "persistent-http"]
insteadof = http

Unfortunately, this breaks `git submodule add`:

> git submodule add https://github.com/nodenv/nodenv.git \
./Vendor/nodenv
Cloning into '/Users/ec/Library/System Repo/Vendor/nodenv'...
fatal: transport 'persistent-https' not allowed
fatal: clone of 'https://github.com/nodenv/nodenv.git' into
submodule path '/Users/ec/Library/System Repo/Vendor/nodenv' failed

Presumably this isn't intended behaviour?

   [README]: 



⁓ ELLIOTTCABLE — fly safe.
  http://ell.io/tt


Re: [PATCHv3 20/20] diff.c: color moved lines differently

2017-05-19 Thread Jonathan Tan
On Fri, May 19, 2017 at 11:40 AM, Stefan Beller  wrote:
>>> +static unsigned get_line_hash(struct buffered_patch_line *line, unsigned 
>>> ignore_ws)
>>> +{
>>> + static struct strbuf sb = STRBUF_INIT;
>>> +
>>> + if (ignore_ws) {
>>> + strbuf_reset();
>>> + get_ws_cleaned_string(line, );
>>
>> Memory leak here, I think.
>
> It's static, so we don't care.
> I can make it non-static and release the memory in a resend.

Ah, I missed the "static". It seems that "static" is used elsewhere
too, so these functions are not reentrant anyway, so this is fine.


Re: [PATCHv3 20/20] diff.c: color moved lines differently

2017-05-19 Thread Stefan Beller
On Fri, May 19, 2017 at 11:23 AM, Jonathan Tan  wrote:
> On Thu, 18 May 2017 12:37:46 -0700
> Stefan Beller  wrote:
>
> [snip]
>
>> Instead this provides a dynamic programming greedy algorithm that
>
> Not sure if this is called "dynamic programming".

https://loveforprogramming.quora.com/Backtracking-Memoization-Dynamic-Programming
http://stackoverflow.com/questions/3592943/difference-between-back-tracking-and-dynamic-programming

Instead of doing backtracking (finding the lengthiest hunk for
each line), we keep a set of potential hunks around, this sounds
very much like the examples given in these links.


> The first part of the commit message could probably be written more
> concisely, like the following:
...
> Having said that, thanks - this version is much more like what I would
> expect.

Thanks for giving a more concise commit message, will fix in a reroll.


>
>> +static int buffered_patch_line_cmp_no_ws(const struct buffered_patch_line 
>> *a,
>
>> +static int buffered_patch_line_cmp(const struct buffered_patch_line *a,
>
> Instead of having 2 versions of all the comparison functions, could the
> ws-ness be passed as the keydata?

No, this is misuse use of the API, peff explains:

https://public-inbox.org/git/20170513085050.plmau5ffvzn6i...@sigill.intra.peff.net/



>
>> +static unsigned get_line_hash(struct buffered_patch_line *line, unsigned 
>> ignore_ws)
>> +{
>> + static struct strbuf sb = STRBUF_INIT;
>> +
>> + if (ignore_ws) {
>> + strbuf_reset();
>> + get_ws_cleaned_string(line, );
>
> Memory leak here, I think.

It's static, so we don't care.
I can make it non-static and release the memory in a resend.

>
>> + return memhash(sb.buf, sb.len);
>> + } else {
>> + return memhash(line->line, line->len);
>> + }
>> +}
>
> [snip]
>
>> +static void add_lines_to_move_detection(struct diff_options *o)
>> +{
>> + struct moved_entry *prev_line;
>
> gcc says (rightly) that this must be initialized.

This is one of the last refactorings I did on this patch, moving
the prev_line out of the diff_options struct (which is memset in its
init), forgot to init it here. will fix.

>> + int alt_flag = 0;
>
> Probably call this "use_alt_color" or something similar.

Sounds better than alt_flag.

>> + struct moved_entry *p = pmb[i];
>> + struct moved_entry *pnext = (p && p->next_line) ?
>> + p->next_line : NULL;
>> + if (pnext &&
>> + !buffered_patch_line_cmp(pnext->line, l, o)) {
>> + pmb[i] = p->next_line;
>> + } else {
>> + pmb[i] = NULL;
>> + }
>
> Memory leak of pmb[i] somewhere here?

pmb[] holds pointers into moved)entry elements that
are obtained via  hashmap_get_next(hm, match), such that
any pmb[] element is also part of a hashmap.

When freeing the hashmap, we'll free the memory. This
array doesn't own the underlying memory.

>> @@ -4874,6 +5114,11 @@ static void diff_flush_patch_all_file_pairs(struct 
>> diff_options *o)
>>
>>   if (o->use_buffer) {
>> + if (o->color_moved) {
>
> Can you just declare the two hashmaps here, so that we do not need to
> put them in o? They don't seem to be used outside this block anyway.

Obviously. Thanks for that pointer as well.


>> diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh
>> index 289806d0c7..232d9ad55e 100755
>> --- a/t/t4015-diff-whitespace.sh
>> +++ b/t/t4015-diff-whitespace.sh
>
> As for the tests, also add a test checking the interaction with
> whitespace highlighting, and a test showing that diff errors out if we
> ask for both move coloring and word-by-word diffing.

We do not error out, but ignore the move heuristic doesn't find any
blocks. I can make it error out, instead. (and add tests)

Thanks,
Stefan


Re: [WIP/RFC 00/23] repository object

2017-05-19 Thread Ben Peart
Glad to see you tackling this.  This is definitely a step in the right 
direction.


I realize that it will take a lot of work and that intermediate steps 
may just be pushing it the global state one level higher but eventually 
it would be great to see an entire code path global state free!


I'm personally interested because reducing the reliance on global state 
also helps us in our performance work as it makes it more possible to 
use threading to scale up the performance.


Ben

On 5/18/2017 7:21 PM, Brandon Williams wrote:

When I first started working on the git project I found it very difficult to
understand parts of the code base because of the inherently global nature of
our code.  It also made working on submodules very difficult.  Since we can
only open up a single repository per process, you need to launch a child
process in order to process a submodule.  But you also need to be able to
communicate other stateful information to the children processes so that the
submodules know how best to format their output or match against a
pathspec...it ends up feeling like layering on hack after hack.  What I would
really like to do, is to have the ability to have a repository object so that I
can open a submodule in-process.

Before this becomes a reality for all commands, much of the library code would
need to be refactored in order to work purely on handles instead of global
state.  As it turned out, ls-files is a pretty simple command and doesn't have
*too* many dependencies.  The biggest thing that needed to be changed was
piping through an index into a couple library routines so that they don't
inherently rely on 'the_index'.  A few of these changes I've sent out and can
be found at 'origin/bw/pathspec-sans-the-index' and
'origin/bw/dir-c-stops-relying-on-the-index' which this series is based on.

Patches 1-16 are refactorings to prepare either library code or ls-files itself
to be ready to handle passing around an index struct.  Patches 17-22 introduce
a repository struct and change a couple of things about how submodule caches
work (getting submodule information from .gitmodules).  And Patch 23 converts
ls-files to use a repository struct.

The most interesting part of the series is from 17-23.  And 1-16 could be taken
as is without the rest of the series.

This is still very much in a WIP state, though it does pass all tests.  What
I'm hoping for here is to get a discussion started about the feasibility of a
change like this and hopefully to get the ball rolling.  Is this a direction we
want to move in?  Is it worth the pain?

Thanks for taking the time to look at this and entertain my insane ideas :)

Brandon Williams (23):
  convert: convert get_cached_convert_stats_ascii to take an index
  convert: convert crlf_to_git to take an index
  convert: convert convert_to_git_filter_fd to take an index
  convert: convert convert_to_git to take an index
  convert: convert renormalize_buffer to take an index
  tree: convert read_tree to take an index parameter
  ls-files: convert overlay_tree_on_cache to take an index
  ls-files: convert write_eolinfo to take an index
  ls-files: convert show_killed_files to take an index
  ls-files: convert show_other_files to take an index
  ls-files: convert show_ru_info to take an index
  ls-files: convert ce_excluded to take an index
  ls-files: convert prune_cache to take an index
  ls-files: convert show_files to take an index
  ls-files: factor out debug info into a function
  ls-files: factor out tag calculation
  repo: introduce new repository object
  repo: add index_state to struct repo
  repo: add per repo config
  submodule-config: refactor to allow for multiple submodule_cache's
  repo: add repo_read_gitmodules
  submodule: add is_submodule_active
  ls-files: use repository object

 Makefile   |   1 +
 apply.c|   2 +-
 builtin/blame.c|   2 +-
 builtin/commit.c   |   3 +-
 builtin/ls-files.c | 348 -
 cache.h|   4 +-
 combine-diff.c |   2 +-
 config.c   |   2 +-
 convert.c  |  31 +--
 convert.h  |  19 +-
 diff.c |   6 +-
 dir.c  |   2 +-
 git.c  |   2 +-
 ll-merge.c |   2 +-
 merge-recursive.c  |   4 +-
 repo.c | 112 +++
 repo.h |  22 +++
 sha1_file.c|   6 +-
 submodule-config.c |  40 +++-
 submodule-config.h |  10 +
 submodule.c|  51 +
 submodule.h|   2 +
 t/t3007-ls-files-recurse-submodules.sh |  39 
 tree.c   

Re: git merges of tags

2017-05-19 Thread Linus Torvalds
On Thu, May 18, 2017 at 4:23 PM, Stephen Rothwell  wrote:
>
> Just a reminder that if you are merging Linus' tree (or any tree
> really) via a tag, git was changed some time ago so that merging a tag
> will not do a fast forward (there is a good reason for this - I just
> can't recall it ATM).

The reason is that when you merge a signed tag, git squirrels away t
he signature into the merge commit, so that you can see and verify the
signage later (use "git log --show-signatures" to see the signatures
on the commits).

If you fast-forward, there isn't any new commit to add the signing data to.

> To do the fast forward, try "git merge ^{}" ...

A slightly simpler syntax might be just "tag^0", but yes, the "^{}"
thing peels off any tags.

>   (unfortunately
> doing "git merge --ff " also does not do a fast forward - it also
> doesn't fail, it unexpectedly just creates a merge commit :-().

"--ff" is the default behavior, and means "allow fast forward", but
note that it is about "allowing", not "forcing".

You can use "--ff-only" to say that you will _only_ accept a
fast-forward, and git will error out  if it needs to create a merge.

  Linus


Re: [PATCHv3 20/20] diff.c: color moved lines differently

2017-05-19 Thread Jonathan Tan
On Thu, 18 May 2017 12:37:46 -0700
Stefan Beller  wrote:

[snip]

> Instead this provides a dynamic programming greedy algorithm that

Not sure if this is called "dynamic programming".

> finds the largest moved hunk and then switches color to the
> alternative color for the next hunk. By doing this any permutation is
> recognized and displayed. That implies that there is no dedicated
> boundary or inside-hunk color, but instead we'll have just two colors
> alternating for hunks.

[snip]

I would title this "color moved blocks differently" to emphasize that we
are treating the moves in terms of blocks, not just lines.

The first part of the commit message could probably be written more
concisely, like the following:

  When a patch consists mostly of moving blocks of code around, it can
  be quite tedious to ensure that the blocks are moved verbatim, and not
  undesirably modified in the move. To that end, color blocks that are
  moved within the same patch differently. For example (OM, del, add,
  and NM are different colors):

[OM]  -void sensitive_stuff(void)
[OM]  -{
[OM]  -if (!is_authorized_user())
[OM]  -die("unauthorized");
[OM]  -sensitive_stuff(spanning,
[OM]  -multiple,
[OM]  -lines);
[OM]  -}

   void another_function()
   {
[del] -printf("foo");
[add] +printf("bar");
   }

[NM]  +void sensitive_stuff(void)
[NM]  +{
[NM]  +if (!is_authorized_user())
[NM]  +die("unauthorized");
[NM]  +sensitive_stuff(spanning,
[NM]  +multiple,
[NM]  +lines);
[NM]  +}

  Adjacent blocks are colored differently. For example, in this
  potentially malicious patch, the swapping of blocks can be spotted:

[OM]  -void sensitive_stuff(void)
[OM]  -{
[OMA] -if (!is_authorized_user())
[OMA] -die("unauthorized");
[OM]  -sensitive_stuff(spanning,
[OM]  -multiple,
[OM]  -lines);
[OMA] -}

   void another_function()
   {
[del] -printf("foo");
[add] +printf("bar");
   }

[NM]  +void sensitive_stuff(void)
[NM]  +{
[NMA] +sensitive_stuff(spanning,
[NMA] +multiple,
[NMA] +lines);
[NM]  +if (!is_authorized_user())
[NM]  +die("unauthorized");
[NMA] +}

Having said that, thanks - this version is much more like what I would
expect.

> +static int buffered_patch_line_cmp_no_ws(const struct buffered_patch_line *a,
> +  const struct buffered_patch_line *b,
> +  const void *keydata)
> +{
> + int ret;
> + struct strbuf sba = STRBUF_INIT;
> + struct strbuf sbb = STRBUF_INIT;
> +
> + get_ws_cleaned_string(a, );
> + get_ws_cleaned_string(b, );
> + ret = sba.len != sbb.len || strncmp(sba.buf, sbb.buf, sba.len);
> + strbuf_release();
> + strbuf_release();
> + return ret;
> +}
> +
> +static int buffered_patch_line_cmp(const struct buffered_patch_line *a,
> +const struct buffered_patch_line *b,
> +const void *keydata)
> +{
> + return a->len != b->len || strncmp(a->line, b->line, a->len);
> +}

Instead of having 2 versions of all the comparison functions, could the
ws-ness be passed as the keydata?

> +static unsigned get_line_hash(struct buffered_patch_line *line, unsigned 
> ignore_ws)
> +{
> + static struct strbuf sb = STRBUF_INIT;
> +
> + if (ignore_ws) {
> + strbuf_reset();
> + get_ws_cleaned_string(line, );

Memory leak here, I think.

> + return memhash(sb.buf, sb.len);
> + } else {
> + return memhash(line->line, line->len);
> + }
> +}

[snip]

> +static void add_lines_to_move_detection(struct diff_options *o)
> +{
> + struct moved_entry *prev_line;

gcc says (rightly) that this must be initialized.

> +
> + int n;
> + for (n = 0; n < o->line_buffer_nr; n++) {
> + int sign = 0;
> + struct hashmap *hm;
> + struct moved_entry *key;
> +
> + switch (o->line_buffer[n].sign) {
> + case '+':
> + sign = '+';
> + hm = o->added_lines;
> + break;
> + case '-':
> + sign = '-';
> + hm = o->deleted_lines;
> + break;
> + case ' ':
> + default:
> + prev_line = NULL;
> + continue;
> + }
> +
> + key = prepare_entry(o, n);
> + if (prev_line &&
> + 

die("bad object.. for duplicate tagged tag in remote

2017-05-19 Thread Chris West
Bear with me here, I hit this in a real repo.

If you have an annotated tag of an annotated tag, and `remote update`
elects not to fetch this tag (perhaps because it has a name collision
locally), then the repo ends up corrupt: you can't gc it, but fsck
doesn't notice.

Two repos, named "bad" and "good":

bad$ git tag -a inner
bad$ git tag -a outer inner
bad$ git tag -d inner
bad$ git show outer
tag outer
Tagger: ...
Date:   ...

This is the outer tag.

tag inner
Tagger: ...
Date:   ...

This is the inner tag.

commit 826365dcfec304a80b227a990f7d5c805bce3dd9
Author: ...
...

bad$ git rev-parse outer
070707..
bad$ git cat-file tag outer
object 03030303...


good$ git tag -a outer # create a colliding tag
good$ git remote add bad ../bad

good$ git remote update
warning: no common commits
remote: Counting objects: 4, done.
remote: Compressing objects: 100% (2/2), done.
remote: Total 4 (delta 0), reused 0 (delta 0)
Unpacking objects: 100% (4/4), done.
>From ../bad
 * [new branch]  master -> bad/master


Note how it has not fetched the tag ref, but it has fetched one of the
tag objects:

$ git show 07070
error: Could not read object 0303030..
tag outer
Tagger: ...

$ git fsck
...
dangling tag 07070...

I actually don't get that on the real repo, but do on this testcase. Hmm.
`git fsck` exits with success here. This is bad, as the object graph is
incomplete?


$ git gc
fatal: bad object 03030303...
error: failed to run repack

`git gc` fails with this meaningless error. The attached patch improves
the error.

I don't know where the rest of the problem lies. What's the expected
behaviour when a tag already exists locally, but is different? Fetch
the object anyway, but ignore it?

>From aa861789077012f78605431e1a1f191292693325 Mon Sep 17 00:00:00 2001
From: "Chris West (Faux)" 
Date: Fri, 19 May 2017 19:24:03 +0200
Subject: [PATCH] print tag id when tagged object is bad

---
 revision.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/revision.c b/revision.c
index 8a8c178..22b6021 100644
--- a/revision.c
+++ b/revision.c
@@ -232,7 +232,8 @@ static struct commit *handle_commit(struct rev_info *revs,
 		if (!object) {
 			if (flags & UNINTERESTING)
 return NULL;
-			die("bad object %s", oid_to_hex(>tagged->oid));
+			die("bad tagged object %s in %s", oid_to_hex(>tagged->oid),
+		oid_to_hex(>object.oid));
 		}
 		object->flags |= flags;
 		/*
-- 
2.7.4



How are you doing

2017-05-19 Thread Awawu Issa
Greetings
It’s my pleasure to write you today, I am Awawu Issa, am a Personal
Assistant to Late Mr. Lorna Laboso the former Assistant Minister of
Home Affairs in Kenya.  Mr.Lorna Laboso and the former Kenyan Road
minister Dr. Kipkalya Kones has been on board the Cessna 210, which
was headed to Kericho and crashed in a remote area called Kajong'a, in
western Kenya. The plane crashed occurred on the Tuesday 10th, June,
2008.You can read more about the crash through the below

site:http://edition.cnn.com/2008/WORLD/africa/06/10/kenya.crash/index.html

After the burial of Mr.Lorna Laboso, his family did not know about his
money deposited in a bank in Europe, but as his personal assistant i
know everything about this money and i kept it to myself due to some
reasons best known to me.   One morning, I opened his briefcase that
he kept with me and found out documents which he used to deposit the
money [$10,500,000 USD] into a Bank in Cotonou-Benin.

I just need this fund to be transfer to your account so that i will
come over to your country and complete some investment over there.
Based on the fact that this is a deal, I propose that 40% of this fund
for you and 60% for me.
I will give you full detail of this transaction as soon as i recieve
your reply.
Waiting to receive your prompt respond on my personal email
(awawuissa1...@yahoo.com)
Yours sincerely,
Ms Awawu Issa


Re: [PATCH v2 3/6] fsmonitor: teach git to optionally utilize a file system monitor to speed up detecting new or changed files.

2017-05-19 Thread Ben Peart
After sending this, I noticed that using a different compiler generated 
a couple of warnings I should fix.  I'm assuming I'll need another 
re-roll but if not, here are the changes that will be squashed in.


Ben



diff --git a/dir.c b/dir.c
index da428489e2..37f1c580a5 100644
--- a/dir.c
+++ b/dir.c
@@ -16,6 +16,7 @@
 #include "utf8.h"
 #include "varint.h"
 #include "ewah/ewok.h"
+#include "fsmonitor.h"


diff --git a/fsmonitor.c b/fsmonitor.c
index 6356dc795e..9f08e66db9 100644
--- a/fsmonitor.c
+++ b/fsmonitor.c
@@ -162,7 +162,7 @@ void process_fsmonitor_extension(struct index_state 
*istate)


 void refresh_by_fsmonitor(struct index_state *istate)
 {
-   static has_run_once = FALSE;
+   static int has_run_once = FALSE;
struct strbuf query_result = STRBUF_INIT;
int query_success = 0;
size_t bol = 0; /* beginning of line */


On 5/18/2017 4:13 PM, Ben Peart wrote:

When the index is read from disk, the query-fsmonitor index extension is
used to flag the last known potentially dirty index and untracked cach
entries.

If git finds out some entries are 'fsmonitor-dirty', but are really
unchanged (e.g. the file was changed, then reverted back), then Git will
clear the marking in the extension. If git adds or updates an index
entry, it is marked 'fsmonitor-dirty' to ensure it is checked for
changes in the working directory.

Before the 'fsmonitor-dirty' flags are used to limit the scope of the
files to be checked, the query-fsmonitor hook proc is called with the
time the index was last updated.  The hook proc returns the list of
files changed since that last updated time and the list of
potentially dirty entries is updated to reflect the current state.

refresh_index() and valid_cached_dir() are updated so that any entry not
flagged as potentially dirty is not checked as it cannot have any
changes.

Signed-off-by: Ben Peart 
---
 Makefile   |   1 +
 builtin/update-index.c |   1 +
 cache.h|   5 ++
 config.c   |   5 ++
 dir.c  |  13 +++
 dir.h  |   2 +
 entry.c|   1 +
 environment.c  |   1 +
 fsmonitor.c| 231 +
 fsmonitor.h|   9 ++
 read-cache.c   |  28 +-
 unpack-trees.c |   1 +
 12 files changed, 296 insertions(+), 2 deletions(-)
 create mode 100644 fsmonitor.c
 create mode 100644 fsmonitor.h

diff --git a/Makefile b/Makefile
index e35542e631..488a4466cc 100644
--- a/Makefile
+++ b/Makefile
@@ -760,6 +760,7 @@ LIB_OBJS += ewah/ewah_rlw.o
 LIB_OBJS += exec_cmd.o
 LIB_OBJS += fetch-pack.o
 LIB_OBJS += fsck.o
+LIB_OBJS += fsmonitor.o
 LIB_OBJS += gettext.o
 LIB_OBJS += gpg-interface.o
 LIB_OBJS += graph.o
diff --git a/builtin/update-index.c b/builtin/update-index.c
index ebfc09faa0..32fd977b43 100644
--- a/builtin/update-index.c
+++ b/builtin/update-index.c
@@ -232,6 +232,7 @@ static int mark_ce_flags(const char *path, int flag, int 
mark)
else
active_cache[pos]->ce_flags &= ~flag;
active_cache[pos]->ce_flags |= CE_UPDATE_IN_BASE;
+   active_cache[pos]->ce_flags |= CE_FSMONITOR_DIRTY;
cache_tree_invalidate_path(_index, path);
active_cache_changed |= CE_ENTRY_CHANGED;
return 0;
diff --git a/cache.h b/cache.h
index 188811920c..36423c77cc 100644
--- a/cache.h
+++ b/cache.h
@@ -201,6 +201,7 @@ struct cache_entry {
 #define CE_ADDED (1 << 19)

 #define CE_HASHED(1 << 20)
+#define CE_FSMONITOR_DIRTY   (1 << 21)
 #define CE_WT_REMOVE (1 << 22) /* remove in work directory */
 #define CE_CONFLICTED(1 << 23)

@@ -324,6 +325,7 @@ static inline unsigned int canon_mode(unsigned int mode)
 #define CACHE_TREE_CHANGED (1 << 5)
 #define SPLIT_INDEX_ORDERED(1 << 6)
 #define UNTRACKED_CHANGED  (1 << 7)
+#define FSMONITOR_CHANGED  (1 << 8)

 struct split_index;
 struct untracked_cache;
@@ -342,6 +344,8 @@ struct index_state {
struct hashmap dir_hash;
unsigned char sha1[20];
struct untracked_cache *untracked;
+   time_t fsmonitor_last_update;
+   struct ewah_bitmap *fsmonitor_dirty_bitmap;
 };

 extern struct index_state the_index;
@@ -766,6 +770,7 @@ extern int precomposed_unicode;
 extern int protect_hfs;
 extern int protect_ntfs;
 extern int git_db_env, git_index_env, git_graft_env, git_common_dir_env;
+extern int core_fsmonitor;

 /*
  * Include broken refs in all ref iterations, which will
diff --git a/config.c b/config.c
index bb4d735701..1a8108636d 100644
--- a/config.c
+++ b/config.c
@@ -1232,6 +1232,11 @@ static int git_default_core_config(const char *var, 
const char *value)
return 0;
}

+   if (!strcmp(var, "core.fsmonitor")) {
+   core_fsmonitor = git_config_bool(var, value);
+   return 0;
+   }
+
/* Add other 

Re: [PATCH] doc: explain default option for rev-parse --short

2017-05-19 Thread André Werlang
Hey Jeff, I'll take a look at improving the text and the other commands.

Thanks for the response. I'll get back to you soonish.

André

2017-05-18 12:59 GMT-03:00 Jeff King :
> On Thu, May 18, 2017 at 11:03:00AM -0300, André Werlang wrote:
>
>> Git 2.11 introduced a computation to guess the default length
>> for commit short hashes. The documentation isn't updated.
>
> Thanks for the patch. I think this is going in the right direction, but
> I have a few minor comments.
>
>> From 2b1c229153a89c7608e64b87d2f933704c18b7ae Mon Sep 17 00:00:00 2001
>> From: =?UTF-8?q?Andr=C3=A9=20Werlang?= 
>> Date: Thu, 18 May 2017 10:50:11 -0300
>> Subject: [PATCH] doc: explain default option for rev-parse --short
>> MIME-Version: 1.0
>> Content-Type: text/plain; charset=UTF-8
>> Content-Transfer-Encoding: 8bit
>
> These headers are redundant with what's in your email headers and can be
> dropped.
>
>> diff --git a/Documentation/git-rev-parse.txt 
>> b/Documentation/git-rev-parse.txt
>> index 7241e96..b49f053 100644
>> --- a/Documentation/git-rev-parse.txt
>> +++ b/Documentation/git-rev-parse.txt
>> @@ -139,8 +139,10 @@ can be used.
>>  --short::
>>  --short=number::
>>   Instead of outputting the full SHA-1 values of object names try to
>> - abbreviate them to a shorter unique name. When no length is specified
>> - 7 is used. The minimum length is 4.
>> + abbreviate them to a shorter unique name. When no length is specified,
>> + it is guessed from the number of objects in the repository. In any case,
>> + the actual length will be enough to identify the object unambiguously
>> + in the current state of the repository. The minimum length is 4.
>
> This is definitely an improvement, though I wonder if we should mention
> that we default to core.abbrev (which in turn defaults to the "auto"
> behavior).
>
> It looks like there are a few other mentions of "7" with respect to
> "--abbrev": git-branch.txt, git-describe.txt, git-blame.txt. Those
> should probably get the same treatment.
>
> There are a few other "--abbrev" options (e.g., ls-files and ls-tree)
> that don't mention "7". But while we're fixing the others, it may be
> worth giving them all consistent text.
>
> -Peff



-- 
André Werlang


Re: reversion in GIT_COMMON_DIR refs path

2017-05-19 Thread Joey Hess
Joey Hess wrote:
> Bisecting this test suite failure
> https://git-annex.branchable.com/git-annex_in_nixpkgs_fails_with_git-2.13.0/
> I landed on commit f57f37e2e1bf11ab4cdfd221ad47e961ba9353a0 to git.
> 
> It seems that changed resolving refs paths when GIT_DIR and GIT_COMMON_DIR
> are both set. While before refs were looked for in GIT_COMMON_DIR,
> now they're not.

In case there's any doubt about whether this is a reversion or an
intentional change, see gitrepository-layout(5):

   refs
   References are stored in subdirectories of this directory. The git
   prune command knows to preserve objects reachable from refs found
   in this directory and its subdirectories. This directory is ignored
   if $GIT_COMMON_DIR is set and "$GIT_COMMON_DIR/refs" will be used
   instead.

So the documented behavior is broken.

-- 
see shy jo


signature.asc
Description: PGP signature


Re: [PATCH] rebase -i: Add missing newline to end of message

2017-05-19 Thread Phillip Wood

On 19/05/17 12:24, Ævar Arnfjörð Bjarmason wrote:

On Fri, May 19, 2017 at 12:13 PM, Phillip Wood
 wrote:

On 18/05/17 22:21, Johannes Schindelin wrote:

Hi Phillip,

On Thu, 18 May 2017, Phillip Wood wrote:


From: Phillip Wood 

The message that's printed when auto-stashed changes are successfully
restored was missing '\n' at the end.
---


Please add your Signed-off-by:, and my Acked-by:


Will do (I forgot to add --signoff to git commit). Do I need to resend
the other patches with your Acked-by: or will that happen when Junio
applies them?

I was thinking about this last night and wonder if it would be better to do

-   printf(_("Applied autostash."));
+   printf("%s\n", _("Applied autostash."));

rather than

-   printf(_("Applied autostash."));
+   printf(_("Applied autostash.\n"));

as it would avoid changing the translated string and also mean that the
newline couldn't be accidentally removed any typos in the translation.

Best Wishes

Phillip


Just having the \n in the message is fine. Stuff like this is already
checked for as part of the build process by msgfmt, e.g.:

 $ git diff -U0
 diff --git a/po/de.po b/po/de.po
 index 679f8f4720..b9a7d417ac 100644
 --- a/po/de.po
 +++ b/po/de.po
 @@ -23 +23 @@ msgid "hint: %.*s\n"
 -msgstr "Hinweis: %.*s\n"
 +msgstr "Hinweis: %.*s"

Errors with:

 po/de.po:23: 'msgid' and 'msgstr' entries do not both end with '\n'

And if you change the format specifier:

 po/de.po:23: number of format specifications in 'msgid' and
'msgstr' does not match


Hi Ævar

Thanks for clarifying that, I've left the patch unchanged

Thanks again

Phillip


[PATCH v2] rebase -i: Add missing newline to end of message

2017-05-19 Thread phillip . wood
From: Phillip Wood 

The message that's printed when auto-stashed changes are successfully
restored was missing '\n' at the end.

Signed-off-by: Phillip Wood 
Acked-by: Johannes Schindelin 
---
I've added signed-off-by and acked-by lines to the commit message,
otherwise this is unchanged

 sequencer.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/sequencer.c b/sequencer.c
index 
311728a145dfc66e230334221a2610468239932d..4dcf9c8be044247c6c18a4ec2a4675d9df9953eb
 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -1920,7 +1920,7 @@ static int apply_autostash(struct replay_opts *opts)
argv_array_push(, "apply");
argv_array_push(, stash_sha1.buf);
if (!run_command())
-   printf(_("Applied autostash."));
+   printf(_("Applied autostash.\n"));
else {
struct child_process store = CHILD_PROCESS_INIT;
 
-- 
2.13.0



[PATCH 14/15] diff: use pending "path" if it is available

2017-05-19 Thread Jeff King
There's a subtle distinction between "name" and "path" for a
blob that we resolve: the name is what the user told us on
the command line, and the path is what we traversed when
finding the blob within a tree (if we did so).

When we diff blobs directly, we use "name", but "path" is
more likely to be useful to the user (it will find the
correct .gitattributes, and give them a saner diff header).

We still have to fall back to using the name for some cases
(i.e., any blob reference that isn't of the form tree:path).
That's the best we can do in such a case.

Signed-off-by: Jeff King 
---
 builtin/diff.c| 7 ++-
 t/t4063-diff-blobs.sh | 4 ++--
 2 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/builtin/diff.c b/builtin/diff.c
index 4c0811d6f..1a1149eed 100644
--- a/builtin/diff.c
+++ b/builtin/diff.c
@@ -23,6 +23,11 @@
 static const char builtin_diff_usage[] =
 "git diff [] [ []] [--] [...]";
 
+static const char *blob_path(struct object_array_entry *entry)
+{
+   return entry->path ? entry->path : entry->name;
+}
+
 static void stuff_change(struct diff_options *opt,
 unsigned old_mode, unsigned new_mode,
 const struct object_id *old_oid,
@@ -110,7 +115,7 @@ static int builtin_diff_blobs(struct rev_info *revs,
 blob[0]->mode, blob[1]->mode,
 [0]->item->oid, [1]->item->oid,
 1, 1,
-blob[0]->name, blob[1]->name);
+blob_path(blob[0]), blob_path(blob[1]));
diffcore_std(>diffopt);
diff_flush(>diffopt);
return 0;
diff --git a/t/t4063-diff-blobs.sh b/t/t4063-diff-blobs.sh
index df9c35b2d..80ce033ab 100755
--- a/t/t4063-diff-blobs.sh
+++ b/t/t4063-diff-blobs.sh
@@ -55,7 +55,7 @@ test_expect_success 'diff by tree:path (run)' '
 test_expect_success 'index of tree:path diff' '
check_index $sha1_one $sha1_two
 '
-test_expect_failure 'tree:path diff uses filenames as paths' '
+test_expect_success 'tree:path diff uses filenames as paths' '
check_paths one two
 '
 test_expect_success 'tree:path diff shows mode change' '
@@ -68,7 +68,7 @@ test_expect_success 'diff by ranged tree:path' '
 test_expect_success 'index of ranged tree:path diff' '
check_index $sha1_one $sha1_two
 '
-test_expect_failure 'ranged tree:path diff uses filenames as paths' '
+test_expect_success 'ranged tree:path diff uses filenames as paths' '
check_paths one two
 '
 test_expect_success 'ranged tree:path diff shows mode change' '
-- 
2.13.0.219.g63f6bc368



[PATCH 15/15] diff: use blob path for blob/file diffs

2017-05-19 Thread Jeff King
When we diff a blob against a working tree file like:

  git diff HEAD:Makefile Makefile

we always use the working tree filename for both sides of
the diff. In most cases that's fine, as the two would be the
same anyway, as above. And until recently, we used the
"name" for the blob, not the path, which would have the
messy "HEAD:" on the beginning.

But when they don't match, like:

  git diff HEAD:old_path new_path

it makes sense to show both names.

This patch uses the blob's path field if it's available, and
otherwise falls back to using the filename (in preference to
the blob's name, which is likely to be garbage like a raw
sha1).

Signed-off-by: Jeff King 
---
 builtin/diff.c| 3 ++-
 t/t4063-diff-blobs.sh | 7 ++-
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/builtin/diff.c b/builtin/diff.c
index 1a1149eed..5e7c6428c 100644
--- a/builtin/diff.c
+++ b/builtin/diff.c
@@ -90,7 +90,8 @@ static int builtin_diff_b_f(struct rev_info *revs,
 blob[0]->mode, canon_mode(st.st_mode),
 [0]->item->oid, _oid,
 1, 0,
-path, path);
+blob[0]->path ? blob[0]->path : path,
+path);
diffcore_std(>diffopt);
diff_flush(>diffopt);
return 0;
diff --git a/t/t4063-diff-blobs.sh b/t/t4063-diff-blobs.sh
index 80ce033ab..bc69e26c5 100755
--- a/t/t4063-diff-blobs.sh
+++ b/t/t4063-diff-blobs.sh
@@ -81,11 +81,16 @@ test_expect_success 'diff blob against file' '
 test_expect_success 'index of blob-file diff' '
check_index $sha1_one $sha1_two
 '
-test_expect_failure 'blob-file diff uses filename as paths' '
+test_expect_success 'blob-file diff uses filename as paths' '
check_paths one two
 '
 test_expect_success FILEMODE 'blob-file diff shows mode change' '
check_mode 100644 100755
 '
 
+test_expect_success 'blob-file diff prefers filename to sha1' '
+   run_diff $sha1_one two &&
+   check_paths two two
+'
+
 test_done
-- 
2.13.0.219.g63f6bc368


[PATCH 12/15] diff: pass whole pending entry in blobinfo

2017-05-19 Thread Jeff King
When diffing blobs directly, git-diff picks the blobs out of
the rev_info's pending array and copies the relevant bits to
a custom "struct blobinfo". But the pending array entry
already has all of this information (and more, which we'll
use in future patches). Let's just pass the original entry
instead.

In practice, these two blobs are probably adjacent in the
revs->pending array, and we could just pass the whole array.
But the current code is careful to pick each blob out
separately and put it into another array, so we'll continue
to do so and make our own array-of-pointers.

Signed-off-by: Jeff King 
---
I looked into just passing "blob_a" and "blob_b", instead of passing an
array and letting the called functions decide how many elements
should be in it. But I don't know if we would ever want to handle
arbitrary numbers of blobs (e.g., n-way combined diffs). It would be a
step backwards in that case, so I decided not to.

 builtin/diff.c | 38 +++---
 1 file changed, 15 insertions(+), 23 deletions(-)

diff --git a/builtin/diff.c b/builtin/diff.c
index d184aafab..8b276ae57 100644
--- a/builtin/diff.c
+++ b/builtin/diff.c
@@ -20,12 +20,6 @@
 #define DIFF_NO_INDEX_EXPLICIT 1
 #define DIFF_NO_INDEX_IMPLICIT 2
 
-struct blobinfo {
-   struct object_id oid;
-   const char *name;
-   unsigned mode;
-};
-
 static const char builtin_diff_usage[] =
 "git diff [] [ []] [--] [...]";
 
@@ -65,7 +59,7 @@ static void stuff_change(struct diff_options *opt,
 
 static int builtin_diff_b_f(struct rev_info *revs,
int argc, const char **argv,
-   struct blobinfo *blob)
+   struct object_array_entry **blob)
 {
/* Blob vs file in the working tree*/
struct stat st;
@@ -84,12 +78,12 @@ static int builtin_diff_b_f(struct rev_info *revs,
 
diff_set_mnemonic_prefix(>diffopt, "o/", "w/");
 
-   if (blob[0].mode == S_IFINVALID)
-   blob[0].mode = canon_mode(st.st_mode);
+   if (blob[0]->mode == S_IFINVALID)
+   blob[0]->mode = canon_mode(st.st_mode);
 
stuff_change(>diffopt,
-blob[0].mode, canon_mode(st.st_mode),
-[0].oid, _oid,
+blob[0]->mode, canon_mode(st.st_mode),
+[0]->item->oid, _oid,
 1, 0,
 path, path);
diffcore_std(>diffopt);
@@ -99,24 +93,24 @@ static int builtin_diff_b_f(struct rev_info *revs,
 
 static int builtin_diff_blobs(struct rev_info *revs,
  int argc, const char **argv,
- struct blobinfo *blob)
+ struct object_array_entry **blob)
 {
unsigned mode = canon_mode(S_IFREG | 0644);
 
if (argc > 1)
usage(builtin_diff_usage);
 
-   if (blob[0].mode == S_IFINVALID)
-   blob[0].mode = mode;
+   if (blob[0]->mode == S_IFINVALID)
+   blob[0]->mode = mode;
 
-   if (blob[1].mode == S_IFINVALID)
-   blob[1].mode = mode;
+   if (blob[1]->mode == S_IFINVALID)
+   blob[1]->mode = mode;
 
stuff_change(>diffopt,
-blob[0].mode, blob[1].mode,
-[0].oid, [1].oid,
+blob[0]->mode, blob[1]->mode,
+[0]->item->oid, [1]->item->oid,
 1, 1,
-blob[0].name, blob[1].name);
+blob[0]->name, blob[1]->name);
diffcore_std(>diffopt);
diff_flush(>diffopt);
return 0;
@@ -259,7 +253,7 @@ int cmd_diff(int argc, const char **argv, const char 
*prefix)
struct rev_info rev;
struct object_array ent = OBJECT_ARRAY_INIT;
int blobs = 0, paths = 0;
-   struct blobinfo blob[2];
+   struct object_array_entry *blob[2];
int nongit = 0, no_index = 0;
int result = 0;
 
@@ -408,9 +402,7 @@ int cmd_diff(int argc, const char **argv, const char 
*prefix)
} else if (obj->type == OBJ_BLOB) {
if (2 <= blobs)
die(_("more than two blobs given: '%s'"), name);
-   hashcpy(blob[blobs].oid.hash, obj->oid.hash);
-   blob[blobs].name = name;
-   blob[blobs].mode = entry->mode;
+   blob[blobs] = entry;
blobs++;
 
} else {
-- 
2.13.0.219.g63f6bc368



[PATCH 10/15] handle_revision_arg: record modes for "a..b" endpoints

2017-05-19 Thread Jeff King
The "a..b" revision syntax was designed to handle commits,
so it doesn't bother to record any mode we find while
traversing a "tree:path" endpoint. These days "git diff" can
diff blobs using either "a:path..b:path" (with dots) or
"a:path b:path" (without), but the two behave
inconsistently, as the with-dots version fails to notice the
mode.

Let's teach the dot-dot range parser to record modes; it
doesn't cost us anything, and it makes this case work.

Signed-off-by: Jeff King 
---
 revision.c| 10 ++
 t/t4063-diff-blobs.sh |  2 +-
 2 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/revision.c b/revision.c
index eb45501fd..96427e3c2 100644
--- a/revision.c
+++ b/revision.c
@@ -1448,9 +1448,11 @@ static int handle_dotdot_1(const char *arg, char *dotdot,
const char *a_name, *b_name;
struct object_id a_oid, b_oid;
struct object *a_obj, *b_obj;
+   struct object_context a_oc, b_oc;
unsigned int a_flags, b_flags;
int symmetric = 0;
unsigned int flags_exclude = flags ^ (UNINTERESTING | BOTTOM);
+   unsigned int oc_flags = GET_SHA1_COMMITTISH;
 
a_name = arg;
if (!*a_name)
@@ -1464,8 +1466,8 @@ static int handle_dotdot_1(const char *arg, char *dotdot,
if (!*b_name)
b_name = "HEAD";
 
-   if (get_sha1_committish(a_name, a_oid.hash) ||
-   get_sha1_committish(b_name, b_oid.hash))
+   if (get_sha1_with_context(a_name, oc_flags, a_oid.hash, _oc) ||
+   get_sha1_with_context(b_name, oc_flags, b_oid.hash, _oc))
return -1;
 
if (!cant_be_filename) {
@@ -1507,8 +1509,8 @@ static int handle_dotdot_1(const char *arg, char *dotdot,
b_obj->flags |= b_flags;
add_rev_cmdline(revs, a_obj, a_name, REV_CMD_LEFT, a_flags);
add_rev_cmdline(revs, b_obj, b_name, REV_CMD_RIGHT, b_flags);
-   add_pending_object(revs, a_obj, a_name);
-   add_pending_object(revs, b_obj, b_name);
+   add_pending_object_with_mode(revs, a_obj, a_name, a_oc.mode);
+   add_pending_object_with_mode(revs, b_obj, b_name, b_oc.mode);
return 0;
 }
 
diff --git a/t/t4063-diff-blobs.sh b/t/t4063-diff-blobs.sh
index 90c6f6b85..df9c35b2d 100755
--- a/t/t4063-diff-blobs.sh
+++ b/t/t4063-diff-blobs.sh
@@ -71,7 +71,7 @@ test_expect_success 'index of ranged tree:path diff' '
 test_expect_failure 'ranged tree:path diff uses filenames as paths' '
check_paths one two
 '
-test_expect_failure 'ranged tree:path diff shows mode change' '
+test_expect_success 'ranged tree:path diff shows mode change' '
check_mode 100644 100755
 '
 
-- 
2.13.0.219.g63f6bc368



[PATCH 08/15] get_sha1_with_context: dynamically allocate oc->path

2017-05-19 Thread Jeff King
When a sha1 lookup returns the tree path via "struct
object_context", it just copies it into a fixed-size buffer.
This means the result can be truncated, and it means our
"struct object_context" consumes a lot of stack space.

Instead, let's allocate a string on the heap. Because most
callers don't care about this information, we'll avoid doing
it by default (so they don't all have to start calling
free() on the result). There are basically two options for
the caller to signal to us that it's interested:

  1. By setting a pointer to storage in the object_context.

  2. By passing a flag in another parameter.

Doing (1) would match the way that sha1_object_info_extended()
works. But it would mean that every caller would have to
initialize the object_context, which they don't currently
have to do.

This patch does (2), and adds a new bit to the function's
flags field. All of the callers that look at the "path"
field are updated to pass the new flag.

Signed-off-by: Jeff King 
---
If there's a topic in flight that adds a new call without the flag, note
that it will stop filling the "path" field (which is what most calls
would want). And you won't get a compile error, because the pointer can
be used in largely the same way as the array.

I find it unlikely that there's a case that would care, but if we wanted
to be really paranoid, we could change the name of oc->path (at the cost
of making this diff much noisier).

 builtin/cat-file.c |  4 +++-
 builtin/grep.c |  4 +++-
 builtin/log.c  | 10 +++---
 cache.h|  8 +++-
 sha1_name.c|  6 --
 5 files changed, 24 insertions(+), 8 deletions(-)

diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 1890d7a63..421709517 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -61,7 +61,8 @@ static int cat_one_file(int opt, const char *exp_type, const 
char *obj_name,
if (unknown_type)
flags |= LOOKUP_UNKNOWN_OBJECT;
 
-   if (get_sha1_with_context(obj_name, 0, oid.hash, _context))
+   if (get_sha1_with_context(obj_name, GET_SHA1_RECORD_PATH,
+ oid.hash, _context))
die("Not a valid object name %s", obj_name);
 
if (!path)
@@ -165,6 +166,7 @@ static int cat_one_file(int opt, const char *exp_type, 
const char *obj_name,
die("git cat-file %s: bad file", obj_name);
 
write_or_die(1, buf, size);
+   free(obj_context.path);
return 0;
 }
 
diff --git a/builtin/grep.c b/builtin/grep.c
index 3ffb5b4e8..254c1c784 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -1190,7 +1190,8 @@ int cmd_grep(int argc, const char **argv, const char 
*prefix)
break;
}
 
-   if (get_sha1_with_context(arg, 0, oid.hash, )) {
+   if (get_sha1_with_context(arg, GET_SHA1_RECORD_PATH,
+ oid.hash, )) {
if (seen_dashdash)
die(_("unable to resolve revision: %s"), arg);
break;
@@ -1200,6 +1201,7 @@ int cmd_grep(int argc, const char **argv, const char 
*prefix)
if (!seen_dashdash)
verify_non_filename(prefix, arg);
add_object_array_with_path(object, arg, , oc.mode, 
oc.path);
+   free(oc.path);
}
 
/*
diff --git a/builtin/log.c b/builtin/log.c
index fd3d10ec2..a0d5233dc 100644
--- a/builtin/log.c
+++ b/builtin/log.c
@@ -483,16 +483,20 @@ static int show_blob_object(const struct object_id *oid, 
struct rev_info *rev, c
!DIFF_OPT_TST(>diffopt, ALLOW_TEXTCONV))
return stream_blob_to_fd(1, oid, NULL, 0);
 
-   if (get_sha1_with_context(obj_name, 0, oidc.hash, _context))
+   if (get_sha1_with_context(obj_name, GET_SHA1_RECORD_PATH,
+ oidc.hash, _context))
die(_("Not a valid object name %s"), obj_name);
-   if (!obj_context.path[0] ||
-   !textconv_object(obj_context.path, obj_context.mode, , 1, 
, ))
+   if (!obj_context.path ||
+   !textconv_object(obj_context.path, obj_context.mode, , 1, 
, )) {
+   free(obj_context.path);
return stream_blob_to_fd(1, oid, NULL, 0);
+   }
 
if (!buf)
die(_("git show %s: bad file"), obj_name);
 
write_or_die(1, buf, size);
+   free(obj_context.path);
return 0;
 }
 
diff --git a/cache.h b/cache.h
index 656341b8e..e219c45ed 100644
--- a/cache.h
+++ b/cache.h
@@ -1333,13 +1333,18 @@ static inline int hex2chr(const char *s)
 
 struct object_context {
unsigned char tree[20];
-   char path[PATH_MAX];
unsigned mode;
/*
 * symlink_path is only used by get_tree_entry_follow_symlinks,
 * and only for symlinks that point outside the repository.
 */
struct strbuf symlink_path;
+   /*
+* If 

[PATCH 11/15] handle_revision_arg: record paths for pending objects

2017-05-19 Thread Jeff King
If the revision parser sees an argument like tree:path, we
parse it down to the correct blob (or tree), but throw away
the "path" portion. Let's ask get_sha1_with_context() to
record it, and pass it along in the pending array.

This will let programs like git-diff which rely on the
revision-parser show more accurate paths.

Note that the implementation is a little tricky; we have to
make sure we free oc.path in all code paths. For handle_dotdot(),
we can piggy-back on the existing cleanup-wrapper pattern.
The real work happens in handle_dotdot_1(), but the
handle_dotdot() wrapper makes sure that the path is freed no
matter how we exit the function (and for that reason we make
sure that the object_context struct is zero'd, so if we fail
to even get to the get_sha1_with_context() call, we just end
up calling free(NULL)).

Signed-off-by: Jeff King 
---
 revision.c | 32 +---
 1 file changed, 21 insertions(+), 11 deletions(-)

diff --git a/revision.c b/revision.c
index 96427e3c2..a7b93dcc8 100644
--- a/revision.c
+++ b/revision.c
@@ -1443,16 +1443,17 @@ static int dotdot_missing(const char *arg, char *dotdot,
 
 static int handle_dotdot_1(const char *arg, char *dotdot,
   struct rev_info *revs, int flags,
-  int cant_be_filename)
+  int cant_be_filename,
+  struct object_context *a_oc,
+  struct object_context *b_oc)
 {
const char *a_name, *b_name;
struct object_id a_oid, b_oid;
struct object *a_obj, *b_obj;
-   struct object_context a_oc, b_oc;
unsigned int a_flags, b_flags;
int symmetric = 0;
unsigned int flags_exclude = flags ^ (UNINTERESTING | BOTTOM);
-   unsigned int oc_flags = GET_SHA1_COMMITTISH;
+   unsigned int oc_flags = GET_SHA1_COMMITTISH | GET_SHA1_RECORD_PATH;
 
a_name = arg;
if (!*a_name)
@@ -1466,8 +1467,8 @@ static int handle_dotdot_1(const char *arg, char *dotdot,
if (!*b_name)
b_name = "HEAD";
 
-   if (get_sha1_with_context(a_name, oc_flags, a_oid.hash, _oc) ||
-   get_sha1_with_context(b_name, oc_flags, b_oid.hash, _oc))
+   if (get_sha1_with_context(a_name, oc_flags, a_oid.hash, a_oc) ||
+   get_sha1_with_context(b_name, oc_flags, b_oid.hash, b_oc))
return -1;
 
if (!cant_be_filename) {
@@ -1509,8 +1510,8 @@ static int handle_dotdot_1(const char *arg, char *dotdot,
b_obj->flags |= b_flags;
add_rev_cmdline(revs, a_obj, a_name, REV_CMD_LEFT, a_flags);
add_rev_cmdline(revs, b_obj, b_name, REV_CMD_RIGHT, b_flags);
-   add_pending_object_with_mode(revs, a_obj, a_name, a_oc.mode);
-   add_pending_object_with_mode(revs, b_obj, b_name, b_oc.mode);
+   add_pending_object_with_path(revs, a_obj, a_name, a_oc->mode, 
a_oc->path);
+   add_pending_object_with_path(revs, b_obj, b_name, b_oc->mode, 
b_oc->path);
return 0;
 }
 
@@ -1518,16 +1519,24 @@ static int handle_dotdot(const char *arg,
 struct rev_info *revs, int flags,
 int cant_be_filename)
 {
+   struct object_context a_oc, b_oc;
char *dotdot = strstr(arg, "..");
int ret;
 
if (!dotdot)
return -1;
 
+   memset(_oc, 0, sizeof(a_oc));
+   memset(_oc, 0, sizeof(b_oc));
+
*dotdot = '\0';
-   ret = handle_dotdot_1(arg, dotdot, revs, flags, cant_be_filename);
+   ret = handle_dotdot_1(arg, dotdot, revs, flags, cant_be_filename,
+ _oc, _oc);
*dotdot = '.';
 
+   free(a_oc.path);
+   free(b_oc.path);
+
return ret;
 }
 
@@ -1540,7 +1549,7 @@ int handle_revision_arg(const char *arg_, struct rev_info 
*revs, int flags, unsi
int local_flags;
const char *arg = arg_;
int cant_be_filename = revarg_opt & REVARG_CANNOT_BE_FILENAME;
-   unsigned get_sha1_flags = 0;
+   unsigned get_sha1_flags = GET_SHA1_RECORD_PATH;
 
flags = flags & UNINTERESTING ? flags | BOTTOM : flags & ~BOTTOM;
 
@@ -1591,7 +1600,7 @@ int handle_revision_arg(const char *arg_, struct rev_info 
*revs, int flags, unsi
}
 
if (revarg_opt & REVARG_COMMITTISH)
-   get_sha1_flags = GET_SHA1_COMMITTISH;
+   get_sha1_flags |= GET_SHA1_COMMITTISH;
 
if (get_sha1_with_context(arg, get_sha1_flags, sha1, ))
return revs->ignore_missing ? 0 : -1;
@@ -1599,7 +1608,8 @@ int handle_revision_arg(const char *arg_, struct rev_info 
*revs, int flags, unsi
verify_non_filename(revs->prefix, arg);
object = get_reference(revs, arg, sha1, flags ^ local_flags);
add_rev_cmdline(revs, object, arg_, REV_CMD_REV, flags ^ local_flags);
-   add_pending_object_with_mode(revs, object, arg, oc.mode);
+   add_pending_object_with_path(revs, object, arg, oc.mode, 

[PATCH 09/15] t4063: add tests of direct blob diffs

2017-05-19 Thread Jeff King
The git-diff command can directly compare two blobs (or a
blob and a file), but we don't test this at all. Let's add
some basic tests that reveal a few problems.

There are basically four interesting inputs:

  1. sha1 against sha1 (where diff has no information beyond
 the contents)

  2. tree:path against tree:path (where it can get
 information via get_sha1_with_context)

  3. Same as (2), but using the ".." range syntax

  4. tree:path against a filename

And beyond generating a sane diff, we care about a few
little bits: which paths they show in the diff header, and
whether they correctly pick up a mode change.

They should all be able to show a mode except for (1),
though note that case (3) is currently broken.

For the headers, we would ideally show the path within the
tree if we have it, making:

  git diff a:path b:path

look the same as:

  git diff a b -- path

We can't always do that (e.g., in the direct sha1/sha1 diff,
we have no path to show), in which case we should fall back
to the name that resolved to the blob (which is nonsense
from the repository's perspective, but is the best we can
do).

Aside from the fallback case in (1), none of the cases get
this right. Cases (2) and (3) always show the full
tree:path, even though we should be able to know just the
"path" portion.

Case (4) picks up the filename path, but assigns it to
_both_ sides of the diff. So this works for:

  git diff tree:path path

but not for:

  git diff tree:other_path path

The appropriate tests are marked to expect failure.

Signed-off-by: Jeff King 
---
 t/t4063-diff-blobs.sh | 91 +++
 1 file changed, 91 insertions(+)
 create mode 100755 t/t4063-diff-blobs.sh

diff --git a/t/t4063-diff-blobs.sh b/t/t4063-diff-blobs.sh
new file mode 100755
index 0..90c6f6b85
--- /dev/null
+++ b/t/t4063-diff-blobs.sh
@@ -0,0 +1,91 @@
+#!/bin/sh
+
+test_description='test direct comparison of blobs via git-diff'
+. ./test-lib.sh
+
+run_diff () {
+   # use full-index to make it easy to match the index line
+   git diff --full-index "$@" >diff
+}
+
+check_index () {
+   grep "^index $1\\.\\.$2" diff
+}
+
+check_mode () {
+   grep "^old mode $1" diff &&
+   grep "^new mode $2" diff
+}
+
+check_paths () {
+   grep "^diff --git a/$1 b/$2" diff
+}
+
+test_expect_success 'create some blobs' '
+   echo one >one &&
+   echo two >two &&
+   chmod +x two &&
+   git add . &&
+
+   # cover systems where modes are ignored
+   git update-index --chmod=+x two &&
+
+   git commit -m base &&
+
+   sha1_one=$(git rev-parse HEAD:one) &&
+   sha1_two=$(git rev-parse HEAD:two)
+'
+
+test_expect_success 'diff by sha1' '
+   run_diff $sha1_one $sha1_two
+'
+test_expect_success 'index of sha1 diff' '
+   check_index $sha1_one $sha1_two
+'
+test_expect_success 'sha1 diff uses arguments as paths' '
+   check_paths $sha1_one $sha1_two
+'
+test_expect_success 'sha1 diff has no mode change' '
+   ! grep mode diff
+'
+
+test_expect_success 'diff by tree:path (run)' '
+   run_diff HEAD:one HEAD:two
+'
+test_expect_success 'index of tree:path diff' '
+   check_index $sha1_one $sha1_two
+'
+test_expect_failure 'tree:path diff uses filenames as paths' '
+   check_paths one two
+'
+test_expect_success 'tree:path diff shows mode change' '
+   check_mode 100644 100755
+'
+
+test_expect_success 'diff by ranged tree:path' '
+   run_diff HEAD:one..HEAD:two
+'
+test_expect_success 'index of ranged tree:path diff' '
+   check_index $sha1_one $sha1_two
+'
+test_expect_failure 'ranged tree:path diff uses filenames as paths' '
+   check_paths one two
+'
+test_expect_failure 'ranged tree:path diff shows mode change' '
+   check_mode 100644 100755
+'
+
+test_expect_success 'diff blob against file' '
+   run_diff HEAD:one two
+'
+test_expect_success 'index of blob-file diff' '
+   check_index $sha1_one $sha1_two
+'
+test_expect_failure 'blob-file diff uses filename as paths' '
+   check_paths one two
+'
+test_expect_success FILEMODE 'blob-file diff shows mode change' '
+   check_mode 100644 100755
+'
+
+test_done
-- 
2.13.0.219.g63f6bc368



[PATCH 07/15] get_sha1_with_context: always initialize oc->symlink_path

2017-05-19 Thread Jeff King
The get_sha1_with_context() function zeroes out the
oc->symlink_path strbuf, but doesn't use strbuf_init() to
set up the usual invariants (like pointing to the slopbuf).
We don't actually write to the oc->symlink_path strbuf
unless we call get_tree_entry_follow_symlinks(), and that
function does initialize it. However, readers may still look
at the zero'd strbuf.

In practice this isn't a triggerable bug. The only caller
that looks at it only does so when the mode we found is 0.
This doesn't happen for non-tree-entries (where we return
S_IFINVALID). A broken tree entry could have a mode of 0,
but canon_mode() quietly rewrites that into S_IFGITLINK.
So the "0" mode should only come up when we did indeed find
a symlink.

This is mostly just an accident of how the code happens to
work, though. Let's future-proof ourselves to make sure the
strbuf is properly initialized for all calls (it's only a
few struct member assignments, not a heap allocation).

Signed-off-by: Jeff King 
---
 sha1_name.c | 1 +
 tree-walk.c | 1 -
 2 files changed, 1 insertion(+), 1 deletion(-)

diff --git a/sha1_name.c b/sha1_name.c
index a11d08dd8..35b16efc6 100644
--- a/sha1_name.c
+++ b/sha1_name.c
@@ -1511,6 +1511,7 @@ static int get_sha1_with_context_1(const char *name,
 
memset(oc, 0, sizeof(*oc));
oc->mode = S_IFINVALID;
+   strbuf_init(>symlink_path, 0);
ret = get_sha1_1(name, namelen, sha1, flags);
if (!ret)
return ret;
diff --git a/tree-walk.c b/tree-walk.c
index ff7760568..c7ecfc856 100644
--- a/tree-walk.c
+++ b/tree-walk.c
@@ -589,7 +589,6 @@ enum follow_symlinks_result 
get_tree_entry_follow_symlinks(unsigned char *tree_s
int i;
 
init_tree_desc(, NULL, 0UL);
-   strbuf_init(result_path, 0);
strbuf_addstr(, name);
hashcpy(current_tree_sha1, tree_sha1);
 
-- 
2.13.0.219.g63f6bc368



[PATCH 06/15] sha1_name: consistently refer to object_context as "oc"

2017-05-19 Thread Jeff King
An early version of the patch to add object_context used the
name object_resolve_context. This was later shortened to
just object_context, but the "orc" variable name stuck in a
few places.  Let's use "oc", which is used elsewhere in the
code.

Signed-off-by: Jeff King 
---
 cache.h | 2 +-
 sha1_name.c | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/cache.h b/cache.h
index 188811920..656341b8e 100644
--- a/cache.h
+++ b/cache.h
@@ -1363,7 +1363,7 @@ extern int get_sha1_tree(const char *str, unsigned char 
*sha1);
 extern int get_sha1_treeish(const char *str, unsigned char *sha1);
 extern int get_sha1_blob(const char *str, unsigned char *sha1);
 extern void maybe_die_on_misspelt_object_name(const char *name, const char 
*prefix);
-extern int get_sha1_with_context(const char *str, unsigned flags, unsigned 
char *sha1, struct object_context *orc);
+extern int get_sha1_with_context(const char *str, unsigned flags, unsigned 
char *sha1, struct object_context *oc);
 
 extern int get_oid(const char *str, struct object_id *oid);
 
diff --git a/sha1_name.c b/sha1_name.c
index 35c1e2a9e..a11d08dd8 100644
--- a/sha1_name.c
+++ b/sha1_name.c
@@ -1638,9 +1638,9 @@ void maybe_die_on_misspelt_object_name(const char *name, 
const char *prefix)
get_sha1_with_context_1(name, GET_SHA1_ONLY_TO_DIE, prefix, sha1, );
 }
 
-int get_sha1_with_context(const char *str, unsigned flags, unsigned char 
*sha1, struct object_context *orc)
+int get_sha1_with_context(const char *str, unsigned flags, unsigned char 
*sha1, struct object_context *oc)
 {
if (flags & GET_SHA1_FOLLOW_SYMLINKS && flags & GET_SHA1_ONLY_TO_DIE)
die("BUG: incompatible flags for get_sha1_with_context");
-   return get_sha1_with_context_1(str, flags, NULL, sha1, orc);
+   return get_sha1_with_context_1(str, flags, NULL, sha1, oc);
 }
-- 
2.13.0.219.g63f6bc368



[PATCH 05/15] handle_revision_arg: add handle_dotdot() helper

2017-05-19 Thread Jeff King
The handle_revision_arg function is rather long, and a big
chunk of it is handling the range operators. Let's pull that
out to a separate helper. While we're doing so, we can clean
up a few of the rough edges that made the flow hard to
follow:

  - instead of manually restoring *dotdot (that we overwrote
with a NUL), do the real work in a sub-helper, which
makes it clear that the munge/restore lines are a
matched pair

  - eliminate a goto which wasn't actually used for control
flow, but only to avoid duplicating a few lines
(instead, those lines are pushed into another helper
function)

  - use early returns instead of deep nesting

  - consistently name all variables for the left-hand side
of the range as "a" (rather than "this" or "from") and
the right-hand side as "b" (rather than "next", or using
the unadorned "sha1" or "flags" from the main function).

Signed-off-by: Jeff King 
---
 revision.c | 177 +++--
 1 file changed, 102 insertions(+), 75 deletions(-)

diff --git a/revision.c b/revision.c
index dec06ff8b..eb45501fd 100644
--- a/revision.c
+++ b/revision.c
@@ -1429,10 +1429,109 @@ static void prepare_show_merge(struct rev_info *revs)
revs->limited = 1;
 }
 
+static int dotdot_missing(const char *arg, char *dotdot,
+ struct rev_info *revs, int symmetric)
+{
+   if (revs->ignore_missing)
+   return 0;
+   /* de-munge so we report the full argument */
+   *dotdot = '.';
+   die(symmetric
+   ? "Invalid symmetric difference expression %s"
+   : "Invalid revision range %s", arg);
+}
+
+static int handle_dotdot_1(const char *arg, char *dotdot,
+  struct rev_info *revs, int flags,
+  int cant_be_filename)
+{
+   const char *a_name, *b_name;
+   struct object_id a_oid, b_oid;
+   struct object *a_obj, *b_obj;
+   unsigned int a_flags, b_flags;
+   int symmetric = 0;
+   unsigned int flags_exclude = flags ^ (UNINTERESTING | BOTTOM);
+
+   a_name = arg;
+   if (!*a_name)
+   a_name = "HEAD";
+
+   b_name = dotdot + 2;
+   if (*b_name == '.') {
+   symmetric = 1;
+   b_name++;
+   }
+   if (!*b_name)
+   b_name = "HEAD";
+
+   if (get_sha1_committish(a_name, a_oid.hash) ||
+   get_sha1_committish(b_name, b_oid.hash))
+   return -1;
+
+   if (!cant_be_filename) {
+   *dotdot = '.';
+   verify_non_filename(revs->prefix, arg);
+   *dotdot = '\0';
+   }
+
+   a_obj = parse_object(a_oid.hash);
+   b_obj = parse_object(b_oid.hash);
+   if (!a_obj || !b_obj)
+   return dotdot_missing(arg, dotdot, revs, symmetric);
+
+   if (!symmetric) {
+   /* just A..B */
+   b_flags = flags;
+   a_flags = flags_exclude;
+   } else {
+   /* A...B -- find merge bases between the two */
+   struct commit *a, *b;
+   struct commit_list *exclude;
+
+   a = lookup_commit_reference(a_obj->oid.hash);
+   b = lookup_commit_reference(b_obj->oid.hash);
+   if (!a || !b)
+   return dotdot_missing(arg, dotdot, revs, symmetric);
+
+   exclude = get_merge_bases(a, b);
+   add_rev_cmdline_list(revs, exclude, REV_CMD_MERGE_BASE,
+flags_exclude);
+   add_pending_commit_list(revs, exclude, flags_exclude);
+   free_commit_list(exclude);
+
+   b_flags = flags;
+   a_flags = flags | SYMMETRIC_LEFT;
+   }
+
+   a_obj->flags |= a_flags;
+   b_obj->flags |= b_flags;
+   add_rev_cmdline(revs, a_obj, a_name, REV_CMD_LEFT, a_flags);
+   add_rev_cmdline(revs, b_obj, b_name, REV_CMD_RIGHT, b_flags);
+   add_pending_object(revs, a_obj, a_name);
+   add_pending_object(revs, b_obj, b_name);
+   return 0;
+}
+
+static int handle_dotdot(const char *arg,
+struct rev_info *revs, int flags,
+int cant_be_filename)
+{
+   char *dotdot = strstr(arg, "..");
+   int ret;
+
+   if (!dotdot)
+   return -1;
+
+   *dotdot = '\0';
+   ret = handle_dotdot_1(arg, dotdot, revs, flags, cant_be_filename);
+   *dotdot = '.';
+
+   return ret;
+}
+
 int handle_revision_arg(const char *arg_, struct rev_info *revs, int flags, 
unsigned revarg_opt)
 {
struct object_context oc;
-   char *dotdot;
char *mark;
struct object *object;
unsigned char sha1[20];
@@ -1451,80 +1550,8 @@ int handle_revision_arg(const char *arg_, struct 
rev_info *revs, int flags, unsi
return -1;
}
 
-   dotdot = strstr(arg, "..");
-   if (dotdot) {
-   unsigned char 

[PATCH 04/15] handle_revision_arg: hoist ".." check out of range parsing

2017-05-19 Thread Jeff King
Since 003c84f6d (specifying ranges: we did not mean to make
".." an empty set, 2011-05-02), we treat the argument ".."
specially. We detect it by noticing that both sides of the
range are empty, and that this is a non-symmetric two-dot
range.

While correct, this makes the code overly complicated. We
can just detect ".." up front before we try to do further
parsing. This avoids having to de-munge the NUL from dotdot,
and lets us eliminate an extra const array (which we needed
only to do direct pointer comparisons).

It also removes the one code path from the range-parsing
conditional that requires us to return -1. That will make it
simpler to pull the dotdot parsing out into its own
function.

Signed-off-by: Jeff King 
---
There's a similar call in rev-parse which could maybe take the same
cleanup. I didn't bother with it because my main goal was removing an
obstacle to factoring out the dotdot parsing (though in an ideal world,
rev-parse would actually use the same dotdot parsing code; I've no idea
how complicated that would be).

 revision.c | 24 ++--
 1 file changed, 10 insertions(+), 14 deletions(-)

diff --git a/revision.c b/revision.c
index 9c683874d..dec06ff8b 100644
--- a/revision.c
+++ b/revision.c
@@ -1443,6 +1443,14 @@ int handle_revision_arg(const char *arg_, struct 
rev_info *revs, int flags, unsi
 
flags = flags & UNINTERESTING ? flags | BOTTOM : flags & ~BOTTOM;
 
+   if (!cant_be_filename && !strcmp(arg, "..")) {
+   /*
+* Just ".."?  That is not a range but the
+* pathspec for the parent directory.
+*/
+   return -1;
+   }
+
dotdot = strstr(arg, "..");
if (dotdot) {
unsigned char from_sha1[20];
@@ -1450,27 +1458,15 @@ int handle_revision_arg(const char *arg_, struct 
rev_info *revs, int flags, unsi
const char *this = arg;
int symmetric = *next == '.';
unsigned int flags_exclude = flags ^ (UNINTERESTING | BOTTOM);
-   static const char head_by_default[] = "HEAD";
unsigned int a_flags;
 
*dotdot = 0;
next += symmetric;
 
if (!*next)
-   next = head_by_default;
+   next = "HEAD";
if (dotdot == arg)
-   this = head_by_default;
-   if (this == head_by_default && next == head_by_default &&
-   !symmetric) {
-   /*
-* Just ".."?  That is not a range but the
-* pathspec for the parent directory.
-*/
-   if (!cant_be_filename) {
-   *dotdot = '.';
-   return -1;
-   }
-   }
+   this = "HEAD";
if (!get_sha1_committish(this, from_sha1) &&
!get_sha1_committish(next, sha1)) {
struct object *a_obj, *b_obj;
-- 
2.13.0.219.g63f6bc368



[PATCH 03/15] handle_revision_arg: stop using "dotdot" as a generic pointer

2017-05-19 Thread Jeff King
The handle_revision_arg() function has a "dotdot" variable
that it uses to find a ".." or "..." in the argument. If we
don't find one, we look for other marks, like "^!". But we
just keep re-using the "dotdot" variable, which is
confusing.

Let's introduce a separate "mark" variable that can be used
for these other marks. They still reuse the same variable,
but at least the name is no longer actively misleading.

Signed-off-by: Jeff King 
---
It may make sense to pull each of these into its own helper. I didn't
really look because they're so small, and because the return semantics
seemed confusing to me. Some of them return, and some of them keep
parsing. Some of them restore the NUL they overwrite, and some do not.

I didn't dig in to see if there are weird corner cases where they
misbehave.

 revision.c | 29 +++--
 1 file changed, 15 insertions(+), 14 deletions(-)

diff --git a/revision.c b/revision.c
index b6031fb35..9c683874d 100644
--- a/revision.c
+++ b/revision.c
@@ -1433,6 +1433,7 @@ int handle_revision_arg(const char *arg_, struct rev_info 
*revs, int flags, unsi
 {
struct object_context oc;
char *dotdot;
+   char *mark;
struct object *object;
unsigned char sha1[20];
int local_flags;
@@ -1529,33 +1530,33 @@ int handle_revision_arg(const char *arg_, struct 
rev_info *revs, int flags, unsi
*dotdot = '.';
}
 
-   dotdot = strstr(arg, "^@");
-   if (dotdot && !dotdot[2]) {
-   *dotdot = 0;
+   mark = strstr(arg, "^@");
+   if (mark && !mark[2]) {
+   *mark = 0;
if (add_parents_only(revs, arg, flags, 0))
return 0;
-   *dotdot = '^';
+   *mark = '^';
}
-   dotdot = strstr(arg, "^!");
-   if (dotdot && !dotdot[2]) {
-   *dotdot = 0;
+   mark = strstr(arg, "^!");
+   if (mark && !mark[2]) {
+   *mark = 0;
if (!add_parents_only(revs, arg, flags ^ (UNINTERESTING | 
BOTTOM), 0))
-   *dotdot = '^';
+   *mark = '^';
}
-   dotdot = strstr(arg, "^-");
-   if (dotdot) {
+   mark = strstr(arg, "^-");
+   if (mark) {
int exclude_parent = 1;
 
-   if (dotdot[2]) {
+   if (mark[2]) {
char *end;
-   exclude_parent = strtoul(dotdot + 2, , 10);
+   exclude_parent = strtoul(mark + 2, , 10);
if (*end != '\0' || !exclude_parent)
return -1;
}
 
-   *dotdot = 0;
+   *mark = 0;
if (!add_parents_only(revs, arg, flags ^ (UNINTERESTING | 
BOTTOM), exclude_parent))
-   *dotdot = '^';
+   *mark = '^';
}
 
local_flags = 0;
-- 
2.13.0.219.g63f6bc368



[PATCH 01/15] handle_revision_arg: reset "dotdot" consistently

2017-05-19 Thread Jeff King
When we are parsing a range like "a..b", we write a
temporary NUL over the first ".", so that we can access the
names "a" and "b" as C strings. But our restoration of the
original "." is done at inconsistent times, which can lead
to confusing results.

For most calls, we restore the "." after we resolve the
names, but before we call verify_non_filename().  This means
that when we later call add_pending_object(), the name for
the left-hand "a" has been re-expanded to "a..b". You can
see this with:

  git log --source a...b

where "b" will be correctly marked with "b", but "a" will be
marked with "a...b". Likewise with "a..b" (though you need
to use --boundary to even see "a" at all in that case).

To top off the confusion, when the REVARG_CANNOT_BE_FILENAME
flag is set, we skip the non-filename check, and leave the
NUL in place.

That means we do report the correct name for "a" in the
pending array. But some code paths try to show the whole
"a..b" name in error messages, and these erroneously show
only "a" instead of "a..b". E.g.:

  $ git cherry-pick HEAD:foo..HEAD:foo
  error: object d95f3ad14dee633a758d2e331151e950dd13e4ed is a blob, not a commit
  error: object d95f3ad14dee633a758d2e331151e950dd13e4ed is a blob, not a commit
  fatal: Invalid symmetric difference expression HEAD:foo

(That last message should be "HEAD:foo...HEAD:foo"; I used
cherry-pick because it passes the CANNOT_BE_FILENAME flag).

As an interesting side note, cherry-pick actually looks at
and re-resolves the arguments from the pending->name fields.
So it would have been visibly broken by the first bug, but
the effect was canceled out by the second one.

This patch makes the whole function consistent by re-writing
the NUL immediately after calling verify_non_filename(), and
then restoring the "." as appropriate in some error-printing
and early-return code paths.

Signed-off-by: Jeff King 
---
I also considered just making a copy of the string rather than this
in-place munging (technically we get it as a pointer-to-const; it's only
the use of strstr() that lets us quietly drop the const). But it doesn't
really make the code any cleaner; now instead of restoring the dot you
have to remember to free() the string in each code path.

 revision.c | 3 +++
 t/t4202-log.sh | 9 +
 2 files changed, 12 insertions(+)

diff --git a/revision.c b/revision.c
index 8a8c1789c..014bf52e3 100644
--- a/revision.c
+++ b/revision.c
@@ -1477,12 +1477,14 @@ int handle_revision_arg(const char *arg_, struct 
rev_info *revs, int flags, unsi
if (!cant_be_filename) {
*dotdot = '.';
verify_non_filename(revs->prefix, arg);
+   *dotdot = '\0';
}
 
a_obj = parse_object(from_sha1);
b_obj = parse_object(sha1);
if (!a_obj || !b_obj) {
missing:
+   *dotdot = '.';
if (revs->ignore_missing)
return 0;
die(symmetric
@@ -1525,6 +1527,7 @@ int handle_revision_arg(const char *arg_, struct rev_info 
*revs, int flags, unsi
REV_CMD_RIGHT, flags);
add_pending_object(revs, a_obj, this);
add_pending_object(revs, b_obj, next);
+   *dotdot = '.';
return 0;
}
*dotdot = '.';
diff --git a/t/t4202-log.sh b/t/t4202-log.sh
index f57799071..6da1bbe91 100755
--- a/t/t4202-log.sh
+++ b/t/t4202-log.sh
@@ -1380,4 +1380,13 @@ test_expect_success 'log --source paints tag names' '
test_cmp expect actual
 '
 
+test_expect_success 'log --source paints symmetric ranges' '
+   cat >expect <<-\EOF &&
+   09e12a9 source-b three
+   8e393e1 source-a two
+   EOF
+   git log --oneline --source source-a...source-b >actual &&
+   test_cmp expect actual
+'
+
 test_done
-- 
2.13.0.219.g63f6bc368



[PATCH 02/15] handle_revision_arg: simplify commit reference lookups

2017-05-19 Thread Jeff King
The "dotdot" range parser avoids calling
lookup_commit_reference() if we are directly fed two
commits. But its casts are unnecessarily complex; that
function will just return a commit we pass into it.

Just calling the function all the time is much simpler, and
doesn't do any significant extra work (the object is already
parsed, and deref_tag() on a non-tag is a noop; we do incur
one extra lookup_object() call, but that's fairly trivial).

Signed-off-by: Jeff King 
---
 revision.c | 8 ++--
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/revision.c b/revision.c
index 014bf52e3..b6031fb35 100644
--- a/revision.c
+++ b/revision.c
@@ -1500,12 +1500,8 @@ int handle_revision_arg(const char *arg_, struct 
rev_info *revs, int flags, unsi
struct commit *a, *b;
struct commit_list *exclude;
 
-   a = (a_obj->type == OBJ_COMMIT
-? (struct commit *)a_obj
-: 
lookup_commit_reference(a_obj->oid.hash));
-   b = (b_obj->type == OBJ_COMMIT
-? (struct commit *)b_obj
-: 
lookup_commit_reference(b_obj->oid.hash));
+   a = lookup_commit_reference(a_obj->oid.hash);
+   b = lookup_commit_reference(b_obj->oid.hash);
if (!a || !b)
goto missing;
exclude = get_merge_bases(a, b);
-- 
2.13.0.219.g63f6bc368



[PATCH 0/15] retain blob info for git diff HEAD:foo HEAD:bar

2017-05-19 Thread Jeff King
On Tue, May 16, 2017 at 10:05:35PM -0400, Jeff King wrote:

> On Wed, May 17, 2017 at 10:38:36AM +0900, Junio C Hamano wrote:
> 
> > > - add_pending_object(revs, a_obj, this);
> > > - add_pending_object(revs, b_obj, next);
> > > + add_pending_object_with_path(revs, a_obj, this,
> > > +  oc.mode,
> > > +  oc.path[0] ? oc.path : 
> > > NULL);
> > > + add_pending_object_with_path(revs, b_obj, next,
> > > +  oc2.mode,
> > > +  oc2.path[0] ? oc2.path : 
> > > NULL);
> > 
> > The fix is surprisingly simple, and I think it definitely goes in
> > the right direction.
> > 
> > Somehow, it bothers me to be forced to view (a_obj, this, from_sha1,
> > oc) vs (b_obj, next, sha1, oc2) as a sensibly corresponding pair of
> > tuples, but that is not something your introduction of "oc2"
> > started, so I won't complain ;-).
> 
> Yes, in my polishing I switched this out at least "oc_a" and "oc_b" so
> we could be sure they match correctly. I think this whole "dotdot"
> conditional could be pulled out to its own function, and probably
> consistently use "a" and "b" for the endpoints. I'll see what
> refactoring I can do to make it more readable.

The cleanup ended up being a bit bigger than I expected, but I did find
an obscure bug while I was there. And I think the end result looks
pretty decent.

The patches are listed below. There are a few things I didn't do (and
don't plan to in the near future):

  - I didn't pick up Dscho's gitattributes test. The tests in patch 9
cover that in a much more direct fashion by looking at the diff
headers. We _could_ still check that they respect gitattributes, but
there's no reason that they wouldn't; the bug was before we even hit
the diff machinery at all.

  - There is an interesting related case with attributes that wasn't
tested by that case. If you do:

   git diff $(git rev-parse HEAD:one) $(git rev-parse HEAD:two)

then we'll feed those sha1s to the diff machinery as the path (we
have enough information at that point to know they aren't actually
paths, but we have to give _something_ to the diff code to display).

That means we'll also look for .gitattributes files that match those
names. It would be hard to fix (because the diff machinery doesn't
have a notion of "this path is for display, but not for attributes).
And I doubt anybody cares (AIUI, the motivation for the original was
not that somebody was concerned with reading what is likely to be a
non-existent attributes file, but rather that looking at names with
colons produces a bogus warning on Windows).

  - I noticed that the dotdot parser jumps into the middle of the string
with a strstr(), and never tries to find other dotdots. That means
things like:

  HEAD@{2..days.ago}...HEAD@{3..days.ago}

does not parse (we find the first "..", realize that it's not a
range operator, and then give up looking for more range operators).
That could be solved by either parsing left-to-right, or by trying
each ".." in the string in turn. I doubt anybody cares overly much.

I think we do get the related:

  git show HEAD:foo..bar

correct, because we see that "HEAD:foo" is not a commit and bail to
trying the whole thing as a single unit.

I think those are all minor problems that have likely been around for
10+ years, and would take a lot of digging and work to fix. Unless
somebody actually hits one in practice, I'm happy to punt for another
decade.

  [01/15]: handle_revision_arg: reset "dotdot" consistently
  [02/15]: handle_revision_arg: simplify commit reference lookups
  [03/15]: handle_revision_arg: stop using "dotdot" as a generic pointer
  [04/15]: handle_revision_arg: hoist ".." check out of range parsing
  [05/15]: handle_revision_arg: add handle_dotdot() helper
  [06/15]: sha1_name: consistently refer to object_context as "oc"
  [07/15]: get_sha1_with_context: always initialize oc->symlink_path
  [08/15]: get_sha1_with_context: dynamically allocate oc->path
  [09/15]: t4063: add tests of direct blob diffs
  [10/15]: handle_revision_arg: record modes for "a..b" endpoints
  [11/15]: handle_revision_arg: record paths for pending objects
  [12/15]: diff: pass whole pending entry in blobinfo
  [13/15]: diff: use the word "path" instead of "name" for blobs
  [14/15]: diff: use pending "path" if it is available
  [15/15]: diff: use blob path for blob/file diffs

 builtin/cat-file.c|   4 +-
 builtin/diff.c|  60 ++---
 builtin/grep.c|   4 +-
 builtin/log.c |  10 ++-
 cache.h   |  10 ++-
 revision.c| 243 +-
 sha1_name.c   |  11 ++-
 t/t4063-diff-blobs.sh |  96 

Re: [WIP/RFC 00/23] repository object

2017-05-19 Thread Jeff Hostetler



On 5/18/2017 7:21 PM, Brandon Williams wrote:

When I first started working on the git project I found it very difficult to
understand parts of the code base because of the inherently global nature of
our code.  It also made working on submodules very difficult.  Since we can
only open up a single repository per process, you need to launch a child
process in order to process a submodule.  But you also need to be able to
communicate other stateful information to the children processes so that the
submodules know how best to format their output or match against a
pathspec...it ends up feeling like layering on hack after hack.  What I would
really like to do, is to have the ability to have a repository object so that I
can open a submodule in-process.

Before this becomes a reality for all commands, much of the library code would
need to be refactored in order to work purely on handles instead of global
state.  As it turned out, ls-files is a pretty simple command and doesn't have
*too* many dependencies.  The biggest thing that needed to be changed was
piping through an index into a couple library routines so that they don't
inherently rely on 'the_index'.  A few of these changes I've sent out and can
be found at 'origin/bw/pathspec-sans-the-index' and
'origin/bw/dir-c-stops-relying-on-the-index' which this series is based on.

Patches 1-16 are refactorings to prepare either library code or ls-files itself
to be ready to handle passing around an index struct.  Patches 17-22 introduce
a repository struct and change a couple of things about how submodule caches
work (getting submodule information from .gitmodules).  And Patch 23 converts
ls-files to use a repository struct.

The most interesting part of the series is from 17-23.  And 1-16 could be taken
as is without the rest of the series.

This is still very much in a WIP state, though it does pass all tests.  What
I'm hoping for here is to get a discussion started about the feasibility of a
change like this and hopefully to get the ball rolling.  Is this a direction we
want to move in?  Is it worth the pain?

Thanks for taking the time to look at this and entertain my insane ideas :)


Very nice and thanks for starting this.
Jeff



your response is highly appreciated

2017-05-19 Thread Mr. Adams Salem


Hello ,

I am specifically contacting you in respect of a business proposal that I have 
for you as you appear very relevant in the proposal.

Please kindly reply back to me for further details.

Waiting to hear from you.

Regards,

Mr. Adams Salem

Private E-mail: mradamssa...@outlook.my


Re: [PATCH] rebase -i: Add missing newline to end of message

2017-05-19 Thread Ævar Arnfjörð Bjarmason
On Fri, May 19, 2017 at 12:13 PM, Phillip Wood
 wrote:
> On 18/05/17 22:21, Johannes Schindelin wrote:
>> Hi Phillip,
>>
>> On Thu, 18 May 2017, Phillip Wood wrote:
>>
>>> From: Phillip Wood 
>>>
>>> The message that's printed when auto-stashed changes are successfully
>>> restored was missing '\n' at the end.
>>> ---
>>
>> Please add your Signed-off-by:, and my Acked-by:
>
> Will do (I forgot to add --signoff to git commit). Do I need to resend
> the other patches with your Acked-by: or will that happen when Junio
> applies them?
>
> I was thinking about this last night and wonder if it would be better to do
>
> -   printf(_("Applied autostash."));
> +   printf("%s\n", _("Applied autostash."));
>
> rather than
>
> -   printf(_("Applied autostash."));
> +   printf(_("Applied autostash.\n"));
>
> as it would avoid changing the translated string and also mean that the
> newline couldn't be accidentally removed any typos in the translation.
>
> Best Wishes
>
> Phillip

Just having the \n in the message is fine. Stuff like this is already
checked for as part of the build process by msgfmt, e.g.:

$ git diff -U0
diff --git a/po/de.po b/po/de.po
index 679f8f4720..b9a7d417ac 100644
--- a/po/de.po
+++ b/po/de.po
@@ -23 +23 @@ msgid "hint: %.*s\n"
-msgstr "Hinweis: %.*s\n"
+msgstr "Hinweis: %.*s"

Errors with:

po/de.po:23: 'msgid' and 'msgstr' entries do not both end with '\n'

And if you change the format specifier:

po/de.po:23: number of format specifications in 'msgid' and
'msgstr' does not match


Re: [PATCH] rebase -i: Add missing newline to end of message

2017-05-19 Thread Phillip Wood
On 18/05/17 22:21, Johannes Schindelin wrote:
> Hi Phillip,
> 
> On Thu, 18 May 2017, Phillip Wood wrote:
> 
>> From: Phillip Wood 
>>
>> The message that's printed when auto-stashed changes are successfully
>> restored was missing '\n' at the end.
>> ---
> 
> Please add your Signed-off-by:, and my Acked-by:

Will do (I forgot to add --signoff to git commit). Do I need to resend
the other patches with your Acked-by: or will that happen when Junio
applies them?

I was thinking about this last night and wonder if it would be better to do

-   printf(_("Applied autostash."));
+   printf("%s\n", _("Applied autostash."));

rather than

-   printf(_("Applied autostash."));
+   printf(_("Applied autostash.\n"));

as it would avoid changing the translated string and also mean that the
newline couldn't be accidentally removed any typos in the translation.

Best Wishes

Phillip


Re: [PATCH 22/23] ref-filter: limit traversal to prefix

2017-05-19 Thread Michael Haggerty
On 05/17/2017 03:38 PM, Jeff King wrote:
> On Wed, May 17, 2017 at 02:05:45PM +0200, Michael Haggerty wrote:
> 
>> From: Jeff King 
> 
> This patch did originate with me, but I know you had to fix several
> things to integrate it in your series. So I'll review it anyway, and
> give you full blame for any bugs. :)
> 
>> When we are matching refnames "as path" against a pattern, then we
>> know that the beginning of any refname that can match the pattern has
>> to match the part of the pattern up to the first glob character. For
>> example, if the pattern is `refs/heads/foo*bar`, then it can only
>> match a reference that has the prefix `refs/heads/foo`.
> 
> That first sentence confused me as to what "as path" meant (I know
> because I worked on this code, and even then it took me a minute to
> parse it).
> 
> Maybe just "When we are matching refnames against a pattern" and then
> later something like:
> 
>   Note that this applies only when the "match_as_path" flag is set
>   (i.e., when for-each-ref is the caller), as the matching rules for
>   git-branch and git-tag are subtly different.

That is clearer. I'll make the change.

>> +/*
>> + * Find the longest prefix of pattern we can pass to
>> + * for_each_fullref_in(), namely the part of pattern preceding the
>> + * first glob character.
>> + */
>> +static void find_longest_prefix(struct strbuf *out, const char *pattern)
>> +{
>> +const char *p;
>> +
>> +for (p = pattern; *p && !is_glob_special(*p); p++)
>> +;
>> +
>> +strbuf_add(out, pattern, p - pattern);
>> +}
> 
> If I were reviewing this from scratch, I'd probably ask whether it is OK
> in:
> 
>   refs/heads/m*
> 
> to return "refs/heads/m" as the prefix, and not stop at the last
> non-wildcard component ("refs/heads/"). But I happen to know we
> discussed this off-list and you checked that for_each_ref and friends
> are happy with an arbitrary prefix. But I'm calling it out here for
> other reviewers.

I'll add a comment about this, too.

Thanks,
Michael



Re: [PATCH 12/23] ref_transaction_commit(): break into multiple functions

2017-05-19 Thread Michael Haggerty
On 05/17/2017 07:44 PM, Stefan Beller wrote:
> On Wed, May 17, 2017 at 5:05 AM, Michael Haggerty  
> wrote:
>> Break the function `ref_transaction_commit()` into two functions,
>> `ref_transaction_prepare()` and `ref_transaction_finish()`, with a
>> third function, `ref_transaction_abort()`, that can be used to abort
>> the transaction. Break up the `ref_store` methods similarly.
>>
>> This split will make it easier to implement compound reference stores,
>> where a transaction might have to span multiple underlying stores. In
>> that case, we would first want to "prepare" the sub-transactions in
>> each of the reference stores. If any of the "prepare" steps fails, we
>> would "abort" all of the sub-transactions. Only if all of the
>> "prepare" steps succeed would we "finish" each of them.
> [...]

Thanks for your comments. While I was incorporating them, I realized
that other parts of the documentation needed updates, too. And while I
was fixing that, I noticed that the interface was unnecessarily
complicated. The old version required either `commit` or `prepare`
followed by `finish`. But there is no reason that the public API has to
expose `finish`. So instead, let's call `prepare` an optional step that
is allowed to precede `commit`, and make `commit` smart enough to call
`prepare` if it hasn't been called already, and then call `finish`.

Furthermore, let's make it possible to call `abort` when the transaction
is in OPEN as well as PREPARED state rather than requiring `free` in
OPEN state and `abort` in PREPARED state.

I will make these changes and revamp the docs for v2.

Michael



[PATCH] ref-filter: resolve HEAD when parsing %(HEAD) atom

2017-05-19 Thread Jeff King
If the user asks to display (or sort by) the %(HEAD) atom,
ref-filter has to compare each refname to the value of HEAD.
We do so by resolving HEAD fresh when calling populate_value()
on each ref. If there are a large number of refs, this can
have a measurable impact on runtime.

Instead, let's resolve HEAD once when we realize we need the
%(HEAD) atom, allowing us to do a simple string comparison
for each ref. On a repository with 3000 branches (high, but
an actual example found in the wild) this drops the
best-of-five time to run "git branch >/dev/null" from 59ms
to 48ms (~20% savings).

Signed-off-by: Jeff King 
---
The "something like this" patch I sent earlier just cached the value of
HEAD in a global for the length of the program. This is a bit nicer, in
that it ties the cache to the atom we are filling in. But since that's
also stored in a program global, the end effect is the same. :) I think
it's still worth doing it this way, though, as we might one day push the
used_atom stuff into some kind of ref_filter_context struct, and then
this would Just Work.

I did take a look at de-globalifying used_atom and friends, but it gets
pretty nasty pushing it all through the callstack. Since there's no
immediate benefit, I don't think it's really worth pursuing for now.

 ref-filter.c | 16 +---
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/ref-filter.c b/ref-filter.c
index 1fc5e9970..82ca411d0 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -93,6 +93,7 @@ static struct used_atom {
unsigned int length;
} objectname;
struct refname_atom refname;
+   char *head;
} u;
 } *used_atom;
 static int used_atom_cnt, need_tagged, need_symref;
@@ -287,6 +288,12 @@ static void if_atom_parser(struct used_atom *atom, const 
char *arg)
}
 }
 
+static void head_atom_parser(struct used_atom *atom, const char *arg)
+{
+   unsigned char unused[GIT_SHA1_RAWSZ];
+
+   atom->u.head = resolve_refdup("HEAD", RESOLVE_REF_READING, unused, 
NULL);
+}
 
 static struct {
const char *name;
@@ -325,7 +332,7 @@ static struct {
{ "push", FIELD_STR, remote_ref_atom_parser },
{ "symref", FIELD_STR, refname_atom_parser },
{ "flag" },
-   { "HEAD" },
+   { "HEAD", FIELD_STR, head_atom_parser },
{ "color", FIELD_STR, color_atom_parser },
{ "align", FIELD_STR, align_atom_parser },
{ "end" },
@@ -1369,12 +1376,7 @@ static void populate_value(struct ref_array_item *ref)
} else if (!deref && grab_objectname(name, ref->objectname, v, 
atom)) {
continue;
} else if (!strcmp(name, "HEAD")) {
-   const char *head;
-   unsigned char sha1[20];
-
-   head = resolve_ref_unsafe("HEAD", RESOLVE_REF_READING,
- sha1, NULL);
-   if (head && !strcmp(ref->refname, head))
+   if (atom->u.head && !strcmp(ref->refname, atom->u.head))
v->s = "*";
else
v->s = " ";
-- 
2.13.0.219.g63f6bc368