Re: [PATCH] Documentation: consolidate submodule..update

2017-09-25 Thread Jonathan Nieder
Stefan Beller wrote:
> On Mon, Sep 25, 2017 at 12:17 PM, Brandon Williams  wrote:
>> On 09/25, Stefan Beller wrote:

>>> Have one place to explain the effects of setting submodule..update
>>> instead of two.
>>>
>>> Signed-off-by: Stefan Beller 
>>> ---
> I disagree.  Actually, I think the git-config(1) blurb could just
> point here, and that the text here ought to be clear about what
> commands it affects and how an end user would use it.

 I tend to agree with the consolidation.
>>>
>>> Something like this?
>>
>> I like the consolidation, its easier to keep up to date when its only in
>> one place.
>
> After thinking about it further, I'd like to withdraw this patch
> for now.
>
> The reason is that this would also forward point to
> git-submodule.txt, such that we'd still have 2 places
> in which it is explained. The current situation with explaining
> it in 3 places is not too bad as each place hints at their specific
> circumstance:
> git-submodule.txt explains the values to set itself.
> gitmodules.txt explains what the possible fields in that file are,
>   and regarding this field it points out it is ignored in the init call.
> config.txt: actually describe the effects of the setting.
>
> I think we can keep it as-is for now.

Sorry, I got lost.  Which state is as-is?

As a user, how do I find out what submodule.*.update is going to do
and which commands will respect it?

Maybe we should work on first wordsmithing the doc and then figuring
out where it goes?  The current state of the doc with (?) three
different texts, all wrong, doesn't seem like a good state to
preserve.

Thanks,
Jonathan


Re: git ls-tree -d doesn't work if the specified path is the repository root?

2017-09-25 Thread Jonathan Nieder
Hi,

Roy Wellington wrote:

> When I run `git ls-tree -d HEAD -- subdir` from the root of my
> repository, where `subdir` is a subdirectory in that root, I get the
> treehash of that subdirectory. This is what I expect.
>
> However, if I merely replace `subdir` with `.` (the root of the
> repository), (i.e., `git ls-tree -d HEAD -- .`) git ls-tree returns
> the treehashes of the /children/ of the root, instead of the root
> itself, contrary to the documented behavior of -d.

Can you be more specific?  What documentation led you to this
expectation?

I don't have a strong opinion about this behavior, but in any case if
the documentation and command disagree about the correct behavior then
one of them needs to be fixed. :)

> Is there some reason for this? This behavior seems like a bug to me:
> it means that prior to calling ls-tree I need to check if the
> referenced path happens to be the root, and if so, find some other
> means (rev-parse?) of converting it to a treehash.

Does

git rev-parse HEAD:subdir

work better for what you're trying to accomplish?  To get the root of
the repository, you can use

git rev-parse HEAD:

Thanks and hope that helps,
Jonathan


Re: [PATCH 3/7] read_in_full: reset errno before reading

2017-09-25 Thread Jonathan Nieder
Jeff King wrote:

>> --- a/wrapper.c
>> +++ b/wrapper.c
>> @@ -318,8 +318,10 @@ ssize_t read_in_full(int fd, void *buf, size_t count)
>>  ssize_t loaded = xread(fd, p, count);
>>  if (loaded < 0)
>>  return -1;
>> -if (loaded == 0)
>> +if (loaded == 0) {
>> +errno = EILSEQ;
>>  return total;
>> +}
>
> If I do this:
>
>   errno = 0;
>   read_in_full(fd, buf, sizeof(buf));
>   if (errno)
>   die_errno("oops");
>
> then we'll claim an error, even though there was none (remember that
> it's only an error for _some_ callers to not read the whole length).
>
> This may be sufficiently odd that we don't need to care about it. There
> are some calls (like strtoul) which require this kind of clear-and-check
> strategy with errno, but in general we frown on it for calls like
> read().

Correct.  Actually more than "frown on": except for with the few calls
like strtoul that are advertised to work this way, POSIX does not make
the guarantee the above code would rely on, at all.

So it's not just frowned upon: it's so unportable that the standard
calls it out as something that won't work.

> We could also introduce a better helper like this:
>
>   int read_exactly(int fd, void *buf, size_t count)
>   {
>   ssize_t ret = read_in_full(fd, buf, count);
>   if (ret < 0)
>   return -1;
>   if (ret != count) {
>   errno = EILSEQ;
>   return -1;
>   }
>   return 0;
>   }
>
> Then we know that touching errno always coincides with an error return.
> And it's shorter to check "< 0" compared to "!= count" in the caller.
> But of course a caller which wants to distinguish the two cases for its
> error messages then has to look at errno:

That sounds nice, but it doesn't solve the original problem of callers
using read_in_full that way.

Thanks,
Jonathan


Re: [PATCH] Documentation: consolidate submodule..update

2017-09-25 Thread Jonathan Nieder
Stefan Beller wrote:

> By as-is I refer to origin/pu.

Ah, okay.  I'm not in that habit because pu frequently loses topics
--- it's mostly useful as a tool to (1) distribute topic branches and
(2) check whether they are compatible with each other.

[...]
> On Mon, Sep 25, 2017 at 3:22 PM, Jonathan Nieder  wrote:

>> Maybe we should work on first wordsmithing the doc and then figuring
>> out where it goes?  The current state of the doc with (?) three
>> different texts,
>
> such that each different text highlights each locations purpose.
>
>> all wrong,
>
> Care to spell out this bold claim?

In the state of "master", "man git-config" tells me that

[submodule ""]
update = ...

sets the "default update procedure for a submodule".  It suggests that
I read about "git submodule update" in git-submodule(1) for more
details.  This is problematic because

 1. It doesn't tell me when I should and should not set it.
 2. I would expect "git checkout --recurse-submodules" to respect the
default update procedure.
 3. It doesn't tell me what commands like "git fetch
--recurse-submodules" will do.
 4. It doesn't point me to better alternatives, when this
configuration doesn't fit my use case.

"man git-submodule" doesn't have a section on submodule..update.
It has references scattered throughout the manpage.  It only tells me
how the setting affects the "git submodule update" command and doesn't
address the problems (1)-(4).

"man gitmodules" also refers to this setting as the "default update
procedure for the named submodule".  It doesn't answer the questions
(1)-(4) either.

Thanks and hope that helps,
Jonathan


Re: [PATCH 3/7] read_in_full: reset errno before reading

2017-09-25 Thread Jonathan Nieder
Jeff King wrote:
> On Mon, Sep 25, 2017 at 04:33:16PM -0700, Jonathan Nieder wrote:
>> Jef King wrote:

>>>   errno = 0;
>>>   read_in_full(fd, buf, sizeof(buf));
>>>   if (errno)
>>> die_errno("oops");
[...]
>>>  in general we frown on it for calls like
>>> read().
>>
>> Correct.  Actually more than "frown on": except for with the few calls
>> like strtoul that are advertised to work this way, POSIX does not make
>> the guarantee the above code would rely on, at all.
>>
>> So it's not just frowned upon: it's so unportable that the standard
>> calls it out as something that won't work.
>
> Is it unportable? Certainly read() is free reset errno to zero on
> success. But is it allowed to set it to another random value?
>
> I think we're getting pretty academic here, but I'm curious if you have
> a good reference.

Yes, it is allowed to set it to another random value.  It's a common
thing for implementations to do when they hit a recoverable error,
which they sometimes do (e.g. think EFAULT, EINTR, ETIMEDOUT, or an
implementation calling strtoul and getting EINVAL or ERANGE).

glibc only recently started trying to avoid this kind of behavior,
because even though the standard allows it, users hate it.

POSIX.1-2008 XSI 2.3 "Error Numbers" says

Some functions provide the error number in a variable accessed
through the symbol errno, defined by including the 
header.  The value of errno should only be examined when it is
indicated to be valid by a function's return value.

See http://austingroupbugs.net/view.php?id=374 for an example where
someone wanted to add a new guarantee of that kind to a function.

Hope that helps,
Jonathan


Re: [PATCH 3/7] read_in_full: reset errno before reading

2017-09-25 Thread Jonathan Nieder
Jeff King wrote:

>   [EOVERFLOW]
> The file is a regular file, nbyte is greater than 0, the starting
> position is before the end-of-file, and the starting position is
> greater than or equal to the offset maximum established in the open
> file description associated with fildes.
>
> That's not _exactly_ what's going on here, but it's pretty close. And is
> what you would get if you implemented read_exactly() in terms of
> something like pread().

All that really matters is the strerror() that the user would see.

For EOVERFLOW, that is

Value too large to be stored in data type

For EILSEQ, it is

Illegal byte sequence

Neither is perfect, but the latter seems like a better fit for the kind
of file format errors we're describing here.  Of course, it's even
better if we fix the callers and don't try to wedge this into errno.

If you are okay with the too-long/too-short confusion in EOVERFLOW, then
there is EMSGSIZE:

Message too long

Hope that helps,
Jonathan


Re: [PATCH] t7406: submodule..update command must not be run from .gitmodules

2017-09-25 Thread Jonathan Nieder
Stefan Beller wrote:

> submodule..update can be assigned an arbitrary command via setting
> it to "!command". When this command is found in the regular config, Git
> ought to just run that command instead of other update mechanisms.
>
> However if that command is just found in the .gitmodules file, it is
> potentially untrusted, which is why we do not run it.  Add a test
> confirming the behavior.
>
> Suggested-by: Jonathan Nieder 
> Signed-off-by: Stefan Beller 
> ---
>  t/t7406-submodule-update.sh | 14 ++
>  1 file changed, 14 insertions(+)
>
> diff --git a/t/t7406-submodule-update.sh b/t/t7406-submodule-update.sh
> index 034914a14f..d718cb00e7 100755
> --- a/t/t7406-submodule-update.sh
> +++ b/t/t7406-submodule-update.sh
> @@ -406,6 +406,20 @@ test_expect_success 'submodule update - command in 
> .git/config' '
>   )
>  '
>  
> +test_expect_success 'submodule update - command in .gitmodules is ignored' '
> + test_when_finished "git -C super reset --hard HEAD^" &&
> +
> + write_script must_not_run.sh <<-EOF &&
> + >$TEST_DIRECTORY/bad
> + EOF
> +
> + git -C super config -f .gitmodules submodule.submodule.update 
> "!$TEST_DIRECTORY/must_not_run.sh" &&

Long line, but I don't think I care.  I wish there were a tool like
"make style" to format shell scripts.

> + git -C super commit -a -m "add command to .gitmodules file" &&
> + git -C super/submodule reset --hard $submodulesha1^ &&
> + git -C super submodule update submodule &&
> + test_path_is_missing bad
> +'

Per offline discussion, you tested that this fails when you use
.git/config instead of .gitmodules, so there aren't any subtle typos
here. :)

Reviewed-by: Jonathan Nieder 

Thanks for writing it.


Re: [PATCH 3/7] read_in_full: reset errno before reading

2017-09-25 Thread Jonathan Nieder
Jeff King wrote:
> On Mon, Sep 25, 2017 at 04:55:10PM -0700, Jonathan Nieder wrote:

>> All that really matters is the strerror() that the user would see.
>
> Agreed, though I didn't think those were necessarily standardized.

The standard allows them to vary for the sake of internationalization,
but they are de facto constants (probably because of application authors
like us abusing them ;-)).

[...]
>> Of course, it's even
>> better if we fix the callers and don't try to wedge this into errno.
>
> Yes. This patch is just a stop-gap. Perhaps we should abandon it
> entirely. But I couldn't find a way to fix all the callers. If you have
> a function that just returns "-1" when it sees a read_in_full() error,
> how does _its_ caller tell the difference?
>
> I guess the answer is that the interface of the sub-function calling
> read_in_full() needs to change.

Yes. :/

>> If you are okay with the too-long/too-short confusion in EOVERFLOW, then
>> there is EMSGSIZE:
>>
>>  Message too long
>
> I don't really like that one either.
>
> I actually prefer "0" of the all of the options discussed. At least it
> is immediately clear that it is not a syscall error.

I have a basic aversion to ": Success" in error messages.  Whenever I
see such an error message, I stop trusting the program that produced it
not to be seriously buggy.  Maybe I'm the only one?

If no actual errno works, we could make a custom strerror-like function
with our own custom errors (making them negative to avoid clashing with
standard errno values), but this feels like overkill.

In the same spirit of misleadingly confusing too-long and too-short,
there is also ERANGE ("Result too large"), which doesn't work here.
There's also EPROTO "Protocol error", but that's about protocols, not
file formats.  More vague and therefor maybe more applicable is EBADMSG
"Bad message".  There's also ENOMSG "No message of the desired type".

If the goal is to support debugging, an error like EPIPE "Broken pipe"
or ESPIPE "Invalid seek" would be likely to lead me in the right
direction (wrong file size), even though it is misleading about how
the error surfaced.

We could also avoid trying to be cute and use something generic like
EIO "Input/output error".

Jonathan


Re: [PATCH 3/7] read_in_full: reset errno before reading

2017-09-25 Thread Jonathan Nieder
On Mon, Sep 25, 2017 at 08:09:13PM -0400, Jeff King wrote:
> On Mon, Sep 25, 2017 at 05:06:58PM -0700, Stefan Beller wrote:

>> After reading this discussion from the sideline, maybe
>>
>>   ENODATA No message is available on the STREAM head read
>> queue (POSIX.1)
>>
>> is not too bad after all. Or
>>
>>   ESPIPE  Invalid seek (POSIX.1)
>>
>> Technically we do not seek, but one could imagine we'd seek there
>> to read?
>
> ENODATA is not too bad. On my glibc system it yields "No data available"
> from strerror(), which is at least comprehensible.
>
> We're still left with the question of whether it is defined everywhere
> (and what to fallback to when it isn't).

So,

#ifndef EUNDERFLOW
#ifdef ENODATA
#define ENODATA EUNDERFLOW
#else
#define ENODATA ESPIPE
#endif
#endif

?  Windows has ESPIPE, and I'm not sure what other non-POSIX platform we
need to worry about.

Thanks,
Jonathan


Re: [PATCH 3/7] read_in_full: reset errno before reading

2017-09-25 Thread Jonathan Nieder
Jonathan Nieder wrote:
> On Mon, Sep 25, 2017 at 08:09:13PM -0400, Jeff King wrote:

>> ENODATA is not too bad. On my glibc system it yields "No data available"
>> from strerror(), which is at least comprehensible.
>>
>> We're still left with the question of whether it is defined everywhere
>> (and what to fallback to when it isn't).
>
> So,
>
> #ifndef EUNDERFLOW
> #ifdef ENODATA
> #define ENODATA EUNDERFLOW
> #else
> #define ENODATA ESPIPE
> #endif
> #endif
>
> ?

Uh, pretend I said

#ifndef EUNDERFLOW
# ifdef ENODATA
#  define EUNDERFLOW ENODATA
# else
#  define EUNDERFLOW ESPIPE
# endif
#endif

Sorry for the nonsense.
Jonathan


Re: [PATCH 3/7] read_in_full: reset errno before reading

2017-09-25 Thread Jonathan Nieder
Jeff King wrote:

> I definitely would prefer to avoid EIO, or anything that an actual
> read() might return.
>
> What do you think of ENODATA? The glibc text for it is pretty
> appropriate. If it's not available everywhere, we'd have to fallback to
> something else (EIO? 0?). I don't know how esoteric it is. The likely
> candidate to be lacking it is Windows.

ENODATA with a fallback to ESPIPE sounds fine to me.

read() would never produce ESPIPE because it doesn't seek.

So that would be:

/* errno value to use for early EOF */
#ifndef ENODATA
#define ENODATA ESPIPE
#endif

Thanks,
Jonathan


Re: [PATCH 3/7] read_in_full: reset errno before reading

2017-09-25 Thread Jonathan Nieder
Jeff King wrote:
> On Mon, Sep 25, 2017 at 05:17:32PM -0700, Jonathan Nieder wrote:

>> #ifndef EUNDERFLOW
>> # ifdef ENODATA
>> #  define EUNDERFLOW ENODATA
>> # else
>> #  define EUNDERFLOW ESPIPE
>> # endif
>> #endif
>
> Right, I think our mails just crossed but I'm leaning in this direction.
> I'd prefer to call it SHORT_READ_ERRNO or something, though. Your
> "#ifndef EUNDERFLOW" had me thinking that this was something that a real
> platform might have, but AFAICT you just made it up.

Agreed.  Two of the risks of replying too quickly. :)

Jonathan


Re: [PATCH 3/7] read_in_full: reset errno before reading

2017-09-25 Thread Jonathan Nieder
Jeff King wrote:

> What do you think of patch 7 in light of this? If the short-read case
> gives us a sane errno, should we even bother trying to consistently
> handle its error separately?

I like the read_exactly_or_die variant because it makes callers more
concise, but on the other hand a caller handling the error can write a
more meaningful error message with the right amount of context.

So I think you're right that it's better to drop patch 7.

Thanks,
Jonathan


Re: [PATCH] technical doc: add a design doc for hash function transition

2017-09-26 Thread Jonathan Nieder
Hi,

Stefan Beller wrote:

> From: Jonathan Nieder 

I go by jrnie...@gmail.com upstream. :)

> This is "RFC v3: Another proposed hash function transition plan" from
> the git mailing list.
>
> Signed-off-by: Jonathan Nieder 
> Signed-off-by: Jonathan Tan 
> Signed-off-by: Brandon Williams 
> Signed-off-by: Stefan Beller 

I hadn't signed-off on this version, but it's not a big deal.

[...]
> ---
>
>  This takes the original Google Doc[1] and adds it to our history,
>  such that the discussion can be on on list and in the commit messages.
>
>  * replaced SHA3-256 with NEWHASH, sha3 with newhash
>  * added section 'Implementation plan'
>  * added section 'Future work'
>  * added section 'Agreed-upon criteria for selecting NewHash'

Thanks for sending this out.  I had let it stall too long.

As a tiny nit, I think NewHash is easier to read than NEWHASH.  Not a
big deal.  More importantly, we need some text describing it and
saying it's a placeholder.

The implementation plan included here is out of date.  It comes from
an email where I was answering a question about what people can do to
make progress, before this design had been agreed on.  In the context
of this design there are other steps we'd want to describe (having to
do with implementing the translation table, etc).

I also planned to add a description of the translation table based on
what was discussed previously in this thread.

Jonathan


Re: RFC v3: Another proposed hash function transition plan

2017-09-26 Thread Jonathan Nieder
Hi,

Johannes Schindelin wrote:

> Sorry, you are asking cryptography experts to spend their time on the Git
> mailing list. I tried to get them to speak out on the Git mailing list.
> They respectfully declined.
>
> I can't fault them, they have real jobs to do, and none of their managers
> would be happy for them to educate the Git mailing list on matters of
> cryptography, not after what happened in 2005.

Fortunately we have had a few public comments from crypto specialists:

https://public-inbox.org/git/91a34c5b-7844-3db2-cf29-411df5bcf...@noekeon.org/
https://public-inbox.org/git/CAL9PXLzhPyE+geUdcLmd=pidt5p8efebbsgx_ds88knz2q_...@mail.gmail.com/
https://public-inbox.org/git/CAL9PXLxMHG1nP5_GQaK_WSJTNKs=_qbaL6V5v2GzVG=9vu2...@mail.gmail.com/
https://public-inbox.org/git/59bfb95d.1030...@st.com/
https://public-inbox.org/git/59c149a3.6080...@st.com/

[...]
> Let's be realistic. Git is pretty important to us, but it is not important
> enough to sway, say, Intel into announcing hardware support for SHA3.

Yes, I agree with this.  (Adoption by Git could lead to adoption by
some other projects, leading to more work on high quality software
implementations in projects like OpenSSL, but I am not convinced that
that would be a good thing for the world anyway.  There are downsides
to a proliferation of too many crypto primitives.  This is the basic
argument described in more detail at [1].)

[...]
> On Tue, 26 Sep 2017, Jason Cooper wrote:

>> For my use cases, as a user of git, I have a plan to maintain provable
>> integrity of existing objects stored in git under sha1 while migrating
>> away from sha1.  The same plan works for migrating away from SHA2 or
>> SHA3 when the time comes.
>
> Please do not make the mistake of taking your use case to be a template
> for everybody's use case.

That said, I'm curious at what plan you are alluding to.  Is it
something that could benefit others on the list?

Thanks,
Jonathan

[1] https://www.imperialviolet.org/2017/05/31/skipsha3.html


Re: Updated to v2.14.2 on macOS; git add --patch broken

2017-09-27 Thread Jonathan Nieder
Hi,

Jeff King wrote:
> On Wed, Sep 27, 2017 at 07:28:49PM +0200, Toni Uebernickel wrote:

>> The previous version was v2.13.2.
>> I switched back to this version, and it works perfectly fine; without any 
>> changes to my system.
>
> Thanks for confirming.
>
> There aren't a lot of changes to the script between v2.13.2 and v2.14.2.
> The most plausible culprit is d5addcf522 (add--interactive: handle EOF
> in prompt_yesno, 2017-06-21), but I'm scratching my head over how that
> could cause what you're seeing.
>
> Are you able to build Git from source and bisect the problem? It would
> help to know which commit introduced the problem.

How about this change?

 commit 136c8c8b8fa39f1315713248473dececf20f8fe7
 Author: Jeff King 
 Date:   Thu Jul 13 11:07:03 2017 -0400

 color: check color.ui in git_default_config()

Toni, what is the output of "git config -l"?

Thanks,
Jonathan


Re: Updated to v2.14.2 on macOS; git add --patch broken

2017-09-27 Thread Jonathan Nieder
Jonathan Nieder wrote:
> Jeff King wrote:

>> There aren't a lot of changes to the script between v2.13.2 and v2.14.2.
>> The most plausible culprit is d5addcf522 (add--interactive: handle EOF
>> in prompt_yesno, 2017-06-21), but I'm scratching my head over how that
>> could cause what you're seeing.
>>
>> Are you able to build Git from source and bisect the problem? It would
>> help to know which commit introduced the problem.
>
> How about this change?
>
>  commit 136c8c8b8fa39f1315713248473dececf20f8fe7
>  Author: Jeff King 
>  Date:   Thu Jul 13 11:07:03 2017 -0400
>
>  color: check color.ui in git_default_config()

Uh, I think I was thinking of another thread when I wrote this.  Sorry
for the nonsense.

> Toni, what is the output of "git config -l"?

I'm still curious about this.

Thanks,
Jonathan


Re: [PATCH] diff: correct newline in summary for renamed files

2017-09-27 Thread Jonathan Nieder
Stefan Beller wrote:

> In 146fdb0dfe (diff.c: emit_diff_symbol learns about DIFF_SYMBOL_SUMMARY,
> 2017-06-29), the conversion from direct printing to the symbol emission
> dropped the new line character for renamed, copied and rewritten files.
>
> Add the emission of a newline, add a test for this case.
>
> Reported-by: Linus Torvalds 
> Helped-by: Jeff King 
> Signed-off-by: Stefan Beller 
> ---
>  diff.c|  1 +
>  t/t4013-diff-various.sh   | 12 
>  t/t4013/diff.diff-tree_--stat_initial_mode|  4 
>  t/t4013/diff.diff-tree_--summary_initial_mode |  3 +++
>  t/t4013/diff.diff-tree_initial_mode   |  3 +++
>  t/t4013/diff.log_--decorate=full_--all|  6 ++
>  t/t4013/diff.log_--decorate_--all |  6 ++
>  7 files changed, 35 insertions(+)
>  create mode 100644 t/t4013/diff.diff-tree_--stat_initial_mode
>  create mode 100644 t/t4013/diff.diff-tree_--summary_initial_mode
>  create mode 100644 t/t4013/diff.diff-tree_initial_mode

Reviewed-by: Jonathan Nieder 

Thanks.


[PATCH v4] technical doc: add a design doc for hash function transition

2017-09-27 Thread Jonathan Nieder
This document describes what a transition to a new hash function for
Git would look like.  Add it to Documentation/technical/ as the plan
of record so that future changes can be recorded as patches.

Also-by: Brandon Williams 
Also-by: Jonathan Tan 
Also-by: Stefan Beller 
Signed-off-by: Jonathan Nieder 
---
On Thu, Mar 09, 2017 at 11:14 AM, Shawn Pearce wrote:
> On Mon, Mar 6, 2017 at 4:17 PM, Jonathan Nieder  wrote:

>> Thanks for the kind words on what had quite a few flaws still.  Here's
>> a new draft.  I think the next version will be a patch against
>> Documentation/technical/.
>
> FWIW, I like this approach.

Okay, here goes.

Instead of sharding the loose object translation tables by first byte,
we went for a single table.  It simplifies the design and we need to
keep the number of loose objects under control anyway.

We also included a description of the transition plan and tried to
include a summary of what has been agreed upon so far about the choice
of hash function.

Thanks to Junio for reviving the discussion and in particular to Dscho
for pushing this forward and making the missing pieces clearer.

Thoughts of all kinds welcome, as always.

 Documentation/Makefile |   1 +
 .../technical/hash-function-transition.txt | 797 +
 2 files changed, 798 insertions(+)
 create mode 100644 Documentation/technical/hash-function-transition.txt

diff --git a/Documentation/Makefile b/Documentation/Makefile
index 2415e0d657..471bb29725 100644
--- a/Documentation/Makefile
+++ b/Documentation/Makefile
@@ -67,6 +67,7 @@ SP_ARTICLES += howto/maintain-git
 API_DOCS = $(patsubst %.txt,%,$(filter-out technical/api-index-skel.txt 
technical/api-index.txt, $(wildcard technical/api-*.txt)))
 SP_ARTICLES += $(API_DOCS)
 
+TECH_DOCS += technical/hash-function-transition
 TECH_DOCS += technical/http-protocol
 TECH_DOCS += technical/index-format
 TECH_DOCS += technical/pack-format
diff --git a/Documentation/technical/hash-function-transition.txt 
b/Documentation/technical/hash-function-transition.txt
new file mode 100644
index 00..417ba491d0
--- /dev/null
+++ b/Documentation/technical/hash-function-transition.txt
@@ -0,0 +1,797 @@
+Git hash function transition
+
+
+Objective
+-
+Migrate Git from SHA-1 to a stronger hash function.
+
+Background
+--
+At its core, the Git version control system is a content addressable
+filesystem. It uses the SHA-1 hash function to name content. For
+example, files, directories, and revisions are referred to by hash
+values unlike in other traditional version control systems where files
+or versions are referred to via sequential numbers. The use of a hash
+function to address its content delivers a few advantages:
+
+* Integrity checking is easy. Bit flips, for example, are easily
+  detected, as the hash of corrupted content does not match its name.
+* Lookup of objects is fast.
+
+Using a cryptographically secure hash function brings additional
+advantages:
+
+* Object names can be signed and third parties can trust the hash to
+  address the signed object and all objects it references.
+* Communication using Git protocol and out of band communication
+  methods have a short reliable string that can be used to reliably
+  address stored content.
+
+Over time some flaws in SHA-1 have been discovered by security
+researchers. https://shattered.io demonstrated a practical SHA-1 hash
+collision. As a result, SHA-1 cannot be considered cryptographically
+secure any more. This impacts the communication of hash values because
+we cannot trust that a given hash value represents the known good
+version of content that the speaker intended.
+
+SHA-1 still possesses the other properties such as fast object lookup
+and safe error checking, but other hash functions are equally suitable
+that are believed to be cryptographically secure.
+
+Goals
+-
+Where NewHash is a strong 256-bit hash function to replace SHA-1 (see
+"Selection of a New Hash", below):
+
+1. The transition to NewHash can be done one local repository at a time.
+   a. Requiring no action by any other party.
+   b. A NewHash repository can communicate with SHA-1 Git servers
+  (push/fetch).
+   c. Users can use SHA-1 and NewHash identifiers for objects
+  interchangeably (see "Object names on the command line", below).
+   d. New signed objects make use of a stronger hash function than
+  SHA-1 for their security guarantees.
+2. Allow a complete transition away from SHA-1.
+   a. Local metadata for SHA-1 compatibility can be removed from a
+  repository if compatibility with SHA-1 is no longer needed.
+3. Maintainability throughout the process.
+   a. The object format is kept simple and consistent.
+   b. Creation of a generalized repository conversion tool.
+
+Non-Goals
+-
+1. Add NewHash support to Git protocol. This is valuable and the
+   logic

Re: [PATCH] doc: correct command formatting

2017-09-28 Thread Jonathan Nieder
Adam Dinwoodie wrote:

> Leaving spaces around the `-delimeters for commands means asciidoc fails
> to parse them as the start of a literal string.  Remove an extraneous
> space that is causing a literal to not be formatted as such.
>
> Signed-off-by: Adam Dinwoodie 
> ---
>  Documentation/git.txt | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Good catch.

Reviewed-by: Jonathan Nieder 


Re: [PATCH] doc: correct command formatting

2017-09-28 Thread Jonathan Nieder
Hi,

Andreas Heiduk wrote:

> +1, Thanks for spotting.

Thanks for looking it over.  Can we add your Reviewed-by?  (See [1]
for what this means.)

> I did a quick
>
>   grep -r " ` "
>
> which came up with with another relevant place:
>
> Documentation/git-format-patch.txt:   `--subject-prefix` option) has ` v` 
> appended to it.  E.g.
>
> But here the space IS relevant but asciidoc does not pick up
> the formatting. Perhaps that one could read like this:
>
>   `--subject-prefix` option) has `v` appended to it.  E.g.

Interesting.  This comes from

  commit 4aad08e061df699b49e24c4d34698d734473fb66
  Author: Junio C Hamano 
  Date:   Wed Jan 2 14:16:07 2013 -0800

  format-patch: document and test --reroll-count

In some output formats, the text with backticks surrounding it is
shown in a different background color, which makes something like
`{space}v` tempting (with appropriate definition of {space} in
the attributes section of asciidoc.conf).  But that feels way too
subtle.

How about something like

has a space and `v` appended to it

?

Thanks,
Jonathan

[1] 
https://kernel.googlesource.com/pub/scm/linux/kernel/git/torvalds/linux/+/9cd6681cb1169e815c41af0265165dd1b872f228/Documentation/process/submitting-patches.rst#563


Re: [PATCH] doc: correct command formatting

2017-09-28 Thread Jonathan Nieder
Junio C Hamano wrote:
> Jonathan Nieder  writes:
>> Andreas Heiduk wrote:

>>> +1, Thanks for spotting.
>>
>> Thanks for looking it over.  Can we add your Reviewed-by?  (See [1]
>> for what this means.)
>
> I would just do "Acked-by: Andreas" after seeing such an obvious
> admission of guilt & appreciation for fixing in the exchange.

Oh!  I had missed that context.

Your instinct would have been right (and is born out by Andreas's
reply to me).

I was just fishing, but in this context there was no reason to fish
for more than the Ack that was already there.

> Would we rather want to make it more formal like how Linux folks do
> the Reviewed-by: thing?

Separate from this example: yes, I think adopting Linux's Reviewed-by
convention would be a good thing.  When I see a positive reply to a
patch, I often wonder whether an ack or a fuller reviewed-by is
intended, and Linux's way of formalizing that appeals to me.

I'll try sending a patch to add it to SubmittingPatches tomorrow
morning (Stefan had also been hinting recently about this being
something worth trying).

Thank you,
Jonathan


Re: [PATCH] git-sh: Avoid sourcing scripts with git --exec-path

2017-09-28 Thread Jonathan Nieder
Hi,

Dridi Boukelmoune wrote:

> For end users making use of a custom exec path many commands will simply
> fail. Adding git's exec path to the PATH also allows overriding git-sh-*
> scripts, not just adding commands. One can then patch a script without
> tainting their system installation of git for example.
>
> Signed-off-by: Dridi Boukelmoune 

This has been broken for a while, but better late than never to
address it.

[...]
> --- a/Documentation/CodingGuidelines
> +++ b/Documentation/CodingGuidelines
> @@ -151,6 +151,28 @@ For shell scripts specifically (not exhaustive):
> interface translatable. See "Marking strings for translation" in
> po/README.
>  
> + - When sourcing a "git-sh-*" file using "git --exec-path" make sure
> +   to only use its base name.
> +
> + (incorrect)
> + . "$(git --exec-path)/git-sh-setup"
> +
> + (correct)
> + . git-sh-setup

I wonder if we can make this so intuitive that it doesn't need
mentioning in CodingGuidelines.  What if the test harness
t/test-lib.sh were to set a GIT_EXEC_PATH with multiple directories in
it?  That way, authors of new commands would not have to be careful to
look out for this issue proactively because the testsuite would catch
it.

[...]
> +++ b/contrib/convert-grafts-to-replace-refs.sh
> @@ -5,7 +5,8 @@
>  
>  GRAFTS_FILE="${GIT_DIR:-.git}/info/grafts"
>  
> -. $(git --exec-path)/git-sh-setup
> +PATH="$(git --exec-path):$PATH"
> +. git-sh-setup

I wish there were a simple way to do this that doesn't involve
polluting $PATH with the dashed commands.  E.g. we can do something
like

OIFS=$IFS
IFS=:
set -f
for d in $(git --exec-path)
do
if test -e "$d/git-sh-setup"
then
. "$d/git-sh-setup"
break
fi
done
set +f
IFS=$OIFS

but that is not very simple.  Something like

old_PATH=$PATH
PATH=$(git --exec-path):$PATH
. git-sh-setup
PATH=$old_PATH

looks like it could work, but it would undo the work git-sh-setup
does to set a sane $PATH on platforms like Solaris.

> --- a/contrib/rerere-train.sh
> +++ b/contrib/rerere-train.sh
> @@ -7,7 +7,8 @@ USAGE="$me rev-list-args"
>  
>  SUBDIRECTORY_OK=Yes
>  OPTIONS_SPEC=
> -. "$(git --exec-path)/git-sh-setup"
> +PATH="$(git --exec-path):$PATH"
> +. git-sh-setup

This makes me similarly unhappy about PATH pollution, but it may be
that there's nothing to be done about that.

[...]
> --- a/contrib/subtree/git-subtree.sh
> +++ b/contrib/subtree/git-subtree.sh
> @@ -32,7 +32,6 @@ squashmerge subtree changes as a single commit
>  "
>  eval "$(echo "$OPTS_SPEC" | git rev-parse --parseopt -- "$@" || echo exit 
> $?)"
>  
> -PATH=$PATH:$(git --exec-path)
>  . git-sh-setup

This looks like a change that could be separated into a separate
patch, both because it is to contrib/subtree (which is maintained
separately) and because it is not necessary for the goal described in
this patch's commit message.  I do like this change, since it improves
consistency with other commands.

> --- a/git-sh-setup.sh
> +++ b/git-sh-setup.sh
> @@ -44,7 +44,7 @@ git_broken_path_fix () {
>  # @@BROKEN_PATH_FIX@@
>  
>  # Source git-sh-i18n for gettext support.
> -. "$(git --exec-path)/git-sh-i18n"
> +. git-sh-i18n

Do git-mergetool--lib.txt, git-parse-remote.txt, git-sh-i18n.txt,
and git-sh-setup.txt in Documentation/ need the same treatment?

Summary: I like the goal of this patch but I am nervous about the
side-effect it introduces of PATH pollution.  Is there a way around
that?  If there isn't, then this needs a few tweaks and then it should
be ready to go.

Thanks and hope that helps,
Jonathan


Re: [PATCH] git-sh: Avoid sourcing scripts with git --exec-path

2017-09-28 Thread Jonathan Nieder
Junio C Hamano wrote:
> Jonathan Nieder  writes:

>> This has been broken for a while, but better late than never to
>> address it.
>
> I am not sure if this is broken in the first place.  We do want to
> make sure that the scripted porcelains will source the shell helper
> library from matching Git release.  The proposed patch goes directly
> against that and I do not see how it could be an improvement.

It used to be that git allowed setting a colon-separated list of paths
in GIT_EXEC_PATH.  (Very long ago, I relied on that feature.)  If we
can restore that functionality without too much cost, then I think
it's worthwhile.

But the cost in this patch for that is pretty high.  And

$ git log -S'$(git --exec-path)/'

tells me that colon-separated paths in GIT_EXEC_PATH did not work for
some use cases as far back as 2006.  Since 2008 the documentation has
encouraged using "git --exec-path" in a way that does not work with
colon-separated paths, too.  I hadn't realized it was so long.  Given
that it hasn't been supported for more than ten years, I was wrong to
read this as a bug report --- instead, it's a feature request.

In that context, this cost is likely not worth paying.

I wonder if there's another way to achieve this patch's goal.  E.g.
what if there were a way to specify some paths git could search for
custom commands, separate from "git --exec-path"?

Putting the custom commands on the $PATH seems nicer unless you're
trying to override the implementation of an existing git command.
And we already discourage overriding the implementation of an existing
git command (as it's open source, you can patch them instead), so...

Another possible motivation (the one that led me to use this mechnism
long ago) is avoiding providing the dashed form git-$cmd in $PATH.  I
think time has shown that having git-$cmd in $PATH is not too painful.

Thanks and hope that helps,
Jonathan


Re: [PATCH v4] technical doc: add a design doc for hash function transition

2017-09-29 Thread Jonathan Nieder
Junio C Hamano wrote:
> Jonathan Nieder  writes:

>> This document describes what a transition to a new hash function for
>> Git would look like.  Add it to Documentation/technical/ as the plan
>> of record so that future changes can be recorded as patches.
>>
>> Also-by: Brandon Williams 
>> Also-by: Jonathan Tan 
>> Also-by: Stefan Beller 
>> Signed-off-by: Jonathan Nieder 
>> ---
>
> Shoudln't these all be s-o-b: (with a note immediately before that
> to say all four contributed equally or something)?

I don't want to get lost in the weeds in the question of how to
represent such a collaborative effort in git's metadata.

You're right that I should collect their sign-offs!  Your approach of
using text instead of machine-readable data for common authorship also
seems okay.

In any event, this is indeed

Signed-off-by: Brandon Williams 
Signed-off-by: Jonathan Tan 
Signed-off-by: Stefan Beller 

(I just checked :)).

>> +Background
>> +--
>> +At its core, the Git version control system is a content addressable
>> +filesystem. It uses the SHA-1 hash function to name content. For
>> +example, files, directories, and revisions are referred to by hash
>> +values unlike in other traditional version control systems where files
>> +or versions are referred to via sequential numbers. The use of a hash
>
> Traditional systems refer to files via numbers???  Perhaps "where
> versions of files are referred to via sequential numbers" or
> something?

Good point.  The wording you suggested will work well.

>> +function to address its content delivers a few advantages:
>> +
>> +* Integrity checking is easy. Bit flips, for example, are easily
>> +  detected, as the hash of corrupted content does not match its name.
>> +* Lookup of objects is fast.
>
> * There is no ambiguity what the object's name should be, given its
>   content.
>
> * Deduping the same content copied across versions and paths is
>   automatic.

:)  Yep, these are nice too, especially that second one.

It also is how we make diff-ing fast.

>> +SHA-1 still possesses the other properties such as fast object lookup
>> +and safe error checking, but other hash functions are equally suitable
>> +that are believed to be cryptographically secure.
>
> s/secure/more &/, perhaps?

We were looking for a phrase meaning that it should be a cryptographic
hash function in good standing, which SHA-1 is at least approaching
not being.

"more secure" should work fine.  Let's go with that.

>> +Goals
>> +-
>> +...
>> +   c. Users can use SHA-1 and NewHash identifiers for objects
>> +  interchangeably (see "Object names on the command line", below).
>
> Mental note.  This needs to extend to the "index X..Y" lines in the
> patch output, which is used by "apply -3" and "am -3".

Will add a note about this to "Object names on the command line".  Stefan
had already pointed out that that section should really be renamed to
something like "Object names in input and output".

>> +2. Allow a complete transition away from SHA-1.
>> +   a. Local metadata for SHA-1 compatibility can be removed from a
>> +  repository if compatibility with SHA-1 is no longer needed.
>
> I like the emphasis on "Local" here.  Metadata for compatiblity that
> is embedded in the objects obviously cannot be removed.
>
> From that point of view, one of the goals ought to be "make sure
> that as much SHA-1 compatibility metadata as possible is local and
> outside the object".  This goal may not be able to say more than "as
> much as possible", as signed objects that came from SHA-1 world
> needs to carry the compatibility metadata somewhere somehow.
>
> Or perhaps we could.  There is nothing that says a signed tag
> created in the SHA-1 world must have the PGP/SHA-1 signature in the
> NewHash payload---it could be split off of the object data and
> stored in a local metadata cache, to be used only when we need to
> convert it back to the SHA-1 world.

That would break round-tripping and would mean that multiple SHA-1
objects could have the same NewHash name.  In other words, from
my point of view there is something that says that such data must
be preserved.

Another way to put it: even after removing all SHA-1 compatibility
metadata, one nice feature of this design is that it can be recovered
if I change my mind, from data in the NewHash based repository alone.

[...]
>> +Non-Goals
>> +-
>> ...
>> +6. Skip fetching some submodules of a project into a NewHash
>> +   repository. (This also depends on NewHash support in G

Re: [PATCH] clang-format: adjust line break penalties

2017-09-29 Thread Jonathan Nieder
Hi Dscho,

Johannes Schindelin wrote:

> Signed-off-by: Johannes Schindelin 
> ---
>  .clang-format | 12 ++--
>  1 file changed, 6 insertions(+), 6 deletions(-)

Well executed and well explained. Thank you.

Reviewed-by: Jonathan Nieder 

Going forward, is there an easy way to preview the effect of this kind
of change (e.g., to run "make style" on the entire codebase so as to be
able to compare the result with two different versions of
.clang-format)?

Thanks,
Jonathan


Re: [PATCH] clang-format: adjust line break penalties

2017-09-29 Thread Jonathan Nieder
Stephan Beyer wrote:
> On 09/29/2017 08:40 PM, Jonathan Nieder wrote:

>> Going forward, is there an easy way to preview the effect of this kind
>> of change (e.g., to run "make style" on the entire codebase so as to be
>> able to compare the result with two different versions of
>> .clang-format)?
>
> I just ran clang-format before and after the patch and pushed to github.
> The resulting diff is quite big:
>
> https://github.com/sbeyer/git/commit/3d1186c4cf4dd7e40b97453af5fc1170f6868ccd

Thanks.  The first change I see there is

 -char *strbuf_realpath(struct strbuf *resolved, const char *path, int 
die_on_error)
 +char *
 +strbuf_realpath(struct strbuf *resolved, const char *path, int die_on_error)

I understand why the line is broken, but the choice of line break is
wrong.  Seems like the penalty for putting return type on its own line
quite high enough.

My Reviewed-by still stands, though.  It gets "make style" to signal
long lines that should be broken, which is an improvement.

> PS: There should be a comment at the beginning of the .clang-format file
> that says what version it is tested with (on my machine it worked with
> 5.0 but not with 4.0) and there should also probably a remark that the
> clang-format-based style should only be understood as a hint or guidance
> and that most of the Git codebase does not conform it.

Sounds good to me.  Care to send it as a patch? :)

Thanks,
Jonathan


Re: What means "git config bla ~/"?

2017-10-02 Thread Jonathan Nieder
Hi,

rpj...@crashcourse.ca wrote:

> i'm sure i'm about to embarrass myself but, in "man git-config",
> OPTIONS, one reads:
>
>   --path
>
> git-config will expand leading ~ to the value of $HOME, and ~user
> to the   home directory for the specified user. This option has no
> effect when setting the value (but you can use git config bla ~/
> from the command line to let your shell do the expansion).
>
> what's with that "git config bla ~/"? is this some config keyword
> or something?

No need to be embarrased.  Here "bla" is a placeholder.  That is,
for example, I can run

git config --global include.path ~/.config/git/more-config

or

git config --global include.path $HOME/.config/git/more-config

to cause

[include]
path = /home/me/.config/git/more-config

to be added to my global configuration.  The expansion of ~ or $HOME
is performed by my shell, not Git.  For comparison, if I had run

git config --global include.path '~/.config/git/more-config'

then that would cause

[include]
path = ~/.config/git/more-config

to be added to my global configuration, but it would still have the
same effect at run time, since Git is also able to expand ~ to my home
directory.

The wording comes from

commit 1349484e341a3ec2ba02a86c8fbd97ea9dc8c756
Author: Matthieu Moy 
Date:   Wed Dec 30 17:51:53 2009 +0100

builtin-config: add --path option doing ~ and ~user expansion.

I agree with you that it is less clear than it could be.  Ideas for
clarifying it?

Thanks,
Jonathan


Re: git submodule add fails when using both --branch and --depth

2017-10-02 Thread Jonathan Nieder
Hi,

Thadeus Fleming wrote:

> I'm running git 2.14.2 on Ubuntu 16.04.
>
> Compare the behavior of
>
>> git clone --branch pu --depth 1 https://github.com/git/git git-pu
>
> which clones only the latest commit of the pu branch and

Yes.

>> mkdir tmp && cd tmp && git init
>> git submodule add --branch pu --depth 1 https://github.com/git/git \
>  git-pu
>
> which gives the error
>
> fatal: 'origin/pu' is not a commit and a branch 'pu' cannot be created from it
> Unable to checkout submodule 'git-pu'

As a side note (you are using "git submodule add --depth", not "git
submodule update --depth"), I suspect that "submodule update --depth"
may not always do what people expect.

With add --depth, I agree with your expectation and after your fix
everything should work fine.  But with update --depth, consider the
following sequence of steps:

 1. I create a repository "super" with submodule "sub" and publish
both.

 2. I make a couple commits to "sub" and a commit to "super" making
use of those changes and want to publish them.

 3. I use "git push --recurse-submodules" to publish my commits to
"sub" and "super":

a. First it pushes to "sub".

b. Then it pushes to "super".

Between steps 3(a) and 3(b), a person can still "git clone
--recurse-submodules" the "super" repository.  The repository "super"
does not have my change yet and "sub" does, but that is not a problem,
since commands like "git checkout --recurse-submodules" and "git
submodule update" know to check out the commit *before* my change in
the submodule.

But between steps 3(a) and 3(b), "git submodule update --depth=1"
would not work.  It would fetch the submodule with depth 1 and then
try to check out a commit that is deeper in history.

So I think there's more thinking needed there.

That's all a tangent to your report about add --depth, though.

Thanks,
Jonathan


Re: [PATCH v2] ref-filter.c: pass empty-string as NULL to atom parsers

2017-10-02 Thread Jonathan Nieder
Hi,

Taylor Blau wrote:

> Peff points out that different atom parsers handle the empty
> "sub-argument" list differently. An example of this is the format
> "%(refname:)".
>
> Since callers often use `string_list_split` (which splits the empty
> string with any delimiter as a 1-ary string_list containing the empty
> string), this makes handling empty sub-argument strings non-ergonomic.
>
> Let's fix this by assuming that atom parser implementations don't care
> about distinguishing between the empty string "%(refname:)" and no
> sub-arguments "%(refname)".
>
> Signed-off-by: Taylor Blau 
> ---
>  ref-filter.c| 10 +-
>  t/t6300-for-each-ref.sh |  1 +
>  2 files changed, 10 insertions(+), 1 deletion(-)

The above does a nice job of explaining

 - what this change is going to do
 - how it's good for the internal code structure / maintainability

What it doesn't tell me about is why the user-facing effect won't
cause problems.  Is there no atom where %(atom:) was previously
accepted and did something meaningful that this may break?

Looking at the manpage and code, I don't see any, so for what it's
worth, this is

Reviewed-by: Jonathan Nieder 

but for next time, please remember to discuss regression risk in
the commit message, too.

Thanks,
Jonathan


Re: hooks are ignored if there are not marked as executable

2017-10-02 Thread Jonathan Nieder
Hi Damien,

Damien wrote:

> If you do `echo my_script > .git/hooks/pre-commit` and then `git commit`,
> The hook is just gonna be ignored.
> But if you do `chmod +x .git/hooks/pre-commit`, then it's executed.

This is intentional.

> I think ignoring a hook is misleading and not newbie friendly, an error
> message to signal an incorrectly configured hook could be better.

Adding a warning sounds like a nice change.  Care to propose a patch?

In the early days, sample hooks were installed without .sample in
their filename and could be enabled by "chmod +x"-ing them.  Because
of this, long-time users of Git may find themselves experiencing this
warning more often than they'd like.  That could be acceptable if the
warning shows a command to run to prevent the warning from showing up
again, though (see advice.c for some examples of how that can be
done).

The main code path to look at is run-command.c::find_hook.

"git grep -e 'rev-parse --git-path hooks' -- . ':!contrib'" finds a
few other code paths you may also want to look at.

Thanks and hope that helps,
Jonathan


Re: Updated to v2.14.2 on macOS; git add --patch broken

2017-10-02 Thread Jonathan Nieder
Hi,

Toni Uebernickel wrote:

> I updated to git version v2.14.2 on macOS using homebrew.
>
> Since then `git add --patch` and `git stash save --patch` are not
> working anymore. It's just printing the complete diff without ever
> stopping to ask for actions. This results in an unusable state, as
> the whole command option is rendered useless.

Would a patch like the following help?

I am worried that other scripts using diff-files would need the same
kind of patch.  So it seems worthwhile to look for alternatives.

An alternative would be to partially roll back v2.14.2~61^2~4 (color:
check color.ui in git_default_config, 2017-07-13) by making it not
take effect for plumbing commands (i.e., by adding a boolean to
"struct startup_info" to indicate whether a command is low-level
plumbing).  That would make the behavior of Git harder to explain so I
don't particularly like it.  Plus it defeats the point of the patch.

Yet another alternative would be to treat color.ui=always as a
deprecated synonym for color.ui=auto.  I think that's my preferred
fix.

What do you think?

Thanks again for reporting,
Jonathan

diff --git i/git-add--interactive.perl w/git-add--interactive.perl
index 28b325d754..4ea69538c7 100755
--- i/git-add--interactive.perl
+++ w/git-add--interactive.perl
@@ -101,49 +101,49 @@ sub apply_patch_for_stash;
 
 my %patch_modes = (
'stage' => {
-   DIFF => 'diff-files -p',
+   DIFF => 'diff-files --no-color -p',
APPLY => sub { apply_patch 'apply --cached', @_; },
APPLY_CHECK => 'apply --cached',
FILTER => 'file-only',
IS_REVERSE => 0,
},
'stash' => {
-   DIFF => 'diff-index -p HEAD',
+   DIFF => 'diff-index --no-color -p HEAD',
APPLY => sub { apply_patch 'apply --cached', @_; },
APPLY_CHECK => 'apply --cached',
FILTER => undef,
IS_REVERSE => 0,
},
'reset_head' => {
-   DIFF => 'diff-index -p --cached',
+   DIFF => 'diff-index --no-color -p --cached',
APPLY => sub { apply_patch 'apply -R --cached', @_; },
APPLY_CHECK => 'apply -R --cached',
FILTER => 'index-only',
IS_REVERSE => 1,
},
'reset_nothead' => {
-   DIFF => 'diff-index -R -p --cached',
+   DIFF => 'diff-index --no-color -R -p --cached',
APPLY => sub { apply_patch 'apply --cached', @_; },
APPLY_CHECK => 'apply --cached',
FILTER => 'index-only',
IS_REVERSE => 0,
},
'checkout_index' => {
-   DIFF => 'diff-files -p',
+   DIFF => 'diff-files --no-color -p',
APPLY => sub { apply_patch 'apply -R', @_; },
APPLY_CHECK => 'apply -R',
FILTER => 'file-only',
IS_REVERSE => 1,
},
'checkout_head' => {
-   DIFF => 'diff-index -p',
+   DIFF => 'diff-index --no-color -p',
APPLY => sub { apply_patch_for_checkout_commit '-R', @_ },
APPLY_CHECK => 'apply -R',
FILTER => undef,
IS_REVERSE => 1,
},
'checkout_nothead' => {
-   DIFF => 'diff-index -R -p',
+   DIFF => 'diff-index --no-color -R -p',
APPLY => sub { apply_patch_for_checkout_commit '', @_ },
APPLY_CHECK => 'apply',
FILTER => undef,


Re: [PATCH] PR msg: capitalise "Git" to make it a proper noun

2017-10-02 Thread Jonathan Nieder
Hi,

Thanks for working to improve Git!

Bedhanger wrote:

> Subject: [PATCH] PR msg: capitalise "Git" to make it a proper noun

nit: What is a PR msg?  Looking with "git log git-request-pull.sh",
I see that previous patches called the subsystem request-pull, so
this could be

request-pull: capitalise "Git" to make it a proper noun

> Of the many ways to spell the three-letter word, the variant "Git"
> should be used when referring to a repository in a description; or, in
> general, when it is used as a proper noun.
>
> We thus change the pull-request template message so that it reads
>
>"...in the Git repository at:"
>
> Besides, this brings us in line with the documentation, see
> Documentation/howto/using-signed-tag-in-pull-request.txt
>
> Signed-off-by: bedhanger 

Please use your full name in the Signed-off-by line.  See
Documentation/SubmittingPatches section "(5) Certify your work" for
why we ask for this.

> ---
>  git-request-pull.sh | 2 +-
>  t/t5150-request-pull.sh | 4 ++--
>  2 files changed, 3 insertions(+), 3 deletions(-)

The patch itself looks good, and I like the change it makes to the
email generated by git request-pull.  Looking forward to seeing what
other changes you come up with in the future.

Thanks and hope that helps,
Jonathan


Security of .git/config and .git/hooks

2017-10-02 Thread Jonathan Nieder
Hi,

This topic has been mentioned on this mailing list before but I had
trouble finding a relevant reference.  Links welcome.

Suppose that I add the following to .git/config in a repository on a
shared computer:

[pager]
log = rm -fr /
fsck = rm -fr /

("rm -fr /" is of course a placeholder here.)

I then tell a sysadmin that git commands are producing strange output
and I am having trouble understanding what is going on.  They may run
"git fsck" or "git log"; in either case, the output is passed to the
pager I configured, allowing me to run an arbitrary command using the
sysadmin's credentials.

You might say that this is the sysadmin's fault, that they should have
read through .git/config before running any Git commands.  But I don't
find it so easy to blame them.

A few related cases that might not seem so dated:

 1. I put my repository in a zip file and ask a Git expert to help me
recover data from it, or

 2. My repository is in a shared directory on NFS.  Unless the
administrator setting that up is very careful, it is likely that
the least privileged user with write access to .git/config or
.git/hooks/ may be someone that I don't want to be able to run
arbitrary commands on behalf of the most privileged user working
in that repository.

A similar case to compare to is Linux's "perf" tool, which used to
respect a .perfconfig file from the current working directory.
Fortunately, nowadays "perf" only respects ~/.perfconfig and
/etc/perfconfig.

Proposed fix: because of case (1), I would like a way to tell Git to
stop trusting any files in .git.  That is:

 1. Introduce a (configurable) list of "safe" configuration items that
can be set in .git/config and don't respect any others.

 2. But what if I want to set a different pager per-repository?
I think we could do this using configuration "profiles".
My ~/.config/git/profiles/ directory would contain git-style
config files for repositories to include.  Repositories could
then contain

[include]
path = ~/.config/git/profiles/fancy-log-pager

to make use of those settings.  The facility (1) would
special-case this directory to allow it to set "unsafe" settings
since files there are assumed not to be under the control of an
attacker.

 3. Likewise for hooks: my ~/.config/git/hooks/ directory would
contain hooks for repositories to make use of.  Repositories could
symlink to hook files from there to make use of them.

That would allow the configured hooks in ~/.config/git/hooks/ to
be easy to find and to upgrade in one place.

To help users migrate, when a hook is present and executable in
.git/hooks/, Git would print instructions for moving it to
~/.config/git/hooks/ and replacing it with a symlink after
inspecting it.

For backward compatibility, this facility would be controlled by a
global configuration setting.  If that setting is not enabled, then
the current, less safe behavior would remain.

One downside of (3) is its reliance on symlinks.  Some alternatives:

 3b. Use core.hooksPath configuration instead.  Rely on (2).
 3c. Introduce new hook.* configuration to be used instead of hook
 scripts.  Rely on (2).

Thoughts?
Jonathan


Re: [PATCH v2] request-pull: capitalise "Git" to make it a proper noun

2017-10-02 Thread Jonathan Nieder
Ann T Ropea wrote:

> Of the many ways to spell the three-letter word, the variant "Git"
> should be used when referring to a repository in a description; or, in
> general, when it is used as a proper noun.
>
> We thus change the pull-request template message so that it reads
>
>"...in the Git repository at:"
>
> Besides, this brings us in line with the documentation, see
> Documentation/howto/using-signed-tag-in-pull-request.txt
>
> Signed-off-by: Ann T Ropea 
> Acked-by: Jonathan Nieder 
> ---
> v2: rename patch, correct Signed-off-by line, add Acked-by line
>  git-request-pull.sh | 2 +-
>  t/t5150-request-pull.sh | 4 ++--
>  2 files changed, 3 insertions(+), 3 deletions(-)

Reviewed-by: Jonathan Nieder 

Thanks for the quick turnaround.

Sincerely,
Jonathan


Re: [PATCH v3 10/10] ssh: introduce a 'simple' ssh variant

2017-10-03 Thread Jonathan Nieder
Hi,

Brandon Williams wrote:

> When using the 'ssh' transport, the '-o' option is used to specify an
> environment variable which should be set on the remote end.  This allows
> git to send additional information when contacting the server,
> requesting the use of a different protocol version via the
> 'GIT_PROTOCOL' environment variable like so: "-o SendEnv=GIT_PROTOCOL"
>
> Unfortunately not all ssh variants support the sending of environment
> variables to the remote end.  To account for this, only use the '-o'
> option for ssh variants which are OpenSSH compliant.  This is done by
> checking that the basename of the ssh command is 'ssh' or the ssh
> variant is overridden to be 'ssh' (via the ssh.variant config).

This also affects -p (port), right?

What happens if I specify a ssh://host:port/path URL and the SSH
implementation is of 'simple' type?

> Previously if an ssh command's basename wasn't 'plink' or

Git's commit messages use the present tense to describe the current
state of the code, so this is "Currently". :)

> 'tortoiseplink' git assumed that the command was an OpenSSH variant.
> Since user configured ssh commands may not be OpenSSH compliant, tighten
> this constraint and assume a variant of 'simple' if the basename of the
> command doesn't match the variants known to git.  The new ssh variant
> 'simple' will only have the host and command to execute ([username@]host
> command) passed as parameters to the ssh command.
>
> Update the Documentation to better reflect the command-line options sent
> to ssh commands based on their variant.
>
> Reported-by: Jeffrey Yasskin 
> Signed-off-by: Brandon Williams 

Thanks for working on this.

For background, the GIT_SSH implementation that motivated this is
https://github.com/travis-ci/dpl/blob/6c3fddfda1f2a85944c56b068bac0a77c049/lib/dpl/provider.rb#L215,
which does not support -p or -4/-6, either.

> ---
>  Documentation/config.txt |  27 ++--
>  Documentation/git.txt|   9 ++--
>  connect.c| 107 
> ++-
>  t/t5601-clone.sh |   9 ++--
>  t/t5700-protocol-v1.sh   |   2 +
>  5 files changed, 95 insertions(+), 59 deletions(-)
[...]
> --- a/connect.c
> +++ b/connect.c
> @@ -776,37 +776,44 @@ static const char *get_ssh_command(void)
[...]
> +static enum ssh_variant determine_ssh_variant(const char *ssh_command,
> +   int is_cmdline)
[...]
> - if (!strcasecmp(variant, "plink") ||
> - !strcasecmp(variant, "plink.exe"))
> - *port_option = 'P';
> + if (!strcasecmp(variant, "ssh"))
> + ssh_variant = VARIANT_SSH;

Could this handle ssh.exe, too?

[...]
> --- a/t/t5601-clone.sh
> +++ b/t/t5601-clone.sh

Can this get tests for the new defaulting behavior?  E.g.

 - default is "simple"
 - how "simple" treats an ssh://host:port/path URL
 - how "simple" treats ipv4/ipv6 switching
 - ssh defaults to "ssh"
 - if GIT_SSH=ssh, can set ssh.variant to "simple" to force the "simple"
   mode

One other worry: this (intentionally) changes the behavior of a
previously-working GIT_SSH=ssh-wrapper that wants to support
OpenSSH-style options but does not declare ssh.variant=ssh.  When
discovering this change, what should the author of such an ssh-wrapper
do?

They could instruct their users to set ssh.variant or GIT_SSH_VARIANT
to "ssh", but then they are at the mercy of future additional options
supported by OpenSSH we may want to start using in the future (e.g.,
maybe we will start passing "--" to separate options from the
hostname).  So this is not a futureproof option for them.

They could take the new default behavior or instruct their users to
set ssh.variant or GIT_SSH_VARIANT to "simple", but then they lose
support for handling alternate ports, ipv4/ipv6, and specifying -o
SendEnv to propagate GIT_PROTOCOL or other envvars.  They can handle
GIT_PROTOCOL propagation manually, but losing port support seems like
a heavy cost.

They could send a patch to define yet another variant that is
forward-compatible, for example using an interface similar to what
git-credential(1) uses.  Then they can set GIT_SSH to their
OpenSSH-style helper and GIT_FANCY_NEW_SSH to their more modern
helper, so that old Git versions could use GIT_SSH and new Git
versions could use GIT_FANCY_NEW_SSH.  This might be their best
option.  It feels odd to say that their only good way forward is to
send a patch, but on the other hand someone with such an itch is
likely to be in the best position to define an appropriate interface.

They could send a documentation patch to make more promises about the
commandline used in OpenSSH mode: e.g. setting a rule in advance about
which options can take an argument so that they can properly parse an
OpenSSH command line in a future-compatible way.

Or they could send a patch to allow passing the port in "simple"
mode, for example using an environment variable.

Am I missing another option?  What advice d

Re: [PATCH] branch: reset instead of release a strbuf

2017-10-03 Thread Jonathan Nieder
Hi,

Stefan Beller wrote:

> Our documentation advises to not re-use a strbuf, after strbuf_release
> has been called on it. Use the proper reset instead.

I'm super surprised at this documentation.  strbuf_release maintains
the invariant that a strbuf is always usable (i.e., that we do not have
to fear use-after-free problems).

On second thought, though, strbuf_release is a more expensive operation
than strbuf_reset --- constantly free()-ing and re-malloc()ing is
unnecessary churn in malloc data structures.  So maybe that is the
motivation here?

> Signed-off-by: Stefan Beller 
> ---
>
> Maybe one of the #leftoverbits is to remove the re-init call in release
> and see what breaks? (And then fixing up more of such cases as presented
> in this patch)

As mentioned above: please no.  That would be problematic both for
ease of development and for the risk of security bugs.

>  builtin/branch.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/builtin/branch.c b/builtin/branch.c
> index b998e16d0c..9758012390 100644
> --- a/builtin/branch.c
> +++ b/builtin/branch.c
> @@ -217,7 +217,7 @@ static int delete_branches(int argc, const char **argv, 
> int force, int kinds,
>   if (!head_rev)
>   die(_("Couldn't look up commit object for HEAD"));
>   }
> - for (i = 0; i < argc; i++, strbuf_release(&bname)) {
> + for (i = 0; i < argc; i++, strbuf_reset(&bname)) {
>   char *target = NULL;
>   int flags = 0;

Should there be a strbuf_release (or UNLEAK if you are very very sure)
call at the end of the function to replace this?

With that change (but not without it),
Reviewed-by: Jonathan Nieder 

Thanks.


Re: [bug] git version 2.4.12 color.ui = always breaks git add -p

2017-10-03 Thread Jonathan Nieder
Hi Christian,

Christian Rebischke wrote:

> I played around with my gitconfig and I think I found a bug while doing
> so. I set the following lines in my gitconfig:
>
> [color]
> ui = always
>
> When I try to use `git add -p ` I don't see the 'Stage
> this hunk'-dialog anymore. First I thought it's some other configuration
> but now I can definitly say that this configuration breaks `git add -p`
> interactive mode.

Do the patches at
https://public-inbox.org/git/20171003133713.ccxv6clrmuuhh...@sigill.intra.peff.net/
help?

Thanks and hope that helps,
Jonathan


Re: [PATCH] branch: reset instead of release a strbuf

2017-10-03 Thread Jonathan Nieder
Hi,

Stefan Beller wrote:

> Our documentation advises to not re-use a strbuf, after strbuf_release
> has been called on it. Use the proper reset instead.
>
> Reviewed-by: Jonathan Nieder 

This is indeed
Reviewed-by: Jonathan Nieder 

Thank you.

> Signed-off-by: Stefan Beller 
> ---
>  builtin/branch.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)

Here's a patch to address the surprising strbuf.h advice.

-- >8 --
Subject: strbuf: do not encourage init-after-release

strbuf_release already leaves the strbuf in a valid, initialized
state, so there is not need to call strbuf_init after it.

Moreover, this is not likely to change in the future: strbuf_release
leaving the strbuf in a valid state has been easy to maintain and has
been very helpful for Git's robustness and simplicity (e.g.,
preventing use-after-free vulnerabilities).

It is still not advisable to call strbuf_release until done using a
strbuf because it is wasteful, so keep that part of the advice.

Reported-by: Stefan Beller 
Signed-off-by: Jonathan Nieder 
---
 strbuf.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/strbuf.h b/strbuf.h
index 7496cb8ec5..6e175c3694 100644
--- a/strbuf.h
+++ b/strbuf.h
@@ -83,7 +83,7 @@ extern void strbuf_init(struct strbuf *, size_t);
 
 /**
  * Release a string buffer and the memory it used. You should not use the
- * string buffer after using this function, unless you initialize it again.
+ * string buffer after using this function.
  */
 extern void strbuf_release(struct strbuf *);
 
-- 
2.14.2.920.gcf0c67979c



Re: [PATCH 1/3] path.c: fix uninitialized memory access

2017-10-03 Thread Jonathan Nieder
Hi,

Thomas Gummerer wrote:

> In cleanup_path we're passing in a char array, run a memcmp on it, and
> run through it without ever checking if something is in the array in the
> first place.  This can lead us to access uninitialized memory, for
> example in t5541-http-push-smart.sh test 7, when run under valgrind:
[...]
> Avoid this by checking passing in the length of the string in the char
> array, and checking that we never run over it.
>
> Signed-off-by: Thomas Gummerer 
> ---
>  path.c | 19 ++-
>  1 file changed, 10 insertions(+), 9 deletions(-)

When I first read the above, I thought it was going to be about a
NUL-terminated string that was missing a NUL.  But in fact, the issue
is that strlen(path) can be < 2.

In other words, an alternative fix would be

if (*path == '.' && path[1] == '/') {
...
}

which would not require passing in 'len' or switching to index-based
arithmetic.  I think I prefer it.  What do you think?

Thanks and hope that helps,
Jonathan

diff --git i/path.c w/path.c
index b533ec938d..3a1fbee1e0 100644
--- i/path.c
+++ w/path.c
@@ -37,7 +37,7 @@ static struct strbuf *get_pathname(void)
 static char *cleanup_path(char *path)
 {
/* Clean it up */
-   if (!memcmp(path, "./", 2)) {
+   if (*path == '.' && path[1] == '/') {
path += 2;
while (*path == '/')
path++;


Re: [PATCH 2/3] http-push: fix construction of hex value from path

2017-10-03 Thread Jonathan Nieder
Hi,

Thomas Gummerer wrote:

> The get_oid_hex_from_objpath takes care of creating a oid from a
> pathname.  It does this by memcpy'ing the first two bytes of the path to
> the "hex" string, then skipping the '/', and then copying the rest of the
> path to the "hex" string.  Currently it fails to increase the pointer to
> the hex string, so the second memcpy invocation just mashes over what
> was copied in the first one, and leaves the last two bytes in the string
> uninitialized.

Wow.  The fix is obviously correct.

> This breaks valgrind in t5540, although the test passes without
> valgrind:
[...]
> Signed-off-by: Thomas Gummerer 
> ---
>  http-push.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Would it be straightforward to add a correctness test for this?  It
seems like this code path didn't work at all and no one noticed.

This is the code path in http-push.c which says

 /*
  * NEEDSWORK: remote_ls() ignores info/refs on the remote side.  But it
  * should _only_ heed the information from that file, instead of trying to
  * determine the refs from the remote file system (badly: it does not even
  * know about packed-refs).
  */
 static void remote_ls(const char *path, int flags,

I think the problem is that when it fails, we end up thinking that
there are *fewer* objects than are actually present remotely so the
only ill effect is pushing too much.  So this should be observable in
server logs (i.e. it is testable) but it's not a catastrophic failure
which means it's harder to test than it would be otherwise.

Moreover, this is in the webdav-based "dumb http" push code path,
which I do not trust much at all.  I wonder if we could retire it
completely (or at least provide an option to turn it off).

> diff --git a/http-push.c b/http-push.c
> index e4c9b065ce..e9a01ec4da 100644
> --- a/http-push.c
> +++ b/http-push.c
> @@ -1018,7 +1018,7 @@ static int get_oid_hex_from_objpath(const char *path, 
> struct object_id *oid)
>   memcpy(hex, path, 2);
>   path += 2;
>   path++; /* skip '/' */
> - memcpy(hex, path, GIT_SHA1_HEXSZ - 2);
> + memcpy(hex + 2, path, GIT_SHA1_HEXSZ - 2);
>  
>   return get_oid_hex(hex, oid);

Thanks,
Jonathan


Re: [PATCH] test-stringlist: avoid buffer underrun when sorting nothing

2017-10-03 Thread Jonathan Nieder
René Scharfe wrote:

> Check if the strbuf containing data to sort is empty before attempting
> to trim a trailing newline character.
>
> Signed-off-by: Rene Scharfe 
> ---
>  t/helper/test-string-list.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Thanks for fixing it.
Reviewed-by: Jonathan Nieder 


Re: [PATCH 1/3] path.c: fix uninitialized memory access

2017-10-03 Thread Jonathan Nieder
Jeff King wrote:
> On Tue, Oct 03, 2017 at 03:45:01PM -0700, Jonathan Nieder wrote:

>> In other words, an alternative fix would be
>> 
>>  if (*path == '.' && path[1] == '/') {
>>  ...
>>  }
>> 
>> which would not require passing in 'len' or switching to index-based
>> arithmetic.  I think I prefer it.  What do you think?
>
> Yes, I think that approach is much nicer. I think you could even use
> skip_prefix. Unfortunately you have to play a few games with const-ness,
> but I think the resulting signature for cleanup_path() is an
> improvement:

Ooh!

For what it's worth, if you add a commit message with Thomas's
Reported-by then this lgtm.

Thanks,
Jonathan

> diff --git a/path.c b/path.c
> index 00ec04e7a5..2e09a7bce0 100644
> --- a/path.c
> +++ b/path.c
> @@ -34,11 +34,10 @@ static struct strbuf *get_pathname(void)
>   return sb;
>  }
>  
> -static char *cleanup_path(char *path)
> +static const char *cleanup_path(const char *path)
>  {
>   /* Clean it up */
> - if (!memcmp(path, "./", 2)) {
> - path += 2;
> + if (skip_prefix(path, "./", &path)) {
>   while (*path == '/')
>   path++;
>   }
> @@ -47,7 +46,7 @@ static char *cleanup_path(char *path)
>  
>  static void strbuf_cleanup_path(struct strbuf *sb)
>  {
> - char *path = cleanup_path(sb->buf);
> + const char *path = cleanup_path(sb->buf);
>   if (path > sb->buf)
>   strbuf_remove(sb, 0, path - sb->buf);
>  }
> @@ -64,7 +63,7 @@ char *mksnpath(char *buf, size_t n, const char *fmt, ...)
>   strlcpy(buf, bad_path, n);
>   return buf;
>   }
> - return cleanup_path(buf);
> + return (char *)cleanup_path(buf);
>  }
>  
>  static int dir_prefix(const char *buf, const char *dir)


Re: [PATCH 0/3] fixes for running the test suite with --valgrind

2017-10-03 Thread Jonathan Nieder
Jeff King wrote:

> I think using SANITIZE=memory would catch these, but it needs some
> suppressions tuning. The weird "zlib reads uninitialized memory" error
> is a problem (valgrind sees this, too, but we have suppressions).

What version of zlib do you use?  I've heard some good things about
v1.2.11 improving matters, though I haven't checked yet.

Thanks,
Jonathan


Re: Git for Windows: mingw.c's strange usage of creation time as ctime?

2017-10-03 Thread Jonathan Nieder
+git-for-windows
Hi Marc,

Marc Strapetz wrote:

> compat/mingw.c assigns the Windows file creation time [1] to Git's
> ctime fields at line 591 and at line 705:
>
> buf->st_ctime = filetime_to_time_t(&(fdata.ftCreationTime));
>
> ftCreationTime > ftLastWriteTime is actually possible after copying
> a file, so it makes sense to include this timestamp somehow in the
> Index, but I think it would be better to use the maximum of
> ftLastWriteTime and ftCreationTime here which is less confusing and
> closer to Unix's ctime:
>
> buf->st_ctime = max(buf->st_mtime,
> filetime_to_time_t(&(fdata.ftCreationTime));

Can you say more about the practical effect?  Is this causing a bug in
practice, is it a bug waiting to happen, or is it making the code
difficult to understand without any ill effect expected at run time?
(Any of those three is a valuable kind of feedback to offer --- I'm
just trying to figure out which one you mean.)

By the way, do you have core.trustctime set to true or false?

Thanks,
Jonathan

> -Marc
>
> [1] 
> https://msdn.microsoft.com/en-us/library/windows/desktop/aa365739(v=vs.85).aspx


[PATCH v2] strbuf doc: reuse after strbuf_release is fine

2017-10-03 Thread Jonathan Nieder
strbuf_release leaves the strbuf in a valid, initialized state, so
there is no need to call strbuf_init after it.

Moreover, this is not likely to change in the future: strbuf_release
leaving the strbuf in a valid state has been easy to maintain and has
been very helpful for Git's robustness and simplicity (e.g.,
preventing use-after-free vulnerabilities).

Document the semantics so the next generation of Git developers can
become familiar with them without reading the implementation.  It is
still not advisable to call strbuf_release too often because it is
wasteful, so add a note pointing to strbuf_reset for that.

The same semantics apply to strbuf_detach.  Add a similar note to its
docstring to make that clear.

Improved-by: Jeff King 
Signed-off-by: Jonathan Nieder 
---
Jeff King wrote:

> I think it's actually OK to use the string buffer after this function.
> It's just an empty string.
>
> Perhaps we should be more explicit: this releases any resources and
> resets to a pristine, empty state. I suspect strbuf_detach() probably
> should make the same claim.

Like this?

Thanks,
Jonathan

 strbuf.h | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/strbuf.h b/strbuf.h
index 7496cb8ec5..249df86711 100644
--- a/strbuf.h
+++ b/strbuf.h
@@ -82,8 +82,12 @@ extern char strbuf_slopbuf[];
 extern void strbuf_init(struct strbuf *, size_t);
 
 /**
- * Release a string buffer and the memory it used. You should not use the
- * string buffer after using this function, unless you initialize it again.
+ * Release a string buffer and the memory it used. After this call, the
+ * strbuf points to an empty string that does not need to be free()ed, as
+ * if it had been set to `STRBUF_INIT` and never modified.
+ *
+ * To clear a strbuf in preparation for further use without the overhead
+ * of free()ing and malloc()ing again, use strbuf_reset() instead.
  */
 extern void strbuf_release(struct strbuf *);
 
@@ -91,6 +95,9 @@ extern void strbuf_release(struct strbuf *);
  * Detach the string from the strbuf and returns it; you now own the
  * storage the string occupies and it is your responsibility from then on
  * to release it with `free(3)` when you are done with it.
+ *
+ * The strbuf that previously held the string is reset to `STRBUF_INIT` so
+ * it can be reused after calling this function.
  */
 extern char *strbuf_detach(struct strbuf *, size_t *);
 
-- 
2.14.2.920.gcf0c67979c



Re: Line ending normalization doesn't work as expected

2017-10-04 Thread Jonathan Nieder
Hi Robert,

Robert Dailey wrote:

> You guys are obviously worlds ahead of me on the internals of things,
> but from my perspective I like to avoid the "plumbing" commands as
> much as I can.

I suspect what we are dancing around is the need for some command like

git checkout --renormalize .

which would shorten the sequence to

git checkout --renormalize .
git status; # Show files that will be normalized
git commit; # Commit the result

What do you think?  Would you be interested in writing a patch for it?
("No" is as always an acceptable answer.)

Thanks,
Jonathan


Re: disable interactive prompting

2017-10-04 Thread Jonathan Nieder
Hi Ernesto,

Ernesto Alfonso wrote:

> Waiting for git-push synchronously slows me down, so I have a bash
> alias/function to do this in the background. But when my origin is https, I
> get an undesired interactive prompt. I've tried to disable by
> redirecting stdin:
>
> git push ${REMOTE} ${BRANCH} &>/dev/null 
> but I still get an interactive prompt.
>
> Is there a way to either
>
> 1. disable interactive prompting
> 2. programmatically determine whether a git command (or at least a git
> push) would interactively prompt

You left out an important detail: what does the interactive prompt in
question say?

The general question is also interesting, but seeing the particular
prompt would make it easy to look into the specific case at the same
time.

Thanks,
Jonathan


Re: [PATCH 1/3] path.c: fix uninitialized memory access

2017-10-04 Thread Jonathan Nieder
Junio C Hamano wrote:

> From: Jeff King 
> Date: Tue, 3 Oct 2017 19:30:40 -0400
> Subject: [PATCH] path.c: fix uninitialized memory access
> 
> In cleanup_path we're passing in a char array, run a memcmp on it, and
> run through it without ever checking if something is in the array in the
> first place.  This can lead us to access uninitialized memory, for
> example in t5541-http-push-smart.sh test 7, when run under valgrind:
>
> ==4423== Conditional jump or move depends on uninitialised value(s)
> ==4423==at 0x242FA9: cleanup_path (path.c:35)
[...]
> ==4423==  Uninitialised value was created by a heap allocation
[...]
> ==4423==by 0x29A30B: strbuf_grow (strbuf.c:66)
> ==4423==by 0x29A30B: strbuf_vaddf (strbuf.c:277)
> ==4423==by 0x242F9F: mkpath (path.c:454)
[...]
> Avoid this by using skip_prefix(), which knows not to go beyond the
> end of the string.
>
> Reported-by: Thomas Gummerer 
> Signed-off-by: Jeff King 
> Reviewed-by: Jonathan Nieder 

This is indeed
Reviewed-by: Jonathan Nieder 

Thanks.


Re: Line ending normalization doesn't work as expected

2017-10-04 Thread Jonathan Nieder
Junio C Hamano wrote:
> Jonathan Nieder  writes:

>>  git checkout --renormalize .
>>  git status; # Show files that will be normalized
>>  git commit; # Commit the result
>>
>> What do you think?  Would you be interested in writing a patch for it?
>> ("No" is as always an acceptable answer.)
>
> I actually think what is being requested is the opposite, i.e. "the
> object registered in the index have wrong line endings, and the
> safe-crlf is getting in the way to prevent me from correcting by
> hashing the working tree contents again to register contents with
> corrected line endings, even with 'git add .'".
>
> So I would understand if your suggestion were for
>
>   git checkin --renormalize .
>
> but not "git checkout".  And it probably is more familiar to lay
> people if we spelled that as "git add --renormalize ." ;-)

Good catch.  You understood correctly --- "git add --renormalize" is
the feature that I think is being hinted at here.

Thanks,
Jonathan


Re: [PATCH v2] ref-filter.c: pass empty-string as NULL to atom parsers

2017-10-04 Thread Jonathan Nieder
Junio C Hamano wrote:

> From: Taylor Blau 
> Date: Mon, 2 Oct 2017 09:10:34 -0700
> Subject: [PATCH] ref-filter.c: pass empty-string as NULL to atom parsers
>
> Peff points out that different atom parsers handle the empty
> "sub-argument" list differently. An example of this is the format
> "%(refname:)".
>
> Since callers often use `string_list_split` (which splits the empty
> string with any delimiter as a 1-ary string_list containing the empty
> string), this makes handling empty sub-argument strings non-ergonomic.
>
> Let's fix this by declaring that atom parser implementations must
> not care about distinguishing between the empty string "%(refname:)"
> and no sub-arguments "%(refname)".  Current code aborts, either with
> "unrecognised arg" (e.g. "refname:") or "does not take args"
> (e.g. "body:") as an error message.
>
> Signed-off-by: Taylor Blau 
> Reviewed-by: Jeff King 
> Reviewed-by: Jonathan Nieder 
> Signed-off-by: Junio C Hamano 
> ---
>  ref-filter.c| 10 +++++-
>  t/t6300-for-each-ref.sh |  1 +
>  2 files changed, 10 insertions(+), 1 deletion(-)

Thanks for taking care of it.  This is indeed still
Reviewed-by: Jonathan Nieder 


Re: [PATCH] .mailmap: normalize name for René Scharfe

2017-10-05 Thread Jonathan Nieder
René Scharfe wrote:

> Reported-by: Johannes Schindelin 
> Reported-by: Stefan Beller 
> Signed-off-by: Rene Scharfe 
> ---
>  .mailmap | 1 +
>  1 file changed, 1 insertion(+)

A quick "git shortlog -nse" run confirms that this works.

Reviewed-by: Jonathan Nieder 
Thanks.

> diff --git a/.mailmap b/.mailmap
> index ab85e0d16d..224db83887 100644
> --- a/.mailmap
> +++ b/.mailmap
> @@ -194,6 +194,7 @@ Philippe Bruhat 
>  Ralf Thielow  
>  Ramsay Jones  
>  René Scharfe  
> +René Scharfe  Rene Scharfe
>  Richard Hansen  
>  Richard Hansen  
>  Robert Fitzsimons 


Re: [PATCH 0/6] Teach Status options around showing ignored files

2017-10-05 Thread Jonathan Nieder
Hi,

jameson.mille...@gmail.com wrote:

> This patch series is the second part of [1], which was split into 2
> parts. The first part, added an optimization in the directory listing
> logic to not scan the contents of ignored directories and was merged
> to master with commit 5aaa7fd3. This patch series includes the second
> part to expose additional arguments to the --ignored option on the
> status command.

Thanks.

> This patch series teaches the status command more options to control
> which ignored files are reported independently of the which untracked
[...]
> Our application (Visual Studio) has a specific set of requirements
> about how it wants untracked / ignored files reported by git status.
[...]
> The reason for controlling these behaviors separately is that there
> can be a significant performance impact to scanning the contents of
[]
> As a more concrete example, on Windows, Visual Studio creates a bin/
> and obj/ directory inside of the project where it writes all .obj and
[...]

I see this information is also in patch 1/6.  That's a very good
thing, since that makes performance numbers involved more concrete
about which patch brings them about and it becomes part of permanent
history that way --- thanks.

But it took me a while to notice, and before then, I was trying to
read through the cover letter to get an overview of which patches I am
supposed to look at.  For next time, could the cover letter say
something like "See patches 1 and 2 for more details about the
motivation" instead of repeating the commit message content?  That
would save reviewers some time.

Thanks,
Jonathan


Re: couple questions about git "logical variables" and "git var"

2017-10-05 Thread Jonathan Nieder
Jeff King wrote:
> On Thu, Oct 05, 2017 at 05:11:04AM -0400, rpj...@crashcourse.ca wrote:

>>  - GIT_AUTHOR_IDENT
>>  - GIT_COMMITTER_IDENT
>>  - GIT_EDITOR
>>  - GIT_PAGER
>>
>> first question -- what is it about precisely those four variables that makes
>> them "logical" variables in git parlance? just those four? no others?
>
> It was introduced in the very early days as a way for scripts to get
> access to "standard" values that would be computed the same way as the C
> portions of Git.  But it hasn't generally been kept up to date with new
> possible variables.
>
> It also only tells half the story. You have to know not just what's in
> $GIT_EDITOR, but you have to know the right way to evaluate it. There's
> a git_editor helper in git-sh-setup, but other scripting languages are
> on their own.

I am not sure I understand the complaint here.  git-var(1) says:

GIT_EDITOR
   Text editor for use by Git commands. The value is meant to be
   interpreted by the shell when it is used. Examples: [...]

Are you saying that the output of the command should quote that
manpage, so as to tell the rest of the story?

>   We'd probably have done better to introduce a "git editor"
> command which can be run from any language.

I remember that we discussed this at the time but don't remember why
it didn't happen.  It seems like a good idea.

[...]
>> p.s. yes, i realize this command is deprecated in favour of "git config -l",
>> but as long as it's available, it should work as described in the man page.
>
> Yes, though I think fixing the manpage is the right way to make them
> consistent.

Agreed as well.  rday, care to take a stab at wording?

Thanks,
Jonathan


Re: [RFC] Reporting index properties

2017-10-05 Thread Jonathan Nieder
Alex Vandiver wrote:

> As part of gathering some automated performance statistics of git, it
> would be useful to be able to extract some vital properties of the
> index.  While `git update-index` has ways to _set_ the index version,
> and enable/disable various extensions, AFAICT there is no method by
> which one can ask for reporting about the current state of them --
> short of re-implementing all of the index-parsing logic external to
> Git, which is not terribly appealing.

Aside: git-update-index(1) says

Version 4 is relatively young (first released in 1.8.0 in
October 2012).

My first reaction is to wonder if it is time to introduce a config
option to use index format version 4 by default, since after 5 years
it is not relatively young any more.

My second reaction is to notice that the index.version configuration
already exists.  Probably git-update-index(1) ought to point to it.

JGit still doesn't support index format 4, so 4 is still not a good
default for `index.version` for a user that hasn't explicitly
configured it, but the moment JGit gains that support I think it will
be. :)

> I hesitate to propose adding a flag to `git update-index` which reads
> but does no updating, as that seems to make a misnomer of the
> command.  But this also doesn't seem worthy of a new top-level command.

The existing scripting command that inspects the index is
"git ls-files".

It doesn't go into this kind of low-level detail about the index
format, though.  In the same spirit, I don't think we have an existing
command in this spirit for analyzing packfiles, either.

So I agree with Junio that this would be a good debugging aid to put
in t/helper/ for now, and then once we've had experience with it we'd
end up in a better position to come up with a stable, public-facing
interface, if one is needed.

Thanks and hope that helps,
Jonathan


Re: [PATCH v2] api-argv-array.txt: Remove broken link to string-list API

2017-10-05 Thread Jonathan Nieder
Todd Zullinger wrote:

> In 4f665f2cf3 (string-list.h: move documentation from Documentation/api/
> into header, 2017-09-26) the string-list API documentation was moved to
> string-list.h.  The argv-array API documentation may follow a similar
> course in the future.  Until then, prevent the broken link from making
> it to the end-user documentation.
>
> Signed-off-by: Todd Zullinger 
> ---
>  Documentation/technical/api-argv-array.txt | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Reviewed-by: Jonathan Nieder 

Thanks for catching it.  Do you use a broken link detection tool to
detect this kind of issue automatically?

[...]
> --- a/Documentation/technical/api-argv-array.txt
> +++ b/Documentation/technical/api-argv-array.txt
> @@ -8,7 +8,7 @@ always NULL-terminated at the element pointed to by 
> `argv[argc]`. This
>  makes the result suitable for passing to functions expecting to receive
>  argv from main(), or the link:api-run-command.html[run-command API].
>  
> -The link:api-string-list.html[string-list API] is similar, but cannot be
> +The string-list API (documented in string-list.h) is similar, but cannot be
>  used for these purposes; instead of storing a straight string pointer,
>  it contains an item structure with a `util` field that is not compatible
>  with the traditional argv interface.


Re: Git config multiple values

2017-10-06 Thread Jonathan Nieder
Hi,

Jeff King wrote:
> On Fri, Oct 06, 2017 at 01:10:17PM +0200, aleksander.baranowski wrote:

>> It's just an opinion, but this behaviour is no consistent for me.
>>
>> If it's not the bug it's a feature just let me know.
>
> It's a feature, though I agree that git-config is rather baroque. We're
> mostly stuck with it for reasons of backwards compatibility, though.

This feels like a dodge.  Can we make a list of what is baroque here,
with an eye to fixing it?  E.g. if we introduce a new --set option,
then what should its semantics be, to be more intuitive?

Thanks,
Jonathan


Re: is there a truly compelling rationale for .git/info/exclude?

2017-10-06 Thread Jonathan Nieder
Hi,

Robert P. J. Day wrote:
> On Fri, 6 Oct 2017, Junio C Hamano wrote:

>> Don't waste time by seeking a "compelling" reason.  A mere "this is
>> the most expedite way to gain convenience" back when something was
>> introduced could be an answer, and it is way too late to complain
>> about such a choice anyway.
>
>   perfectly respectable answer ... it tells me that, between
> .gitignore files and core.excludesFile, there's not much left for
> .git/info/exclude to do, except in weird circumstances.

I use .git/info/exclude in what I don't consider to be weird
circumstances.

But I am not motivated to say more than that without knowing what my
answer is going to be used for.  E.g. is there a part of the
gitignore(5) man page where such an explanation would make it less
confusing and more useful?

Thanks,
Jonathan


Re: [PATCH 1/2] tests: use shell negation instead of test_must_fail for test_cmp

2017-10-06 Thread Jonathan Nieder
Stefan Beller wrote:

> The `test_must_fail` should only be used to indicate a git command is
> failing. `test_cmp` is not a git command, such that it doesn't need the
> special treatment of `test_must_fail` (which e.g. includes checking for
> segfault).
>
> Signed-off-by: Stefan Beller 
> ---
>  t/t3504-cherry-pick-rerere.sh | 2 +-
>  t/t5512-ls-remote.sh  | 2 +-
>  t/t5612-clone-refspec.sh  | 2 +-
>  t/t7508-status.sh | 2 +-
>  t/t9164-git-svn-dcommit-concurrent.sh | 2 +-
>  5 files changed, 5 insertions(+), 5 deletions(-)

Thanks.  I agree that this is more readable, and it matches the advice
in t/README:

   On the other hand, don't use test_must_fail for running regular
   platform commands; just use '! cmd'.  We are not in the business
   of verifying that the world given to us sanely works.

Reviewed-by: Jonathan Nieder 

I wonder if it would be useful to have a nod to that advice in the
docstring in t/test-lib-functions.sh, too.


Re: [PATCH 1/2] tests: use shell negation instead of test_must_fail for test_cmp

2017-10-06 Thread Jonathan Nieder
Hi,

Jeff King wrote:
> On Fri, Oct 06, 2017 at 12:00:05PM -0700, Stefan Beller wrote:

>> The `test_must_fail` should only be used to indicate a git command is
>> failing. `test_cmp` is not a git command, such that it doesn't need the
>> special treatment of `test_must_fail` (which e.g. includes checking for
>> segfault)
>
> Hmph. "test_must_fail test_cmp" is a weird thing for somebody to write.
> And your patch is obviously an improvement, but I have to wonder if some
> of these make any sense.

Just for the record: I agree with all the above, and my Reviewed-by
still stands.

Thanks for looking closer.  I wonder if there's a systematic way to
avoid this kind of weak test that can bitrot and still pass too easily
--- e.g. should we have a test_must_differ command with an explanation
of why it should usually be avoided?  Would a lint check that bans
this kind of construct completely be going too far?

Jonathan


Re: [PATCH 2/2] tests: fix diff order arguments in test_cmp

2017-10-06 Thread Jonathan Nieder
Stefan Beller wrote:

> Fix the argument order for test_cmp. When given the expected
> result first the diff shows the actual output with '+' and the
> expectation with '-', which is the convention for our tests.
>
> Signed-off-by: Stefan Beller 
> ---

Yes, this should make the output from failing tests easier to take in
at a glance.

Reviewed-by: Jonathan Nieder 

How did you find these?  E.g. is there a grep pattern that reviewers
can use to repeat your results?


Re: [Question] Documenting platform implications on CVE to git

2017-10-06 Thread Jonathan Nieder
Hi Randall,

Randall S. Becker wrote:

> I wonder whether there is some mechanism for providing official responses
> from platform ports relating to security CVE reports, like CVE-2017-14867.

This question is too abstract for me.  Can you say more concretely
what you are trying to do?

E.g. are you asking how you would communicate to users of your port
that CVE-2017-14867 does not apply to them?  Or are you asking where
to start a conversation about who a bug applies to?  Or something
else?

Thanks,
Jonathan

> For example, the Perl implementation on HPE NonStop does not include the SCM
> module so commands relating cvsserver may not be available - one thing to be
> verified so is a question #1. But the real question (#2) is: where would one
> most appropriately document issues like this to be consistent with other
> platform responses relating to git?
>
> Thanks,
> Randall
>
> -- Brief whoami: NonStop&UNIX developer since approximately
> UNIX(421664400)/NonStop(2112884442)
> -- In my real life, I talk too much.


Re: [Question] Documenting platform implications on CVE to git

2017-10-06 Thread Jonathan Nieder
Hi,

Randall S. Becker wrote:

> The first one, mostly. When looking at CVE-2017-14867, there are places like
> https://nvd.nist.gov/vuln/detail/CVE-2017-14867 where the issue is
> discussed. It provides hyperlinks to various platform discussions.
> Unfortunately for me, I am not an HPE employee - and even if I was, there is
> no specific site where I can publicly discuss the vulnerability. I'm looking
> to the group here for advice on how to get the word out that it does not
> appear to apply to the HPE NonStop Git port. The question of where to best
> do that for any CVE pertaining to git as applicable to the NonStop Port is
> question #1.

How do people find out about the HPE NonStop Git port?  Where is it
distributed?  Does that distribution point allow you to publish
release notes or other documentation?

Do you have a web page?  That's another place you can publish
information.  http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2017-14867
links to lots of resources that are not from the Git project.

The oss-security list 
allows anyone to participate.  It is a place that people often
collaborate to figure out the impact of a published vulnerability, how
to mitigate it, etc.  There are other similar mailing lists elsewhere,
too.

> Question #2 - probably more relevant to the specific issue and this group -
> is whether the vulnerability is contained to Git's use of Perl SCM and since
> NonStop's Perl does not support SCM, the vulnerability may not be relevant,
> but I'm not really enough of a Perl guru to make that determination.

What is Perl SCM?  I don't know what you're talking about.

Thanks,
Jonathan


Re: [PATCH] submodule: spell out API of submodule_move_head

2017-10-09 Thread Jonathan Nieder
Stefan Beller wrote:

> This way users of this function do not need to read the implementation to
> know what it will do.
>
> Signed-off-by: Jonathan Nieder 
> Signed-off-by: Stefan Beller 
> ---
>  submodule.c |  5 -
>  submodule.h | 18 ++
>  2 files changed, 18 insertions(+), 5 deletions(-)

Looks good (well, you'd expect me to think that :)).  Thanks.


Re: [PATCH v2 01/24] walker: convert to struct object_id

2017-10-09 Thread Jonathan Nieder
Hi,

brian m. carlson wrote:

> Signed-off-by: brian m. carlson 
> ---
>  walker.c | 24 
>  1 file changed, 12 insertions(+), 12 deletions(-)

An egrep for 'sha1|unsigned char' finds nothing left in the file after
this change.

[...]
> --- a/walker.c
> +++ b/walker.c
> @@ -7,7 +7,7 @@
>  #include "blob.h"
>  #include "refs.h"
>  
> -static unsigned char current_commit_sha1[20];
> +static struct object_id current_commit_oid;

Yay!

nit, not suggesting changing anything: I would have been tempted to
call this current_commit_id, or even just current_commmit.

[...]
> @@ -259,7 +259,7 @@ int walker_fetch(struct walker *walker, int targets, char 
> **target,
>   struct strbuf refname = STRBUF_INIT;
>   struct strbuf err = STRBUF_INIT;
>   struct ref_transaction *transaction = NULL;
> - unsigned char *sha1 = xmalloc(targets * 20);
> + struct object_id *oids = xmalloc(targets * sizeof(struct object_id));

Not new in this patch, just noticing while we're here: can this
multiplication overflow?

E.g. in remote-curl, it looks like nr_heads gets incremented once per
"fetch" line passed to the remote helper, making it unbounded.

This could use st_mult or ALLOC_ARRAY to protect against that.  The
caller remote-curl.c::parse_fetch also needs an overflow check to
avoid overflowing its "int".  Likewise, walker_targets_stdin needs an
overflow check to avoid overflowing its "int".

[...]
> @@ -321,7 +321,7 @@ int walker_fetch(struct walker *walker, int targets, char 
> **target,
>  done:
>   ref_transaction_free(transaction);
>   free(msg);
> - free(sha1);
> + free(oids);

Reviewed-by: Jonathan Nieder 


Re: [PATCH v2 02/24] refs/files-backend: convert struct ref_to_prune to object_id

2017-10-09 Thread Jonathan Nieder
brian m. carlson wrote:

> Change the member of this struct to be a struct object_id.
>
> Signed-off-by: brian m. carlson 
> ---
>  refs/files-backend.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)

Yes, this is an obvious improvement to the clarity of the code.
Reviewed-by: Jonathan Nieder 


Re: [PATCH v2 03/24] refs: convert delete_ref and refs_delete_ref to struct object_id

2017-10-09 Thread Jonathan Nieder
brian m. carlson wrote:

> Convert delete_ref and refs_delete_ref to take a pointer to struct
> object_id.  Update the documentation accordingly, including referring to
> null_oid in lowercase, as it is not a #define constant.
>
> Signed-off-by: brian m. carlson 
> ---
>  builtin/branch.c  |  2 +-
>  builtin/replace.c |  2 +-
>  builtin/reset.c   |  2 +-
>  builtin/tag.c |  2 +-
>  builtin/update-ref.c  |  2 +-
>  refs.c| 21 +++--
>  refs.h| 12 ++--
>  refs/files-backend.c  |  2 +-
>  t/helper/test-ref-store.c |  6 +++---
>  9 files changed, 26 insertions(+), 25 deletions(-)

Was this prepared using coccinelle?  (Just curious.)

[...]
> @@ -663,12 +663,13 @@ int refs_delete_ref(struct ref_store *refs, const char 
> *msg,
>  
>   if (ref_type(refname) == REF_TYPE_PSEUDOREF) {
>   assert(refs == get_main_ref_store());
> - return delete_pseudoref(refname, old_sha1);
> + return delete_pseudoref(refname, old_oid);
>   }
>  
>   transaction = ref_store_transaction_begin(refs, &err);
>   if (!transaction ||
> - ref_transaction_delete(transaction, refname, old_sha1,
> + ref_transaction_delete(transaction, refname,
> +old_oid ? old_oid->hash : NULL,
>  flags, msg, &err) ||

musing out loud: Distinguishing contexts where we need this kind of
change and contexts where we can use old_oid->hash directly can be
subtle.  Do we need some kind of annotation to mark which parameters
are nullable and which aren't?

[...]
> --- a/refs.h
> +++ b/refs.h
> @@ -371,19 +371,19 @@ int refs_reflog_exists(struct ref_store *refs, const 
> char *refname);
>  int reflog_exists(const char *refname);
>  
>  /*
> - * Delete the specified reference. If old_sha1 is non-NULL, then
> + * Delete the specified reference. If old_oid is non-NULL, then
>   * verify that the current value of the reference is old_sha1 before
> - * deleting it. If old_sha1 is NULL, delete the reference if it
> - * exists, regardless of its old value. It is an error for old_sha1 to
> - * be NULL_SHA1. msg and flags are passed through to
> + * deleting it. If old_oid is NULL, delete the reference if it
> + * exists, regardless of its old value. It is an error for old_oid to
> + * be null_oid. msg and flags are passed through to
>   * ref_transaction_delete().
>   */
>  int refs_delete_ref(struct ref_store *refs, const char *msg,

Thanks for updating the comment.

Reviewed-by: Jonathan Nieder 


Re: [PATCH v2 04/24] refs: convert update_ref and refs_update_ref to use struct object_id

2017-10-09 Thread Jonathan Nieder
brian m. carlson wrote:

> Convert update_ref, refs_update_ref, and write_pseudoref to use struct
> object_id.  Update the existing callers as well.  Remove update_ref_oid,
> as it is no longer needed.
>
> Signed-off-by: brian m. carlson 
> ---

I'm very happy to see this kind of cleanup (removal of update_ref_oid).

>  bisect.c  |  5 +++--
>  builtin/am.c  | 14 +++---
>  builtin/checkout.c|  3 +--
>  builtin/clone.c   | 14 +++---
>  builtin/merge.c   | 13 ++---
>  builtin/notes.c   | 10 +-
>  builtin/pull.c|  2 +-
>  builtin/reset.c   |  4 ++--
>  builtin/update-ref.c  |  2 +-
>  notes-cache.c |  2 +-
>  notes-utils.c |  2 +-
>  refs.c| 40 
>  refs.h|  5 +
>  sequencer.c   |  9 +++--
>  t/helper/test-ref-store.c | 10 +-
>  transport-helper.c|  3 ++-
>  transport.c   |  4 ++--
>  17 files changed, 64 insertions(+), 78 deletions(-)
[...]
> +++ b/builtin/checkout.c
> @@ -664,8 +664,7 @@ static void update_refs_for_switch(const struct 
> checkout_opts *opts,
>   if (!strcmp(new->name, "HEAD") && !new->path && !opts->force_detach) {
>   /* Nothing to do. */
>   } else if (opts->force_detach || !new->path) {  /* No longer on any 
> branch. */
> - update_ref(msg.buf, "HEAD", new->commit->object.oid.hash, NULL,
> -REF_NODEREF, UPDATE_REFS_DIE_ON_ERR);
> + update_ref(msg.buf, "HEAD", &new->commit->object.oid, NULL, 
> REF_NODEREF, UPDATE_REFS_DIE_ON_ERR);

Long line.

[...]
> --- a/builtin/pull.c
> +++ b/builtin/pull.c
> @@ -544,7 +544,7 @@ static int pull_into_void(const struct object_id 
> *merge_head,
>   if (checkout_fast_forward(&empty_tree_oid, merge_head, 0))
>   return 1;
>  
> - if (update_ref("initial pull", "HEAD", merge_head->hash, 
> curr_head->hash, 0, UPDATE_REFS_DIE_ON_ERR))
> + if (update_ref("initial pull", "HEAD", merge_head, curr_head, 0, 
> UPDATE_REFS_DIE_ON_ERR))
>   return 1;

nit, not needing a change: Preexisting long line.

I wonder if we can teach "make style" to perform line wrapping
correctly and fix those all at once e.g. in builtin/ at some point.
When reading, a consistent line length is helpful, but reviewing each
patch for it feels like wasted time.

[...]
> --- a/refs.c
> +++ b/refs.c
> @@ -985,17 +985,9 @@ int ref_transaction_verify(struct ref_transaction 
> *transaction,
> flags, NULL, err);
>  }
>  
> -int update_ref_oid(const char *msg, const char *refname,
> -const struct object_id *new_oid, const struct object_id *old_oid,
> -unsigned int flags, enum action_on_err onerr)
> -{
> - return update_ref(msg, refname, new_oid ? new_oid->hash : NULL,
> - old_oid ? old_oid->hash : NULL, flags, onerr);
> -}
> -

Yay!

[...]
> --- a/sequencer.c
> +++ b/sequencer.c
> @@ -1114,12 +1114,10 @@ static int do_pick_commit(enum todo_command command, 
> struct commit *commit,
>* write it at all.
>*/
>   if (command == TODO_PICK && !opts->no_commit && (res == 0 || res == 1) 
> &&
> - update_ref(NULL, "CHERRY_PICK_HEAD", commit->object.oid.hash, NULL,
> -REF_NODEREF, UPDATE_REFS_MSG_ON_ERR))
> + update_ref(NULL, "CHERRY_PICK_HEAD", &commit->object.oid, NULL, 
> REF_NODEREF, UPDATE_REFS_MSG_ON_ERR))
>   res = -1;
>   if (command == TODO_REVERT && ((opts->no_commit && res == 0) || res == 
> 1) &&
> - update_ref(NULL, "REVERT_HEAD", commit->object.oid.hash, NULL,
> -REF_NODEREF, UPDATE_REFS_MSG_ON_ERR))
> + update_ref(NULL, "REVERT_HEAD", &commit->object.oid, NULL, 
> REF_NODEREF, UPDATE_REFS_MSG_ON_ERR))

Long lines.

[...]
> @@ -2123,8 +2121,7 @@ static int pick_commits(struct todo_list *todo_list, 
> struct replay_opts *opts)
>   }
>   msg = reflog_message(opts, "finish", "%s onto %s",
>   head_ref.buf, buf.buf);
> - if (update_ref(msg, head_ref.buf, head.hash, orig.hash,
> - REF_NODEREF, UPDATE_REFS_MSG_ON_ERR)) {
> + if (update_ref(msg, head_ref.buf, &head, &orig, 
> REF_NODEREF, UPDATE_REFS_MSG_ON_ERR)) {
>   res = error(_("could not update %s"),
>   head_ref.buf);

Likewise.

As mentioned above, I am not too worried about the line length issues
(none of these is particularly jarring).  For what it's worth,

Reviewed-by: Jonathan Nieder 

Thanks.


Re: [PATCH v2 05/24] refs: update ref transactions to use struct object_id

2017-10-09 Thread Jonathan Nieder
brian m. carlson wrote:

> Update the ref transaction code to use struct object_id.  Remove one
> NULL pointer check which was previously inserted around a dereference;
> since we now pass a pointer to struct object_id directly through, the
> code we're calling handles this for us.
>
> Signed-off-by: brian m. carlson 
> ---
>  branch.c   |  2 +-
>  builtin/clone.c|  2 +-
>  builtin/commit.c   |  4 ++--
>  builtin/fetch.c|  4 ++--
>  builtin/receive-pack.c |  4 ++--
>  builtin/replace.c  |  2 +-
>  builtin/tag.c  |  2 +-
>  builtin/update-ref.c   |  8 
>  fast-import.c  |  4 ++--
>  refs.c | 44 +---
>  refs.h | 24 
>  refs/files-backend.c   | 12 ++--
>  refs/refs-internal.h   |  4 ++--
>  sequencer.c|  2 +-
>  walker.c   |  2 +-
>  15 files changed, 59 insertions(+), 61 deletions(-)

Makes sense.

[...]
> +++ b/refs.c
[...]
>  int ref_transaction_create(struct ref_transaction *transaction,
>  const char *refname,
> -const unsigned char *new_sha1,
> +const struct object_id *new_oid,
>  unsigned int flags, const char *msg,
>  struct strbuf *err)
>  {
> - if (!new_sha1 || is_null_sha1(new_sha1))
> + if (!new_oid || is_null_oid(new_oid))
>   die("BUG: create called without valid new_sha1");

The error message still refers to new_sha1.

[...]
>  int ref_transaction_delete(struct ref_transaction *transaction,
>  const char *refname,
> -const unsigned char *old_sha1,
> +const struct object_id *old_oid,
>  unsigned int flags, const char *msg,
>  struct strbuf *err)
>  {
> - if (old_sha1 && is_null_sha1(old_sha1))
> + if (old_oid && is_null_oid(old_oid))
>   die("BUG: delete called with old_sha1 set to zeros");

Likewise.

[...]
>  int ref_transaction_verify(struct ref_transaction *transaction,
>  const char *refname,
> -const unsigned char *old_sha1,
> +const struct object_id *old_oid,
>  unsigned int flags,
>  struct strbuf *err)
>  {
> - if (!old_sha1)
> + if (!old_oid)
>   die("BUG: verify called with old_sha1 set to NULL");

Likewise.

[...]
> +++ b/refs.h
> @@ -519,15 +519,15 @@ struct ref_transaction *ref_transaction_begin(struct 
> strbuf *err);
[...]
>  /*
> - * Add a reference creation to transaction. new_sha1 is the value that
> + * Add a reference creation to transaction. new_oid is the value that
>   * the reference should have after the update; it must not be
> - * null_sha1. It is verified that the reference does not exist
> + * null_oid. It is verified that the reference does not exist
>   * already.
>   *
>   * See the above comment "Reference transaction updates" for more
> @@ -535,35 +535,35 @@ int ref_transaction_update(struct ref_transaction 
> *transaction,
>   */

(Possible diff generation bug: there's exactly one line represented in
that @@ field.  I would have expected the diff generator to combine
the hunks.)

I think this is fine to go in without the error messages being fixed.
A grep for 'sha1' will find them later so that they can be addressed
in a separate patch.

Reviewed-by: Jonathan Nieder 


Re: [PATCH v2 06/24] Convert check_connected to use struct object_id

2017-10-09 Thread Jonathan Nieder
brian m. carlson wrote:

> Convert check_connected and the callbacks it takes to use struct
> object_id.
>
> Signed-off-by: brian m. carlson 
> ---
>  builtin/clone.c|  4 ++--
>  builtin/fetch.c|  4 ++--
>  builtin/receive-pack.c | 10 +-
>  connected.c| 18 +-
>  connected.h|  4 ++--
>  5 files changed, 20 insertions(+), 20 deletions(-)

Reviewed-by: Jonathan Nieder 


Re: [PATCH v2 07/24] refs: convert resolve_refdup and refs_resolve_refdup to struct object_id

2017-10-09 Thread Jonathan Nieder
brian m. carlson wrote:

> All of the callers already pass the hash member of struct object_id, so
> update them to pass a pointer to the struct directly,
>
> This transformation was done with an update to declaration and
> definition and the following semantic patch:
>
> @@
> expression E1, E2, E3, E4;
> @@
> - resolve_refdup(E1, E2, E3.hash, E4)
> + resolve_refdup(E1, E2, &E3, E4)
>
> @@
> expression E1, E2, E3, E4;
> @@
> - resolve_refdup(E1, E2, E3->hash, E4)
> + resolve_refdup(E1, E2, E3, E4)
>
> Signed-off-by: brian m. carlson 

Lovely.  I tried putting that in contrib/coccinelle/resolve_refdup.cocci,
running

  git grep -l -e resolve_refdup -- '*.c' '*.h' |
  xargs spatch --in-place --sp-file contrib/coccinelle/resolve_refdup.cocci

and diffing the result against this commit.  With --word-diff, there
are a small number of changes:

- the above semantic patch handles resolve_refdup but not
  refs_resolve_refdup.  This commit does both.

- as mentioned in the commit message, the above semantic patch only
  updates callers.  This commit updates the implementations to match.

Without --word-diff, I also see some line-wrapping changes, which all
seem reasonable.  (Coccinelle's line-wrapping heuristics seem to be
pretty specific to Linux kernel style.)

In other words, this does what it says on the cover in a
straightforward and reviewable way.  Thanks for that.

Reviewed-by: Jonathan Nieder 


Re: What happened to "git status --color=(always|auto|never)"?

2017-10-09 Thread Jonathan Nieder
Hi,

Nazri Ramliy wrote:

> I used to work before, but now:
>
> $ git version
> git version 2.15.0.rc0.39.g2f0e14e649
>
> $ git status --color=always
> error: unknown option `color=always'
> usage: git status [] [--] ...

Which version did it work in?  That would allow me to bisect.

Thanks,
Jonathan


Re: [PATCH] submodule: spell out API of submodule_move_head

2017-10-09 Thread Jonathan Nieder
Junio C Hamano wrote:
> Stefan Beller  writes:

>> +/**
>> + * Move the HEAD and content of the active submodule at 'path' from object 
>> id
>> + * 'old' to 'new'.
>> + *
>> + * Updates the submodule at 'path' and files in its work tree to commit
>> + * 'new'. The commit previously pointed to by the submodule is named by
>> + * 'old'. This updates the submodule's HEAD, index, and work tree but
>> + * does not touch its gitlink entry in the superproject.
>> + *
>> + * If the submodule did not previously exist, then 'old' should be NULL.
>> + * Similarly, if the submodule is to be removed, 'new' should be NULL.
>> + *
>> + * If updating the submodule would cause local changes to be overwritten,
>> + * then instead of updating the submodule, this function prints an error
>> + * message and returns -1.
>
> This is not a new issue (the removed comment did not mention this at
> all), but is it correct to say that updates to "index and work tree"
> was as if we did "git -C $path checkout new" (and of course, HEAD in
> the $path submodule must be at 'old')?

I don't understand the question.  This comment doesn't say it's like
"git checkout" --- are you saying it should?

The function is more like "git read-tree -m -u" (or --reset when
SUBMODULE_MOVE_HEAD_FORCE is passed) than like "git checkout".
Perhaps what you are hinting at is that read-tree --recurse-submodules
is not necessarily the right primitive to implement "git checkout"
with.  But that's a separate issue from documenting the current
behavior of the function.

> What should happen if 'old' does not match reality (i.e. old is NULL
> but HEAD does point at some commit, old and HEAD are different,
> etc.)?  Should the call be aborted?

No.

Thanks,
Jonathan

>> + * If the submodule is not active, this does nothing and returns 0.
>> + */
>>  #define SUBMODULE_MOVE_HEAD_DRY_RUN (1<<0)
>>  #define SUBMODULE_MOVE_HEAD_FORCE   (1<<1)
>>  extern int submodule_move_head(const char *path,


Re: [PATCH] submodule: spell out API of submodule_move_head

2017-10-09 Thread Jonathan Nieder
Junio C Hamano wrote:
> Jonathan Nieder  writes:
>> Junio C Hamano wrote:

>>> This is not a new issue (the removed comment did not mention this at
>>> all), but is it correct to say that updates to "index and work tree"
>>> was as if we did "git -C $path checkout new" (and of course, HEAD in
>>> the $path submodule must be at 'old')?
>>
>> I don't understand the question.  This comment doesn't say it's like
>> "git checkout" --- are you saying it should?
>
> No, I am pointing out that this comment does not say what it's like
> clearly enough.  s/is it correct/am I correct/ would have been less
> prone to get misunderstood, I guess.

No problem.  I think a word or two about how it's like read-tree
in the docstring could be an improvement.

> If it behaves like two-tree "read-tree -m -u", I'd say that the best
> explanation an average developer would understand is that the update
> done to "index and work tree" is like using 'git checkout' to switch
> to the branch whose tip is 'new'.

If it says it's like "git checkout", then I fear that can just lead to
more confusion, since "git checkout" does a number of things (e.g.
updating the HEAD symref) that this function does not do.

It could say that it's like "git reset --keep", I suppose.

[...]
>>> What should happen if 'old' does not match reality (i.e. old is NULL
>>> but HEAD does point at some commit, old and HEAD are different,
>>> etc.)?  Should the call be aborted?
>>
>> No.
>
> ... and that is because?
>
> When does it make sense to do a two-tree "read-tree -m -u" giving
> the 'old' that is very different from the real 'old' tree-ish that
> corresponds to where your index started at?

Because that is not the purpose of the function.

The caller is responsible for setting 'old' appropriately.  A word or
two in that direction would not be a terrible thing.

All that said, I want this function to go away completely. :)
Documenting how it currently behaves is just a good way to understand
what is happening when doing so.

Thanks,
Jonathan


Re: What happened to "git status --color=(always|auto|never)"?

2017-10-09 Thread Jonathan Nieder
Nazri Ramliy wrote:
> On Tue, Oct 10, 2017 at 8:16 AM, Jonathan Nieder  wrote:
>> Nazri Ramliy wrote:

>>> I used to work before, but now:
>>>
>>> $ git version
>>> git version 2.15.0.rc0.39.g2f0e14e649
>>>
>>> $ git status --color=always
>>> error: unknown option `color=always'
>>> usage: git status [] [--] ...
>>
>> Which version did it work in?  That would allow me to bisect.
>
> Sorry. It's my bad. I must have confused this with `git grep`'s --color 
> option.

No problem.  It sounds like a reasonable feature request to me,
especially now that we are about to drop support for
color.status=always in configuration:

commit 6be4595edb8e5b616c6e8b9fbc78b0f831fa2a87
Author: Jeff King 
Date:   Tue Oct 3 09:46:06 2017 -0400

color: make "always" the same as "auto" in config

Would you like to take a stab at adding it?  builtin/commit.c and
Documentation/git-{commit,status}.txt would be my best guesses at
where to start.

Thanks,
Jonathan


Re: [PATCH v2 08/24] refs: convert read_ref and read_ref_full to object_id

2017-10-09 Thread Jonathan Nieder
brian m. carlson wrote:

> All but two of the call sites already had parameters using the hash

nit: "already have", since commit messages use the present for the
state of the codebase without the patch applied.

> parameter of struct object_id, so convert them to take a pointer to the
> struct directly.  Also convert refs_read_refs_full, the underlying
> implementation.
>
> Signed-off-by: brian m. carlson 
> ---
>  builtin/checkout.c |  6 +++---
>  builtin/remote.c   |  2 +-
>  builtin/replace.c  |  4 ++--
>  builtin/show-ref.c |  2 +-
>  builtin/tag.c  |  4 ++--
>  builtin/update-index.c |  6 +++---
>  bundle.c   |  2 +-
>  fast-import.c  |  2 +-
>  notes-cache.c  |  2 +-
>  notes-merge.c  |  2 +-
>  notes-utils.c  |  2 +-
>  notes.c|  2 +-
>  refs.c | 20 ++--
>  refs.h |  6 +++---
>  refs/files-backend.c   | 16 
>  remote-testsvn.c   |  2 +-
>  remote.c   |  6 +++---
>  sequencer.c|  2 +-
>  transport-helper.c |  5 ++---
>  19 files changed, 46 insertions(+), 47 deletions(-)
[...]
> --- a/builtin/checkout.c
> +++ b/builtin/checkout.c
> @@ -379,7 +379,7 @@ static int checkout_paths(const struct checkout_opts 
> *opts,
>   if (write_locked_index(&the_index, lock_file, COMMIT_LOCK))
>   die(_("unable to write new index file"));
>  
> - read_ref_full("HEAD", 0, rev.hash, NULL);
> + read_ref_full("HEAD", 0, &rev, NULL);

Yep, this is nicer (and likewise for the rest of them).

Reviewed-by: Jonathan Nieder 


Re: [PATCH v2 09/24] refs: convert dwim_ref and expand_ref to struct object_id

2017-10-09 Thread Jonathan Nieder
brian m. carlson wrote:

> All of the callers of these functions just pass the hash member of a
> struct object_id, so convert them to use a pointer to struct object_id
> directly.
>
> Signed-off-by: brian m. carlson 
> ---
>  archive.c |  2 +-
>  branch.c  |  2 +-
>  builtin/fast-export.c |  2 +-
>  builtin/log.c |  2 +-
>  builtin/merge-base.c  |  2 +-
>  builtin/merge.c   |  2 +-
>  builtin/rev-parse.c   |  2 +-
>  builtin/show-branch.c |  2 +-
>  bundle.c  |  2 +-
>  refs.c| 14 +++---
>  refs.h|  4 ++--
>  remote.c  |  3 +--
>  sha1_name.c   |  6 +++---
>  upload-pack.c |  2 +-
>  wt-status.c   |  2 +-
>  15 files changed, 24 insertions(+), 25 deletions(-)

One worry below.  I might be worrying in vain but thought I might as
well ask.

> --- a/refs.c
> +++ b/refs.c
[...]
> -int expand_ref(const char *str, int len, unsigned char *sha1, char **ref)
> +int expand_ref(const char *str, int len, struct object_id *oid, char **ref)
>  {
>   const char **p, *r;
>   int refs_found = 0;
> @@ -472,15 +472,15 @@ int expand_ref(const char *str, int len, unsigned char 
> *sha1, char **ref)
>  
>   *ref = NULL;
>   for (p = ref_rev_parse_rules; *p; p++) {
> - unsigned char sha1_from_ref[20];
> - unsigned char *this_result;
> + struct object_id oid_from_ref;
> + struct object_id *this_result;
>   int flag;
>  
> - this_result = refs_found ? sha1_from_ref : sha1;
> + this_result = refs_found ? &oid_from_ref : oid;
>   strbuf_reset(&fullref);
>   strbuf_addf(&fullref, *p, len, str);
>   r = resolve_ref_unsafe(fullref.buf, RESOLVE_REF_READING,
> -this_result, &flag);
> +this_result->hash, &flag);

Can this_result be NULL?  Can the oid param be NULL?

Since both this and dwim_ref are non-static functions, I'd be comforted by an

if (!oid)
BUG("expand_ref: oid must be non-NULL");

at the top so that mistakes in callers are caught quickly.

[...]
> --- a/remote.c
> +++ b/remote.c
> @@ -1628,8 +1628,7 @@ static void set_merge(struct branch *ret)
>   if (!remote_find_tracking(remote, ret->merge[i]) ||
>   strcmp(ret->remote_name, "."))
>   continue;
> - if (dwim_ref(ret->merge_name[i], strlen(ret->merge_name[i]),
> -  oid.hash, &ref) == 1)
> + if (dwim_ref(ret->merge_name[i], strlen(ret->merge_name[i]), 
> &oid, &ref) == 1)
>   ret->merge[i]->dst = ref;

Long line (but as discussed earlier in this series, I don't mind).

Thanks,
Jonathan


Re: [PATCH v2 10/24] builtin/reflog: convert remaining unsigned char uses to object_id

2017-10-09 Thread Jonathan Nieder
brian m. carlson wrote:

> Convert the remaining uses of unsigned char [20] to struct object_id.
> This conversion is needed for dwim_log.
>
> Signed-off-by: brian m. carlson 
> ---
>  builtin/reflog.c | 18 +-
>  1 file changed, 9 insertions(+), 9 deletions(-)

Thanks.  Looks good.

Reviewed-by: Jonathan Nieder 


Re: [PATCH v2 11/24] refs: convert dwim_log to struct object_id

2017-10-09 Thread Jonathan Nieder
brian m. carlson wrote:

> Signed-off-by: brian m. carlson 
> ---
>  builtin/reflog.c | 4 ++--
>  reflog-walk.c| 2 +-
>  refs.c   | 8 
>  refs.h   | 2 +-
>  sha1_name.c  | 2 +-
>  5 files changed, 9 insertions(+), 9 deletions(-)

Reviewed-by: Jonathan Nieer 


Re: [PATCH v2 12/24] pack-bitmap: convert traverse_bitmap_commit_list to object_id

2017-10-09 Thread Jonathan Nieder
brian m. carlson wrote:

> Convert traverse_bitmap_commit_list and the callbacks it takes to use a
> pointer to struct object_id.
>
> Signed-off-by: brian m. carlson 
> ---
>  builtin/pack-objects.c | 8 
>  builtin/rev-list.c | 4 ++--
>  pack-bitmap.c  | 8 
>  pack-bitmap.h  | 2 +-
>  4 files changed, 11 insertions(+), 11 deletions(-)

Reviewed-by: Jonathan Nieder 


Re: [PATCH v2 13/24] builtin/pack-objects: convert to struct object_id

2017-10-09 Thread Jonathan Nieder
brian m. carlson wrote:

> This is one of the last unconverted callers to peel_ref.  While we're
> fixing that, convert the rest of the file, since it will need to be
> converted at some point anyway.
>
> Signed-off-by: brian m. carlson 
> ---
>  builtin/pack-objects.c | 135 
> +
>  1 file changed, 68 insertions(+), 67 deletions(-)

Reviewed-by: Jonathan Nieder 


Re: What happened to "git status --color=(always|auto|never)"?

2017-10-10 Thread Jonathan Nieder
Hi,

Jeff King wrote:
> On Tue, Oct 10, 2017 at 09:51:38PM +0900, Junio C Hamano wrote:

>> I think the right fix to the original problem (you cannot remove
>> auto-color from the plumbing) is to stop paying attention to color
>> configuration from the default config.  I wonder if something like
>> this would work?
>>
>>  - Initialize color.c::git_use_color_default to GIT_COLOR_UNKNOWN;
>>
>>  - When git_color_config() is called, and if git_use_color_default
>>is still GIT_COLOR_UNKNOWN, set it to GIT_COLOR_AUTO (regardless
>>of the variable git_color_config() is called for).
>>
>>  - In color.c::want_color(), when git_use_color_default is used,
>>notice if it is GIT_COLOR_UNKNOWN and behave as if it is
>>GIT_COLOR_NEVER.
>>
>> Then we make sure that git_color_config() is never called by any
>> plumbing command.  The fact it is (ever) called can be taken as a
>> clue that we are running a Porcelain (hence we transition from
>> UNKNOWN to AUTO), so we'd get the desirable "no default color for
>> plumbing, auto color for Porcelain", I would think.
>
> Yes, I think that's the simplest way to implement the "plumbing should
> never do color without a command-line option" scheme.
>
> I do wonder if people would end up seeing some corner cases as
> regressions, though. Right now "diff-tree" _does_ color the output by
> default, and it would stop doing so under your scheme. That's the right
> thing to do by the plumbing/porcelain distinction, but users with
> scripts that use diff-tree (or other plumbing) to generate user-visible
> output may unexpectedly lose their color, until the calling script is
> fixed to add back in a --color option[1].

I think it's better for the calling script to be fixed to use "git
diff", since it is producing output for the sake of the user instead
of for machine parsing.  That way, the script gets the benefit of
other changes like --decorate automatically.

So I don't see that as a regression.

Where I worry is about commands where the line between porcelain and
plumbing blur, like "git log --format=raw".  I actually still prefer
the approach where "color.ui=always" becomes impossible to express in
config and each command takes a --color option.

If we want to be extra fancy, we could make git take a --color option
instead of requiring each command to do it.

To support existing scripts, we could treat "-c color.ui=always" as a
historical synonym for --color=always, either temporarily or
indefinitely.  Making it clear that this is only there for historical
reasons would make it less likely that other options make the same
mistake in the future.

Thanks,
Jonathan


Re: [PATCH v3 03/10] protocol: introduce protocol extention mechanisms

2017-10-10 Thread Jonathan Nieder
Hi,

Brandon Williams wrote:

> Given some of this discussion though, maybe we want to change the
> semantics of 'protocol.version' so that both servers and clients respect
> it.  As of right now, these patches have the server always allow
> protocol v0 and v1?  Though that doesn't do much right now since v1 is
> the same as v0.

I strongly prefer not to do that.

If we want to make the advertised protocol versions on the server side
configurable, I think it should be independent from the configuration
for protocol versions to use on the client side.  Rationale:

 - As a client-side user, I may want to (after reporting the bug, of
   course!) blacklist certain protocol versions to work around server
   bugs.  But this should not affect my behavior as a server --- in
   my role as a server, these server-side bugs have no effect on me.

 - As a server operator, I may want to (after reporting the bug, of
   course!) blacklist certain protocol versions to work around client
   bugs.  But this should not affect my behavior as a client --- in my
   role as a client, these client-side bugs have no effect on me.

Making the client-side case configurable seems important since Git is
widely used in environments where it may not be easy to control the
deployed version (so having configuration as an escape hatch is
important).

Making the server-side case configurable seems less important since
Git server operators usually have tight control over the deployed Git
version and can apply a targetted fix or workaround.

> One other considerations that I should probably handle is that a client
> doesn't do any verification right now to ensure that the protocol
> version the server selects was indeed the protocol that the client
> requested.  Is that something you think we need to check for?

Do you mean in tests, or are you referring to something else?

Thanks,
Jonathan


Re: [PATCH] checkout doc: clarify command line args for "checkout paths" mode

2017-10-10 Thread Jonathan Nieder
Hi,

Junio C Hamano wrote:

[...]
> The source of the confusion is that -p(atch) is described as if it
> is just another "optional" part and its description is lumped
> together with the non patch mode, even though the actual end user
> experience is vastly different.

Makes sense.  This should have been done as part of b831deda
(2010-06-01), but better late than never.

Let's see how the patch goes...

[...]
>  Documentation/git-checkout.txt | 32 +---
>  1 file changed, 17 insertions(+), 15 deletions(-)
> 
> diff --git a/Documentation/git-checkout.txt b/Documentation/git-checkout.txt
> index d6399c0af8..8e77a9de49 100644
> --- a/Documentation/git-checkout.txt
> +++ b/Documentation/git-checkout.txt
> @@ -13,7 +13,8 @@ SYNOPSIS
>  'git checkout' [-q] [-f] [-m] [--detach] 
>  'git checkout' [-q] [-f] [-m] [[-b|-B|--orphan] ] []
>  'git checkout' [-f|--ours|--theirs|-m|--conflict=

Re: [PATCH] checkout doc: clarify command line args for "checkout paths" mode

2017-10-10 Thread Jonathan Nieder
Junio C Hamano wrote:

> here is another attempt, this time to avoid "Restore" and ...
>
>  Documentation/git-checkout.txt | 30 --
>  1 file changed, 16 insertions(+), 14 deletions(-)

Thanks.  I find this one easier to read indeed.

[...]
> --- a/Documentation/git-checkout.txt
> +++ b/Documentation/git-checkout.txt
> @@ -13,7 +13,8 @@ SYNOPSIS
[...]
> @@ -101,6 +95,14 @@ specific side of the merge can be checked out of the 
> index by
>  using `--ours` or `--theirs`.  With `-m`, changes made to the working tree
>  file can be discarded to re-create the original conflicted merge result.
>  
> +'git checkout' (-p|--patch) [] [--] [...]::
> + This is similar to the "check out paths to the working tree
> + from either the index or from a tree-ish" mode described
> + above, but lets you use the interactive interface to show
> + the "diff" output and choose which hunks to use in the
> + result.  See below for the descriptoin of `--patch` option.

nit: s/descriptoin/description/

With that tweak,
Reviewed-by: Jonathan Nieder 

Thanks.


Re: v2.15.0-rc1 test failure

2017-10-11 Thread Jonathan Nieder
Hi,

Adam Dinwoodie wrote:

> t0021.15 has PERL as a requirement, and I see semi-regular failures from
> Git tests that are Perl-based in one way or another (git-svn tests are
> the most common problems).  I've not spotted t0021 failing in that way,
> but it sounds like the same class of problem.
>
> I dig into these failures when I see them, mostly by running the script
> a few hundred times until I get the failure again, and they've always
> been Perl itself segfaulting.  That points to the problem being in
> Cygwin's Perl package rather than Git, and it's very unlikely to be
> anything that's got worse in v2.15.0.

That reminds me of https://bugs.debian.org/868738, which I tracked down
to perl's "die" helper using errno to determine the exit status instead
of deterministically using 128.  I wasn't able to track it down further
than that.

t/t0021/rot13-filter.pl doesn't have any similar suspect constructs, but
thought I should mention it anyway.

Thanks,
Jonathan


Re: Unable to use --patch with git add

2017-10-11 Thread Jonathan Nieder
Hi Ayush,

Ayush Goel wrote:

> Thank you for your mail. It works :)
>
> For future reference, is there a page for known issues of git?

We usually try not to have known issues that would require such a
warning for long.  And when we fail, reminders like yours are very
welcome, though a search through the mailing list archive for an
existing thread to reply to instead of starting a new one is always
welcome.

Sorry for the trouble.

Sincerely,
Jonathan


Re: git repack leaks disk space on ENOSPC

2017-10-11 Thread Jonathan Nieder
Hi Andreas,

Andreas Krey wrote:

> I observed (again) an annoying behavior of 'git repack':

Do you have context for this 'again'?  E.g. was this discussed
previously on-list?

> When the new pack file cannot be fully written because
> the disk gets full beforehand, the tmp_pack file isn't
> deleted, meaning the disk stays full:
>
>   $ df -h .; git repack -ad; df -h .; ls -lart .git/objects/pack/tmp*; rm 
> .git/objects/pack/tmp*; df -h .
>   FilesystemSize  Used Avail Use% Mounted on
>   /dev/mapper/vg02-localworkspaces  250G  245G  5.1G  98% /workspaces/calvin
>   Counting objects: 4715349, done.
>   Delta compression using up to 8 threads.
>   Compressing objects: 100% (978051/978051), done.
>   fatal: sha1 file '.git/objects/pack/tmp_pack_xB7DMt' write error: No space 
> left on device
>   FilesystemSize  Used Avail Use% Mounted on
>   /dev/mapper/vg02-localworkspaces  250G  250G   20K 100% /workspaces/calvin
>   -r--r--r-- 1 andrkrey users 5438435328 Oct 11 17:03 
> .git/objects/pack/tmp_pack_xB7DMt
>   rm: remove write-protected regular file 
> '.git/objects/pack/tmp_pack_xB7DMt'? y
>   FilesystemSize  Used Avail Use% Mounted on
>   /dev/mapper/vg02-localworkspaces  250G  245G  5.1G  98% /workspaces/calvin
>
> git version 2.15.0.rc0

I can imagine this behavior of retaining tmp_pack being useful for
debugging in some circumstances, but I agree with you that it is
certainly not a good default.

Chasing this down, I find:

  pack-write.c::create_tmp_packfile chooses the filename
  builtin/pack-objects.c::write_pack_file writes to it and the .bitmap, calling
  pack-write.c::finish_tmp_packfile to rename it into place

Nothing tries to install an atexit handler to do anything special to it
on exit.

The natural thing, I'd expect, would be for pack-write to use the
tempfile API (see tempfile.h) to create and finish the file.  That way,
we'd get such atexit handlers for free.  If we want a way to keep temp
files for debugging on abnormal exit, we could set that up separately as
a generic feature of the tempfile API (e.g. an envvar
GIT_KEEP_TEMPFILES_ON_FAILURE), making that an orthogonal topic.

Does using create_tempfile there seem like a good path forward to you?
Would you be interested in working on it (either writing a patch with
such a fix or a test in t/ to make sure it keeps working)?

Thanks,
Jonathan


Re: [PATCH] doc: emphasize stash "--keep-index" stashes staged content

2017-10-11 Thread Jonathan Nieder
Hi,

Robert P. J. Day wrote:

> It's not immediately obvious from the man page that the "--keep-index"
> option still adds the staged content to the stash, so make that
> abundantly clear.
>
> Signed-off-by: Robert P. J. Day 
> ---
>
> diff --git a/Documentation/git-stash.txt b/Documentation/git-stash.txt
> index 00f95fee1..037144037 100644
> --- a/Documentation/git-stash.txt
> +++ b/Documentation/git-stash.txt
> @@ -68,8 +68,8 @@ entries and working tree files are then rolled back to the 
> state in
>  HEAD only for these files, too, leaving files that do not match the
>  pathspec intact.
>  +
> -If the `--keep-index` option is used, all changes already added to the
> -index are left intact.
> +If the `--keep-index` option is used, all changes already staged in the
> +index are left intact in the index, while still being added to the stash.

Aside from Junio's note about "in the index" vs "in the working tree":

The "Testing partial commits" item in the EXAMPLES section explains
what --keep-index is useful for.  I wonder if some allusion to that
would make the explanation in the OPTIONS section easier to
understand.

Something that I end up still curious about when reading this
description is what will happen when I "git stash pop".  Will it apply
only the changes that were stashed away and removed from the working
tree, or will it apply the changes that were kept in the index, too?
If the latter, why?  Is there some way I can turn that behavior off?

E.g. in the "Testing partial commits" example, it seems like the
natural behavior for "git stash pop" would be just restore the changes
that were removed from the working tree.  That would also match an
assumption of save/push and pop being symmetrical ('inverse
operations').

Is this related to "git stash pop --index"?  I notice that the
EXAMPLES section doesn't give any examples of that option.

Thanks,
Jonathan


Re: [PATCH 2/2] color: discourage use of ui.color=always

2017-10-11 Thread Jonathan Nieder
Junio C Hamano wrote:

> Warn when we read such a configuration from a file, and nudge the
> users to spell them 'auto' instead.
>
> Signed-off-by: Junio C Hamano 
> ---
>  Documentation/config.txt | 2 +-
>  color.c  | 7 +++
>  2 files changed, 8 insertions(+), 1 deletion(-)

This warning only kicks in when `always` is being silently downgraded
to `auto`.  It makes sense to me.

Reviewed-by: Jonathan Nieder 


Re: [PATCH 1/2] color: downgrade "always" to "auto" only for on-disk configuration

2017-10-11 Thread Jonathan Nieder
Junio C Hamano wrote:

[...]
> --- a/color.c
> +++ b/color.c
> @@ -307,8 +307,21 @@ int git_config_colorbool(const char *var, const char 
> *value)
>   if (value) {
>   if (!strcasecmp(value, "never"))
>   return 0;
> - if (!strcasecmp(value, "always"))
> - return var ? GIT_COLOR_AUTO : 1;
> + if (!strcasecmp(value, "always")) {
> + /*
> +  * Command-line options always respect "always".
> +  * Likewise for "-c" config on the command-line.
> +  */
> + if (!var ||
> + current_config_scope() == CONFIG_SCOPE_CMDLINE)
> + return 1;
> +
> + /*
> +  * Otherwise, we're looking at on-disk config;
> +  * downgrade to auto.
> +  */
> + return GIT_COLOR_AUTO;
> + }

Yes, this looks good to me.

Should we document this special case treatment of color.* in -c
somewhere?  E.g.

Signed-off-by: Jonathan Nieder 

diff --git i/Documentation/config.txt w/Documentation/config.txt
index 13ce76d48b..d7bd6b169c 100644
--- i/Documentation/config.txt
+++ w/Documentation/config.txt
@@ -1067,11 +1067,15 @@ clean.requireForce::
-i or -n.   Defaults to true.
 
 color.branch::
-   A boolean to enable/disable color in the output of
-   linkgit:git-branch[1]. May be set to `false` (or `never`) to
-   disable color entirely, `auto` (or `true` or `always`) in which
-   case colors are used only when the output is to a terminal.  If
-   unset, then the value of `color.ui` is used (`auto` by default).
+   When to use color in the output of linkgit:git-branch[1].
+   May be set to `never` (or `false`) to disable color entirely,
+   or `auto` (or `true`) in which case colors are used only when
+   the output is to a terminal.  If unset, then the value of
+   `color.ui` is used (`auto` by default).
++
+The value `always` is a historical synonym for `auto`, except when
+passed on the command line using `-c`, where it is a historical synonym
+for `--color=always`.
 
 color.branch.::
Use customized color for branch coloration. `` is one of
@@ -1084,10 +1088,13 @@ color.diff::
Whether to use ANSI escape sequences to add color to patches.
If this is set to `true` or `auto`, linkgit:git-diff[1],
linkgit:git-log[1], and linkgit:git-show[1] will use color
-   when output is to the terminal. The value `always` is a
-   historical synonym for `auto`.  If unset, then the value of
+   when output is to the terminal. If unset, then the value of
`color.ui` is used (`auto` by default).
 +
+The value `always` is a historical synonym for `auto`, except when
+passed on the command line using `-c`, where it is a historical
+synonym for `--color=always`.
++
 This does not affect linkgit:git-format-patch[1] or the
 'git-diff-{asterisk}' plumbing commands.  Can be overridden on the
 command line with the `--color[=]` option.
@@ -1118,10 +1125,14 @@ color.decorate.::
branches, remote-tracking branches, tags, stash and HEAD, respectively.
 
 color.grep::
-   When set to `always`, always highlight matches.  When `false` (or
+   When to highlight matches using color. When `false` (or
`never`), never.  When set to `true` or `auto`, use color only
when the output is written to the terminal.  If unset, then the
value of `color.ui` is used (`auto` by default).
++
+The value `always` is a historical synonym for `auto`, except when
+passed on the command line using `-c`, where it is a historical synonym
+for `--color=always`.
 
 color.grep.::
Use customized color for grep colorization.  `` specifies which
@@ -1153,9 +1164,11 @@ color.interactive::
When set to `true` or `auto`, use colors for interactive prompts
and displays (such as those used by "git-add --interactive" and
"git-clean --interactive") when the output is to the terminal.
-   When false (or `never`), never show colors. The value `always`
-   is a historical synonym for `auto`.  If unset, then the value of
-   `color.ui` is used (`auto` by default).
+   When false (or `never`), never show colors.
++
+The value `always` is a historical synonym for `auto`, except when
+passed on the command line using `-c`, where it means to use color
+regardless of whether output is to the terminal.
 
 color.interactive.::
Use customized color for 'git add --interactive' and 'git clean
@@ -1168,18 +1181,24 @@ color.pager::
use (default is true).
 
 color.showBranch::
-   A boolean to enable/disable color in the ou

Re: [PATCH 1/2] color: downgrade "always" to "auto" only for on-disk configuration

2017-10-11 Thread Jonathan Nieder
Junio C Hamano wrote:
> Jonathan Nieder  writes:

>> Should we document this special case treatment of color.* in -c
>> somewhere?  E.g.
>
> Perhaps, although I'd save that until we actually add the new option
> to "git" potty, which hasn't happened yet, if I were thinking about
> adding some text like that.  Also I'd call that --default-color=always
> or something like that, to avoid having to answer: what are the
> differences between these two --color thing in the following?
>
> git --color=foo cmd --color=bar

I agree that the color.status text in the example doc is unfortunate.
But the surprising thing I found when writing that doc is that
color.status ("git status", "git commit --dry-run") and
color.interactive are the only items that needed it (aside from
color.ui that needed it for those two).  All the other commands that
use color already accept

git cmd --color=bar

color.interactive applies to multiple commands (e.g. "git clean"), so
it would take a little more chasing down to make them all use
OPT__COLOR.

Heading off to sleep, can look more tomorrow.

I don't think we can get around documenting this -c special case
behavior, though.

diff --git i/builtin/commit.c w/builtin/commit.c
index d75b3805ea..fc5b7cd538 100644
--- i/builtin/commit.c
+++ w/builtin/commit.c
@@ -1345,6 +1345,7 @@ int cmd_status(int argc, const char **argv, const char 
*prefix)
struct object_id oid;
static struct option builtin_status_options[] = {
OPT__VERBOSE(&verbose, N_("be verbose")),
+   OPT__COLOR(&s.use_color, N_("use color")),
OPT_SET_INT('s', "short", &status_format,
N_("show status concisely"), STATUS_FORMAT_SHORT),
OPT_BOOL('b', "branch", &s.show_branch,
@@ -1595,6 +1596,7 @@ int cmd_commit(int argc, const char **argv, const char 
*prefix)
static struct option builtin_commit_options[] = {
OPT__QUIET(&quiet, N_("suppress summary after successful 
commit")),
OPT__VERBOSE(&verbose, N_("show diff in commit message 
template")),
+   OPT__COLOR(&s.use_color, N_("use color")),
 
OPT_GROUP(N_("Commit message options")),
OPT_FILENAME('F', "file", &logfile, N_("read message from 
file")),


Re: [PATCH] check-ref-format: require a repository for --branch

2017-10-16 Thread Jonathan Nieder
Jeff King wrote:
> On Mon, Oct 16, 2017 at 07:45:46PM +0900, Junio C Hamano wrote:

>> Here is to illustrate what I mean in a patch form.  It resurrects
>> the gentle setup, and uses a purely textual format check when we are
>> outside the repository, while bypassing the @{magic} interpolation
>> codepath that requires us to be in a repository.  When we are in a
>> repository, we operate the same way as before.
>
> I like the state this puts us in, but there's one catch: we're
> completely changing the meaning of "check-ref-format --branch", aren't
> we?
>
> It is going from "this is how you resolve @{-1}" to "this is how you
> check the validity of a potential branch name". Do we need to pick a
> different name, and/or have a deprecation period?

Sorry to take so long on picking this up.  I'll try to make an
alternate patch today.

For what it's worth, I don't agree with this repurposing of
"check-ref-format --branch" at all.  The old command already existed.
No one asked for the new command.  At most, we could get rid of the
old command after a deprecation period.  I don't understand at all why
it's worth the confusion of changing its meaning.

Thanks,
Jonathan


Re: [PATCH] check-ref-format: require a repository for --branch

2017-10-16 Thread Jonathan Nieder
Hi,

Junio C Hamano wrote:

> Things like @{-1} would not make any sense when the command is run
> outside a repository, and the documentation is quite clear that it
> is the primary reason why we added "--branch" option to the command,
> i.e.
>
> With the `--branch` option, it expands the ``previous branch syntax''
> `@{-n}`.  For example, `@{-1}` is a way to refer the last branch you
> were on.  This option should be used by porcelains to accept this
> syntax anywhere a branch name is expected, so they can act as if you
> typed the branch name.
>
> So I am tempted to take this patch to make sure that we won't gain
> more people who abuse the command outside a repository.

That seems very sensible on its face.  My only worry is that a script
that can be run both inside and outside a repository and does

branch=$(git check-ref-format --branch "$user_supplied_branch_arg")

currently works with user_supplied_branch_arg='master' and would stop
working.  If we have reason to believe that no such scripts exist,
then this would be a good way to go, but I don't believe we can count
on that.

And in that spirit, I think the patch you replied with aims to go in
the right direction, by providing the core functionality when in a
repository while avoiding breaking such a script outside of one
(though I do not understand it fully yet).

> Having said that, there may still be a use case where a Porcelain
> script wants a way to see if a $name it has is appropriate as a
> branch name before it has a repository

This seems like a different goal than "git check-ref-format --branch"
was originally designed to fulfill (even though it fits well with the
check-ref-format name and coincides with --branch behavior when in a
repository).  I think it's fine for us not to fulfill it.

>(e.g. a wrapper to "git
> clone" that wants to verify the name it is going to give to the "-b"
> option), and a check desired in such a context is different from
> (and is stricter than) feeding refs/heads/$name to the same command
> without the "--branch" option.

Can you say more about this example?  E.g. why is this hypothetical
wrapper unable to rely on "git clone -b"'s own error handling?

> So I think the right endgame in the longer term is:
>
>  - Find (or add if it doesn't exist) a way to recommend to Porcelain
>scripts to use to expand an end-user generated string, and to map
>it to a branch name (it may be "rev-parse --symbolic-full-name
>$name"; I dunno).

--symbolic-full-name seems like a good fit.  Do you remember why
check-ref-format was introduced instead?  Was it just a matter of
implementation simplicity, since --symbolic-full-name can handle a
broader class of revision specifications like --remotes?  The commit
message to v1.6.3-rc0~29^2~4 (give Porcelain a way to grok branch
shorthand, 2009-03-21) is appropriately apologetic but doesn't give
more clues.

>  - Keep check-ref-format as (or revert it to be) a tool to "check".
>This would involve split strbuf_check_branch_ref() into two:

Without an example of where this tool would be used, if we consider
"check-ref-format --branch" to be a mistake then I'd rather deprecate
it with a goal of removing it completely.

Ok, time to look in more detail.

Thanks for your thoughtfulness,
Jonathan


Re: [PATCH 1/2] color: downgrade "always" to "auto" only for on-disk configuration

2017-10-16 Thread Jonathan Nieder
Hi,

Junio C Hamano wrote:
> Jeff King  writes:
>> On Sat, Oct 14, 2017 at 12:01:46PM +0900, Junio C Hamano wrote:

>>> True.  Let's see what others think.  I know Jonathan is running
>>> the fork at $work with "downgrade always to auto" patches, and while
>>> I think both approaches would probably work well in practice, I have
>>> preference for this "harder but right" approach, so I'd want to see
>>> different views discussed on the list before we decide.
>>
>> After pondering over it, I have a slight preference for that, too. But
>> I'm also happy to hear more input.
>
> OK, so it seems we both have slight preference for the "peel back"
> approach.  Adding Jonathan to Cc:

Which approach is "harder but right" / "peel back"?

I agree with the goal of making color.ui=always a synonym for auto in
file-based config.  Peff found some problems with the warning patch
(scripted commands produce too many warnings), which are not an issue
for $dayjob but may be for upstream, so I see the value of holding off
on the warning for now.

I'm also fine with "revert the proximate cause of the latest
complaints" as a stepping stone toward making color.ui=always a
synonym for auto in file-based config in a later release.

Another issue is diff-files paying attention to this configuration.
If I'm reading Documentation/config.txt correctly, that was simply a
bug.  diff-files and diff-index are never supposed to use color,
regardless of configuration.

I'm fine with "revert the proximate cause of the latest complaints" as
a stepping stone toward fixing that, too. :)

Sorry I don't have more detailed advice.  I was planning to look more
closely at how these features evolved over time and haven't had enough
time for it yet.

Jonathan


[PATCH 0/3] Re: [PATCH] check-ref-format: require a repository for --branch

2017-10-17 Thread Jonathan Nieder
Hi,

Junio C Hamano wrote:

> Subject: [PATCH] check-ref-format: --branch cannot grok @{-1} outside a 
> repository

How about this?  It is written to be more conservative than the patch
I am replying to, but except for the commit message, it should be
pretty much equivalent.

[...]
> --- a/builtin/check-ref-format.c
> +++ b/builtin/check-ref-format.c
> @@ -38,13 +38,22 @@ static char *collapse_slashes(const char *refname)
>  
>  static int check_ref_format_branch(const char *arg)
>  {
> + int nongit, malformed;
>   struct strbuf sb = STRBUF_INIT;
> - int nongit;
> + const char *name = arg;
>  
>   setup_git_directory_gently(&nongit);
> - if (strbuf_check_branch_ref(&sb, arg))
> +
> + if (!nongit)
> + malformed = (strbuf_check_branch_ref(&sb, arg) ||
> +  !skip_prefix(sb.buf, "refs/heads/", &name));
> + else
> + malformed = check_branch_ref_format(arg);

Handles the nongit case in strbuf_check_branch_ref instead of
introducing a new check_branch_ref_format helper.

[...]
> --- a/cache.h
> +++ b/cache.h
> @@ -1444,6 +1444,20 @@ extern int parse_oid_hex(const char *hex, struct 
> object_id *oid, const char **en
>  #define INTERPRET_BRANCH_HEAD (1<<2)
>  extern int interpret_branch_name(const char *str, int len, struct strbuf *,
>unsigned allowed);
> +
> +/*
> + * NEEDSWORK: declare strbuf_branchname() and strbuf_check_branch_ref()
> + * here, not in strbuf.h
> + */

As a result, it doesn't touch headers.  I agree that these functions
don't belong in strbuf.h (sorry for not updating the headers at the
same time I moved their implementations) but suspect e.g. branch.h,
revision.h, or some new header like revision-syntax.h would be a
better place.

[...]
> --- a/strbuf.h
> +++ b/strbuf.h
> @@ -568,6 +568,12 @@ static inline void strbuf_complete_line(struct strbuf 
> *sb)
>   strbuf_complete(sb, '\n');
>  }
>  
> +/*
> + * NEEDSWORK: the following two functions should not be in this file;
> + * these are about refnames, and should be declared next to
> + * interpret_branch_name() in cache.h
> + */

Didn't touch headers.

[...]
> --- a/t/t1402-check-ref-format.sh
> +++ b/t/t1402-check-ref-format.sh
> @@ -161,6 +161,18 @@ test_expect_success 'check-ref-format --branch from 
> subdir' '
>   test "$refname" = "$sha1"
>  '
>  
> +test_expect_success 'check-ref-format --branch @{-1} from non-repo' '
> + test_must_fail nongit git check-ref-format --branch @{-1}
> +'

Swapped test_must_fail and nongit to match existing tests.

Junio C Hamano (3):
  check-ref-format --branch: do not expand @{...} outside repository
  check-ref-format --branch: strip refs/heads/ using skip_prefix
  check-ref-format doc: --branch validates and expands 

 Documentation/git-check-ref-format.txt |  9 -
 builtin/check-ref-format.c |  6 --
 sha1_name.c|  5 -
 t/t1402-check-ref-format.sh| 16 
 4 files changed, 32 insertions(+), 4 deletions(-)


[PATCH 1/3] check-ref-format --branch: do not expand @{...} outside repository

2017-10-17 Thread Jonathan Nieder
From: Junio C Hamano 

Running "git check-ref-format --branch @{-1}" from outside any
repository produces

$ git check-ref-format --branch @{-1}
BUG: environment.c:182: git environment hasn't been setup

This is because the expansion of @{-1} must come from the HEAD reflog,
which involves opening the repository.  @{u} and @{push} (which are
more unusual because they typically would not expand to a local
branch) trigger the same assertion.

This has been broken since day one.  Before v2.13.0-rc0~48^2
(setup_git_env: avoid blind fall-back to ".git", 2016-10-02), the
breakage was more subtle: Git would read reflogs from ".git" within
the current directory even if it was not a valid repository.  Usually
that is harmless because Git is not being run from the root directory
of an invalid repository, but in edge cases such accesses can be
confusing or harmful.  Since v2.13.0, the problem is easier to
diagnose because Git aborts with a BUG message.

Erroring out is the right behavior: when asked to interpret a branch
name like "@{-1}", there is no reasonable answer in this context.
But we should print a message saying so instead of an assertion failure.

We do not forbid "check-ref-format --branch" from outside a repository
altogether because it is ok for a script to pre-process branch
arguments without @{...} in such a context.  For example, with
pre-2.13 Git, a script that does

branch='master'; # default value
parse_options
branch=$(git check-ref-format --branch "$branch")

to normalize an optional branch name provided by the user would work
both inside a repository (where the user could provide '@{-1}') and
outside (where '@{-1}' should not be accepted).

So disable the "expand @{...}" half of the feature when run outside a
repository, but keep the check of the syntax of a proposed branch
name. This way, when run from outside a repository, "git
check-ref-format --branch @{-1}" will gracefully fail:

$ git check-ref-format --branch @{-1}
fatal: '@{-1}' is not a valid branch name

and "git check-ref-format --branch master" will succeed as before:

$ git check-ref-format --branch master
master

restoring the usual pre-2.13 behavior.

[jn: split out from a larger patch; moved conditional to
 strbuf_check_branch_ref instead of its caller; fleshed out commit
 message; some style tweaks in tests]

Reported-by: Marko Kungla 
Helped-by: Jeff King 
Signed-off-by: Junio C Hamano 
Signed-off-by: Jonathan Nieder 
---
 sha1_name.c |  5 -
 t/t1402-check-ref-format.sh | 16 
 2 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/sha1_name.c b/sha1_name.c
index c7c5ab376c..603e667faa 100644
--- a/sha1_name.c
+++ b/sha1_name.c
@@ -1331,7 +1331,10 @@ void strbuf_branchname(struct strbuf *sb, const char 
*name, unsigned allowed)
 
 int strbuf_check_branch_ref(struct strbuf *sb, const char *name)
 {
-   strbuf_branchname(sb, name, INTERPRET_BRANCH_LOCAL);
+   if (startup_info->have_repository)
+   strbuf_branchname(sb, name, INTERPRET_BRANCH_LOCAL);
+   else
+   strbuf_addstr(sb, name);
if (name[0] == '-')
return -1;
strbuf_splice(sb, 0, 0, "refs/heads/", 11);
diff --git a/t/t1402-check-ref-format.sh b/t/t1402-check-ref-format.sh
index 0790edf60d..98e4a8613b 100755
--- a/t/t1402-check-ref-format.sh
+++ b/t/t1402-check-ref-format.sh
@@ -144,6 +144,11 @@ test_expect_success "check-ref-format --branch @{-1}" '
refname2=$(git check-ref-format --branch @{-2}) &&
test "$refname2" = master'
 
+test_expect_success 'check-ref-format --branch -naster' '
+   test_must_fail git check-ref-format --branch -naster >actual &&
+   test_must_be_empty actual
+'
+
 test_expect_success 'check-ref-format --branch from subdir' '
mkdir subdir &&
 
@@ -161,6 +166,17 @@ test_expect_success 'check-ref-format --branch from 
subdir' '
test "$refname" = "$sha1"
 '
 
+test_expect_success 'check-ref-format --branch @{-1} from non-repo' '
+   nongit test_must_fail git check-ref-format --branch @{-1} >actual &&
+   test_must_be_empty actual
+'
+
+test_expect_success 'check-ref-format --branch master from non-repo' '
+   echo master >expect &&
+   nongit git check-ref-format --branch master >actual &&
+   test_cmp expect actual
+'
+
 valid_ref_normalized() {
prereq=
case $1 in
-- 
2.15.0.rc1.287.g2b38de12cc



[PATCH 2/3] check-ref-format --branch: strip refs/heads/ using skip_prefix

2017-10-17 Thread Jonathan Nieder
From: Junio C Hamano 

The expansion returned from strbuf_check_branch_ref always starts with
"refs/heads/" by construction, but there is nothing about its name or
advertised API making that obvious.  This command is used to process
human-supplied input from the command line and is usually not the
inner loop, so we can spare some cycles to be more defensive.  Instead
of hard-coding the offset strlen("refs/heads/") to skip, verify that
the expansion actually starts with refs/heads/.

[jn: split out from a larger patch, added explanation]

Signed-off-by: Junio C Hamano 
Signed-off-by: Jonathan Nieder 
---
 builtin/check-ref-format.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/builtin/check-ref-format.c b/builtin/check-ref-format.c
index 6c40ff110b..bc67d3f0a8 100644
--- a/builtin/check-ref-format.c
+++ b/builtin/check-ref-format.c
@@ -39,12 +39,14 @@ static char *collapse_slashes(const char *refname)
 static int check_ref_format_branch(const char *arg)
 {
struct strbuf sb = STRBUF_INIT;
+   const char *name;
int nongit;
 
setup_git_directory_gently(&nongit);
-   if (strbuf_check_branch_ref(&sb, arg))
+   if (strbuf_check_branch_ref(&sb, arg) ||
+   !skip_prefix(sb.buf, "refs/heads/", &name))
die("'%s' is not a valid branch name", arg);
-   printf("%s\n", sb.buf + 11);
+   printf("%s\n", name);
strbuf_release(&sb);
return 0;
 }
-- 
2.15.0.rc1.287.g2b38de12cc



[PATCH 3/3] check-ref-format doc: --branch validates and expands

2017-10-17 Thread Jonathan Nieder
From: Junio C Hamano 

"git check-ref-format --branch $name" feature was originally
introduced (and was advertised) as a way for scripts to take any
end-user supplied string (like "master", "@{-1}" etc.) and see if it
is usable when Git expects to see a branch name, and also obtain the
concrete branch name that the at-mark magic expands to.

Emphasize that "see if it is usable" role in the description and
clarify that the @{...} expansion only occurs when run from within a
repository.

[jn: split out from a larger patch]

Helped-by: Jeff King 
Signed-off-by: Junio C Hamano 
---
 Documentation/git-check-ref-format.txt | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/Documentation/git-check-ref-format.txt 
b/Documentation/git-check-ref-format.txt
index 92777cef25..cf0a0b7df2 100644
--- a/Documentation/git-check-ref-format.txt
+++ b/Documentation/git-check-ref-format.txt
@@ -77,7 +77,14 @@ reference name expressions (see linkgit:gitrevisions[7]):
 
 . at-open-brace `@{` is used as a notation to access a reflog entry.
 
-With the `--branch` option, it expands the ``previous branch syntax''
+With the `--branch` option, the command takes a name and checks if
+it can be used as a valid branch name (e.g. when creating a new
+branch).  The rule `git check-ref-format --branch $name` implements
+may be stricter than what `git check-ref-format refs/heads/$name`
+says (e.g. a dash may appear at the beginning of a ref component,
+but it is explicitly forbidden at the beginning of a branch name).
+When run with `--branch` option in a repository, the input is first
+expanded for the ``previous branch syntax''
 `@{-n}`.  For example, `@{-1}` is a way to refer the last branch you
 were on.  This option should be used by porcelains to accept this
 syntax anywhere a branch name is expected, so they can act as if you
-- 
2.15.0.rc1.287.g2b38de12cc



Re: [PATCH 0/3] Re: [PATCH] check-ref-format: require a repository for --branch

2017-10-17 Thread Jonathan Nieder
Junio C Hamano wrote:
> Jonathan Nieder  writes:

>> Handles the nongit case in strbuf_check_branch_ref instead of
>> introducing a new check_branch_ref_format helper.
>
> I view that as a regression, actually.  Don't we want a function
> that does not require a strbuf when asking a simple question: "I
> have a string, and I want to see if that is a valid name"?

*shrug* I found the change easier to read, and it also sidesteps the
which-header question.  It also ensures that other
strbuf_check_branch_ref callers are safe without having to audit them.

But feel free to tweak that back if you like, or I can tomorrow.

Jonathan


<    8   9   10   11   12   13   14   15   16   17   >