Re: [PATCH 2/2] tests: fix diff order arguments in test_cmp

2017-10-06 Thread Jonathan Nieder
Stefan Beller wrote:

> Fix the argument order for test_cmp. When given the expected
> result first the diff shows the actual output with '+' and the
> expectation with '-', which is the convention for our tests.
>
> Signed-off-by: Stefan Beller 
> ---

Yes, this should make the output from failing tests easier to take in
at a glance.

Reviewed-by: Jonathan Nieder 

How did you find these?  E.g. is there a grep pattern that reviewers
can use to repeat your results?


Re: [PATCH 1/2] tests: use shell negation instead of test_must_fail for test_cmp

2017-10-06 Thread Jonathan Nieder
Hi,

Jeff King wrote:
> On Fri, Oct 06, 2017 at 12:00:05PM -0700, Stefan Beller wrote:

>> The `test_must_fail` should only be used to indicate a git command is
>> failing. `test_cmp` is not a git command, such that it doesn't need the
>> special treatment of `test_must_fail` (which e.g. includes checking for
>> segfault)
>
> Hmph. "test_must_fail test_cmp" is a weird thing for somebody to write.
> And your patch is obviously an improvement, but I have to wonder if some
> of these make any sense.

Just for the record: I agree with all the above, and my Reviewed-by
still stands.

Thanks for looking closer.  I wonder if there's a systematic way to
avoid this kind of weak test that can bitrot and still pass too easily
--- e.g. should we have a test_must_differ command with an explanation
of why it should usually be avoided?  Would a lint check that bans
this kind of construct completely be going too far?

Jonathan


Re: [PATCH 1/2] tests: use shell negation instead of test_must_fail for test_cmp

2017-10-06 Thread Jonathan Nieder
Stefan Beller wrote:

> The `test_must_fail` should only be used to indicate a git command is
> failing. `test_cmp` is not a git command, such that it doesn't need the
> special treatment of `test_must_fail` (which e.g. includes checking for
> segfault).
>
> Signed-off-by: Stefan Beller 
> ---
>  t/t3504-cherry-pick-rerere.sh | 2 +-
>  t/t5512-ls-remote.sh  | 2 +-
>  t/t5612-clone-refspec.sh  | 2 +-
>  t/t7508-status.sh | 2 +-
>  t/t9164-git-svn-dcommit-concurrent.sh | 2 +-
>  5 files changed, 5 insertions(+), 5 deletions(-)

Thanks.  I agree that this is more readable, and it matches the advice
in t/README:

   On the other hand, don't use test_must_fail for running regular
   platform commands; just use '! cmd'.  We are not in the business
   of verifying that the world given to us sanely works.

Reviewed-by: Jonathan Nieder 

I wonder if it would be useful to have a nod to that advice in the
docstring in t/test-lib-functions.sh, too.


Re: is there a truly compelling rationale for .git/info/exclude?

2017-10-06 Thread Jonathan Nieder
Hi,

Robert P. J. Day wrote:
> On Fri, 6 Oct 2017, Junio C Hamano wrote:

>> Don't waste time by seeking a "compelling" reason.  A mere "this is
>> the most expedite way to gain convenience" back when something was
>> introduced could be an answer, and it is way too late to complain
>> about such a choice anyway.
>
>   perfectly respectable answer ... it tells me that, between
> .gitignore files and core.excludesFile, there's not much left for
> .git/info/exclude to do, except in weird circumstances.

I use .git/info/exclude in what I don't consider to be weird
circumstances.

But I am not motivated to say more than that without knowing what my
answer is going to be used for.  E.g. is there a part of the
gitignore(5) man page where such an explanation would make it less
confusing and more useful?

Thanks,
Jonathan


Re: Git config multiple values

2017-10-06 Thread Jonathan Nieder
Hi,

Jeff King wrote:
> On Fri, Oct 06, 2017 at 01:10:17PM +0200, aleksander.baranowski wrote:

>> It's just an opinion, but this behaviour is no consistent for me.
>>
>> If it's not the bug it's a feature just let me know.
>
> It's a feature, though I agree that git-config is rather baroque. We're
> mostly stuck with it for reasons of backwards compatibility, though.

This feels like a dodge.  Can we make a list of what is baroque here,
with an eye to fixing it?  E.g. if we introduce a new --set option,
then what should its semantics be, to be more intuitive?

Thanks,
Jonathan


Re: [PATCH v2] api-argv-array.txt: Remove broken link to string-list API

2017-10-05 Thread Jonathan Nieder
Todd Zullinger wrote:

> In 4f665f2cf3 (string-list.h: move documentation from Documentation/api/
> into header, 2017-09-26) the string-list API documentation was moved to
> string-list.h.  The argv-array API documentation may follow a similar
> course in the future.  Until then, prevent the broken link from making
> it to the end-user documentation.
>
> Signed-off-by: Todd Zullinger 
> ---
>  Documentation/technical/api-argv-array.txt | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Reviewed-by: Jonathan Nieder 

Thanks for catching it.  Do you use a broken link detection tool to
detect this kind of issue automatically?

[...]
> --- a/Documentation/technical/api-argv-array.txt
> +++ b/Documentation/technical/api-argv-array.txt
> @@ -8,7 +8,7 @@ always NULL-terminated at the element pointed to by 
> `argv[argc]`. This
>  makes the result suitable for passing to functions expecting to receive
>  argv from main(), or the link:api-run-command.html[run-command API].
>  
> -The link:api-string-list.html[string-list API] is similar, but cannot be
> +The string-list API (documented in string-list.h) is similar, but cannot be
>  used for these purposes; instead of storing a straight string pointer,
>  it contains an item structure with a `util` field that is not compatible
>  with the traditional argv interface.


Re: [RFC] Reporting index properties

2017-10-05 Thread Jonathan Nieder
Alex Vandiver wrote:

> As part of gathering some automated performance statistics of git, it
> would be useful to be able to extract some vital properties of the
> index.  While `git update-index` has ways to _set_ the index version,
> and enable/disable various extensions, AFAICT there is no method by
> which one can ask for reporting about the current state of them --
> short of re-implementing all of the index-parsing logic external to
> Git, which is not terribly appealing.

Aside: git-update-index(1) says

Version 4 is relatively young (first released in 1.8.0 in
October 2012).

My first reaction is to wonder if it is time to introduce a config
option to use index format version 4 by default, since after 5 years
it is not relatively young any more.

My second reaction is to notice that the index.version configuration
already exists.  Probably git-update-index(1) ought to point to it.

JGit still doesn't support index format 4, so 4 is still not a good
default for `index.version` for a user that hasn't explicitly
configured it, but the moment JGit gains that support I think it will
be. :)

> I hesitate to propose adding a flag to `git update-index` which reads
> but does no updating, as that seems to make a misnomer of the
> command.  But this also doesn't seem worthy of a new top-level command.

The existing scripting command that inspects the index is
"git ls-files".

It doesn't go into this kind of low-level detail about the index
format, though.  In the same spirit, I don't think we have an existing
command in this spirit for analyzing packfiles, either.

So I agree with Junio that this would be a good debugging aid to put
in t/helper/ for now, and then once we've had experience with it we'd
end up in a better position to come up with a stable, public-facing
interface, if one is needed.

Thanks and hope that helps,
Jonathan


Re: couple questions about git "logical variables" and "git var"

2017-10-05 Thread Jonathan Nieder
Jeff King wrote:
> On Thu, Oct 05, 2017 at 05:11:04AM -0400, rpj...@crashcourse.ca wrote:

>>  - GIT_AUTHOR_IDENT
>>  - GIT_COMMITTER_IDENT
>>  - GIT_EDITOR
>>  - GIT_PAGER
>>
>> first question -- what is it about precisely those four variables that makes
>> them "logical" variables in git parlance? just those four? no others?
>
> It was introduced in the very early days as a way for scripts to get
> access to "standard" values that would be computed the same way as the C
> portions of Git.  But it hasn't generally been kept up to date with new
> possible variables.
>
> It also only tells half the story. You have to know not just what's in
> $GIT_EDITOR, but you have to know the right way to evaluate it. There's
> a git_editor helper in git-sh-setup, but other scripting languages are
> on their own.

I am not sure I understand the complaint here.  git-var(1) says:

GIT_EDITOR
   Text editor for use by Git commands. The value is meant to be
   interpreted by the shell when it is used. Examples: [...]

Are you saying that the output of the command should quote that
manpage, so as to tell the rest of the story?

>   We'd probably have done better to introduce a "git editor"
> command which can be run from any language.

I remember that we discussed this at the time but don't remember why
it didn't happen.  It seems like a good idea.

[...]
>> p.s. yes, i realize this command is deprecated in favour of "git config -l",
>> but as long as it's available, it should work as described in the man page.
>
> Yes, though I think fixing the manpage is the right way to make them
> consistent.

Agreed as well.  rday, care to take a stab at wording?

Thanks,
Jonathan


Re: [PATCH 0/6] Teach Status options around showing ignored files

2017-10-05 Thread Jonathan Nieder
Hi,

jameson.mille...@gmail.com wrote:

> This patch series is the second part of [1], which was split into 2
> parts. The first part, added an optimization in the directory listing
> logic to not scan the contents of ignored directories and was merged
> to master with commit 5aaa7fd3. This patch series includes the second
> part to expose additional arguments to the --ignored option on the
> status command.

Thanks.

> This patch series teaches the status command more options to control
> which ignored files are reported independently of the which untracked
[...]
> Our application (Visual Studio) has a specific set of requirements
> about how it wants untracked / ignored files reported by git status.
[...]
> The reason for controlling these behaviors separately is that there
> can be a significant performance impact to scanning the contents of
[]
> As a more concrete example, on Windows, Visual Studio creates a bin/
> and obj/ directory inside of the project where it writes all .obj and
[...]

I see this information is also in patch 1/6.  That's a very good
thing, since that makes performance numbers involved more concrete
about which patch brings them about and it becomes part of permanent
history that way --- thanks.

But it took me a while to notice, and before then, I was trying to
read through the cover letter to get an overview of which patches I am
supposed to look at.  For next time, could the cover letter say
something like "See patches 1 and 2 for more details about the
motivation" instead of repeating the commit message content?  That
would save reviewers some time.

Thanks,
Jonathan


Re: [PATCH] .mailmap: normalize name for René Scharfe

2017-10-05 Thread Jonathan Nieder
René Scharfe wrote:

> Reported-by: Johannes Schindelin 
> Reported-by: Stefan Beller 
> Signed-off-by: Rene Scharfe 
> ---
>  .mailmap | 1 +
>  1 file changed, 1 insertion(+)

A quick "git shortlog -nse" run confirms that this works.

Reviewed-by: Jonathan Nieder 
Thanks.

> diff --git a/.mailmap b/.mailmap
> index ab85e0d16d..224db83887 100644
> --- a/.mailmap
> +++ b/.mailmap
> @@ -194,6 +194,7 @@ Philippe Bruhat 
>  Ralf Thielow  
>  Ramsay Jones  
>  René Scharfe  
> +René Scharfe  Rene Scharfe
>  Richard Hansen  
>  Richard Hansen  
>  Robert Fitzsimons 


Re: [PATCH v2] ref-filter.c: pass empty-string as NULL to atom parsers

2017-10-04 Thread Jonathan Nieder
Junio C Hamano wrote:

> From: Taylor Blau 
> Date: Mon, 2 Oct 2017 09:10:34 -0700
> Subject: [PATCH] ref-filter.c: pass empty-string as NULL to atom parsers
>
> Peff points out that different atom parsers handle the empty
> "sub-argument" list differently. An example of this is the format
> "%(refname:)".
>
> Since callers often use `string_list_split` (which splits the empty
> string with any delimiter as a 1-ary string_list containing the empty
> string), this makes handling empty sub-argument strings non-ergonomic.
>
> Let's fix this by declaring that atom parser implementations must
> not care about distinguishing between the empty string "%(refname:)"
> and no sub-arguments "%(refname)".  Current code aborts, either with
> "unrecognised arg" (e.g. "refname:") or "does not take args"
> (e.g. "body:") as an error message.
>
> Signed-off-by: Taylor Blau 
> Reviewed-by: Jeff King 
> Reviewed-by: Jonathan Nieder 
> Signed-off-by: Junio C Hamano 
> ---
>  ref-filter.c| 10 +++++-
>  t/t6300-for-each-ref.sh |  1 +
>  2 files changed, 10 insertions(+), 1 deletion(-)

Thanks for taking care of it.  This is indeed still
Reviewed-by: Jonathan Nieder 


Re: Line ending normalization doesn't work as expected

2017-10-04 Thread Jonathan Nieder
Junio C Hamano wrote:
> Jonathan Nieder  writes:

>>  git checkout --renormalize .
>>  git status; # Show files that will be normalized
>>  git commit; # Commit the result
>>
>> What do you think?  Would you be interested in writing a patch for it?
>> ("No" is as always an acceptable answer.)
>
> I actually think what is being requested is the opposite, i.e. "the
> object registered in the index have wrong line endings, and the
> safe-crlf is getting in the way to prevent me from correcting by
> hashing the working tree contents again to register contents with
> corrected line endings, even with 'git add .'".
>
> So I would understand if your suggestion were for
>
>   git checkin --renormalize .
>
> but not "git checkout".  And it probably is more familiar to lay
> people if we spelled that as "git add --renormalize ." ;-)

Good catch.  You understood correctly --- "git add --renormalize" is
the feature that I think is being hinted at here.

Thanks,
Jonathan


Re: [PATCH 1/3] path.c: fix uninitialized memory access

2017-10-04 Thread Jonathan Nieder
Junio C Hamano wrote:

> From: Jeff King 
> Date: Tue, 3 Oct 2017 19:30:40 -0400
> Subject: [PATCH] path.c: fix uninitialized memory access
> 
> In cleanup_path we're passing in a char array, run a memcmp on it, and
> run through it without ever checking if something is in the array in the
> first place.  This can lead us to access uninitialized memory, for
> example in t5541-http-push-smart.sh test 7, when run under valgrind:
>
> ==4423== Conditional jump or move depends on uninitialised value(s)
> ==4423==at 0x242FA9: cleanup_path (path.c:35)
[...]
> ==4423==  Uninitialised value was created by a heap allocation
[...]
> ==4423==by 0x29A30B: strbuf_grow (strbuf.c:66)
> ==4423==by 0x29A30B: strbuf_vaddf (strbuf.c:277)
> ==4423==by 0x242F9F: mkpath (path.c:454)
[...]
> Avoid this by using skip_prefix(), which knows not to go beyond the
> end of the string.
>
> Reported-by: Thomas Gummerer 
> Signed-off-by: Jeff King 
> Reviewed-by: Jonathan Nieder 

This is indeed
Reviewed-by: Jonathan Nieder 

Thanks.


Re: disable interactive prompting

2017-10-04 Thread Jonathan Nieder
Hi Ernesto,

Ernesto Alfonso wrote:

> Waiting for git-push synchronously slows me down, so I have a bash
> alias/function to do this in the background. But when my origin is https, I
> get an undesired interactive prompt. I've tried to disable by
> redirecting stdin:
>
> git push ${REMOTE} ${BRANCH} &>/dev/null 
> but I still get an interactive prompt.
>
> Is there a way to either
>
> 1. disable interactive prompting
> 2. programmatically determine whether a git command (or at least a git
> push) would interactively prompt

You left out an important detail: what does the interactive prompt in
question say?

The general question is also interesting, but seeing the particular
prompt would make it easy to look into the specific case at the same
time.

Thanks,
Jonathan


Re: Line ending normalization doesn't work as expected

2017-10-04 Thread Jonathan Nieder
Hi Robert,

Robert Dailey wrote:

> You guys are obviously worlds ahead of me on the internals of things,
> but from my perspective I like to avoid the "plumbing" commands as
> much as I can.

I suspect what we are dancing around is the need for some command like

git checkout --renormalize .

which would shorten the sequence to

git checkout --renormalize .
git status; # Show files that will be normalized
git commit; # Commit the result

What do you think?  Would you be interested in writing a patch for it?
("No" is as always an acceptable answer.)

Thanks,
Jonathan


[PATCH v2] strbuf doc: reuse after strbuf_release is fine

2017-10-03 Thread Jonathan Nieder
strbuf_release leaves the strbuf in a valid, initialized state, so
there is no need to call strbuf_init after it.

Moreover, this is not likely to change in the future: strbuf_release
leaving the strbuf in a valid state has been easy to maintain and has
been very helpful for Git's robustness and simplicity (e.g.,
preventing use-after-free vulnerabilities).

Document the semantics so the next generation of Git developers can
become familiar with them without reading the implementation.  It is
still not advisable to call strbuf_release too often because it is
wasteful, so add a note pointing to strbuf_reset for that.

The same semantics apply to strbuf_detach.  Add a similar note to its
docstring to make that clear.

Improved-by: Jeff King 
Signed-off-by: Jonathan Nieder 
---
Jeff King wrote:

> I think it's actually OK to use the string buffer after this function.
> It's just an empty string.
>
> Perhaps we should be more explicit: this releases any resources and
> resets to a pristine, empty state. I suspect strbuf_detach() probably
> should make the same claim.

Like this?

Thanks,
Jonathan

 strbuf.h | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/strbuf.h b/strbuf.h
index 7496cb8ec5..249df86711 100644
--- a/strbuf.h
+++ b/strbuf.h
@@ -82,8 +82,12 @@ extern char strbuf_slopbuf[];
 extern void strbuf_init(struct strbuf *, size_t);
 
 /**
- * Release a string buffer and the memory it used. You should not use the
- * string buffer after using this function, unless you initialize it again.
+ * Release a string buffer and the memory it used. After this call, the
+ * strbuf points to an empty string that does not need to be free()ed, as
+ * if it had been set to `STRBUF_INIT` and never modified.
+ *
+ * To clear a strbuf in preparation for further use without the overhead
+ * of free()ing and malloc()ing again, use strbuf_reset() instead.
  */
 extern void strbuf_release(struct strbuf *);
 
@@ -91,6 +95,9 @@ extern void strbuf_release(struct strbuf *);
  * Detach the string from the strbuf and returns it; you now own the
  * storage the string occupies and it is your responsibility from then on
  * to release it with `free(3)` when you are done with it.
+ *
+ * The strbuf that previously held the string is reset to `STRBUF_INIT` so
+ * it can be reused after calling this function.
  */
 extern char *strbuf_detach(struct strbuf *, size_t *);
 
-- 
2.14.2.920.gcf0c67979c



Re: Git for Windows: mingw.c's strange usage of creation time as ctime?

2017-10-03 Thread Jonathan Nieder
+git-for-windows
Hi Marc,

Marc Strapetz wrote:

> compat/mingw.c assigns the Windows file creation time [1] to Git's
> ctime fields at line 591 and at line 705:
>
> buf->st_ctime = filetime_to_time_t(&(fdata.ftCreationTime));
>
> ftCreationTime > ftLastWriteTime is actually possible after copying
> a file, so it makes sense to include this timestamp somehow in the
> Index, but I think it would be better to use the maximum of
> ftLastWriteTime and ftCreationTime here which is less confusing and
> closer to Unix's ctime:
>
> buf->st_ctime = max(buf->st_mtime,
> filetime_to_time_t(&(fdata.ftCreationTime));

Can you say more about the practical effect?  Is this causing a bug in
practice, is it a bug waiting to happen, or is it making the code
difficult to understand without any ill effect expected at run time?
(Any of those three is a valuable kind of feedback to offer --- I'm
just trying to figure out which one you mean.)

By the way, do you have core.trustctime set to true or false?

Thanks,
Jonathan

> -Marc
>
> [1] 
> https://msdn.microsoft.com/en-us/library/windows/desktop/aa365739(v=vs.85).aspx


Re: [PATCH 0/3] fixes for running the test suite with --valgrind

2017-10-03 Thread Jonathan Nieder
Jeff King wrote:

> I think using SANITIZE=memory would catch these, but it needs some
> suppressions tuning. The weird "zlib reads uninitialized memory" error
> is a problem (valgrind sees this, too, but we have suppressions).

What version of zlib do you use?  I've heard some good things about
v1.2.11 improving matters, though I haven't checked yet.

Thanks,
Jonathan


Re: [PATCH 1/3] path.c: fix uninitialized memory access

2017-10-03 Thread Jonathan Nieder
Jeff King wrote:
> On Tue, Oct 03, 2017 at 03:45:01PM -0700, Jonathan Nieder wrote:

>> In other words, an alternative fix would be
>> 
>>  if (*path == '.' && path[1] == '/') {
>>  ...
>>  }
>> 
>> which would not require passing in 'len' or switching to index-based
>> arithmetic.  I think I prefer it.  What do you think?
>
> Yes, I think that approach is much nicer. I think you could even use
> skip_prefix. Unfortunately you have to play a few games with const-ness,
> but I think the resulting signature for cleanup_path() is an
> improvement:

Ooh!

For what it's worth, if you add a commit message with Thomas's
Reported-by then this lgtm.

Thanks,
Jonathan

> diff --git a/path.c b/path.c
> index 00ec04e7a5..2e09a7bce0 100644
> --- a/path.c
> +++ b/path.c
> @@ -34,11 +34,10 @@ static struct strbuf *get_pathname(void)
>   return sb;
>  }
>  
> -static char *cleanup_path(char *path)
> +static const char *cleanup_path(const char *path)
>  {
>   /* Clean it up */
> - if (!memcmp(path, "./", 2)) {
> - path += 2;
> + if (skip_prefix(path, "./", &path)) {
>   while (*path == '/')
>   path++;
>   }
> @@ -47,7 +46,7 @@ static char *cleanup_path(char *path)
>  
>  static void strbuf_cleanup_path(struct strbuf *sb)
>  {
> - char *path = cleanup_path(sb->buf);
> + const char *path = cleanup_path(sb->buf);
>   if (path > sb->buf)
>   strbuf_remove(sb, 0, path - sb->buf);
>  }
> @@ -64,7 +63,7 @@ char *mksnpath(char *buf, size_t n, const char *fmt, ...)
>   strlcpy(buf, bad_path, n);
>   return buf;
>   }
> - return cleanup_path(buf);
> + return (char *)cleanup_path(buf);
>  }
>  
>  static int dir_prefix(const char *buf, const char *dir)


Re: [PATCH] test-stringlist: avoid buffer underrun when sorting nothing

2017-10-03 Thread Jonathan Nieder
René Scharfe wrote:

> Check if the strbuf containing data to sort is empty before attempting
> to trim a trailing newline character.
>
> Signed-off-by: Rene Scharfe 
> ---
>  t/helper/test-string-list.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Thanks for fixing it.
Reviewed-by: Jonathan Nieder 


Re: [PATCH 2/3] http-push: fix construction of hex value from path

2017-10-03 Thread Jonathan Nieder
Hi,

Thomas Gummerer wrote:

> The get_oid_hex_from_objpath takes care of creating a oid from a
> pathname.  It does this by memcpy'ing the first two bytes of the path to
> the "hex" string, then skipping the '/', and then copying the rest of the
> path to the "hex" string.  Currently it fails to increase the pointer to
> the hex string, so the second memcpy invocation just mashes over what
> was copied in the first one, and leaves the last two bytes in the string
> uninitialized.

Wow.  The fix is obviously correct.

> This breaks valgrind in t5540, although the test passes without
> valgrind:
[...]
> Signed-off-by: Thomas Gummerer 
> ---
>  http-push.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Would it be straightforward to add a correctness test for this?  It
seems like this code path didn't work at all and no one noticed.

This is the code path in http-push.c which says

 /*
  * NEEDSWORK: remote_ls() ignores info/refs on the remote side.  But it
  * should _only_ heed the information from that file, instead of trying to
  * determine the refs from the remote file system (badly: it does not even
  * know about packed-refs).
  */
 static void remote_ls(const char *path, int flags,

I think the problem is that when it fails, we end up thinking that
there are *fewer* objects than are actually present remotely so the
only ill effect is pushing too much.  So this should be observable in
server logs (i.e. it is testable) but it's not a catastrophic failure
which means it's harder to test than it would be otherwise.

Moreover, this is in the webdav-based "dumb http" push code path,
which I do not trust much at all.  I wonder if we could retire it
completely (or at least provide an option to turn it off).

> diff --git a/http-push.c b/http-push.c
> index e4c9b065ce..e9a01ec4da 100644
> --- a/http-push.c
> +++ b/http-push.c
> @@ -1018,7 +1018,7 @@ static int get_oid_hex_from_objpath(const char *path, 
> struct object_id *oid)
>   memcpy(hex, path, 2);
>   path += 2;
>   path++; /* skip '/' */
> - memcpy(hex, path, GIT_SHA1_HEXSZ - 2);
> + memcpy(hex + 2, path, GIT_SHA1_HEXSZ - 2);
>  
>   return get_oid_hex(hex, oid);

Thanks,
Jonathan


Re: [PATCH 1/3] path.c: fix uninitialized memory access

2017-10-03 Thread Jonathan Nieder
Hi,

Thomas Gummerer wrote:

> In cleanup_path we're passing in a char array, run a memcmp on it, and
> run through it without ever checking if something is in the array in the
> first place.  This can lead us to access uninitialized memory, for
> example in t5541-http-push-smart.sh test 7, when run under valgrind:
[...]
> Avoid this by checking passing in the length of the string in the char
> array, and checking that we never run over it.
>
> Signed-off-by: Thomas Gummerer 
> ---
>  path.c | 19 ++-
>  1 file changed, 10 insertions(+), 9 deletions(-)

When I first read the above, I thought it was going to be about a
NUL-terminated string that was missing a NUL.  But in fact, the issue
is that strlen(path) can be < 2.

In other words, an alternative fix would be

if (*path == '.' && path[1] == '/') {
...
}

which would not require passing in 'len' or switching to index-based
arithmetic.  I think I prefer it.  What do you think?

Thanks and hope that helps,
Jonathan

diff --git i/path.c w/path.c
index b533ec938d..3a1fbee1e0 100644
--- i/path.c
+++ w/path.c
@@ -37,7 +37,7 @@ static struct strbuf *get_pathname(void)
 static char *cleanup_path(char *path)
 {
/* Clean it up */
-   if (!memcmp(path, "./", 2)) {
+   if (*path == '.' && path[1] == '/') {
path += 2;
while (*path == '/')
path++;


Re: [PATCH] branch: reset instead of release a strbuf

2017-10-03 Thread Jonathan Nieder
Hi,

Stefan Beller wrote:

> Our documentation advises to not re-use a strbuf, after strbuf_release
> has been called on it. Use the proper reset instead.
>
> Reviewed-by: Jonathan Nieder 

This is indeed
Reviewed-by: Jonathan Nieder 

Thank you.

> Signed-off-by: Stefan Beller 
> ---
>  builtin/branch.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)

Here's a patch to address the surprising strbuf.h advice.

-- >8 --
Subject: strbuf: do not encourage init-after-release

strbuf_release already leaves the strbuf in a valid, initialized
state, so there is not need to call strbuf_init after it.

Moreover, this is not likely to change in the future: strbuf_release
leaving the strbuf in a valid state has been easy to maintain and has
been very helpful for Git's robustness and simplicity (e.g.,
preventing use-after-free vulnerabilities).

It is still not advisable to call strbuf_release until done using a
strbuf because it is wasteful, so keep that part of the advice.

Reported-by: Stefan Beller 
Signed-off-by: Jonathan Nieder 
---
 strbuf.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/strbuf.h b/strbuf.h
index 7496cb8ec5..6e175c3694 100644
--- a/strbuf.h
+++ b/strbuf.h
@@ -83,7 +83,7 @@ extern void strbuf_init(struct strbuf *, size_t);
 
 /**
  * Release a string buffer and the memory it used. You should not use the
- * string buffer after using this function, unless you initialize it again.
+ * string buffer after using this function.
  */
 extern void strbuf_release(struct strbuf *);
 
-- 
2.14.2.920.gcf0c67979c



Re: [bug] git version 2.4.12 color.ui = always breaks git add -p

2017-10-03 Thread Jonathan Nieder
Hi Christian,

Christian Rebischke wrote:

> I played around with my gitconfig and I think I found a bug while doing
> so. I set the following lines in my gitconfig:
>
> [color]
> ui = always
>
> When I try to use `git add -p ` I don't see the 'Stage
> this hunk'-dialog anymore. First I thought it's some other configuration
> but now I can definitly say that this configuration breaks `git add -p`
> interactive mode.

Do the patches at
https://public-inbox.org/git/20171003133713.ccxv6clrmuuhh...@sigill.intra.peff.net/
help?

Thanks and hope that helps,
Jonathan


Re: [PATCH] branch: reset instead of release a strbuf

2017-10-03 Thread Jonathan Nieder
Hi,

Stefan Beller wrote:

> Our documentation advises to not re-use a strbuf, after strbuf_release
> has been called on it. Use the proper reset instead.

I'm super surprised at this documentation.  strbuf_release maintains
the invariant that a strbuf is always usable (i.e., that we do not have
to fear use-after-free problems).

On second thought, though, strbuf_release is a more expensive operation
than strbuf_reset --- constantly free()-ing and re-malloc()ing is
unnecessary churn in malloc data structures.  So maybe that is the
motivation here?

> Signed-off-by: Stefan Beller 
> ---
>
> Maybe one of the #leftoverbits is to remove the re-init call in release
> and see what breaks? (And then fixing up more of such cases as presented
> in this patch)

As mentioned above: please no.  That would be problematic both for
ease of development and for the risk of security bugs.

>  builtin/branch.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/builtin/branch.c b/builtin/branch.c
> index b998e16d0c..9758012390 100644
> --- a/builtin/branch.c
> +++ b/builtin/branch.c
> @@ -217,7 +217,7 @@ static int delete_branches(int argc, const char **argv, 
> int force, int kinds,
>   if (!head_rev)
>   die(_("Couldn't look up commit object for HEAD"));
>   }
> - for (i = 0; i < argc; i++, strbuf_release(&bname)) {
> + for (i = 0; i < argc; i++, strbuf_reset(&bname)) {
>   char *target = NULL;
>   int flags = 0;

Should there be a strbuf_release (or UNLEAK if you are very very sure)
call at the end of the function to replace this?

With that change (but not without it),
Reviewed-by: Jonathan Nieder 

Thanks.


Re: [PATCH v3 10/10] ssh: introduce a 'simple' ssh variant

2017-10-03 Thread Jonathan Nieder
Hi,

Brandon Williams wrote:

> When using the 'ssh' transport, the '-o' option is used to specify an
> environment variable which should be set on the remote end.  This allows
> git to send additional information when contacting the server,
> requesting the use of a different protocol version via the
> 'GIT_PROTOCOL' environment variable like so: "-o SendEnv=GIT_PROTOCOL"
>
> Unfortunately not all ssh variants support the sending of environment
> variables to the remote end.  To account for this, only use the '-o'
> option for ssh variants which are OpenSSH compliant.  This is done by
> checking that the basename of the ssh command is 'ssh' or the ssh
> variant is overridden to be 'ssh' (via the ssh.variant config).

This also affects -p (port), right?

What happens if I specify a ssh://host:port/path URL and the SSH
implementation is of 'simple' type?

> Previously if an ssh command's basename wasn't 'plink' or

Git's commit messages use the present tense to describe the current
state of the code, so this is "Currently". :)

> 'tortoiseplink' git assumed that the command was an OpenSSH variant.
> Since user configured ssh commands may not be OpenSSH compliant, tighten
> this constraint and assume a variant of 'simple' if the basename of the
> command doesn't match the variants known to git.  The new ssh variant
> 'simple' will only have the host and command to execute ([username@]host
> command) passed as parameters to the ssh command.
>
> Update the Documentation to better reflect the command-line options sent
> to ssh commands based on their variant.
>
> Reported-by: Jeffrey Yasskin 
> Signed-off-by: Brandon Williams 

Thanks for working on this.

For background, the GIT_SSH implementation that motivated this is
https://github.com/travis-ci/dpl/blob/6c3fddfda1f2a85944c56b068bac0a77c049/lib/dpl/provider.rb#L215,
which does not support -p or -4/-6, either.

> ---
>  Documentation/config.txt |  27 ++--
>  Documentation/git.txt|   9 ++--
>  connect.c| 107 
> ++-
>  t/t5601-clone.sh |   9 ++--
>  t/t5700-protocol-v1.sh   |   2 +
>  5 files changed, 95 insertions(+), 59 deletions(-)
[...]
> --- a/connect.c
> +++ b/connect.c
> @@ -776,37 +776,44 @@ static const char *get_ssh_command(void)
[...]
> +static enum ssh_variant determine_ssh_variant(const char *ssh_command,
> +   int is_cmdline)
[...]
> - if (!strcasecmp(variant, "plink") ||
> - !strcasecmp(variant, "plink.exe"))
> - *port_option = 'P';
> + if (!strcasecmp(variant, "ssh"))
> + ssh_variant = VARIANT_SSH;

Could this handle ssh.exe, too?

[...]
> --- a/t/t5601-clone.sh
> +++ b/t/t5601-clone.sh

Can this get tests for the new defaulting behavior?  E.g.

 - default is "simple"
 - how "simple" treats an ssh://host:port/path URL
 - how "simple" treats ipv4/ipv6 switching
 - ssh defaults to "ssh"
 - if GIT_SSH=ssh, can set ssh.variant to "simple" to force the "simple"
   mode

One other worry: this (intentionally) changes the behavior of a
previously-working GIT_SSH=ssh-wrapper that wants to support
OpenSSH-style options but does not declare ssh.variant=ssh.  When
discovering this change, what should the author of such an ssh-wrapper
do?

They could instruct their users to set ssh.variant or GIT_SSH_VARIANT
to "ssh", but then they are at the mercy of future additional options
supported by OpenSSH we may want to start using in the future (e.g.,
maybe we will start passing "--" to separate options from the
hostname).  So this is not a futureproof option for them.

They could take the new default behavior or instruct their users to
set ssh.variant or GIT_SSH_VARIANT to "simple", but then they lose
support for handling alternate ports, ipv4/ipv6, and specifying -o
SendEnv to propagate GIT_PROTOCOL or other envvars.  They can handle
GIT_PROTOCOL propagation manually, but losing port support seems like
a heavy cost.

They could send a patch to define yet another variant that is
forward-compatible, for example using an interface similar to what
git-credential(1) uses.  Then they can set GIT_SSH to their
OpenSSH-style helper and GIT_FANCY_NEW_SSH to their more modern
helper, so that old Git versions could use GIT_SSH and new Git
versions could use GIT_FANCY_NEW_SSH.  This might be their best
option.  It feels odd to say that their only good way forward is to
send a patch, but on the other hand someone with such an itch is
likely to be in the best position to define an appropriate interface.

They could send a documentation patch to make more promises about the
commandline used in OpenSSH mode: e.g. setting a rule in advance about
which options can take an argument so that they can properly parse an
OpenSSH command line in a future-compatible way.

Or they could send a patch to allow passing the port in "simple"
mode, for example using an environment variable.

Am I missing another option?  What advice d

Re: [PATCH v2] request-pull: capitalise "Git" to make it a proper noun

2017-10-02 Thread Jonathan Nieder
Ann T Ropea wrote:

> Of the many ways to spell the three-letter word, the variant "Git"
> should be used when referring to a repository in a description; or, in
> general, when it is used as a proper noun.
>
> We thus change the pull-request template message so that it reads
>
>"...in the Git repository at:"
>
> Besides, this brings us in line with the documentation, see
> Documentation/howto/using-signed-tag-in-pull-request.txt
>
> Signed-off-by: Ann T Ropea 
> Acked-by: Jonathan Nieder 
> ---
> v2: rename patch, correct Signed-off-by line, add Acked-by line
>  git-request-pull.sh | 2 +-
>  t/t5150-request-pull.sh | 4 ++--
>  2 files changed, 3 insertions(+), 3 deletions(-)

Reviewed-by: Jonathan Nieder 

Thanks for the quick turnaround.

Sincerely,
Jonathan


Security of .git/config and .git/hooks

2017-10-02 Thread Jonathan Nieder
Hi,

This topic has been mentioned on this mailing list before but I had
trouble finding a relevant reference.  Links welcome.

Suppose that I add the following to .git/config in a repository on a
shared computer:

[pager]
log = rm -fr /
fsck = rm -fr /

("rm -fr /" is of course a placeholder here.)

I then tell a sysadmin that git commands are producing strange output
and I am having trouble understanding what is going on.  They may run
"git fsck" or "git log"; in either case, the output is passed to the
pager I configured, allowing me to run an arbitrary command using the
sysadmin's credentials.

You might say that this is the sysadmin's fault, that they should have
read through .git/config before running any Git commands.  But I don't
find it so easy to blame them.

A few related cases that might not seem so dated:

 1. I put my repository in a zip file and ask a Git expert to help me
recover data from it, or

 2. My repository is in a shared directory on NFS.  Unless the
administrator setting that up is very careful, it is likely that
the least privileged user with write access to .git/config or
.git/hooks/ may be someone that I don't want to be able to run
arbitrary commands on behalf of the most privileged user working
in that repository.

A similar case to compare to is Linux's "perf" tool, which used to
respect a .perfconfig file from the current working directory.
Fortunately, nowadays "perf" only respects ~/.perfconfig and
/etc/perfconfig.

Proposed fix: because of case (1), I would like a way to tell Git to
stop trusting any files in .git.  That is:

 1. Introduce a (configurable) list of "safe" configuration items that
can be set in .git/config and don't respect any others.

 2. But what if I want to set a different pager per-repository?
I think we could do this using configuration "profiles".
My ~/.config/git/profiles/ directory would contain git-style
config files for repositories to include.  Repositories could
then contain

[include]
path = ~/.config/git/profiles/fancy-log-pager

to make use of those settings.  The facility (1) would
special-case this directory to allow it to set "unsafe" settings
since files there are assumed not to be under the control of an
attacker.

 3. Likewise for hooks: my ~/.config/git/hooks/ directory would
contain hooks for repositories to make use of.  Repositories could
symlink to hook files from there to make use of them.

That would allow the configured hooks in ~/.config/git/hooks/ to
be easy to find and to upgrade in one place.

To help users migrate, when a hook is present and executable in
.git/hooks/, Git would print instructions for moving it to
~/.config/git/hooks/ and replacing it with a symlink after
inspecting it.

For backward compatibility, this facility would be controlled by a
global configuration setting.  If that setting is not enabled, then
the current, less safe behavior would remain.

One downside of (3) is its reliance on symlinks.  Some alternatives:

 3b. Use core.hooksPath configuration instead.  Rely on (2).
 3c. Introduce new hook.* configuration to be used instead of hook
 scripts.  Rely on (2).

Thoughts?
Jonathan


Re: [PATCH] PR msg: capitalise "Git" to make it a proper noun

2017-10-02 Thread Jonathan Nieder
Hi,

Thanks for working to improve Git!

Bedhanger wrote:

> Subject: [PATCH] PR msg: capitalise "Git" to make it a proper noun

nit: What is a PR msg?  Looking with "git log git-request-pull.sh",
I see that previous patches called the subsystem request-pull, so
this could be

request-pull: capitalise "Git" to make it a proper noun

> Of the many ways to spell the three-letter word, the variant "Git"
> should be used when referring to a repository in a description; or, in
> general, when it is used as a proper noun.
>
> We thus change the pull-request template message so that it reads
>
>"...in the Git repository at:"
>
> Besides, this brings us in line with the documentation, see
> Documentation/howto/using-signed-tag-in-pull-request.txt
>
> Signed-off-by: bedhanger 

Please use your full name in the Signed-off-by line.  See
Documentation/SubmittingPatches section "(5) Certify your work" for
why we ask for this.

> ---
>  git-request-pull.sh | 2 +-
>  t/t5150-request-pull.sh | 4 ++--
>  2 files changed, 3 insertions(+), 3 deletions(-)

The patch itself looks good, and I like the change it makes to the
email generated by git request-pull.  Looking forward to seeing what
other changes you come up with in the future.

Thanks and hope that helps,
Jonathan


Re: Updated to v2.14.2 on macOS; git add --patch broken

2017-10-02 Thread Jonathan Nieder
Hi,

Toni Uebernickel wrote:

> I updated to git version v2.14.2 on macOS using homebrew.
>
> Since then `git add --patch` and `git stash save --patch` are not
> working anymore. It's just printing the complete diff without ever
> stopping to ask for actions. This results in an unusable state, as
> the whole command option is rendered useless.

Would a patch like the following help?

I am worried that other scripts using diff-files would need the same
kind of patch.  So it seems worthwhile to look for alternatives.

An alternative would be to partially roll back v2.14.2~61^2~4 (color:
check color.ui in git_default_config, 2017-07-13) by making it not
take effect for plumbing commands (i.e., by adding a boolean to
"struct startup_info" to indicate whether a command is low-level
plumbing).  That would make the behavior of Git harder to explain so I
don't particularly like it.  Plus it defeats the point of the patch.

Yet another alternative would be to treat color.ui=always as a
deprecated synonym for color.ui=auto.  I think that's my preferred
fix.

What do you think?

Thanks again for reporting,
Jonathan

diff --git i/git-add--interactive.perl w/git-add--interactive.perl
index 28b325d754..4ea69538c7 100755
--- i/git-add--interactive.perl
+++ w/git-add--interactive.perl
@@ -101,49 +101,49 @@ sub apply_patch_for_stash;
 
 my %patch_modes = (
'stage' => {
-   DIFF => 'diff-files -p',
+   DIFF => 'diff-files --no-color -p',
APPLY => sub { apply_patch 'apply --cached', @_; },
APPLY_CHECK => 'apply --cached',
FILTER => 'file-only',
IS_REVERSE => 0,
},
'stash' => {
-   DIFF => 'diff-index -p HEAD',
+   DIFF => 'diff-index --no-color -p HEAD',
APPLY => sub { apply_patch 'apply --cached', @_; },
APPLY_CHECK => 'apply --cached',
FILTER => undef,
IS_REVERSE => 0,
},
'reset_head' => {
-   DIFF => 'diff-index -p --cached',
+   DIFF => 'diff-index --no-color -p --cached',
APPLY => sub { apply_patch 'apply -R --cached', @_; },
APPLY_CHECK => 'apply -R --cached',
FILTER => 'index-only',
IS_REVERSE => 1,
},
'reset_nothead' => {
-   DIFF => 'diff-index -R -p --cached',
+   DIFF => 'diff-index --no-color -R -p --cached',
APPLY => sub { apply_patch 'apply --cached', @_; },
APPLY_CHECK => 'apply --cached',
FILTER => 'index-only',
IS_REVERSE => 0,
},
'checkout_index' => {
-   DIFF => 'diff-files -p',
+   DIFF => 'diff-files --no-color -p',
APPLY => sub { apply_patch 'apply -R', @_; },
APPLY_CHECK => 'apply -R',
FILTER => 'file-only',
IS_REVERSE => 1,
},
'checkout_head' => {
-   DIFF => 'diff-index -p',
+   DIFF => 'diff-index --no-color -p',
APPLY => sub { apply_patch_for_checkout_commit '-R', @_ },
APPLY_CHECK => 'apply -R',
FILTER => undef,
IS_REVERSE => 1,
},
'checkout_nothead' => {
-   DIFF => 'diff-index -R -p',
+   DIFF => 'diff-index --no-color -R -p',
APPLY => sub { apply_patch_for_checkout_commit '', @_ },
APPLY_CHECK => 'apply',
FILTER => undef,


Re: hooks are ignored if there are not marked as executable

2017-10-02 Thread Jonathan Nieder
Hi Damien,

Damien wrote:

> If you do `echo my_script > .git/hooks/pre-commit` and then `git commit`,
> The hook is just gonna be ignored.
> But if you do `chmod +x .git/hooks/pre-commit`, then it's executed.

This is intentional.

> I think ignoring a hook is misleading and not newbie friendly, an error
> message to signal an incorrectly configured hook could be better.

Adding a warning sounds like a nice change.  Care to propose a patch?

In the early days, sample hooks were installed without .sample in
their filename and could be enabled by "chmod +x"-ing them.  Because
of this, long-time users of Git may find themselves experiencing this
warning more often than they'd like.  That could be acceptable if the
warning shows a command to run to prevent the warning from showing up
again, though (see advice.c for some examples of how that can be
done).

The main code path to look at is run-command.c::find_hook.

"git grep -e 'rev-parse --git-path hooks' -- . ':!contrib'" finds a
few other code paths you may also want to look at.

Thanks and hope that helps,
Jonathan


Re: [PATCH v2] ref-filter.c: pass empty-string as NULL to atom parsers

2017-10-02 Thread Jonathan Nieder
Hi,

Taylor Blau wrote:

> Peff points out that different atom parsers handle the empty
> "sub-argument" list differently. An example of this is the format
> "%(refname:)".
>
> Since callers often use `string_list_split` (which splits the empty
> string with any delimiter as a 1-ary string_list containing the empty
> string), this makes handling empty sub-argument strings non-ergonomic.
>
> Let's fix this by assuming that atom parser implementations don't care
> about distinguishing between the empty string "%(refname:)" and no
> sub-arguments "%(refname)".
>
> Signed-off-by: Taylor Blau 
> ---
>  ref-filter.c| 10 +-
>  t/t6300-for-each-ref.sh |  1 +
>  2 files changed, 10 insertions(+), 1 deletion(-)

The above does a nice job of explaining

 - what this change is going to do
 - how it's good for the internal code structure / maintainability

What it doesn't tell me about is why the user-facing effect won't
cause problems.  Is there no atom where %(atom:) was previously
accepted and did something meaningful that this may break?

Looking at the manpage and code, I don't see any, so for what it's
worth, this is

Reviewed-by: Jonathan Nieder 

but for next time, please remember to discuss regression risk in
the commit message, too.

Thanks,
Jonathan


Re: git submodule add fails when using both --branch and --depth

2017-10-02 Thread Jonathan Nieder
Hi,

Thadeus Fleming wrote:

> I'm running git 2.14.2 on Ubuntu 16.04.
>
> Compare the behavior of
>
>> git clone --branch pu --depth 1 https://github.com/git/git git-pu
>
> which clones only the latest commit of the pu branch and

Yes.

>> mkdir tmp && cd tmp && git init
>> git submodule add --branch pu --depth 1 https://github.com/git/git \
>  git-pu
>
> which gives the error
>
> fatal: 'origin/pu' is not a commit and a branch 'pu' cannot be created from it
> Unable to checkout submodule 'git-pu'

As a side note (you are using "git submodule add --depth", not "git
submodule update --depth"), I suspect that "submodule update --depth"
may not always do what people expect.

With add --depth, I agree with your expectation and after your fix
everything should work fine.  But with update --depth, consider the
following sequence of steps:

 1. I create a repository "super" with submodule "sub" and publish
both.

 2. I make a couple commits to "sub" and a commit to "super" making
use of those changes and want to publish them.

 3. I use "git push --recurse-submodules" to publish my commits to
"sub" and "super":

a. First it pushes to "sub".

b. Then it pushes to "super".

Between steps 3(a) and 3(b), a person can still "git clone
--recurse-submodules" the "super" repository.  The repository "super"
does not have my change yet and "sub" does, but that is not a problem,
since commands like "git checkout --recurse-submodules" and "git
submodule update" know to check out the commit *before* my change in
the submodule.

But between steps 3(a) and 3(b), "git submodule update --depth=1"
would not work.  It would fetch the submodule with depth 1 and then
try to check out a commit that is deeper in history.

So I think there's more thinking needed there.

That's all a tangent to your report about add --depth, though.

Thanks,
Jonathan


Re: What means "git config bla ~/"?

2017-10-02 Thread Jonathan Nieder
Hi,

rpj...@crashcourse.ca wrote:

> i'm sure i'm about to embarrass myself but, in "man git-config",
> OPTIONS, one reads:
>
>   --path
>
> git-config will expand leading ~ to the value of $HOME, and ~user
> to the   home directory for the specified user. This option has no
> effect when setting the value (but you can use git config bla ~/
> from the command line to let your shell do the expansion).
>
> what's with that "git config bla ~/"? is this some config keyword
> or something?

No need to be embarrased.  Here "bla" is a placeholder.  That is,
for example, I can run

git config --global include.path ~/.config/git/more-config

or

git config --global include.path $HOME/.config/git/more-config

to cause

[include]
path = /home/me/.config/git/more-config

to be added to my global configuration.  The expansion of ~ or $HOME
is performed by my shell, not Git.  For comparison, if I had run

git config --global include.path '~/.config/git/more-config'

then that would cause

[include]
path = ~/.config/git/more-config

to be added to my global configuration, but it would still have the
same effect at run time, since Git is also able to expand ~ to my home
directory.

The wording comes from

commit 1349484e341a3ec2ba02a86c8fbd97ea9dc8c756
Author: Matthieu Moy 
Date:   Wed Dec 30 17:51:53 2009 +0100

builtin-config: add --path option doing ~ and ~user expansion.

I agree with you that it is less clear than it could be.  Ideas for
clarifying it?

Thanks,
Jonathan


Re: [PATCH] clang-format: adjust line break penalties

2017-09-29 Thread Jonathan Nieder
Stephan Beyer wrote:
> On 09/29/2017 08:40 PM, Jonathan Nieder wrote:

>> Going forward, is there an easy way to preview the effect of this kind
>> of change (e.g., to run "make style" on the entire codebase so as to be
>> able to compare the result with two different versions of
>> .clang-format)?
>
> I just ran clang-format before and after the patch and pushed to github.
> The resulting diff is quite big:
>
> https://github.com/sbeyer/git/commit/3d1186c4cf4dd7e40b97453af5fc1170f6868ccd

Thanks.  The first change I see there is

 -char *strbuf_realpath(struct strbuf *resolved, const char *path, int 
die_on_error)
 +char *
 +strbuf_realpath(struct strbuf *resolved, const char *path, int die_on_error)

I understand why the line is broken, but the choice of line break is
wrong.  Seems like the penalty for putting return type on its own line
quite high enough.

My Reviewed-by still stands, though.  It gets "make style" to signal
long lines that should be broken, which is an improvement.

> PS: There should be a comment at the beginning of the .clang-format file
> that says what version it is tested with (on my machine it worked with
> 5.0 but not with 4.0) and there should also probably a remark that the
> clang-format-based style should only be understood as a hint or guidance
> and that most of the Git codebase does not conform it.

Sounds good to me.  Care to send it as a patch? :)

Thanks,
Jonathan


Re: [PATCH] clang-format: adjust line break penalties

2017-09-29 Thread Jonathan Nieder
Hi Dscho,

Johannes Schindelin wrote:

> Signed-off-by: Johannes Schindelin 
> ---
>  .clang-format | 12 ++--
>  1 file changed, 6 insertions(+), 6 deletions(-)

Well executed and well explained. Thank you.

Reviewed-by: Jonathan Nieder 

Going forward, is there an easy way to preview the effect of this kind
of change (e.g., to run "make style" on the entire codebase so as to be
able to compare the result with two different versions of
.clang-format)?

Thanks,
Jonathan


Re: [PATCH v4] technical doc: add a design doc for hash function transition

2017-09-29 Thread Jonathan Nieder
Junio C Hamano wrote:
> Jonathan Nieder  writes:

>> This document describes what a transition to a new hash function for
>> Git would look like.  Add it to Documentation/technical/ as the plan
>> of record so that future changes can be recorded as patches.
>>
>> Also-by: Brandon Williams 
>> Also-by: Jonathan Tan 
>> Also-by: Stefan Beller 
>> Signed-off-by: Jonathan Nieder 
>> ---
>
> Shoudln't these all be s-o-b: (with a note immediately before that
> to say all four contributed equally or something)?

I don't want to get lost in the weeds in the question of how to
represent such a collaborative effort in git's metadata.

You're right that I should collect their sign-offs!  Your approach of
using text instead of machine-readable data for common authorship also
seems okay.

In any event, this is indeed

Signed-off-by: Brandon Williams 
Signed-off-by: Jonathan Tan 
Signed-off-by: Stefan Beller 

(I just checked :)).

>> +Background
>> +--
>> +At its core, the Git version control system is a content addressable
>> +filesystem. It uses the SHA-1 hash function to name content. For
>> +example, files, directories, and revisions are referred to by hash
>> +values unlike in other traditional version control systems where files
>> +or versions are referred to via sequential numbers. The use of a hash
>
> Traditional systems refer to files via numbers???  Perhaps "where
> versions of files are referred to via sequential numbers" or
> something?

Good point.  The wording you suggested will work well.

>> +function to address its content delivers a few advantages:
>> +
>> +* Integrity checking is easy. Bit flips, for example, are easily
>> +  detected, as the hash of corrupted content does not match its name.
>> +* Lookup of objects is fast.
>
> * There is no ambiguity what the object's name should be, given its
>   content.
>
> * Deduping the same content copied across versions and paths is
>   automatic.

:)  Yep, these are nice too, especially that second one.

It also is how we make diff-ing fast.

>> +SHA-1 still possesses the other properties such as fast object lookup
>> +and safe error checking, but other hash functions are equally suitable
>> +that are believed to be cryptographically secure.
>
> s/secure/more &/, perhaps?

We were looking for a phrase meaning that it should be a cryptographic
hash function in good standing, which SHA-1 is at least approaching
not being.

"more secure" should work fine.  Let's go with that.

>> +Goals
>> +-
>> +...
>> +   c. Users can use SHA-1 and NewHash identifiers for objects
>> +  interchangeably (see "Object names on the command line", below).
>
> Mental note.  This needs to extend to the "index X..Y" lines in the
> patch output, which is used by "apply -3" and "am -3".

Will add a note about this to "Object names on the command line".  Stefan
had already pointed out that that section should really be renamed to
something like "Object names in input and output".

>> +2. Allow a complete transition away from SHA-1.
>> +   a. Local metadata for SHA-1 compatibility can be removed from a
>> +  repository if compatibility with SHA-1 is no longer needed.
>
> I like the emphasis on "Local" here.  Metadata for compatiblity that
> is embedded in the objects obviously cannot be removed.
>
> From that point of view, one of the goals ought to be "make sure
> that as much SHA-1 compatibility metadata as possible is local and
> outside the object".  This goal may not be able to say more than "as
> much as possible", as signed objects that came from SHA-1 world
> needs to carry the compatibility metadata somewhere somehow.
>
> Or perhaps we could.  There is nothing that says a signed tag
> created in the SHA-1 world must have the PGP/SHA-1 signature in the
> NewHash payload---it could be split off of the object data and
> stored in a local metadata cache, to be used only when we need to
> convert it back to the SHA-1 world.

That would break round-tripping and would mean that multiple SHA-1
objects could have the same NewHash name.  In other words, from
my point of view there is something that says that such data must
be preserved.

Another way to put it: even after removing all SHA-1 compatibility
metadata, one nice feature of this design is that it can be recovered
if I change my mind, from data in the NewHash based repository alone.

[...]
>> +Non-Goals
>> +-
>> ...
>> +6. Skip fetching some submodules of a project into a NewHash
>> +   repository. (This also depends on NewHash support in G

Re: [PATCH] git-sh: Avoid sourcing scripts with git --exec-path

2017-09-28 Thread Jonathan Nieder
Junio C Hamano wrote:
> Jonathan Nieder  writes:

>> This has been broken for a while, but better late than never to
>> address it.
>
> I am not sure if this is broken in the first place.  We do want to
> make sure that the scripted porcelains will source the shell helper
> library from matching Git release.  The proposed patch goes directly
> against that and I do not see how it could be an improvement.

It used to be that git allowed setting a colon-separated list of paths
in GIT_EXEC_PATH.  (Very long ago, I relied on that feature.)  If we
can restore that functionality without too much cost, then I think
it's worthwhile.

But the cost in this patch for that is pretty high.  And

$ git log -S'$(git --exec-path)/'

tells me that colon-separated paths in GIT_EXEC_PATH did not work for
some use cases as far back as 2006.  Since 2008 the documentation has
encouraged using "git --exec-path" in a way that does not work with
colon-separated paths, too.  I hadn't realized it was so long.  Given
that it hasn't been supported for more than ten years, I was wrong to
read this as a bug report --- instead, it's a feature request.

In that context, this cost is likely not worth paying.

I wonder if there's another way to achieve this patch's goal.  E.g.
what if there were a way to specify some paths git could search for
custom commands, separate from "git --exec-path"?

Putting the custom commands on the $PATH seems nicer unless you're
trying to override the implementation of an existing git command.
And we already discourage overriding the implementation of an existing
git command (as it's open source, you can patch them instead), so...

Another possible motivation (the one that led me to use this mechnism
long ago) is avoiding providing the dashed form git-$cmd in $PATH.  I
think time has shown that having git-$cmd in $PATH is not too painful.

Thanks and hope that helps,
Jonathan


Re: [PATCH] git-sh: Avoid sourcing scripts with git --exec-path

2017-09-28 Thread Jonathan Nieder
Hi,

Dridi Boukelmoune wrote:

> For end users making use of a custom exec path many commands will simply
> fail. Adding git's exec path to the PATH also allows overriding git-sh-*
> scripts, not just adding commands. One can then patch a script without
> tainting their system installation of git for example.
>
> Signed-off-by: Dridi Boukelmoune 

This has been broken for a while, but better late than never to
address it.

[...]
> --- a/Documentation/CodingGuidelines
> +++ b/Documentation/CodingGuidelines
> @@ -151,6 +151,28 @@ For shell scripts specifically (not exhaustive):
> interface translatable. See "Marking strings for translation" in
> po/README.
>  
> + - When sourcing a "git-sh-*" file using "git --exec-path" make sure
> +   to only use its base name.
> +
> + (incorrect)
> + . "$(git --exec-path)/git-sh-setup"
> +
> + (correct)
> + . git-sh-setup

I wonder if we can make this so intuitive that it doesn't need
mentioning in CodingGuidelines.  What if the test harness
t/test-lib.sh were to set a GIT_EXEC_PATH with multiple directories in
it?  That way, authors of new commands would not have to be careful to
look out for this issue proactively because the testsuite would catch
it.

[...]
> +++ b/contrib/convert-grafts-to-replace-refs.sh
> @@ -5,7 +5,8 @@
>  
>  GRAFTS_FILE="${GIT_DIR:-.git}/info/grafts"
>  
> -. $(git --exec-path)/git-sh-setup
> +PATH="$(git --exec-path):$PATH"
> +. git-sh-setup

I wish there were a simple way to do this that doesn't involve
polluting $PATH with the dashed commands.  E.g. we can do something
like

OIFS=$IFS
IFS=:
set -f
for d in $(git --exec-path)
do
if test -e "$d/git-sh-setup"
then
. "$d/git-sh-setup"
break
fi
done
set +f
IFS=$OIFS

but that is not very simple.  Something like

old_PATH=$PATH
PATH=$(git --exec-path):$PATH
. git-sh-setup
PATH=$old_PATH

looks like it could work, but it would undo the work git-sh-setup
does to set a sane $PATH on platforms like Solaris.

> --- a/contrib/rerere-train.sh
> +++ b/contrib/rerere-train.sh
> @@ -7,7 +7,8 @@ USAGE="$me rev-list-args"
>  
>  SUBDIRECTORY_OK=Yes
>  OPTIONS_SPEC=
> -. "$(git --exec-path)/git-sh-setup"
> +PATH="$(git --exec-path):$PATH"
> +. git-sh-setup

This makes me similarly unhappy about PATH pollution, but it may be
that there's nothing to be done about that.

[...]
> --- a/contrib/subtree/git-subtree.sh
> +++ b/contrib/subtree/git-subtree.sh
> @@ -32,7 +32,6 @@ squashmerge subtree changes as a single commit
>  "
>  eval "$(echo "$OPTS_SPEC" | git rev-parse --parseopt -- "$@" || echo exit 
> $?)"
>  
> -PATH=$PATH:$(git --exec-path)
>  . git-sh-setup

This looks like a change that could be separated into a separate
patch, both because it is to contrib/subtree (which is maintained
separately) and because it is not necessary for the goal described in
this patch's commit message.  I do like this change, since it improves
consistency with other commands.

> --- a/git-sh-setup.sh
> +++ b/git-sh-setup.sh
> @@ -44,7 +44,7 @@ git_broken_path_fix () {
>  # @@BROKEN_PATH_FIX@@
>  
>  # Source git-sh-i18n for gettext support.
> -. "$(git --exec-path)/git-sh-i18n"
> +. git-sh-i18n

Do git-mergetool--lib.txt, git-parse-remote.txt, git-sh-i18n.txt,
and git-sh-setup.txt in Documentation/ need the same treatment?

Summary: I like the goal of this patch but I am nervous about the
side-effect it introduces of PATH pollution.  Is there a way around
that?  If there isn't, then this needs a few tweaks and then it should
be ready to go.

Thanks and hope that helps,
Jonathan


Re: [PATCH] doc: correct command formatting

2017-09-28 Thread Jonathan Nieder
Junio C Hamano wrote:
> Jonathan Nieder  writes:
>> Andreas Heiduk wrote:

>>> +1, Thanks for spotting.
>>
>> Thanks for looking it over.  Can we add your Reviewed-by?  (See [1]
>> for what this means.)
>
> I would just do "Acked-by: Andreas" after seeing such an obvious
> admission of guilt & appreciation for fixing in the exchange.

Oh!  I had missed that context.

Your instinct would have been right (and is born out by Andreas's
reply to me).

I was just fishing, but in this context there was no reason to fish
for more than the Ack that was already there.

> Would we rather want to make it more formal like how Linux folks do
> the Reviewed-by: thing?

Separate from this example: yes, I think adopting Linux's Reviewed-by
convention would be a good thing.  When I see a positive reply to a
patch, I often wonder whether an ack or a fuller reviewed-by is
intended, and Linux's way of formalizing that appeals to me.

I'll try sending a patch to add it to SubmittingPatches tomorrow
morning (Stefan had also been hinting recently about this being
something worth trying).

Thank you,
Jonathan


Re: [PATCH] doc: correct command formatting

2017-09-28 Thread Jonathan Nieder
Hi,

Andreas Heiduk wrote:

> +1, Thanks for spotting.

Thanks for looking it over.  Can we add your Reviewed-by?  (See [1]
for what this means.)

> I did a quick
>
>   grep -r " ` "
>
> which came up with with another relevant place:
>
> Documentation/git-format-patch.txt:   `--subject-prefix` option) has ` v` 
> appended to it.  E.g.
>
> But here the space IS relevant but asciidoc does not pick up
> the formatting. Perhaps that one could read like this:
>
>   `--subject-prefix` option) has `v` appended to it.  E.g.

Interesting.  This comes from

  commit 4aad08e061df699b49e24c4d34698d734473fb66
  Author: Junio C Hamano 
  Date:   Wed Jan 2 14:16:07 2013 -0800

  format-patch: document and test --reroll-count

In some output formats, the text with backticks surrounding it is
shown in a different background color, which makes something like
`{space}v` tempting (with appropriate definition of {space} in
the attributes section of asciidoc.conf).  But that feels way too
subtle.

How about something like

has a space and `v` appended to it

?

Thanks,
Jonathan

[1] 
https://kernel.googlesource.com/pub/scm/linux/kernel/git/torvalds/linux/+/9cd6681cb1169e815c41af0265165dd1b872f228/Documentation/process/submitting-patches.rst#563


Re: [PATCH] doc: correct command formatting

2017-09-28 Thread Jonathan Nieder
Adam Dinwoodie wrote:

> Leaving spaces around the `-delimeters for commands means asciidoc fails
> to parse them as the start of a literal string.  Remove an extraneous
> space that is causing a literal to not be formatted as such.
>
> Signed-off-by: Adam Dinwoodie 
> ---
>  Documentation/git.txt | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Good catch.

Reviewed-by: Jonathan Nieder 


[PATCH v4] technical doc: add a design doc for hash function transition

2017-09-27 Thread Jonathan Nieder
This document describes what a transition to a new hash function for
Git would look like.  Add it to Documentation/technical/ as the plan
of record so that future changes can be recorded as patches.

Also-by: Brandon Williams 
Also-by: Jonathan Tan 
Also-by: Stefan Beller 
Signed-off-by: Jonathan Nieder 
---
On Thu, Mar 09, 2017 at 11:14 AM, Shawn Pearce wrote:
> On Mon, Mar 6, 2017 at 4:17 PM, Jonathan Nieder  wrote:

>> Thanks for the kind words on what had quite a few flaws still.  Here's
>> a new draft.  I think the next version will be a patch against
>> Documentation/technical/.
>
> FWIW, I like this approach.

Okay, here goes.

Instead of sharding the loose object translation tables by first byte,
we went for a single table.  It simplifies the design and we need to
keep the number of loose objects under control anyway.

We also included a description of the transition plan and tried to
include a summary of what has been agreed upon so far about the choice
of hash function.

Thanks to Junio for reviving the discussion and in particular to Dscho
for pushing this forward and making the missing pieces clearer.

Thoughts of all kinds welcome, as always.

 Documentation/Makefile |   1 +
 .../technical/hash-function-transition.txt | 797 +
 2 files changed, 798 insertions(+)
 create mode 100644 Documentation/technical/hash-function-transition.txt

diff --git a/Documentation/Makefile b/Documentation/Makefile
index 2415e0d657..471bb29725 100644
--- a/Documentation/Makefile
+++ b/Documentation/Makefile
@@ -67,6 +67,7 @@ SP_ARTICLES += howto/maintain-git
 API_DOCS = $(patsubst %.txt,%,$(filter-out technical/api-index-skel.txt 
technical/api-index.txt, $(wildcard technical/api-*.txt)))
 SP_ARTICLES += $(API_DOCS)
 
+TECH_DOCS += technical/hash-function-transition
 TECH_DOCS += technical/http-protocol
 TECH_DOCS += technical/index-format
 TECH_DOCS += technical/pack-format
diff --git a/Documentation/technical/hash-function-transition.txt 
b/Documentation/technical/hash-function-transition.txt
new file mode 100644
index 00..417ba491d0
--- /dev/null
+++ b/Documentation/technical/hash-function-transition.txt
@@ -0,0 +1,797 @@
+Git hash function transition
+
+
+Objective
+-
+Migrate Git from SHA-1 to a stronger hash function.
+
+Background
+--
+At its core, the Git version control system is a content addressable
+filesystem. It uses the SHA-1 hash function to name content. For
+example, files, directories, and revisions are referred to by hash
+values unlike in other traditional version control systems where files
+or versions are referred to via sequential numbers. The use of a hash
+function to address its content delivers a few advantages:
+
+* Integrity checking is easy. Bit flips, for example, are easily
+  detected, as the hash of corrupted content does not match its name.
+* Lookup of objects is fast.
+
+Using a cryptographically secure hash function brings additional
+advantages:
+
+* Object names can be signed and third parties can trust the hash to
+  address the signed object and all objects it references.
+* Communication using Git protocol and out of band communication
+  methods have a short reliable string that can be used to reliably
+  address stored content.
+
+Over time some flaws in SHA-1 have been discovered by security
+researchers. https://shattered.io demonstrated a practical SHA-1 hash
+collision. As a result, SHA-1 cannot be considered cryptographically
+secure any more. This impacts the communication of hash values because
+we cannot trust that a given hash value represents the known good
+version of content that the speaker intended.
+
+SHA-1 still possesses the other properties such as fast object lookup
+and safe error checking, but other hash functions are equally suitable
+that are believed to be cryptographically secure.
+
+Goals
+-
+Where NewHash is a strong 256-bit hash function to replace SHA-1 (see
+"Selection of a New Hash", below):
+
+1. The transition to NewHash can be done one local repository at a time.
+   a. Requiring no action by any other party.
+   b. A NewHash repository can communicate with SHA-1 Git servers
+  (push/fetch).
+   c. Users can use SHA-1 and NewHash identifiers for objects
+  interchangeably (see "Object names on the command line", below).
+   d. New signed objects make use of a stronger hash function than
+  SHA-1 for their security guarantees.
+2. Allow a complete transition away from SHA-1.
+   a. Local metadata for SHA-1 compatibility can be removed from a
+  repository if compatibility with SHA-1 is no longer needed.
+3. Maintainability throughout the process.
+   a. The object format is kept simple and consistent.
+   b. Creation of a generalized repository conversion tool.
+
+Non-Goals
+-
+1. Add NewHash support to Git protocol. This is valuable and the
+   logic

Re: [PATCH] diff: correct newline in summary for renamed files

2017-09-27 Thread Jonathan Nieder
Stefan Beller wrote:

> In 146fdb0dfe (diff.c: emit_diff_symbol learns about DIFF_SYMBOL_SUMMARY,
> 2017-06-29), the conversion from direct printing to the symbol emission
> dropped the new line character for renamed, copied and rewritten files.
>
> Add the emission of a newline, add a test for this case.
>
> Reported-by: Linus Torvalds 
> Helped-by: Jeff King 
> Signed-off-by: Stefan Beller 
> ---
>  diff.c|  1 +
>  t/t4013-diff-various.sh   | 12 
>  t/t4013/diff.diff-tree_--stat_initial_mode|  4 
>  t/t4013/diff.diff-tree_--summary_initial_mode |  3 +++
>  t/t4013/diff.diff-tree_initial_mode   |  3 +++
>  t/t4013/diff.log_--decorate=full_--all|  6 ++
>  t/t4013/diff.log_--decorate_--all |  6 ++
>  7 files changed, 35 insertions(+)
>  create mode 100644 t/t4013/diff.diff-tree_--stat_initial_mode
>  create mode 100644 t/t4013/diff.diff-tree_--summary_initial_mode
>  create mode 100644 t/t4013/diff.diff-tree_initial_mode

Reviewed-by: Jonathan Nieder 

Thanks.


Re: Updated to v2.14.2 on macOS; git add --patch broken

2017-09-27 Thread Jonathan Nieder
Jonathan Nieder wrote:
> Jeff King wrote:

>> There aren't a lot of changes to the script between v2.13.2 and v2.14.2.
>> The most plausible culprit is d5addcf522 (add--interactive: handle EOF
>> in prompt_yesno, 2017-06-21), but I'm scratching my head over how that
>> could cause what you're seeing.
>>
>> Are you able to build Git from source and bisect the problem? It would
>> help to know which commit introduced the problem.
>
> How about this change?
>
>  commit 136c8c8b8fa39f1315713248473dececf20f8fe7
>  Author: Jeff King 
>  Date:   Thu Jul 13 11:07:03 2017 -0400
>
>  color: check color.ui in git_default_config()

Uh, I think I was thinking of another thread when I wrote this.  Sorry
for the nonsense.

> Toni, what is the output of "git config -l"?

I'm still curious about this.

Thanks,
Jonathan


Re: Updated to v2.14.2 on macOS; git add --patch broken

2017-09-27 Thread Jonathan Nieder
Hi,

Jeff King wrote:
> On Wed, Sep 27, 2017 at 07:28:49PM +0200, Toni Uebernickel wrote:

>> The previous version was v2.13.2.
>> I switched back to this version, and it works perfectly fine; without any 
>> changes to my system.
>
> Thanks for confirming.
>
> There aren't a lot of changes to the script between v2.13.2 and v2.14.2.
> The most plausible culprit is d5addcf522 (add--interactive: handle EOF
> in prompt_yesno, 2017-06-21), but I'm scratching my head over how that
> could cause what you're seeing.
>
> Are you able to build Git from source and bisect the problem? It would
> help to know which commit introduced the problem.

How about this change?

 commit 136c8c8b8fa39f1315713248473dececf20f8fe7
 Author: Jeff King 
 Date:   Thu Jul 13 11:07:03 2017 -0400

 color: check color.ui in git_default_config()

Toni, what is the output of "git config -l"?

Thanks,
Jonathan


Re: RFC v3: Another proposed hash function transition plan

2017-09-26 Thread Jonathan Nieder
Hi,

Johannes Schindelin wrote:

> Sorry, you are asking cryptography experts to spend their time on the Git
> mailing list. I tried to get them to speak out on the Git mailing list.
> They respectfully declined.
>
> I can't fault them, they have real jobs to do, and none of their managers
> would be happy for them to educate the Git mailing list on matters of
> cryptography, not after what happened in 2005.

Fortunately we have had a few public comments from crypto specialists:

https://public-inbox.org/git/91a34c5b-7844-3db2-cf29-411df5bcf...@noekeon.org/
https://public-inbox.org/git/CAL9PXLzhPyE+geUdcLmd=pidt5p8efebbsgx_ds88knz2q_...@mail.gmail.com/
https://public-inbox.org/git/CAL9PXLxMHG1nP5_GQaK_WSJTNKs=_qbaL6V5v2GzVG=9vu2...@mail.gmail.com/
https://public-inbox.org/git/59bfb95d.1030...@st.com/
https://public-inbox.org/git/59c149a3.6080...@st.com/

[...]
> Let's be realistic. Git is pretty important to us, but it is not important
> enough to sway, say, Intel into announcing hardware support for SHA3.

Yes, I agree with this.  (Adoption by Git could lead to adoption by
some other projects, leading to more work on high quality software
implementations in projects like OpenSSL, but I am not convinced that
that would be a good thing for the world anyway.  There are downsides
to a proliferation of too many crypto primitives.  This is the basic
argument described in more detail at [1].)

[...]
> On Tue, 26 Sep 2017, Jason Cooper wrote:

>> For my use cases, as a user of git, I have a plan to maintain provable
>> integrity of existing objects stored in git under sha1 while migrating
>> away from sha1.  The same plan works for migrating away from SHA2 or
>> SHA3 when the time comes.
>
> Please do not make the mistake of taking your use case to be a template
> for everybody's use case.

That said, I'm curious at what plan you are alluding to.  Is it
something that could benefit others on the list?

Thanks,
Jonathan

[1] https://www.imperialviolet.org/2017/05/31/skipsha3.html


Re: [PATCH] technical doc: add a design doc for hash function transition

2017-09-26 Thread Jonathan Nieder
Hi,

Stefan Beller wrote:

> From: Jonathan Nieder 

I go by jrnie...@gmail.com upstream. :)

> This is "RFC v3: Another proposed hash function transition plan" from
> the git mailing list.
>
> Signed-off-by: Jonathan Nieder 
> Signed-off-by: Jonathan Tan 
> Signed-off-by: Brandon Williams 
> Signed-off-by: Stefan Beller 

I hadn't signed-off on this version, but it's not a big deal.

[...]
> ---
>
>  This takes the original Google Doc[1] and adds it to our history,
>  such that the discussion can be on on list and in the commit messages.
>
>  * replaced SHA3-256 with NEWHASH, sha3 with newhash
>  * added section 'Implementation plan'
>  * added section 'Future work'
>  * added section 'Agreed-upon criteria for selecting NewHash'

Thanks for sending this out.  I had let it stall too long.

As a tiny nit, I think NewHash is easier to read than NEWHASH.  Not a
big deal.  More importantly, we need some text describing it and
saying it's a placeholder.

The implementation plan included here is out of date.  It comes from
an email where I was answering a question about what people can do to
make progress, before this design had been agreed on.  In the context
of this design there are other steps we'd want to describe (having to
do with implementing the translation table, etc).

I also planned to add a description of the translation table based on
what was discussed previously in this thread.

Jonathan


Re: [PATCH 3/7] read_in_full: reset errno before reading

2017-09-25 Thread Jonathan Nieder
Jeff King wrote:

> What do you think of patch 7 in light of this? If the short-read case
> gives us a sane errno, should we even bother trying to consistently
> handle its error separately?

I like the read_exactly_or_die variant because it makes callers more
concise, but on the other hand a caller handling the error can write a
more meaningful error message with the right amount of context.

So I think you're right that it's better to drop patch 7.

Thanks,
Jonathan


Re: [PATCH 3/7] read_in_full: reset errno before reading

2017-09-25 Thread Jonathan Nieder
Jeff King wrote:
> On Mon, Sep 25, 2017 at 05:17:32PM -0700, Jonathan Nieder wrote:

>> #ifndef EUNDERFLOW
>> # ifdef ENODATA
>> #  define EUNDERFLOW ENODATA
>> # else
>> #  define EUNDERFLOW ESPIPE
>> # endif
>> #endif
>
> Right, I think our mails just crossed but I'm leaning in this direction.
> I'd prefer to call it SHORT_READ_ERRNO or something, though. Your
> "#ifndef EUNDERFLOW" had me thinking that this was something that a real
> platform might have, but AFAICT you just made it up.

Agreed.  Two of the risks of replying too quickly. :)

Jonathan


Re: [PATCH 3/7] read_in_full: reset errno before reading

2017-09-25 Thread Jonathan Nieder
Jeff King wrote:

> I definitely would prefer to avoid EIO, or anything that an actual
> read() might return.
>
> What do you think of ENODATA? The glibc text for it is pretty
> appropriate. If it's not available everywhere, we'd have to fallback to
> something else (EIO? 0?). I don't know how esoteric it is. The likely
> candidate to be lacking it is Windows.

ENODATA with a fallback to ESPIPE sounds fine to me.

read() would never produce ESPIPE because it doesn't seek.

So that would be:

/* errno value to use for early EOF */
#ifndef ENODATA
#define ENODATA ESPIPE
#endif

Thanks,
Jonathan


Re: [PATCH 3/7] read_in_full: reset errno before reading

2017-09-25 Thread Jonathan Nieder
Jonathan Nieder wrote:
> On Mon, Sep 25, 2017 at 08:09:13PM -0400, Jeff King wrote:

>> ENODATA is not too bad. On my glibc system it yields "No data available"
>> from strerror(), which is at least comprehensible.
>>
>> We're still left with the question of whether it is defined everywhere
>> (and what to fallback to when it isn't).
>
> So,
>
> #ifndef EUNDERFLOW
> #ifdef ENODATA
> #define ENODATA EUNDERFLOW
> #else
> #define ENODATA ESPIPE
> #endif
> #endif
>
> ?

Uh, pretend I said

#ifndef EUNDERFLOW
# ifdef ENODATA
#  define EUNDERFLOW ENODATA
# else
#  define EUNDERFLOW ESPIPE
# endif
#endif

Sorry for the nonsense.
Jonathan


Re: [PATCH 3/7] read_in_full: reset errno before reading

2017-09-25 Thread Jonathan Nieder
On Mon, Sep 25, 2017 at 08:09:13PM -0400, Jeff King wrote:
> On Mon, Sep 25, 2017 at 05:06:58PM -0700, Stefan Beller wrote:

>> After reading this discussion from the sideline, maybe
>>
>>   ENODATA No message is available on the STREAM head read
>> queue (POSIX.1)
>>
>> is not too bad after all. Or
>>
>>   ESPIPE  Invalid seek (POSIX.1)
>>
>> Technically we do not seek, but one could imagine we'd seek there
>> to read?
>
> ENODATA is not too bad. On my glibc system it yields "No data available"
> from strerror(), which is at least comprehensible.
>
> We're still left with the question of whether it is defined everywhere
> (and what to fallback to when it isn't).

So,

#ifndef EUNDERFLOW
#ifdef ENODATA
#define ENODATA EUNDERFLOW
#else
#define ENODATA ESPIPE
#endif
#endif

?  Windows has ESPIPE, and I'm not sure what other non-POSIX platform we
need to worry about.

Thanks,
Jonathan


Re: [PATCH 3/7] read_in_full: reset errno before reading

2017-09-25 Thread Jonathan Nieder
Jeff King wrote:
> On Mon, Sep 25, 2017 at 04:55:10PM -0700, Jonathan Nieder wrote:

>> All that really matters is the strerror() that the user would see.
>
> Agreed, though I didn't think those were necessarily standardized.

The standard allows them to vary for the sake of internationalization,
but they are de facto constants (probably because of application authors
like us abusing them ;-)).

[...]
>> Of course, it's even
>> better if we fix the callers and don't try to wedge this into errno.
>
> Yes. This patch is just a stop-gap. Perhaps we should abandon it
> entirely. But I couldn't find a way to fix all the callers. If you have
> a function that just returns "-1" when it sees a read_in_full() error,
> how does _its_ caller tell the difference?
>
> I guess the answer is that the interface of the sub-function calling
> read_in_full() needs to change.

Yes. :/

>> If you are okay with the too-long/too-short confusion in EOVERFLOW, then
>> there is EMSGSIZE:
>>
>>  Message too long
>
> I don't really like that one either.
>
> I actually prefer "0" of the all of the options discussed. At least it
> is immediately clear that it is not a syscall error.

I have a basic aversion to ": Success" in error messages.  Whenever I
see such an error message, I stop trusting the program that produced it
not to be seriously buggy.  Maybe I'm the only one?

If no actual errno works, we could make a custom strerror-like function
with our own custom errors (making them negative to avoid clashing with
standard errno values), but this feels like overkill.

In the same spirit of misleadingly confusing too-long and too-short,
there is also ERANGE ("Result too large"), which doesn't work here.
There's also EPROTO "Protocol error", but that's about protocols, not
file formats.  More vague and therefor maybe more applicable is EBADMSG
"Bad message".  There's also ENOMSG "No message of the desired type".

If the goal is to support debugging, an error like EPIPE "Broken pipe"
or ESPIPE "Invalid seek" would be likely to lead me in the right
direction (wrong file size), even though it is misleading about how
the error surfaced.

We could also avoid trying to be cute and use something generic like
EIO "Input/output error".

Jonathan


Re: [PATCH] t7406: submodule..update command must not be run from .gitmodules

2017-09-25 Thread Jonathan Nieder
Stefan Beller wrote:

> submodule..update can be assigned an arbitrary command via setting
> it to "!command". When this command is found in the regular config, Git
> ought to just run that command instead of other update mechanisms.
>
> However if that command is just found in the .gitmodules file, it is
> potentially untrusted, which is why we do not run it.  Add a test
> confirming the behavior.
>
> Suggested-by: Jonathan Nieder 
> Signed-off-by: Stefan Beller 
> ---
>  t/t7406-submodule-update.sh | 14 ++
>  1 file changed, 14 insertions(+)
>
> diff --git a/t/t7406-submodule-update.sh b/t/t7406-submodule-update.sh
> index 034914a14f..d718cb00e7 100755
> --- a/t/t7406-submodule-update.sh
> +++ b/t/t7406-submodule-update.sh
> @@ -406,6 +406,20 @@ test_expect_success 'submodule update - command in 
> .git/config' '
>   )
>  '
>  
> +test_expect_success 'submodule update - command in .gitmodules is ignored' '
> + test_when_finished "git -C super reset --hard HEAD^" &&
> +
> + write_script must_not_run.sh <<-EOF &&
> + >$TEST_DIRECTORY/bad
> + EOF
> +
> + git -C super config -f .gitmodules submodule.submodule.update 
> "!$TEST_DIRECTORY/must_not_run.sh" &&

Long line, but I don't think I care.  I wish there were a tool like
"make style" to format shell scripts.

> + git -C super commit -a -m "add command to .gitmodules file" &&
> + git -C super/submodule reset --hard $submodulesha1^ &&
> + git -C super submodule update submodule &&
> + test_path_is_missing bad
> +'

Per offline discussion, you tested that this fails when you use
.git/config instead of .gitmodules, so there aren't any subtle typos
here. :)

Reviewed-by: Jonathan Nieder 

Thanks for writing it.


Re: [PATCH 3/7] read_in_full: reset errno before reading

2017-09-25 Thread Jonathan Nieder
Jeff King wrote:

>   [EOVERFLOW]
> The file is a regular file, nbyte is greater than 0, the starting
> position is before the end-of-file, and the starting position is
> greater than or equal to the offset maximum established in the open
> file description associated with fildes.
>
> That's not _exactly_ what's going on here, but it's pretty close. And is
> what you would get if you implemented read_exactly() in terms of
> something like pread().

All that really matters is the strerror() that the user would see.

For EOVERFLOW, that is

Value too large to be stored in data type

For EILSEQ, it is

Illegal byte sequence

Neither is perfect, but the latter seems like a better fit for the kind
of file format errors we're describing here.  Of course, it's even
better if we fix the callers and don't try to wedge this into errno.

If you are okay with the too-long/too-short confusion in EOVERFLOW, then
there is EMSGSIZE:

Message too long

Hope that helps,
Jonathan


Re: [PATCH 3/7] read_in_full: reset errno before reading

2017-09-25 Thread Jonathan Nieder
Jeff King wrote:
> On Mon, Sep 25, 2017 at 04:33:16PM -0700, Jonathan Nieder wrote:
>> Jef King wrote:

>>>   errno = 0;
>>>   read_in_full(fd, buf, sizeof(buf));
>>>   if (errno)
>>> die_errno("oops");
[...]
>>>  in general we frown on it for calls like
>>> read().
>>
>> Correct.  Actually more than "frown on": except for with the few calls
>> like strtoul that are advertised to work this way, POSIX does not make
>> the guarantee the above code would rely on, at all.
>>
>> So it's not just frowned upon: it's so unportable that the standard
>> calls it out as something that won't work.
>
> Is it unportable? Certainly read() is free reset errno to zero on
> success. But is it allowed to set it to another random value?
>
> I think we're getting pretty academic here, but I'm curious if you have
> a good reference.

Yes, it is allowed to set it to another random value.  It's a common
thing for implementations to do when they hit a recoverable error,
which they sometimes do (e.g. think EFAULT, EINTR, ETIMEDOUT, or an
implementation calling strtoul and getting EINVAL or ERANGE).

glibc only recently started trying to avoid this kind of behavior,
because even though the standard allows it, users hate it.

POSIX.1-2008 XSI 2.3 "Error Numbers" says

Some functions provide the error number in a variable accessed
through the symbol errno, defined by including the 
header.  The value of errno should only be examined when it is
indicated to be valid by a function's return value.

See http://austingroupbugs.net/view.php?id=374 for an example where
someone wanted to add a new guarantee of that kind to a function.

Hope that helps,
Jonathan


Re: [PATCH] Documentation: consolidate submodule..update

2017-09-25 Thread Jonathan Nieder
Stefan Beller wrote:

> By as-is I refer to origin/pu.

Ah, okay.  I'm not in that habit because pu frequently loses topics
--- it's mostly useful as a tool to (1) distribute topic branches and
(2) check whether they are compatible with each other.

[...]
> On Mon, Sep 25, 2017 at 3:22 PM, Jonathan Nieder  wrote:

>> Maybe we should work on first wordsmithing the doc and then figuring
>> out where it goes?  The current state of the doc with (?) three
>> different texts,
>
> such that each different text highlights each locations purpose.
>
>> all wrong,
>
> Care to spell out this bold claim?

In the state of "master", "man git-config" tells me that

[submodule ""]
update = ...

sets the "default update procedure for a submodule".  It suggests that
I read about "git submodule update" in git-submodule(1) for more
details.  This is problematic because

 1. It doesn't tell me when I should and should not set it.
 2. I would expect "git checkout --recurse-submodules" to respect the
default update procedure.
 3. It doesn't tell me what commands like "git fetch
--recurse-submodules" will do.
 4. It doesn't point me to better alternatives, when this
configuration doesn't fit my use case.

"man git-submodule" doesn't have a section on submodule..update.
It has references scattered throughout the manpage.  It only tells me
how the setting affects the "git submodule update" command and doesn't
address the problems (1)-(4).

"man gitmodules" also refers to this setting as the "default update
procedure for the named submodule".  It doesn't answer the questions
(1)-(4) either.

Thanks and hope that helps,
Jonathan


Re: [PATCH 3/7] read_in_full: reset errno before reading

2017-09-25 Thread Jonathan Nieder
Jeff King wrote:

>> --- a/wrapper.c
>> +++ b/wrapper.c
>> @@ -318,8 +318,10 @@ ssize_t read_in_full(int fd, void *buf, size_t count)
>>  ssize_t loaded = xread(fd, p, count);
>>  if (loaded < 0)
>>  return -1;
>> -if (loaded == 0)
>> +if (loaded == 0) {
>> +errno = EILSEQ;
>>  return total;
>> +}
>
> If I do this:
>
>   errno = 0;
>   read_in_full(fd, buf, sizeof(buf));
>   if (errno)
>   die_errno("oops");
>
> then we'll claim an error, even though there was none (remember that
> it's only an error for _some_ callers to not read the whole length).
>
> This may be sufficiently odd that we don't need to care about it. There
> are some calls (like strtoul) which require this kind of clear-and-check
> strategy with errno, but in general we frown on it for calls like
> read().

Correct.  Actually more than "frown on": except for with the few calls
like strtoul that are advertised to work this way, POSIX does not make
the guarantee the above code would rely on, at all.

So it's not just frowned upon: it's so unportable that the standard
calls it out as something that won't work.

> We could also introduce a better helper like this:
>
>   int read_exactly(int fd, void *buf, size_t count)
>   {
>   ssize_t ret = read_in_full(fd, buf, count);
>   if (ret < 0)
>   return -1;
>   if (ret != count) {
>   errno = EILSEQ;
>   return -1;
>   }
>   return 0;
>   }
>
> Then we know that touching errno always coincides with an error return.
> And it's shorter to check "< 0" compared to "!= count" in the caller.
> But of course a caller which wants to distinguish the two cases for its
> error messages then has to look at errno:

That sounds nice, but it doesn't solve the original problem of callers
using read_in_full that way.

Thanks,
Jonathan


Re: git ls-tree -d doesn't work if the specified path is the repository root?

2017-09-25 Thread Jonathan Nieder
Hi,

Roy Wellington wrote:

> When I run `git ls-tree -d HEAD -- subdir` from the root of my
> repository, where `subdir` is a subdirectory in that root, I get the
> treehash of that subdirectory. This is what I expect.
>
> However, if I merely replace `subdir` with `.` (the root of the
> repository), (i.e., `git ls-tree -d HEAD -- .`) git ls-tree returns
> the treehashes of the /children/ of the root, instead of the root
> itself, contrary to the documented behavior of -d.

Can you be more specific?  What documentation led you to this
expectation?

I don't have a strong opinion about this behavior, but in any case if
the documentation and command disagree about the correct behavior then
one of them needs to be fixed. :)

> Is there some reason for this? This behavior seems like a bug to me:
> it means that prior to calling ls-tree I need to check if the
> referenced path happens to be the root, and if so, find some other
> means (rev-parse?) of converting it to a treehash.

Does

git rev-parse HEAD:subdir

work better for what you're trying to accomplish?  To get the root of
the repository, you can use

git rev-parse HEAD:

Thanks and hope that helps,
Jonathan


Re: [PATCH] Documentation: consolidate submodule..update

2017-09-25 Thread Jonathan Nieder
Stefan Beller wrote:
> On Mon, Sep 25, 2017 at 12:17 PM, Brandon Williams  wrote:
>> On 09/25, Stefan Beller wrote:

>>> Have one place to explain the effects of setting submodule..update
>>> instead of two.
>>>
>>> Signed-off-by: Stefan Beller 
>>> ---
> I disagree.  Actually, I think the git-config(1) blurb could just
> point here, and that the text here ought to be clear about what
> commands it affects and how an end user would use it.

 I tend to agree with the consolidation.
>>>
>>> Something like this?
>>
>> I like the consolidation, its easier to keep up to date when its only in
>> one place.
>
> After thinking about it further, I'd like to withdraw this patch
> for now.
>
> The reason is that this would also forward point to
> git-submodule.txt, such that we'd still have 2 places
> in which it is explained. The current situation with explaining
> it in 3 places is not too bad as each place hints at their specific
> circumstance:
> git-submodule.txt explains the values to set itself.
> gitmodules.txt explains what the possible fields in that file are,
>   and regarding this field it points out it is ignored in the init call.
> config.txt: actually describe the effects of the setting.
>
> I think we can keep it as-is for now.

Sorry, I got lost.  Which state is as-is?

As a user, how do I find out what submodule.*.update is going to do
and which commands will respect it?

Maybe we should work on first wordsmithing the doc and then figuring
out where it goes?  The current state of the doc with (?) three
different texts, all wrong, doesn't seem like a good state to
preserve.

Thanks,
Jonathan


Re: [PATCH 7/7] add xread_in_full() helper

2017-09-25 Thread Jonathan Nieder
Jeff King wrote:

> Many callers of read_in_full() expect to see exactly "len"
> bytes, and die if that isn't the case.

micronit: Can this be named read_in_full_or_die?

Otherwise it's too easy to mistake for a function like xread, which
has different semantics.

I realize that xmalloc, xmemdupz, etc use a different convention.
That's yet another reason to be explicit, IMHO.

Thanks,
Jonathan


Re: [PATCH 6/7] worktree: check the result of read_in_full()

2017-09-25 Thread Jonathan Nieder
Jeff King wrote:

> We try to read "len" bytes into a buffer and just assume
> that it happened correctly. In practice this should usually
> be the case, since we just stat'd the file to get the
> length.  But we could be fooled by transient errors or by
> other processes racily truncating the file.
>
> Let's be more careful. There's a slim chance this could
> catch a real error, but it also prevents people and tools
> from getting worried while reading the code.
>
> Signed-off-by: Jeff King 
> ---
>  builtin/worktree.c | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/builtin/worktree.c b/builtin/worktree.c
> index 2f4a4ef9cd..87b3d70b0b 100644
> --- a/builtin/worktree.c
> +++ b/builtin/worktree.c
> @@ -59,7 +59,11 @@ static int prune_worktree(const char *id, struct strbuf 
> *reason)
>   }
>   len = xsize_t(st.st_size);
>   path = xmallocz(len);
> - read_in_full(fd, path, len);
> + if (read_in_full(fd, path, len) != len) {
> + strbuf_addf(reason, _("Removing worktrees/%s: gitdir read did 
> not match stat (%s)"),
> + id, strerror(errno));

I'm a little confused.  The 'if' condition checks for a read error but
the message says something about 'stat'.

If we're trying to double-check the 'stat' result, shouldn't we read
all the way to EOF in case the file got longer?

Puzzled,
Jonathan


Re: [PATCH 5/7] worktree: use xsize_t to access file size

2017-09-25 Thread Jonathan Nieder
Jeff King wrote:

> To read the "gitdir" file into memory, we stat the file and
> allocate a buffer. But we store the size in an "int", which
> may be truncated. We should use a size_t and xsize_t(),
> which will detect truncation.
>
> An overflow is unlikely for a "gitdir" file, but it's a good
> practice to model.
>
> Signed-off-by: Jeff King 
> ---
>  builtin/worktree.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)

Reviewed-by: Jonathan Nieder 


Re: [PATCH 4/7] get-tar-commit-id: prefer "!=" for read_in_full() error check

2017-09-25 Thread Jonathan Nieder
Jeff King wrote:

> Suggested-by: Jonathan Nieder 
> Signed-off-by: Jeff King 
> ---
>  builtin/get-tar-commit-id.c | 6 ++
>  1 file changed, 2 insertions(+), 4 deletions(-)

Reviewed-by: Jonathan Nieder 


Re: [PATCH 3/7] read_in_full: reset errno before reading

2017-09-25 Thread Jonathan Nieder
Jeff King wrote:

> In an ideal world, callers would always distinguish between
> these cases and give a useful message for each. But as an
> easy way to make our imperfect world better, let's reset
> errno to a known value. The best we can do is "0", which
> will yield something like:
>
>   unable to read: Success
>
> That's not great, but at least it's deterministic and makes
> it clear that we didn't see an error from read().

Yuck.  Can we set errno to something more specific instead?

read(2) also doesn't promise not to clobber errno on success.

How about something like the following?

-- >8 --
Subject: read_in_full: set errno for partial reads

Many callers of read_in_full() complain when we do not read their full
byte-count. But a check like:

  if (read_in_full(fd, buf, len) != len)
return error_errno("unable to read");

conflates two problem conditions:

  1. A real error from read().
  2. There were fewer than "len" bytes available.

In the first case, showing the user strerror(errno) is useful. In the
second, we may see a random errno that was set by some previous system
call.

In an ideal world, callers would always distinguish between these
cases and give a useful message for each. But as an easy way to make
our imperfect world better, let's set errno in the second case to a
known value. There is no standard "Unexpected end of file" errno
value, so instead use EILSEQ, which yields a message like

  unable to read: Illegal byte sequence

That's not great, but at least it's deterministic and makes it
possible to reverse-engineer what went wrong.

Change-Id: I48138052f71b7c6cf35c2d00a10dc8febaca4f10
Signed-off-by: Jonathan Nieder 
---
 wrapper.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/wrapper.c b/wrapper.c
index 61aba0b5c1..1842a99b87 100644
--- a/wrapper.c
+++ b/wrapper.c
@@ -318,8 +318,10 @@ ssize_t read_in_full(int fd, void *buf, size_t count)
ssize_t loaded = xread(fd, p, count);
if (loaded < 0)
return -1;
-   if (loaded == 0)
+   if (loaded == 0) {
+   errno = EILSEQ;
return total;
+   }
count -= loaded;
p += loaded;
total += loaded;
-- 
2.14.1.821.g8fa685d3b7



Re: [PATCH 2/7] notes-merge: drop dead zero-write code

2017-09-25 Thread Jonathan Nieder
Jeff King wrote:

> We call write_in_full() with a size that we know is greater
> than zero. The return value can never be zero, then, since
> write_in_full() converts such a failed write() into ENOSPC
> and returns -1.  We can just drop this branch of the error
> handling entirely.
>
> Suggested-by: Jonathan Nieder 
> Signed-off-by: Jeff King 
> ---
>  notes-merge.c | 2 --
>  1 file changed, 2 deletions(-)

Reviewed-by: Jonathan Nieder 

Thanks for tying up this loose end.


Re: [PATCH 1/7] files-backend: prefer "0" for write_in_full() error check

2017-09-25 Thread Jonathan Nieder
Jeff King wrote:
> Commit 06f46f237a (avoid "write_in_full(fd, buf, len) !=
> len" pattern, 2017-09-13) converted this callsite from:
>
>   write_in_full(...) != 1
>
> to
>
>   write_in_full(...) < 0
>
> But during the conflict resolution in c50424a6f0 (Merge
> branch 'jk/write-in-full-fix', 2017-09-25), this morphed
> into
>
>   write_in_full(...) < 1
>
> This behaves as we want, but we prefer to avoid modeling the
> "less than length" error-check which can be subtly buggy, as
> shown in efacf609c8 (config: avoid "write_in_full(fd, buf,
> len) < len" pattern, 2017-09-13).
>
> Signed-off-by: Jeff King 
> ---
>  refs/files-backend.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Good eyes.  Thanks.
Reviewed-by: Jonathan Nieder 

By the way, the description from that merge commit

Many codepaths did not diagnose write failures correctly when disks
go full, due to their misuse of write_in_full() helper function,
which have been corrected.

does not look accurate to me.  At least the "Many codepaths" part.
All of those changes except for three should be no-ops.  The scariest
one is the 'long ret = write_in_full(fd, buf, size)' in notes-merge.c,
which is about overflowing a "long" on Windows instead of error
handling.

Thanks,
Jonathan


Re: [PATCH] t7406: submodule..update command must not be run from .gitmodules

2017-09-25 Thread Jonathan Nieder
Stefan Beller wrote:

> submodule..update can be assigned an arbitrary command via setting
> it to "!command". When this command is found in the regular config, Git
> ought to just run that command instead of other update mechanisms.
> 
> However if that command is just found in the .gitmodules file, it is
> potentially untrusted, which is why we do not run it.  Add a test
> confirming the behavior.
> 
> Suggested-by: Jonathan Nieder 
> Signed-off-by: Stefan Beller 
> ---
>  t/t7406-submodule-update.sh | 10 ++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/t/t7406-submodule-update.sh b/t/t7406-submodule-update.sh
> index 034914a14f..780af4e6f5 100755
> --- a/t/t7406-submodule-update.sh
> +++ b/t/t7406-submodule-update.sh
> @@ -406,6 +406,16 @@ test_expect_success 'submodule update - command in 
> .git/config' '
>   )
>  '
>  
> +test_expect_success 'submodule update - command in .gitmodules is ignored' '
> + test_when_finished "git -C super reset --hard HEAD^" &&
> +
> + git -C super config -f .gitmodules submodule.submodule.update "!false 
> || echo >bad" &&

What does the '!false || echo >bad' do?

Ideally we want this test to be super robust: e.g. if it runs the
command but from a different directory, we still want the test to fail,
and if it runs the command but using exec instead of a shell, we still
want the test to fail.

Maybe write_script would help with this.  E.g. would something like

test_when_finished ... &&
write_script must_not_run.sh <<-EOF &&
>$TEST_DIRECTORY/bad
EOF

git -C super config -f .gitmodules submodule.submodule.update \
"!$TEST_DIRECTORY/must_not_run.sh" &&
...

work?

Thanks,
Jonathan


Re: [PATCH v2 1/1] Win32: simplify loading of DLL functions

2017-09-25 Thread Jonathan Nieder
Johannes Schindelin wrote:

> Dynamic loading of DLL functions is duplicated in several places in Git
> for Windows' source code.
>
> This patch adds a pair of macros to simplify the process: the
> DECLARE_PROC_ADDR(, , ,
> ..) macro to be used at the beginning of a
> code block, and the INIT_PROC_ADDR() macro to call before
> using the declared function. The return value of the INIT_PROC_ADDR()
> call has to be checked; If it is NULL, the function was not found in the
> specified DLL.
>
> Example:
>
> DECLARE_PROC_ADDR(kernel32.dll, BOOL, CreateHardLinkW,
>   LPCWSTR, LPCWSTR, LPSECURITY_ATTRIBUTES);
>
> if (!INIT_PROC_ADDR(CreateHardLinkW))
> return error("Could not find CreateHardLinkW() function";
>
>   if (!CreateHardLinkW(source, target, NULL))
>   return error("could not create hardlink from %S to %S",
>source, target);
>   return 0;
>
> Signed-off-by: Karsten Blees 
> Signed-off-by: Johannes Schindelin 
> Reviewed-by: Jonathan Nieder 

Yep, this is indeed

Reviewed-by: Jonathan Nieder 

> ---
>  compat/win32/lazyload.h | 57 
> +
>  1 file changed, 57 insertions(+)
>  create mode 100644 compat/win32/lazyload.h

Thanks,
Jonathan


Re: [PATCHv2] Documentation/config: clarify the meaning of submodule..update

2017-09-22 Thread Jonathan Nieder
Stefan Beller wrote:

> Reported-by: Lars Schneider 
> Signed-off-by: Stefan Beller 
> ---
>  Documentation/config.txt | 12 
>  1 file changed, 8 insertions(+), 4 deletions(-)
>
> Jonathan writes:

>> You'll want to update Documentation/gitmodules.txt, too.
>
> No. /grumpycat
>
> It should already be fine, as I read it as if it is only relevant to 
> "git submodule update" there, already:
>
>   submodule..update::
>   Defines the default update procedure for the named submodule,
>   i.e. how the submodule is updated by "git submodule update"
>   command in the superproject. This is only used by `git
>   submodule init` to initialize the configuration variable of
>   the same name. Allowed values here are 'checkout', 'rebase',
>   'merge' or 'none'. See description of 'update' command in
>   linkgit:git-submodule[1] for their meaning. Note that the
>   '!command' form is intentionally ignored here for security
>   reasons.

I disagree.  Actually, I think the git-config(1) blurb could just
point here, and that the text here ought to be clear about what
commands it affects and how an end user would use it.


Re: [PATCH] Documentation/config: clarify the meaning of submodule..update

2017-09-22 Thread Jonathan Nieder
Hi,

Stefan Beller wrote:

> With more commands (that potentially change a submodule) paying attention
> to submodules as well as the recent discussion[1] on submodule..update,
> let's spell out that submodule..update is strictly to be used
> for configuring the "submodule update" command and not to be obeyed
> by other commands.

Good idea, thank you.

You'll want to update Documentation/gitmodules.txt, too.

I think this can go further: it should say explicitly that commands
like "git checkout --recurse-submodules" do not pay attention to this
option.

> These other commands usually have a strict meaning of what they should
> do (i.e. checkout, reset, rebase, merge) as well as have their name
> overlapping with the modes possible for submodule..update.
>
> [1] 
> https://public-inbox.org/git/4283f0b0-bc1c-4ed1-8126-7e512d844...@gmail.com/

Can you summarize what this discussion concluded with so the reader
does not have to look far to understand it?

> Signed-off-by: Stefan Beller 

Reported-by: Lars Schneider 

> ---
>  Documentation/config.txt | 7 +++
>  1 file changed, 3 insertions(+), 4 deletions(-)
>
> diff --git a/Documentation/config.txt b/Documentation/config.txt
> index dc4e3f58a2..b0ded777fe 100644
> --- a/Documentation/config.txt
> +++ b/Documentation/config.txt
> @@ -3085,10 +3085,9 @@ submodule..url::
>   See linkgit:git-submodule[1] and linkgit:gitmodules[5] for details.
>  
>  submodule..update::
> - The default update procedure for a submodule. This variable
> - is populated by `git submodule init` from the
> - linkgit:gitmodules[5] file. See description of 'update'
> - command in linkgit:git-submodule[1].
> + The method how a submodule is updated via 'git submodule update'.
> + It is populated by `git submodule init` from the linkgit:gitmodules[5]
> + file. See description of 'update' command in linkgit:git-submodule[1].
>  

Wording nits: s/The method how/The method by which/; s/via/by/

More importantly, can this be more explicit about how it is meant to
be used?  E.g. to say

 1. This only affects "git submodule update" and doesn't affect
commands like "git checkout --recurse-submodules".

 2. It exists for historical reasons; settings like submodule.active
and pull.rebase are more likely to be what someone is looking for.

Thanks and hope that helps,
Jonathan


Re: [PATCH 1/4] cat-file: handle NULL object_context.path

2017-09-21 Thread Jonathan Nieder
Jeff King wrote:

> Commit dc944b65f1 (get_sha1_with_context: dynamically
> allocate oc->path, 2017-05-19) changed the rules that
> callers must follow for seeing if we parsed a path in the
> object name. The rules switched from "check if the oc.path
> buffer is empty" to "check if the oc.path pointer is NULL".
> But that commit forgot to update some sites in
> cat_one_file(), meaning we might dereference a NULL pointer.
>
> You can see this by making a path-aware request like
> --textconv without specifying --path, and giving an object
> name that doesn't have a path in it. Like:
>
>   git cat-file --textconv HEAD
>
> which will reliably segfault.
>
> Signed-off-by: Jeff King 
> ---
>  builtin/cat-file.c  | 4 ++--
>  t/t8010-cat-file-filters.sh | 5 +
>  2 files changed, 7 insertions(+), 2 deletions(-)

Yikes.  Commit dc944b65f1 even touched this function, so we reviewers
have no excuse for not having found it.

Technically this changes the behavior of cat-file --path='', but I
don't think that matters.

Do other GET_SHA1_RECORD_PATH callers need similar treatment?

* builtin/grep.c appears to do the right thing (it stores NULL in
  list, so it passes NULL to grep_object, which calls grep_oid, which
  calls grep_source_init, which stores NULL for the grep machinery
  that is able to cope with a NULL).

* builtin/log.c is correctly updated as part of the patch.

Those are the only other callers.  So we're safe. *phew*

Reviewed-by: Jonathan Nieder 


Re: [PATCH v1] travis-ci: fix "skip_branch_tip_with_tag()" string comparison

2017-09-21 Thread Jonathan Nieder
larsxschnei...@gmail.com wrote:

> 09f5e97 ("travis-ci: skip a branch build if equal tag is present",
> 2017-09-17) introduced the "skip_branch_tip_with_tag" function with
> a broken string comparison. Fix it!
>
> Reported-by: SZEDER Gábor 
> Signed-off-by: Lars Schneider 
> ---

Thanks for the fix.

09f5e97 appears to be for the ls/travis-scriptify branch, which is
already part of "next" (if it weren't, I'd suggest just squashing your
patch into that commit).

> --- a/ci/lib-travisci.sh
> +++ b/ci/lib-travisci.sh
> @@ -14,7 +14,7 @@ skip_branch_tip_with_tag () {
>   # of a tag.
> 
>   if TAG=$(git describe --exact-match "$TRAVIS_BRANCH" 2>/dev/null) &&
> - $TAG != $TRAVIS_BRANCH
> + [ "$TAG" != "$TRAVIS_BRANCH" ]

Git style is to use 'test' instead of '[' for this.  See
https://public-inbox.org/git/2f3cdc85-f051-c0ae-b9db-fd13cac78...@gmail.com/
for more on that subject.

Could you squash in the following?

Thanks,
Jonathan

diff --git i/ci/lib-travisci.sh w/ci/lib-travisci.sh
index c3b46f4a7d..b3ed0a0dda 100755
--- i/ci/lib-travisci.sh
+++ w/ci/lib-travisci.sh
@@ -14,7 +14,7 @@ skip_branch_tip_with_tag () {
# of a tag.
 
if TAG=$(git describe --exact-match "$TRAVIS_BRANCH" 2>/dev/null) &&
-   [ "$TAG" != "$TRAVIS_BRANCH" ]
+   test "$TAG" != "$TRAVIS_BRANCH"
then
echo "Tip of $TRAVIS_BRANCH is exactly at $TAG"
exit 0


Re: [PATCH] revision: replace "struct cmdline_pathspec" with argv_array

2017-09-20 Thread Jonathan Nieder
Jeff King wrote:
> On Wed, Sep 20, 2017 at 03:48:26PM -0700, Jonathan Nieder wrote:
>> Jeff King wrote:

>>> Subject: [PATCH] revision: replace "struct cmdline_pathspec" with argv_array
>>>
>>> We assemble an array of strings in a custom struct,
>>> NULL-terminate the result, and then pass it to
>>> parse_pathspec().
>>>
>>> But then we never free the array or the individual strings
>>> (nor can we do the latter, as they are heap-allocated when
>>> they come from stdin but not when they come from the
>>> passed-in argv).
[...]
>> Except... is the idea that this allows the strings from stdin to be
>> freed sooner, as soon as they have been parsed into a "struct
>> pathspec"?
>
> Well, no...the idea is that this is a function which leaks a bunch of
> memory, and we shouldn't have to think hard about how often its leak can
> be triggered or how severe it is. We should just fix it.

I forgot to say: thanks for writing such a pleasant patch.

Reviewed-by: Jonathan Nieder 

[...]
> I certainly agree that the pathspec interface could use better
> documentation. Patches welcome? :)

Here's such a patch.

-- 8< --
Subject: pathspec doc: parse_pathspec does not maintain references to args

The command line arguments passed to main() are valid for the life of
a program, but the same is not true for all other argv-style arrays
(e.g.  when a caller creates an argv_array).  Clarify that
parse_pathspec does not rely on the argv passed to it to remain valid.

This makes it easier to tell that callers like "git rev-list --stdin"
are safe and ensures that that is more likely to remain true as the
implementation of parse_pathspec evolves.

Signed-off-by: Jonathan Nieder 
---
 pathspec.c | 4 
 pathspec.h | 7 +++
 2 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/pathspec.c b/pathspec.c
index e2a23ebc96..cdefdc7cc0 100644
--- a/pathspec.c
+++ b/pathspec.c
@@ -526,10 +526,6 @@ static void NORETURN unsupported_magic(const char *pattern,
pattern, sb.buf);
 }
 
-/*
- * Given command line arguments and a prefix, convert the input to
- * pathspec. die() if any magic in magic_mask is used.
- */
 void parse_pathspec(struct pathspec *pathspec,
unsigned magic_mask, unsigned flags,
const char *prefix, const char **argv)
diff --git a/pathspec.h b/pathspec.h
index 60e6500401..6420d1080a 100644
--- a/pathspec.h
+++ b/pathspec.h
@@ -70,6 +70,13 @@ struct pathspec {
  */
 #define PATHSPEC_LITERAL_PATH (1<<6)
 
+/*
+ * Given command line arguments and a prefix, convert the input to
+ * pathspec. die() if any magic in magic_mask is used.
+ *
+ * Any arguments used are copied. It is safe for the caller to modify
+ * or free 'prefix' and 'args' after calling this function.
+ */
 extern void parse_pathspec(struct pathspec *pathspec,
   unsigned magic_mask,
   unsigned flags,
-- 
2.14.1.821.g8fa685d3b7



Re: [PATCH] revision: replace "struct cmdline_pathspec" with argv_array

2017-09-20 Thread Jonathan Nieder
Hi,

Jeff King wrote:

> But mostly I am fundamentally against using UNLEAK() in a case like
> this, because it does not match either of the properties which justified
> adding UNLEAK() in the first place:
>
>   1. We are about to exit the program, so the "leak" is only caused by
>  the memory going out of scope at that exit.
>
>  By contrast, the revision machinery may be called many times in the
>  same program.
>
>   2. The memory remains useful until around the time of program exit.
>
>  This most certainly does not, as it would not even be reachable.
[...]
> On Wed, Sep 20, 2017 at 03:48:26PM -0700, Jonathan Nieder wrote:

>> In other words, proposed changes:
>>
>>  1. Could the commit message describe what effect this would have on
>> maximum heap usage, if any?  (In qualitative terms is fine, though
>> actual numbers would be even better if it's easy to get them.)
>> That would make it easier to justify not using UNLEAK.
>
> What wording are you looking for? It was a leak, and now it's gone.  The
> size of the leak depends on how much you feed to --stdin. IMHO using
> UNLEAK is totally inappropriate for this case, and doesn't even seem
> like an alternative worth rejecting.
>
>>  2. Can parse_pathspec get a comment in pathspec.h saying that it
>> defensively copies anything it needs from args so the caller is
>> free to modify or free it?  That way, it should be more obvious
>> to people in the future modifying parse_pathspec() that callers
>> may rely on that.  (The current API comment describes argv as
>> "command line arguments", which I fear would send the opposite
>> message to implementors.)
>
> I certainly agree that the pathspec interface could use better
> documentation. Patches welcome? :)

I think I failed at communicating here.  That is not what I meant at
all.

The context is that Git (especially older parts of it) suffers from a
pretty severe lack of API documentation.  For newcomers that is
especially obvious --- long-time git contributors, on the other hand,
may get used to patterns and common interfaces and may have trouble
seeing just how bad the lack of clearly communicated API contracts is.

There is a bit of a "broken window" problem, too: authors of one-off
patches may reasonably assume from existing code that this is just the
way it is and, lacking examples of how to document an API, add more
underdocumented API contracts.

The patch I am replying to tightens the contract for parse_pathspec().
I am not asking for comprehensive documentation of that function ---
that would be clearly off-topic for the patch.  Instead, I am saying
that we should document what we are newly requiring of the function in
the patch.  That way, the documented contract becomes clearer over
time as people document what they are relying on.  I think of that as
generally a good practice.

In other words, it was not at all obvious that "(2) The memory remains
useful until around the time of program exit" did not hold.  To a
casual reader it instead looks like (2) does hold, and there's no
documentation short of delving into the implementation of
parse_pathspec() to convince a reader otherwise.  The documentation
that is present leads to the opposite conclusion.

The assertion (1) that this allocation is going to happen multiple
times in a program isn't true either.  As you noted, we only read
stdin once.  But that doesn't matter as long as (2) doesn't hold.

TBH saying that I should write the one-sentence doc patch feels like
a cop-out.  Just like it is not sustainable for those reviewers that
are interested in good test coverage to be the only ones who write
tests, I think we cannot avoid treating documentation of API contracts
as a shared responsibility.

Thanks and hope that clarifies,
Jonathan


Re: [PATCH] revision: replace "struct cmdline_pathspec" with argv_array

2017-09-20 Thread Jonathan Nieder
Hi,

Jeff King wrote:

> Subject: [PATCH] revision: replace "struct cmdline_pathspec" with argv_array
>
> We assemble an array of strings in a custom struct,
> NULL-terminate the result, and then pass it to
> parse_pathspec().
>
> But then we never free the array or the individual strings
> (nor can we do the latter, as they are heap-allocated when
> they come from stdin but not when they come from the
> passed-in argv).

To be devil's advocate for a moment: why don't we use UNLEAK on the
paths passed in from stdin?

It's true that there can be an unbounded number of them, but they all
coexist in memory anyway.  They are not generated dynamically on the
fly.  Being able to free them doesn't have any obvious advantage over
using exit().

Except... is the idea that this allows the strings from stdin to be
freed sooner, as soon as they have been parsed into a "struct
pathspec"?

That sounds appealing.  The only risk would be if "struct pathspec"
holds on to a pointer to one of these strings.

Fortunately parse_pathspec() is careful to strdup any strings it
borrows (though it is not documented to do so).

In other words, proposed changes:

 1. Could the commit message describe what effect this would have on
maximum heap usage, if any?  (In qualitative terms is fine, though
actual numbers would be even better if it's easy to get them.)
That would make it easier to justify not using UNLEAK.

 2. Can parse_pathspec get a comment in pathspec.h saying that it
defensively copies anything it needs from args so the caller is
free to modify or free it?  That way, it should be more obvious
to people in the future modifying parse_pathspec() that callers
may rely on that.  (The current API comment describes argv as
"command line arguments", which I fear would send the opposite
message to implementors.)

> Let's swap this out for an argv_array. It does the same
> thing with fewer lines of code, and it's safe to call
> argv_array_clear() at the end to avoid a memory leak.
>
> Reported-by: Martin Ågren 
> Signed-off-by: Jeff King 
> ---
>  revision.c | 39 +++
>  1 file changed, 11 insertions(+), 28 deletions(-)

The code looks good.

Thanks,
Jonathan


Re: [PATCH] commit: fix memory leak in `reduce_heads()`

2017-09-20 Thread Jonathan Nieder
Martin Ågren wrote:

> We don't free the temporary scratch space we use with
> `remove_redundant()`. Free it similar to how we do it in
> `get_merge_bases_many_0()`.
>
> Signed-off-by: Martin Ågren 
> ---
>  commit.c | 1 +
>  1 file changed, 1 insertion(+)

Good catch.  Thanks.

Reviewed-by: Jonathan Nieder 


Re: [PATCH] Win32: simplify loading of DLL functions

2017-09-20 Thread Jonathan Nieder
Johannes Schindelin wrote:
> On Tue, 19 Sep 2017, Jonathan Nieder wrote:

>> Could this example go near the top of the header instead?  That way,
>> it's easier for people reading the header to see how to use it.
>
> Funny, I am *so* used to examples being at the very end, from tutorials to
> man pages.
>
> If my experience is any indication, I would rather keep this order.

Sorry for the lack of clarity.  I meant "near the top of the header
*file*".

[...]
>> Are any of the Git for Windows users something that could go upstream
>> along with this patch?  That would help illustrate what a good caller
>> looks like, which should help with reviewing future patches that use
>> this code.
>
> I do not currently have the time to do that, that's why I did not
> accompany the patch by any user.
>
> However, having said that, Ben's patch series will make for an *excellent*
> user, fulfilling your wish.

Ok.  I think what you are saying is "go ahead --- anyone is welcome to
grab some patches from git for windows and upstream them", which is a
perfectly reasonable answer.

[...]
>> Reviewed-by: Jonathan Nieder 
>
> Okay, I'll add that for v2. Will wait a couple of days in case more stuff
> crops up.

FWIW nothing I noticed came to the level of requiring a v2 imho.  If any
of the ideas I mentioned seems good, they can go in patches on top.

The patch is in Junio's jch branch and I wouldn't be surprised if it
hits "next" soon.

Thanks again,
Jonathan


Re: [PATCH 1.5/8] connect: die when a capability line comes after a ref

2017-09-20 Thread Jonathan Nieder
Brandon Williams wrote:
> On 09/20, Jeff King wrote:

>> (For that matter, could we just be checking whether *list is NULL?)
>
> True, that would probably be the better way to do this.

Nice idea, thank you.

That doesn't capture a few other cases of pkts that aren't supposed to
come before the capabilities^{} line:

 * shallow
 * .have
 * capabilities^{}
 * invalid refnames

Perhaps it should check all of those:

if ((shallow_points && shallow_points->nr) ||
(extra_have && extra_have->nr) ||
got_dummy_ref_with_capabilities_declaration ||
got_invalid_ref ||
*list)

What happens when another type of pkt gets introduced?  This feels
pretty error-prone.  The underlying problem is that we are emulating a
state machine that is not a simple for loop using a simple for loop,
by piling up variables that keep track of the current state.  That
suggests one of the following approaches:

 A. Replace saw_response with an enum describing the state.
Immediately after reading the first packet, update the state to
EXPECTING_FIRST_REF.  Immediately after reading the first ref,
update the state to EXPECTING_SHALLOW.

 B. Use instruction flow to encode the state machine.  Have separate
loops for processing refs and shallow lines.

By the way, there are some other ways the current code is less strict
than described in pack-protocol.txt:

 - allowing an empty list-of-refs.  (This is deliberate ---
   pack-protocol.txt's lack of documentation of this case is a bug.)

 - allowing multiple capability-lists

 - allowing capabilities^{} combined with other refs

 - allowing refs, shallow, and .have to be interleaved

Tightening those would likely be good for the ecosystem (so that
buggy servers get noticed quickly), but that's a separate topic from
this change.

[...]
> I wasn't sure either, which is why I added the comment to prod
> discussion.  I agree that is is orthogonal to this series so I'll most
> likely drop it, as it doesn't help with the protocol transition
> discussion.

I'd be happy to write a separate patch adding the NEEDSWORK comment
(or even a patch doing what the NEEDSWORK comment suggests) to avoid
derailing this one. :)

Thanks,
Jonathan


Re: [PATCH] Add t/helper/test-write-cache to .gitignore

2017-09-20 Thread Jonathan Nieder
Brandon Williams wrote:
> On 08/28, Jonathan Tan wrote:

>> This new binary was introduced in commit 3921a0b ("perf: add test for
>> writing the index", 2017-08-21), but a .gitignore entry was not added
>> for it. Add that entry.
>>
>> Signed-off-by: Jonathan Tan 
>
> Looks good to me

Reviewed-by: Jonathan Nieder 
as well.

I wonder if we can automate this a little.  Some thoughts:

 A. We could include a test that all the binaries named in
TEST_PROGRAMS_NEED_X are also named in t/helper/.gitignore.  That
way tests would fail if this ever falls out of date again.

 B. Even better would be if we could have /t/helper/test-* in
.gitignore.  E.g. if we rename test-*.c to *.c (e.g.
t/helper/ctype.c instead of t/helper/test-ctype.c), then the test
helper source file names would be easier to type and maintaining
this .gitignore would become a problem of the past.

What do you think?
Jonathan


Re: [PATCH v6 09/12] split-index: disable the fsmonitor extension when running the split index test

2017-09-20 Thread Jonathan Nieder
Hi,

Ben Peart wrote:
> On 9/19/2017 4:43 PM, Jonathan Nieder wrote:

>> This feels to me like the wrong fix.  Wouldn't it be better for the
>> test not to depend on the precise object ids?  See the "Tips for
>> Writing Tests" section in t/README:
>
> I completely agree that a better fix would be to rewrite the test to
> not hard code the SHA values.  I'm sure this will come to bite us
> again as we discuss the migration to a different SHA algorithm.

nit: the kind of change I'm proposing does not entail a full rewrite. :)

The SHA migration aspect is true, but that's actually the least of my
worries.  I intend to introduce a SHA1 test prereq that crazy tests
which want to depend on the hash function can declare a dependency on.

My actual worry is that tests hard-coding object ids are (1) hard to
understand, as illustrated by my having no clue what these particular
object ids refer to and (2) very brittle, since an object id changes
whenever a timestamp or any of the history leading to an object
changes.  They create a trap for anyone wanting to change the test
later.  They are basically change detector tests, which is generally
accepted to be a bad practice.

> That said, I think fixing this correctly is outside the scope of
> this patch series.  It has been written this way since it was
> created back in 2014 (and patched in 2015 to hard code the V4 index
> SHA).

Fair enough.

> If desired, this patch can simply be dropped from the series
> entirely as I doubt anyone other than me will attempt to run it with
> the fsmonitor extension turned on.

*shrug*

My motivations in the context of the review were:

 * now that we noticed the problem, we have an opportunity to fix it!
   (i.e. a fix would not have to be part of this series and would not
   necessarily have to be written by you)

 * if we include this non-fix, the commit message really needs to say
   something about it.  Otherwise people are likely to cargo-cult it
   in other contexts and make the problem worse.

Thanks,
Jonathan


Re: [PATCH v2] for_each_string_list_item: avoid undefined behavior for empty list

2017-09-20 Thread Jonathan Nieder
Andreas Schwab wrote:
> On Sep 19 2017, Jonathan Nieder  wrote:

>> B. #define for_each_string_list_item(item, list) \
>>  if (list->items) \
>>  for (item = ...; ...; ... )
>>
>>This breaks a caller like
>>  if (foo)
>>  for_each_string_list_item(item, list)
>>  ...
>>  else
>>  ...
>>
>>making it a non-starter.
>
> That can be fixed with a dangling else.

I believe the fix you're referring to is option C, from the same email
you are replying to.  If not, please correct me.

Thanks,
Jonathan


[PATCH v2] for_each_string_list_item: avoid undefined behavior for empty list

2017-09-19 Thread Jonathan Nieder
From: Michael Haggerty 

If you pass a newly initialized or newly cleared `string_list` to
`for_each_string_list_item()`, then the latter does

for (
item = (list)->items; /* NULL */
item < (list)->items + (list)->nr; /* NULL + 0 */
++item)

Even though this probably works almost everywhere, it is undefined
behavior, and it could plausibly cause highly-optimizing compilers to
misbehave.  C99 section 6.5.6 paragraph 8 explains:

If both the pointer operand and the result point to elements
of the same array object, or one past the last element of the
array object, the evaluation shall not produce an overflow;
otherwise, the behavior is undefined.

and (6.3.2.3.3) a null pointer does not point to anything.

Guard the loop with a NULL check to make the intent crystal clear to
even the most pedantic compiler.  A suitably clever compiler could let
the NULL check only run in the first iteration, but regardless, this
overhead is likely to be dwarfed by the work to be done on each item.

This problem was noticed by Coverity.

[jn: using a NULL check instead of a placeholder empty list;
 fleshed out the commit message based on mailing list discussion]

Signed-off-by: Michael Haggerty 
Signed-off-by: Jonathan Nieder 
---
 string-list.h | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

Junio C Hamano wrote:
> Jonathan Nieder  writes:

>> ...  But a quick test with gcc 4.8.4
>> -O2 finds that at least this compiler does not contain such an
>> optimization.  The overhead Michael Haggerty mentioned is real.
>
> Still, I have a feeling that users of string_list wouldn't care
> the overhead of single pointer NULL-ness check.
>
>  - apply.c collects conflicted paths and reports them with fprintf().
>
>  - builtin/clean.c uses the function to walk the list of paths to be
>removed, and either does a human interaction (for "-i" codepath)
>or goes to the filesystem to remove things.
>
>  - builtin/config.c uses it in get_urlmatch() in preparation for
>doing network-y things.
>
>  - builtin/describe.c walks the list of exclude and include patterns
>to run wildmatch on the candidate reference name to filter it out.
>
>  ...
>
> In all of these examples, what happens for each item in the loop
> seems to me far heavier than the overhead this macro adds.

Yes, agreed.  As a small tweak,

   #define for_each_string_list_item(item, list) \
for (item = ...; item && ...; ...)

produces nicer assembly than

   #define for_each_string_list_item(item, list) \
for (item = ...; list->items && ...; ...)

(By the way, the potential optimization I described isn't valid: we
know that when item == NULL and list->items == NULL, list->nr is
always zero, but the compiler has no way to know that.  So it can't
eliminate the NULL test.  For comparison, a suitably smart compiler
should be able to eliminate a 'list->nr != 0 &&' guard if 'list'
doesn't escape in the loop body.)

Recapping the other proposed fixes:

A. Make it an invariant of string_list that items is never NULL and
   update string_list_init et al to use an empty array.  This is
   pretty painless until you notice some other structs that embed
   string_list without using STRING_LIST_INIT.  Updating all those
   would be too painful.

B. #define for_each_string_list_item(item, list) \
if (list->items) \
for (item = ...; ...; ... )

   This breaks a caller like
if (foo)
for_each_string_list_item(item, list)
...
else
...

   making it a non-starter.

C. As Gábor suggested,
   #define for_each_string_list_item(item, list) \
if (!list->items) \
; /* nothing to do */ \
else \
for (item = ...; ...; ...)

   This handles the caller from (B) correctly.  But it produces
   compiler warnings for a caller like

if (foo)
for_each_string_list_item(item, list)
...

   There is only one instance of that construct in git today.  It
   looks nicer anyway with braces, so this approach would also be
   promising.

D. Eliminate for_each_string_list_item and let callers just do

unsigned int i;
for (i = 0; i < list->nr; i++) {
struct string_list_item *item = list->items[i];
...
}

   Having to declare item is unnecessarily verbose, decreasing the
   appeal of this option.  I think I like it anyway, but I wasn't able
   to convince coccinelle to do it.

E. Use subtraction instead of addition:
   #define for_each_string_list_item(item, list) \
for (item = ...; \
 (item == list->items ? 0 : item - list->items) < nr; \
 item++)

   I ex

Re: [PATCH] for_each_string_list_item(): behave correctly for empty list

2017-09-19 Thread Jonathan Nieder
Hi,

Kaartic Sivaraam wrote:
> On Saturday 16 September 2017 09:36 AM, Michael Haggerty wrote:
>> Jonathan Nieder wrote:

>>> Does the following alternate fix work?  I think I prefer it because
>>> it doesn't require introducing a new global. [...]
>>>   #define for_each_string_list_item(item,list) \
>>> -   for (item = (list)->items; item < (list)->items + (list)->nr; ++item)
>>> +   for (item = (list)->items; \
>>> +(list)->items && item < (list)->items + (list)->nr; \
>>> +++item)
>>
>> This is the possibility that I was referring to as "add[ing] overhead to
>> each iteration of the loop". I'd rather not add an extra test-and-branch
>> to every iteration of a loop in which `list->items` is *not* NULL, which
>> your solution appears to do. Or are compilers routinely able to optimize
>> the check out?
>
> I t seems at least 'gcc' is able to optimize this out even with a -O1
> and 'clang' optimizes this out with a -O2. Taking a sneak peek at
> the 'Makefile' shows that our default is -O2.
>
> For a proof, see https://godbolt.org/g/CPt73L

>From that link:

for ( ;valid_int && *valid_int < 10; (*valid_int)++) {
printf("Valid instance");
}

Both gcc and clang are able to optimize out the 'valid_int &&' because
it is dereferenced on the RHS of the &&.

For comparison, 'item < (list)->items + (list)->nr' does not
dereference (list)->items.  So that optimization doesn't apply here.

A smart compiler could be able to take advantage of there being no
object pointed to by a null pointer, which means

item < (list)->items + (list)->nr

is always false when (list)->items is NULL, which in turn makes a
'(list)->items &&' test redundant.  But a quick test with gcc 4.8.4
-O2 finds that at least this compiler does not contain such an
optimization.  The overhead Michael Haggerty mentioned is real.

Thanks and hope that helps,
Jonathan


Re: [PATCH] for_each_string_list_item(): behave correctly for empty list

2017-09-19 Thread Jonathan Nieder
Hi,

Junio C Hamano wrote:
> Kaartic Sivaraam  writes:
>> On Saturday 16 September 2017 09:36 AM, Michael Haggerty wrote:
>>> Jonathan Nieder wrote:

>>>> Does the following alternate fix work?  I think I prefer it because
>>>> it doesn't require introducing a new global. [...]
>>>>   #define for_each_string_list_item(item,list) \
>>>> -  for (item = (list)->items; item < (list)->items + (list)->nr; ++item)
>>>> +  for (item = (list)->items; \
>>>> +   (list)->items && item < (list)->items + (list)->nr; \
>>>> +   ++item)
>>>
>>> This is the possibility that I was referring to as "add[ing] overhead to
>>> each iteration of the loop". I'd rather not add an extra test-and-branch
>>> to every iteration of a loop in which `list->items` is *not* NULL, which
>>> your solution appears to do. Or are compilers routinely able to optimize
>>> the check out?
>>
>> It seems at least 'gcc' is able to optimize this out even with a -O1
>> and 'clang' optimizes this out with a -O2. Taking a sneak peek at
>> the 'Makefile' shows that our default is -O2.
>
> But doesn't the versions of gcc and clang currently available do the
> right thing with the current code without this change anyway?  I've
> been operating under the assumption that this is to future-proof the
> code even when the compilers change to use the "NULL+0 is undefined"
> as an excuse to make demons fly out of your nose, so unfortunately I
> do not think it is not so huge a plus to find that the current
> compilers do the right thing to the code with proposed updates.

I think you and Kaartic are talking about different things.  Kaartic
was checking that this wouldn't introduce a performance regression
(thanks!).  You are concerned about whether the C standard and common
practice treat the resulting code as exhibiting undefined behavior.

Fortunately the C standard is pretty clear about this.  The undefined
behavior here is at run time, not compile time.  As you suggested in
an earlier reply, the 'list->items &&' effectively guards the
'list->items + list->nr' to prevent that undefined behavior.

I'll send a patch with a commit message saying so to try to close out
this discussion.

Thanks,
Jonathan


Re: [PATCH v6 09/12] split-index: disable the fsmonitor extension when running the split index test

2017-09-19 Thread Jonathan Nieder
Hi,

Ben Peart wrote:

> The split index test t1700-split-index.sh has hard coded SHA values for
> the index.  Currently it supports index V4 and V3 but assumes there are
> no index extensions loaded.
>
> When manually forcing the fsmonitor extension to be turned on when
> running the test suite, the SHA values no longer match which causes the
> test to fail.
>
> The potential matrix of index extensions and index versions can is quite
> large so instead disable the extension before attempting to run the test.

Thanks for finding and diagnosing this problem.

This feels to me like the wrong fix.  Wouldn't it be better for the
test not to depend on the precise object ids?  See the "Tips for
Writing Tests" section in t/README:

 And
such drastic changes to the core GIT that even changes these
otherwise supposedly stable object IDs should be accompanied by
an update to t-basic.sh.

However, other tests that simply rely on basic parts of the core
GIT working properly should not have that level of intimate
knowledge of the core GIT internals.  If all the test scripts
hardcoded the object IDs like t-basic.sh does, that defeats
the purpose of t-basic.sh, which is to isolate that level of
validation in one place.  Your test also ends up needing
updating when such a change to the internal happens, so do _not_
do it and leave the low level of validation to t-basic.sh.

Worse, t1700-split-index.sh doesn't explain where the object ids it
uses comes from so it is not even obvious to a casual reader like me
how to fix it.

See t/diff-lib.sh for some examples of one way to avoid depending on
the object id computation.  Another way that is often preferable is to
come up with commands to compute the expected hash values, like
$(git rev-parse HEAD^{tree}), and use those instead of hard-coded
values.

Thanks and hope that helps,
Jonathan


Re: [PATCH v1 1/1] test-lint: echo -e (or -E) is not portable

2017-09-19 Thread Jonathan Nieder
Torsten Bögershausen  wrote:

> Some implementations of `echo` support the '-e' option to enable
> backslash interpretation of the following string.
> As an addition, they support '-E' to turn it off.

nit: please wrap the commit message to a consistent line width.

> However, none of these are portable, POSIX doesn't even mention them,
> and many implementations don't support them.
>
> A check for '-n' is already done in check-non-portable-shell.pl,
> extend it to cover '-n', '-e' or '-E-'
>
> Signed-off-by: Torsten Bögershausen 
> ---
>  t/check-non-portable-shell.pl | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

An excellent change.  Thanks for noticing and fixing this.

Reviewed-by: Jonathan Nieder 


Re: [PATCH v6 10/12] fsmonitor: add test cases for fsmonitor extension

2017-09-19 Thread Jonathan Nieder
Ben Peart wrote:

> Some stats on these same coding style errors in the current bash scripts:
>
> 298 instances of "[a-z]\(\).*\{" ie "function_name() {" (no space)
> 140 instances of "if \[ .* \]" (not using the preferred "test")
> 293 instances of "if .*; then"
>
> Wouldn't it be great not to have to write up style feedback for when
> these all get copy/pasted into new scripts?

Agreed.  Care to write patches for these? :)  (I think three patches,
one for each issue, would do the trick.)

Thanks,
Jonathan


Re: [PATCH] doc: camelCase the config variables to improve readability

2017-09-19 Thread Jonathan Nieder
Hi,

Kaartic Sivaraam wrote:

> The config variable used weren't readable as they were in the
> crude form of how git stores/uses it's config variables.

nit: Git's commit messages describe the current behavior of Git in the
present tense.  Something like:

These manpages' references to config variables are hard to read because
they use all-lowercase names, without indicating where each word ends
and begins in the configuration variable names.

Improve readability by using camelCase instead.  Git treats these names
case-insensitively so this does not affect functionality.

> Improve it's readability by replacing them with camelCased versions
> of config variables as it doesn't have any impact on it's usage.

nit: s/it's/its/ (in both places)

> Signed-off-by: Kaartic Sivaraam 
> ---
>  Documentation/git-branch.txt | 4 ++--
>  Documentation/git-tag.txt| 2 +-
>  2 files changed, 3 insertions(+), 3 deletions(-)

This also is just more consistent with the rest of the docs.

FWIW, with or without the commit message tweaks mentioned above,
Reviewed-by: Jonathan Nieder 

Thanks for your attention to detail.


Re: [PATCH] Win32: simplify loading of DLL functions

2017-09-19 Thread Jonathan Nieder
Hi,

Johannes Schindelin wrote:

> Dynamic loading of DLL functions is duplicated in several places in Git
> for Windows' source code.
>
> This patch adds a pair of macros to simplify the process: the
> DECLARE_PROC_ADDR(, , ,
> ..) macro to be used at the beginning of a
> code block, and the INIT_PROC_ADDR() macro to call before
> using the declared function. The return value of the INIT_PROC_ADDR()
> call has to be checked; If it is NULL, the function was not found in the
> specified DLL.
>
> Example:
>
> DECLARE_PROC_ADDR(kernel32.dll, BOOL, CreateHardLinkW,
>   LPCWSTR, LPCWSTR, LPSECURITY_ATTRIBUTES);
>
> if (!INIT_PROC_ADDR(CreateHardLinkW))
> return error("Could not find CreateHardLinkW() function";
>
>   if (!CreateHardLinkW(source, target, NULL))
>   return error("could not create hardlink from %S to %S",
>source, target);
>   return 0;

nit: whitespace is a bit strange here (mixture of tabs and spaces).

Could this example go near the top of the header instead?  That way,
it's easier for people reading the header to see how to use it.

> Signed-off-by: Karsten Blees 
> Signed-off-by: Johannes Schindelin 
> ---

Just curious: what was Karsten's contribution?  (I ask mostly because
I'm interested in kinds of collaboration git metadata is failing to
capture correctly --- e.g. pair programming.)

> So far, there are no users (except in Git for Windows). Ben
> promised to make use of it in his fsmonitor patch series.
>
>  compat/win32/lazyload.h | 44 
>  1 file changed, 44 insertions(+)
>  create mode 100644 compat/win32/lazyload.h

Are any of the Git for Windows users something that could go upstream
along with this patch?  That would help illustrate what a good caller
looks like, which should help with reviewing future patches that use
this code.

> --- /dev/null
> +++ b/compat/win32/lazyload.h
> @@ -0,0 +1,44 @@
[...]
> +/* Declares a function to be loaded dynamically from a DLL. */
> +#define DECLARE_PROC_ADDR(dll, rettype, function, ...) \
> + static struct proc_addr proc_addr_##function = \
> + { #dll, #function, NULL, 0 }; \
> + static rettype (WINAPI *function)(__VA_ARGS__)
> +
> +/*
> + * Loads a function from a DLL (once-only).
> + * Returns non-NULL function pointer on success.
> + * Returns NULL + errno == ENOSYS on failure.
> + */
> +#define INIT_PROC_ADDR(function) \
> + (function = get_proc_addr(&proc_addr_##function))

Probably worth mentioning in the doc comment that this is not thread
safe, so a caller that wants to lazy-init in a threaded context is
responsible for doing their own locking.

> +
> +static inline void *get_proc_addr(struct proc_addr *proc)
> +{
> + /* only do this once */
> + if (!proc->initialized) {
> + HANDLE hnd;
> + proc->initialized = 1;
> + hnd = LoadLibraryExA(proc->dll, NULL,
> +  LOAD_LIBRARY_SEARCH_SYSTEM32);
> + if (hnd)
> + proc->pfunction = GetProcAddress(hnd, proc->function);
> +     }
> + /* set ENOSYS if DLL or function was not found */
> + if (!proc->pfunction)
> + errno = ENOSYS;
> + return proc->pfunction;
> +}

strerror(ENOSYS) is "Function not implemented".  Cute.

Reviewed-by: Jonathan Nieder 

Thanks,
Jonathan


Re: What's cooking in git.git (Aug 2017, #05; Tue, 22)

2017-09-19 Thread Jonathan Nieder
Johannes Schindelin wrote:

> What you need is a tool to aggregate this information, to help working
> with it, to manage the project, and to be updated automatically. And to
> publish this information continuously, without costing you extra effort.
>
> I understand that you started before GitHub existed, and before GitHub was
> an option, the script-generated What's cooking mail was the best you could
> do.

On second reading, I think you're talking about GitHub's code review
("pull request") feature, not a bug tracker.

There's been some active work (some that you've been involved in, I
believe) on getting information from a GitHub pull request to the
mailing list.  One possibility would be to get information to also go
in the other direction, so people have information about what happened
to their patch in one place.

I can also see why you are directing your attention to the maintainer
for this one, since if the entire project adopts the GitHub Pull
Request workflow, then the complexity and other flaws of such a
two-way sync could be avoided.

Unfortunately the maintainer is not the only person you'd need to
convince.  You'd also need to convince patch authors and reviewers to
move to the new workflow.  There are likely some potential
contributors that this would bring in, that would like to get involved
in the Git project but had been deterred by e.g. the mailing list's
aggressive filtering of any email with an HTML part.  Meanwhile it
would also be a large adjustment for existing contributors, so it's
not risk free.

I personally like how Bazel[1] handles this.  They have two ways to
contribute:

 A. People used to GitHub can use a pull request like they are
accustomed to.

 B. People preferring a more structured code review can use Gerrit.

Having only two ways to contribute means that the code review
information is still easy to archive and search, just like our mailing
list archive.

(Actually, an example I like even more is Akaros[2], which also
appears to accept patch contributions by email.)

Adding new ways for a contributor to submit a patch would mean that we
can experiment with new workflows without getting in the way of people
using an existing one.

Thoughts?
Jonathan

[1] https://bazel.build/contributing.html
[2] https://groups.google.com/forum/#!forum/akaros


Re: What's cooking in git.git (Aug 2017, #05; Tue, 22)

2017-09-19 Thread Jonathan Nieder
Hi,

Johannes Schindelin wrote:

> To relate that, you are using a plain text format that is not well defined
> and not structured, and certainly not machine-readable, for information
> that is crucial for project management.
>
> What you need is a tool to aggregate this information, to help working
> with it, to manage the project, and to be updated automatically. And to
> publish this information continuously, without costing you extra effort.
>
> I understand that you started before GitHub existed, and before GitHub was
> an option, the script-generated What's cooking mail was the best you could
> do.

I think you are subtly (but not directly, for some reason?) advocating
moving project management for the Git project to GitHub Issues.

For what it's worth:

 1. I would be happy to see Git adopt a bug tracker.  As we've
discussed on the list before, I suspect the only way that this is
going to happen is if some contributors start using a bug tracker
and keep up with bugs there, without requiring everyone to use it.
That is how the Linux Kernel project started using
bugzilla.kernel.org, for example.

 2. GitHub Issues is one of my least favorite bug trackers, for what
it's worth.  If some sub-project of Git chooses to use it, then
that's great and I won't get in their way.  I'm just providing
this single data point that approximately any other tracker
(Bugzilla, JIRA, debbugs, patchwork) is something I'd be more
likely to use.

 3. This advice might feel hopeless, because if the maintainer is not
involved in the initial pilot, then how does the bug tracker get
notified when a patch has been accepted?  But fortunately this is
a problem other people have solved: e.g. most bug trackers have an
API that can be used to automatically notify the bug when a patch
with a certain subject line appears on a certain branch.

Thanks and hope that helps,
Jonathan


Re: [PATCH 2/2] t9010-*.sh: skip all tests if the PIPE prereq is missing

2017-09-18 Thread Jonathan Nieder
Ramsay Jones wrote:

> Every test in this file, except one, is marked with the PIPE prereq.
> However, that lone test ('set up svn repo'), only performs some setup
> work and checks whether the following test should be executed (by
> setting an additional SVNREPO prerequisite). Since the following test
> also requires the PIPE prerequisite, performing the setup test, when the
> PIPE preequisite is missing, is simply wasted effort. Use the skip-all
> test facility to skip all tests when the PIPE prerequisite is missing.
>
> Signed-off-by: Ramsay Jones 
> ---
>  t/t9010-svn-fe.sh | 55 
> ---
>  1 file changed, 28 insertions(+), 27 deletions(-)

It was always the intention that this test script eventually be able
to run on Windows, but it was not meant to be.  It would need to use a
socket or something for backflow to work on Windows.

Reviewed-by: Jonathan Nieder 


Re: [PATCH 1/2] test-lib: use more compact expression in PIPE prerequisite

2017-09-18 Thread Jonathan Nieder
Ramsay Jones wrote:

> Signed-off-by: Ramsay Jones 
> ---
>  t/test-lib.sh | 10 ++
>  1 file changed, 2 insertions(+), 8 deletions(-)

Reviewed-by: Jonathan Nieder 


Re: [PATCH 2/2] read_info_alternates: warn on non-trivial errors

2017-09-18 Thread Jonathan Nieder
Jeff King wrote:

> When we fail to open $GIT_DIR/info/alternates, we silently
> assume there are no alternates. This is the right thing to
> do for ENOENT, but not for other errors.
>
> A hard error is probably overkill here. If we fail to read
> an alternates file then either we'll complete our operation
> anyway, or we'll fail to find some needed object. Either
> way, a warning is good idea. And we already have a helper
> function to handle this pattern; let's just call
> warn_on_fopen_error().

I think I prefer a hard error.  What kind of cases are you imagining
where it would be better to warn?

E.g. for EIO, erroring out so that the user can try again seems better
than hoping that the application will be able to cope with the more
subtle error that comes from discovering some objects are missing.

For EACCES, I can see how it makes sense to warn and move on, but no
other errors like that are occuring to me.

Thoughts?

Thanks,
Jonathan

> Note that technically the errno from strbuf_read_file()
> might be from a read() error, not open(). But since read()
> would never return ENOENT or ENOTDIR, and since it produces
> a generic "unable to access" error, it's suitable for
> handling errors from either.


Re: [PATCH 1/2] read_info_alternates: read contents into strbuf

2017-09-18 Thread Jonathan Nieder
Jeff King wrote:

> Reported-by: Michael Haggerty 
> Signed-off-by: Jeff King 
> ---
>  sha1_file.c | 29 +
>  1 file changed, 9 insertions(+), 20 deletions(-)

Thanks for tracking it down.

Reviewed-by: Jonathan Nieder 

[...]
> +++ b/sha1_file.c
[...]
> @@ -427,28 +427,17 @@ static void link_alt_odb_entries(const char *alt, int 
> len, int sep,
>  
>  static void read_info_alternates(const char * relative_base, int depth)
>  {
> - char *map;
> - size_t mapsz;
> - struct stat st;
>   char *path;
> - int fd;
> + struct strbuf buf = STRBUF_INIT;
>  
>   path = xstrfmt("%s/info/alternates", relative_base);
> - fd = git_open(path);
> - free(path);
> - if (fd < 0)
> - return;
> - if (fstat(fd, &st) || (st.st_size == 0)) {
> - close(fd);
> + if (strbuf_read_file(&buf, path, 1024) < 0) {
> + free(path);
>   return;

strbuf_read_file is careful to release buf on failure, so this doesn't
leak (but it's a bit subtle).

What happened to the !st.st_size case?  Is it eliminated for
simplicity?

The rest looks good.

Thanks,
Jonathan


Re: [PATCH 0/2] fix read past end of array in alternates files

2017-09-18 Thread Jonathan Nieder
Jeff King wrote:

> This series fixes a regression in v2.11.1 where we might read past the
> end of an mmap'd buffer. It was introduced in cf3c635210,

The above information is super helpful.  Can it go in one of the commit
messages?

>   but I didn't
> base the patch on there, for a few reasons:
>
>   1. There's a trivial conflict when merging up (because of
>  git_open_noatime() becoming just git_open() in the inerim).
>
>   2. The reproduction advice relies on our SANITIZE Makefile knob, which
>  didn't exist back then.
>
>   3. The second patch does not apply there because we don't have
>  warn_on_fopen_errors(). Though admittedly it could be applied
>  separately after merging up; it's just a clean-up on top.

Even this part could go in a commit message, but it's fine for it not
to.

Thanks,
Jonathan


Re: [PATCH] rev-parse: rev-parse: add --is-shallow-repository

2017-09-18 Thread Jonathan Nieder
Hi,

Øystein Walle wrote:

> Running `git fetch --unshallow` on a repo that is not in fact shallow
> produces a fatal error message.

Hm, can you say more about the context?  From a certain point of view,
it might make sense for that command to succeed instead: if the repo
is already unshallow, then why should't "fetch --unshallow" complain
instead of declaring victory?

> Add a helper to rev-parse that scripters
> can use to determine whether a repo is shallow or not.
>
> Signed-off-by: Øystein Walle 
> ---
>  Documentation/git-rev-parse.txt |  3 +++
>  builtin/rev-parse.c |  5 +
>  t/t1500-rev-parse.sh| 15 +++
>  3 files changed, 23 insertions(+)

Regardless, this new rev-parse --is-shallow helper looks like a good
feature.

[...]
> --- a/builtin/rev-parse.c
> +++ b/builtin/rev-parse.c
> @@ -868,6 +868,11 @@ int cmd_rev_parse(int argc, const char **argv, const 
> char *prefix)
>   : "false");
>   continue;
>   }
> + if (!strcmp(arg, "--is-shallow-repository")) {
> + printf("%s\n", is_repository_shallow() ? "true"
> + : "false");
> + continue;
> + }

The implementation is straightforward and correct.

[...]
> --- a/t/t1500-rev-parse.sh
> +++ b/t/t1500-rev-parse.sh

Thanks for writing tests. \o/

> @@ -116,6 +116,21 @@ test_expect_success 'git-path inside sub-dir' '
>   test_cmp expect actual
>  '
>  
> +test_expect_success 'git-path shallow repository' '

What does git-path mean here?  I wonder if it's a copy/paste error.
Did you mean something like

 test_expect_success 'rev-parse --is-shallow-repository in shallow repo' '

?

> + test_commit test_commit &&
> + echo true >expect &&
> + git clone --depth 1 --no-local . shallow &&
> + test_when_finished "rm -rf shallow" &&
> + git -C shallow rev-parse --is-shallow-repository >actual &&
> + test_cmp expect actual
> +'
> +
> +test_expect_success 'git-path notshallow repository' '

Likewise: should this be

 test_expect_success 'rev-parse --is-shallow-repository in non-shallow repo' '

?

> + echo false >expect &&
> + git rev-parse --is-shallow-repository >actual &&
> + test_cmp expect actual
> +'
> +
>  test_expect_success 'showing the superproject correctly' '

With the two tweaks mentioned above,
Reviewed-by: Jonathan Nieder 

Thanks.


Re: [OUTREACHY] pack: make packed_git_mru global a value instead of a pointer

2017-09-18 Thread Jonathan Nieder
Hi,

phionah bugosi wrote:

> Just to reecho a previous change requested before in one of the mail
> threads, we currently have two global variables declared:
>
> struct mru packed_git_mru_storage;
> struct mru *packed_git_mru = &packed_git_mru_storage;
>
> We normally use pointers in C to point or refer to the same location
> or space in memory from multiple places. That means in situations
> where we will have the pointer be assigned to in many places in our
> code. But in this case, we do not have any other logic refering or
> assigning to the pointer packed_git_mru. In simple words we actually
> dont need a pointer in this case.
>
> Therefore this patch makes packed_git_mru global a value instead of
> a pointer and makes use of list.h
>
> Signed-off-by: phionah bugosi 
> ---
>  builtin/pack-objects.c |  5 +++--
>  cache.h|  7 ---
>  list.h |  6 ++
>  packfile.c | 12 ++--
>  4 files changed, 15 insertions(+), 15 deletions(-)

*puzzled* This appears to already be in "pu", with a different author.
Did you independently make the same change?  Or are you asking for a
progress report on that change, and just phrasing it in a confusing
way?

Confused,
Jonathan


<    4   5   6   7   8   9   10   11   12   13   >