Re: [PATCH 2/2] tests: fix diff order arguments in test_cmp
Stefan Beller wrote: > Fix the argument order for test_cmp. When given the expected > result first the diff shows the actual output with '+' and the > expectation with '-', which is the convention for our tests. > > Signed-off-by: Stefan Beller > --- Yes, this should make the output from failing tests easier to take in at a glance. Reviewed-by: Jonathan Nieder How did you find these? E.g. is there a grep pattern that reviewers can use to repeat your results?
Re: [PATCH 1/2] tests: use shell negation instead of test_must_fail for test_cmp
Hi, Jeff King wrote: > On Fri, Oct 06, 2017 at 12:00:05PM -0700, Stefan Beller wrote: >> The `test_must_fail` should only be used to indicate a git command is >> failing. `test_cmp` is not a git command, such that it doesn't need the >> special treatment of `test_must_fail` (which e.g. includes checking for >> segfault) > > Hmph. "test_must_fail test_cmp" is a weird thing for somebody to write. > And your patch is obviously an improvement, but I have to wonder if some > of these make any sense. Just for the record: I agree with all the above, and my Reviewed-by still stands. Thanks for looking closer. I wonder if there's a systematic way to avoid this kind of weak test that can bitrot and still pass too easily --- e.g. should we have a test_must_differ command with an explanation of why it should usually be avoided? Would a lint check that bans this kind of construct completely be going too far? Jonathan
Re: [PATCH 1/2] tests: use shell negation instead of test_must_fail for test_cmp
Stefan Beller wrote: > The `test_must_fail` should only be used to indicate a git command is > failing. `test_cmp` is not a git command, such that it doesn't need the > special treatment of `test_must_fail` (which e.g. includes checking for > segfault). > > Signed-off-by: Stefan Beller > --- > t/t3504-cherry-pick-rerere.sh | 2 +- > t/t5512-ls-remote.sh | 2 +- > t/t5612-clone-refspec.sh | 2 +- > t/t7508-status.sh | 2 +- > t/t9164-git-svn-dcommit-concurrent.sh | 2 +- > 5 files changed, 5 insertions(+), 5 deletions(-) Thanks. I agree that this is more readable, and it matches the advice in t/README: On the other hand, don't use test_must_fail for running regular platform commands; just use '! cmd'. We are not in the business of verifying that the world given to us sanely works. Reviewed-by: Jonathan Nieder I wonder if it would be useful to have a nod to that advice in the docstring in t/test-lib-functions.sh, too.
Re: is there a truly compelling rationale for .git/info/exclude?
Hi, Robert P. J. Day wrote: > On Fri, 6 Oct 2017, Junio C Hamano wrote: >> Don't waste time by seeking a "compelling" reason. A mere "this is >> the most expedite way to gain convenience" back when something was >> introduced could be an answer, and it is way too late to complain >> about such a choice anyway. > > perfectly respectable answer ... it tells me that, between > .gitignore files and core.excludesFile, there's not much left for > .git/info/exclude to do, except in weird circumstances. I use .git/info/exclude in what I don't consider to be weird circumstances. But I am not motivated to say more than that without knowing what my answer is going to be used for. E.g. is there a part of the gitignore(5) man page where such an explanation would make it less confusing and more useful? Thanks, Jonathan
Re: Git config multiple values
Hi, Jeff King wrote: > On Fri, Oct 06, 2017 at 01:10:17PM +0200, aleksander.baranowski wrote: >> It's just an opinion, but this behaviour is no consistent for me. >> >> If it's not the bug it's a feature just let me know. > > It's a feature, though I agree that git-config is rather baroque. We're > mostly stuck with it for reasons of backwards compatibility, though. This feels like a dodge. Can we make a list of what is baroque here, with an eye to fixing it? E.g. if we introduce a new --set option, then what should its semantics be, to be more intuitive? Thanks, Jonathan
Re: [PATCH v2] api-argv-array.txt: Remove broken link to string-list API
Todd Zullinger wrote: > In 4f665f2cf3 (string-list.h: move documentation from Documentation/api/ > into header, 2017-09-26) the string-list API documentation was moved to > string-list.h. The argv-array API documentation may follow a similar > course in the future. Until then, prevent the broken link from making > it to the end-user documentation. > > Signed-off-by: Todd Zullinger > --- > Documentation/technical/api-argv-array.txt | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) Reviewed-by: Jonathan Nieder Thanks for catching it. Do you use a broken link detection tool to detect this kind of issue automatically? [...] > --- a/Documentation/technical/api-argv-array.txt > +++ b/Documentation/technical/api-argv-array.txt > @@ -8,7 +8,7 @@ always NULL-terminated at the element pointed to by > `argv[argc]`. This > makes the result suitable for passing to functions expecting to receive > argv from main(), or the link:api-run-command.html[run-command API]. > > -The link:api-string-list.html[string-list API] is similar, but cannot be > +The string-list API (documented in string-list.h) is similar, but cannot be > used for these purposes; instead of storing a straight string pointer, > it contains an item structure with a `util` field that is not compatible > with the traditional argv interface.
Re: [RFC] Reporting index properties
Alex Vandiver wrote: > As part of gathering some automated performance statistics of git, it > would be useful to be able to extract some vital properties of the > index. While `git update-index` has ways to _set_ the index version, > and enable/disable various extensions, AFAICT there is no method by > which one can ask for reporting about the current state of them -- > short of re-implementing all of the index-parsing logic external to > Git, which is not terribly appealing. Aside: git-update-index(1) says Version 4 is relatively young (first released in 1.8.0 in October 2012). My first reaction is to wonder if it is time to introduce a config option to use index format version 4 by default, since after 5 years it is not relatively young any more. My second reaction is to notice that the index.version configuration already exists. Probably git-update-index(1) ought to point to it. JGit still doesn't support index format 4, so 4 is still not a good default for `index.version` for a user that hasn't explicitly configured it, but the moment JGit gains that support I think it will be. :) > I hesitate to propose adding a flag to `git update-index` which reads > but does no updating, as that seems to make a misnomer of the > command. But this also doesn't seem worthy of a new top-level command. The existing scripting command that inspects the index is "git ls-files". It doesn't go into this kind of low-level detail about the index format, though. In the same spirit, I don't think we have an existing command in this spirit for analyzing packfiles, either. So I agree with Junio that this would be a good debugging aid to put in t/helper/ for now, and then once we've had experience with it we'd end up in a better position to come up with a stable, public-facing interface, if one is needed. Thanks and hope that helps, Jonathan
Re: couple questions about git "logical variables" and "git var"
Jeff King wrote: > On Thu, Oct 05, 2017 at 05:11:04AM -0400, rpj...@crashcourse.ca wrote: >> - GIT_AUTHOR_IDENT >> - GIT_COMMITTER_IDENT >> - GIT_EDITOR >> - GIT_PAGER >> >> first question -- what is it about precisely those four variables that makes >> them "logical" variables in git parlance? just those four? no others? > > It was introduced in the very early days as a way for scripts to get > access to "standard" values that would be computed the same way as the C > portions of Git. But it hasn't generally been kept up to date with new > possible variables. > > It also only tells half the story. You have to know not just what's in > $GIT_EDITOR, but you have to know the right way to evaluate it. There's > a git_editor helper in git-sh-setup, but other scripting languages are > on their own. I am not sure I understand the complaint here. git-var(1) says: GIT_EDITOR Text editor for use by Git commands. The value is meant to be interpreted by the shell when it is used. Examples: [...] Are you saying that the output of the command should quote that manpage, so as to tell the rest of the story? > We'd probably have done better to introduce a "git editor" > command which can be run from any language. I remember that we discussed this at the time but don't remember why it didn't happen. It seems like a good idea. [...] >> p.s. yes, i realize this command is deprecated in favour of "git config -l", >> but as long as it's available, it should work as described in the man page. > > Yes, though I think fixing the manpage is the right way to make them > consistent. Agreed as well. rday, care to take a stab at wording? Thanks, Jonathan
Re: [PATCH 0/6] Teach Status options around showing ignored files
Hi, jameson.mille...@gmail.com wrote: > This patch series is the second part of [1], which was split into 2 > parts. The first part, added an optimization in the directory listing > logic to not scan the contents of ignored directories and was merged > to master with commit 5aaa7fd3. This patch series includes the second > part to expose additional arguments to the --ignored option on the > status command. Thanks. > This patch series teaches the status command more options to control > which ignored files are reported independently of the which untracked [...] > Our application (Visual Studio) has a specific set of requirements > about how it wants untracked / ignored files reported by git status. [...] > The reason for controlling these behaviors separately is that there > can be a significant performance impact to scanning the contents of [] > As a more concrete example, on Windows, Visual Studio creates a bin/ > and obj/ directory inside of the project where it writes all .obj and [...] I see this information is also in patch 1/6. That's a very good thing, since that makes performance numbers involved more concrete about which patch brings them about and it becomes part of permanent history that way --- thanks. But it took me a while to notice, and before then, I was trying to read through the cover letter to get an overview of which patches I am supposed to look at. For next time, could the cover letter say something like "See patches 1 and 2 for more details about the motivation" instead of repeating the commit message content? That would save reviewers some time. Thanks, Jonathan
Re: [PATCH] .mailmap: normalize name for René Scharfe
René Scharfe wrote: > Reported-by: Johannes Schindelin > Reported-by: Stefan Beller > Signed-off-by: Rene Scharfe > --- > .mailmap | 1 + > 1 file changed, 1 insertion(+) A quick "git shortlog -nse" run confirms that this works. Reviewed-by: Jonathan Nieder Thanks. > diff --git a/.mailmap b/.mailmap > index ab85e0d16d..224db83887 100644 > --- a/.mailmap > +++ b/.mailmap > @@ -194,6 +194,7 @@ Philippe Bruhat > Ralf Thielow > Ramsay Jones > René Scharfe > +René Scharfe Rene Scharfe > Richard Hansen > Richard Hansen > Robert Fitzsimons
Re: [PATCH v2] ref-filter.c: pass empty-string as NULL to atom parsers
Junio C Hamano wrote: > From: Taylor Blau > Date: Mon, 2 Oct 2017 09:10:34 -0700 > Subject: [PATCH] ref-filter.c: pass empty-string as NULL to atom parsers > > Peff points out that different atom parsers handle the empty > "sub-argument" list differently. An example of this is the format > "%(refname:)". > > Since callers often use `string_list_split` (which splits the empty > string with any delimiter as a 1-ary string_list containing the empty > string), this makes handling empty sub-argument strings non-ergonomic. > > Let's fix this by declaring that atom parser implementations must > not care about distinguishing between the empty string "%(refname:)" > and no sub-arguments "%(refname)". Current code aborts, either with > "unrecognised arg" (e.g. "refname:") or "does not take args" > (e.g. "body:") as an error message. > > Signed-off-by: Taylor Blau > Reviewed-by: Jeff King > Reviewed-by: Jonathan Nieder > Signed-off-by: Junio C Hamano > --- > ref-filter.c| 10 +++++- > t/t6300-for-each-ref.sh | 1 + > 2 files changed, 10 insertions(+), 1 deletion(-) Thanks for taking care of it. This is indeed still Reviewed-by: Jonathan Nieder
Re: Line ending normalization doesn't work as expected
Junio C Hamano wrote: > Jonathan Nieder writes: >> git checkout --renormalize . >> git status; # Show files that will be normalized >> git commit; # Commit the result >> >> What do you think? Would you be interested in writing a patch for it? >> ("No" is as always an acceptable answer.) > > I actually think what is being requested is the opposite, i.e. "the > object registered in the index have wrong line endings, and the > safe-crlf is getting in the way to prevent me from correcting by > hashing the working tree contents again to register contents with > corrected line endings, even with 'git add .'". > > So I would understand if your suggestion were for > > git checkin --renormalize . > > but not "git checkout". And it probably is more familiar to lay > people if we spelled that as "git add --renormalize ." ;-) Good catch. You understood correctly --- "git add --renormalize" is the feature that I think is being hinted at here. Thanks, Jonathan
Re: [PATCH 1/3] path.c: fix uninitialized memory access
Junio C Hamano wrote: > From: Jeff King > Date: Tue, 3 Oct 2017 19:30:40 -0400 > Subject: [PATCH] path.c: fix uninitialized memory access > > In cleanup_path we're passing in a char array, run a memcmp on it, and > run through it without ever checking if something is in the array in the > first place. This can lead us to access uninitialized memory, for > example in t5541-http-push-smart.sh test 7, when run under valgrind: > > ==4423== Conditional jump or move depends on uninitialised value(s) > ==4423==at 0x242FA9: cleanup_path (path.c:35) [...] > ==4423== Uninitialised value was created by a heap allocation [...] > ==4423==by 0x29A30B: strbuf_grow (strbuf.c:66) > ==4423==by 0x29A30B: strbuf_vaddf (strbuf.c:277) > ==4423==by 0x242F9F: mkpath (path.c:454) [...] > Avoid this by using skip_prefix(), which knows not to go beyond the > end of the string. > > Reported-by: Thomas Gummerer > Signed-off-by: Jeff King > Reviewed-by: Jonathan Nieder This is indeed Reviewed-by: Jonathan Nieder Thanks.
Re: disable interactive prompting
Hi Ernesto, Ernesto Alfonso wrote: > Waiting for git-push synchronously slows me down, so I have a bash > alias/function to do this in the background. But when my origin is https, I > get an undesired interactive prompt. I've tried to disable by > redirecting stdin: > > git push ${REMOTE} ${BRANCH} &>/dev/null > but I still get an interactive prompt. > > Is there a way to either > > 1. disable interactive prompting > 2. programmatically determine whether a git command (or at least a git > push) would interactively prompt You left out an important detail: what does the interactive prompt in question say? The general question is also interesting, but seeing the particular prompt would make it easy to look into the specific case at the same time. Thanks, Jonathan
Re: Line ending normalization doesn't work as expected
Hi Robert, Robert Dailey wrote: > You guys are obviously worlds ahead of me on the internals of things, > but from my perspective I like to avoid the "plumbing" commands as > much as I can. I suspect what we are dancing around is the need for some command like git checkout --renormalize . which would shorten the sequence to git checkout --renormalize . git status; # Show files that will be normalized git commit; # Commit the result What do you think? Would you be interested in writing a patch for it? ("No" is as always an acceptable answer.) Thanks, Jonathan
[PATCH v2] strbuf doc: reuse after strbuf_release is fine
strbuf_release leaves the strbuf in a valid, initialized state, so there is no need to call strbuf_init after it. Moreover, this is not likely to change in the future: strbuf_release leaving the strbuf in a valid state has been easy to maintain and has been very helpful for Git's robustness and simplicity (e.g., preventing use-after-free vulnerabilities). Document the semantics so the next generation of Git developers can become familiar with them without reading the implementation. It is still not advisable to call strbuf_release too often because it is wasteful, so add a note pointing to strbuf_reset for that. The same semantics apply to strbuf_detach. Add a similar note to its docstring to make that clear. Improved-by: Jeff King Signed-off-by: Jonathan Nieder --- Jeff King wrote: > I think it's actually OK to use the string buffer after this function. > It's just an empty string. > > Perhaps we should be more explicit: this releases any resources and > resets to a pristine, empty state. I suspect strbuf_detach() probably > should make the same claim. Like this? Thanks, Jonathan strbuf.h | 11 +-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/strbuf.h b/strbuf.h index 7496cb8ec5..249df86711 100644 --- a/strbuf.h +++ b/strbuf.h @@ -82,8 +82,12 @@ extern char strbuf_slopbuf[]; extern void strbuf_init(struct strbuf *, size_t); /** - * Release a string buffer and the memory it used. You should not use the - * string buffer after using this function, unless you initialize it again. + * Release a string buffer and the memory it used. After this call, the + * strbuf points to an empty string that does not need to be free()ed, as + * if it had been set to `STRBUF_INIT` and never modified. + * + * To clear a strbuf in preparation for further use without the overhead + * of free()ing and malloc()ing again, use strbuf_reset() instead. */ extern void strbuf_release(struct strbuf *); @@ -91,6 +95,9 @@ extern void strbuf_release(struct strbuf *); * Detach the string from the strbuf and returns it; you now own the * storage the string occupies and it is your responsibility from then on * to release it with `free(3)` when you are done with it. + * + * The strbuf that previously held the string is reset to `STRBUF_INIT` so + * it can be reused after calling this function. */ extern char *strbuf_detach(struct strbuf *, size_t *); -- 2.14.2.920.gcf0c67979c
Re: Git for Windows: mingw.c's strange usage of creation time as ctime?
+git-for-windows Hi Marc, Marc Strapetz wrote: > compat/mingw.c assigns the Windows file creation time [1] to Git's > ctime fields at line 591 and at line 705: > > buf->st_ctime = filetime_to_time_t(&(fdata.ftCreationTime)); > > ftCreationTime > ftLastWriteTime is actually possible after copying > a file, so it makes sense to include this timestamp somehow in the > Index, but I think it would be better to use the maximum of > ftLastWriteTime and ftCreationTime here which is less confusing and > closer to Unix's ctime: > > buf->st_ctime = max(buf->st_mtime, > filetime_to_time_t(&(fdata.ftCreationTime)); Can you say more about the practical effect? Is this causing a bug in practice, is it a bug waiting to happen, or is it making the code difficult to understand without any ill effect expected at run time? (Any of those three is a valuable kind of feedback to offer --- I'm just trying to figure out which one you mean.) By the way, do you have core.trustctime set to true or false? Thanks, Jonathan > -Marc > > [1] > https://msdn.microsoft.com/en-us/library/windows/desktop/aa365739(v=vs.85).aspx
Re: [PATCH 0/3] fixes for running the test suite with --valgrind
Jeff King wrote: > I think using SANITIZE=memory would catch these, but it needs some > suppressions tuning. The weird "zlib reads uninitialized memory" error > is a problem (valgrind sees this, too, but we have suppressions). What version of zlib do you use? I've heard some good things about v1.2.11 improving matters, though I haven't checked yet. Thanks, Jonathan
Re: [PATCH 1/3] path.c: fix uninitialized memory access
Jeff King wrote: > On Tue, Oct 03, 2017 at 03:45:01PM -0700, Jonathan Nieder wrote: >> In other words, an alternative fix would be >> >> if (*path == '.' && path[1] == '/') { >> ... >> } >> >> which would not require passing in 'len' or switching to index-based >> arithmetic. I think I prefer it. What do you think? > > Yes, I think that approach is much nicer. I think you could even use > skip_prefix. Unfortunately you have to play a few games with const-ness, > but I think the resulting signature for cleanup_path() is an > improvement: Ooh! For what it's worth, if you add a commit message with Thomas's Reported-by then this lgtm. Thanks, Jonathan > diff --git a/path.c b/path.c > index 00ec04e7a5..2e09a7bce0 100644 > --- a/path.c > +++ b/path.c > @@ -34,11 +34,10 @@ static struct strbuf *get_pathname(void) > return sb; > } > > -static char *cleanup_path(char *path) > +static const char *cleanup_path(const char *path) > { > /* Clean it up */ > - if (!memcmp(path, "./", 2)) { > - path += 2; > + if (skip_prefix(path, "./", &path)) { > while (*path == '/') > path++; > } > @@ -47,7 +46,7 @@ static char *cleanup_path(char *path) > > static void strbuf_cleanup_path(struct strbuf *sb) > { > - char *path = cleanup_path(sb->buf); > + const char *path = cleanup_path(sb->buf); > if (path > sb->buf) > strbuf_remove(sb, 0, path - sb->buf); > } > @@ -64,7 +63,7 @@ char *mksnpath(char *buf, size_t n, const char *fmt, ...) > strlcpy(buf, bad_path, n); > return buf; > } > - return cleanup_path(buf); > + return (char *)cleanup_path(buf); > } > > static int dir_prefix(const char *buf, const char *dir)
Re: [PATCH] test-stringlist: avoid buffer underrun when sorting nothing
René Scharfe wrote: > Check if the strbuf containing data to sort is empty before attempting > to trim a trailing newline character. > > Signed-off-by: Rene Scharfe > --- > t/helper/test-string-list.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) Thanks for fixing it. Reviewed-by: Jonathan Nieder
Re: [PATCH 2/3] http-push: fix construction of hex value from path
Hi, Thomas Gummerer wrote: > The get_oid_hex_from_objpath takes care of creating a oid from a > pathname. It does this by memcpy'ing the first two bytes of the path to > the "hex" string, then skipping the '/', and then copying the rest of the > path to the "hex" string. Currently it fails to increase the pointer to > the hex string, so the second memcpy invocation just mashes over what > was copied in the first one, and leaves the last two bytes in the string > uninitialized. Wow. The fix is obviously correct. > This breaks valgrind in t5540, although the test passes without > valgrind: [...] > Signed-off-by: Thomas Gummerer > --- > http-push.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) Would it be straightforward to add a correctness test for this? It seems like this code path didn't work at all and no one noticed. This is the code path in http-push.c which says /* * NEEDSWORK: remote_ls() ignores info/refs on the remote side. But it * should _only_ heed the information from that file, instead of trying to * determine the refs from the remote file system (badly: it does not even * know about packed-refs). */ static void remote_ls(const char *path, int flags, I think the problem is that when it fails, we end up thinking that there are *fewer* objects than are actually present remotely so the only ill effect is pushing too much. So this should be observable in server logs (i.e. it is testable) but it's not a catastrophic failure which means it's harder to test than it would be otherwise. Moreover, this is in the webdav-based "dumb http" push code path, which I do not trust much at all. I wonder if we could retire it completely (or at least provide an option to turn it off). > diff --git a/http-push.c b/http-push.c > index e4c9b065ce..e9a01ec4da 100644 > --- a/http-push.c > +++ b/http-push.c > @@ -1018,7 +1018,7 @@ static int get_oid_hex_from_objpath(const char *path, > struct object_id *oid) > memcpy(hex, path, 2); > path += 2; > path++; /* skip '/' */ > - memcpy(hex, path, GIT_SHA1_HEXSZ - 2); > + memcpy(hex + 2, path, GIT_SHA1_HEXSZ - 2); > > return get_oid_hex(hex, oid); Thanks, Jonathan
Re: [PATCH 1/3] path.c: fix uninitialized memory access
Hi, Thomas Gummerer wrote: > In cleanup_path we're passing in a char array, run a memcmp on it, and > run through it without ever checking if something is in the array in the > first place. This can lead us to access uninitialized memory, for > example in t5541-http-push-smart.sh test 7, when run under valgrind: [...] > Avoid this by checking passing in the length of the string in the char > array, and checking that we never run over it. > > Signed-off-by: Thomas Gummerer > --- > path.c | 19 ++- > 1 file changed, 10 insertions(+), 9 deletions(-) When I first read the above, I thought it was going to be about a NUL-terminated string that was missing a NUL. But in fact, the issue is that strlen(path) can be < 2. In other words, an alternative fix would be if (*path == '.' && path[1] == '/') { ... } which would not require passing in 'len' or switching to index-based arithmetic. I think I prefer it. What do you think? Thanks and hope that helps, Jonathan diff --git i/path.c w/path.c index b533ec938d..3a1fbee1e0 100644 --- i/path.c +++ w/path.c @@ -37,7 +37,7 @@ static struct strbuf *get_pathname(void) static char *cleanup_path(char *path) { /* Clean it up */ - if (!memcmp(path, "./", 2)) { + if (*path == '.' && path[1] == '/') { path += 2; while (*path == '/') path++;
Re: [PATCH] branch: reset instead of release a strbuf
Hi, Stefan Beller wrote: > Our documentation advises to not re-use a strbuf, after strbuf_release > has been called on it. Use the proper reset instead. > > Reviewed-by: Jonathan Nieder This is indeed Reviewed-by: Jonathan Nieder Thank you. > Signed-off-by: Stefan Beller > --- > builtin/branch.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) Here's a patch to address the surprising strbuf.h advice. -- >8 -- Subject: strbuf: do not encourage init-after-release strbuf_release already leaves the strbuf in a valid, initialized state, so there is not need to call strbuf_init after it. Moreover, this is not likely to change in the future: strbuf_release leaving the strbuf in a valid state has been easy to maintain and has been very helpful for Git's robustness and simplicity (e.g., preventing use-after-free vulnerabilities). It is still not advisable to call strbuf_release until done using a strbuf because it is wasteful, so keep that part of the advice. Reported-by: Stefan Beller Signed-off-by: Jonathan Nieder --- strbuf.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/strbuf.h b/strbuf.h index 7496cb8ec5..6e175c3694 100644 --- a/strbuf.h +++ b/strbuf.h @@ -83,7 +83,7 @@ extern void strbuf_init(struct strbuf *, size_t); /** * Release a string buffer and the memory it used. You should not use the - * string buffer after using this function, unless you initialize it again. + * string buffer after using this function. */ extern void strbuf_release(struct strbuf *); -- 2.14.2.920.gcf0c67979c
Re: [bug] git version 2.4.12 color.ui = always breaks git add -p
Hi Christian, Christian Rebischke wrote: > I played around with my gitconfig and I think I found a bug while doing > so. I set the following lines in my gitconfig: > > [color] > ui = always > > When I try to use `git add -p ` I don't see the 'Stage > this hunk'-dialog anymore. First I thought it's some other configuration > but now I can definitly say that this configuration breaks `git add -p` > interactive mode. Do the patches at https://public-inbox.org/git/20171003133713.ccxv6clrmuuhh...@sigill.intra.peff.net/ help? Thanks and hope that helps, Jonathan
Re: [PATCH] branch: reset instead of release a strbuf
Hi, Stefan Beller wrote: > Our documentation advises to not re-use a strbuf, after strbuf_release > has been called on it. Use the proper reset instead. I'm super surprised at this documentation. strbuf_release maintains the invariant that a strbuf is always usable (i.e., that we do not have to fear use-after-free problems). On second thought, though, strbuf_release is a more expensive operation than strbuf_reset --- constantly free()-ing and re-malloc()ing is unnecessary churn in malloc data structures. So maybe that is the motivation here? > Signed-off-by: Stefan Beller > --- > > Maybe one of the #leftoverbits is to remove the re-init call in release > and see what breaks? (And then fixing up more of such cases as presented > in this patch) As mentioned above: please no. That would be problematic both for ease of development and for the risk of security bugs. > builtin/branch.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/builtin/branch.c b/builtin/branch.c > index b998e16d0c..9758012390 100644 > --- a/builtin/branch.c > +++ b/builtin/branch.c > @@ -217,7 +217,7 @@ static int delete_branches(int argc, const char **argv, > int force, int kinds, > if (!head_rev) > die(_("Couldn't look up commit object for HEAD")); > } > - for (i = 0; i < argc; i++, strbuf_release(&bname)) { > + for (i = 0; i < argc; i++, strbuf_reset(&bname)) { > char *target = NULL; > int flags = 0; Should there be a strbuf_release (or UNLEAK if you are very very sure) call at the end of the function to replace this? With that change (but not without it), Reviewed-by: Jonathan Nieder Thanks.
Re: [PATCH v3 10/10] ssh: introduce a 'simple' ssh variant
Hi, Brandon Williams wrote: > When using the 'ssh' transport, the '-o' option is used to specify an > environment variable which should be set on the remote end. This allows > git to send additional information when contacting the server, > requesting the use of a different protocol version via the > 'GIT_PROTOCOL' environment variable like so: "-o SendEnv=GIT_PROTOCOL" > > Unfortunately not all ssh variants support the sending of environment > variables to the remote end. To account for this, only use the '-o' > option for ssh variants which are OpenSSH compliant. This is done by > checking that the basename of the ssh command is 'ssh' or the ssh > variant is overridden to be 'ssh' (via the ssh.variant config). This also affects -p (port), right? What happens if I specify a ssh://host:port/path URL and the SSH implementation is of 'simple' type? > Previously if an ssh command's basename wasn't 'plink' or Git's commit messages use the present tense to describe the current state of the code, so this is "Currently". :) > 'tortoiseplink' git assumed that the command was an OpenSSH variant. > Since user configured ssh commands may not be OpenSSH compliant, tighten > this constraint and assume a variant of 'simple' if the basename of the > command doesn't match the variants known to git. The new ssh variant > 'simple' will only have the host and command to execute ([username@]host > command) passed as parameters to the ssh command. > > Update the Documentation to better reflect the command-line options sent > to ssh commands based on their variant. > > Reported-by: Jeffrey Yasskin > Signed-off-by: Brandon Williams Thanks for working on this. For background, the GIT_SSH implementation that motivated this is https://github.com/travis-ci/dpl/blob/6c3fddfda1f2a85944c56b068bac0a77c049/lib/dpl/provider.rb#L215, which does not support -p or -4/-6, either. > --- > Documentation/config.txt | 27 ++-- > Documentation/git.txt| 9 ++-- > connect.c| 107 > ++- > t/t5601-clone.sh | 9 ++-- > t/t5700-protocol-v1.sh | 2 + > 5 files changed, 95 insertions(+), 59 deletions(-) [...] > --- a/connect.c > +++ b/connect.c > @@ -776,37 +776,44 @@ static const char *get_ssh_command(void) [...] > +static enum ssh_variant determine_ssh_variant(const char *ssh_command, > + int is_cmdline) [...] > - if (!strcasecmp(variant, "plink") || > - !strcasecmp(variant, "plink.exe")) > - *port_option = 'P'; > + if (!strcasecmp(variant, "ssh")) > + ssh_variant = VARIANT_SSH; Could this handle ssh.exe, too? [...] > --- a/t/t5601-clone.sh > +++ b/t/t5601-clone.sh Can this get tests for the new defaulting behavior? E.g. - default is "simple" - how "simple" treats an ssh://host:port/path URL - how "simple" treats ipv4/ipv6 switching - ssh defaults to "ssh" - if GIT_SSH=ssh, can set ssh.variant to "simple" to force the "simple" mode One other worry: this (intentionally) changes the behavior of a previously-working GIT_SSH=ssh-wrapper that wants to support OpenSSH-style options but does not declare ssh.variant=ssh. When discovering this change, what should the author of such an ssh-wrapper do? They could instruct their users to set ssh.variant or GIT_SSH_VARIANT to "ssh", but then they are at the mercy of future additional options supported by OpenSSH we may want to start using in the future (e.g., maybe we will start passing "--" to separate options from the hostname). So this is not a futureproof option for them. They could take the new default behavior or instruct their users to set ssh.variant or GIT_SSH_VARIANT to "simple", but then they lose support for handling alternate ports, ipv4/ipv6, and specifying -o SendEnv to propagate GIT_PROTOCOL or other envvars. They can handle GIT_PROTOCOL propagation manually, but losing port support seems like a heavy cost. They could send a patch to define yet another variant that is forward-compatible, for example using an interface similar to what git-credential(1) uses. Then they can set GIT_SSH to their OpenSSH-style helper and GIT_FANCY_NEW_SSH to their more modern helper, so that old Git versions could use GIT_SSH and new Git versions could use GIT_FANCY_NEW_SSH. This might be their best option. It feels odd to say that their only good way forward is to send a patch, but on the other hand someone with such an itch is likely to be in the best position to define an appropriate interface. They could send a documentation patch to make more promises about the commandline used in OpenSSH mode: e.g. setting a rule in advance about which options can take an argument so that they can properly parse an OpenSSH command line in a future-compatible way. Or they could send a patch to allow passing the port in "simple" mode, for example using an environment variable. Am I missing another option? What advice d
Re: [PATCH v2] request-pull: capitalise "Git" to make it a proper noun
Ann T Ropea wrote: > Of the many ways to spell the three-letter word, the variant "Git" > should be used when referring to a repository in a description; or, in > general, when it is used as a proper noun. > > We thus change the pull-request template message so that it reads > >"...in the Git repository at:" > > Besides, this brings us in line with the documentation, see > Documentation/howto/using-signed-tag-in-pull-request.txt > > Signed-off-by: Ann T Ropea > Acked-by: Jonathan Nieder > --- > v2: rename patch, correct Signed-off-by line, add Acked-by line > git-request-pull.sh | 2 +- > t/t5150-request-pull.sh | 4 ++-- > 2 files changed, 3 insertions(+), 3 deletions(-) Reviewed-by: Jonathan Nieder Thanks for the quick turnaround. Sincerely, Jonathan
Security of .git/config and .git/hooks
Hi, This topic has been mentioned on this mailing list before but I had trouble finding a relevant reference. Links welcome. Suppose that I add the following to .git/config in a repository on a shared computer: [pager] log = rm -fr / fsck = rm -fr / ("rm -fr /" is of course a placeholder here.) I then tell a sysadmin that git commands are producing strange output and I am having trouble understanding what is going on. They may run "git fsck" or "git log"; in either case, the output is passed to the pager I configured, allowing me to run an arbitrary command using the sysadmin's credentials. You might say that this is the sysadmin's fault, that they should have read through .git/config before running any Git commands. But I don't find it so easy to blame them. A few related cases that might not seem so dated: 1. I put my repository in a zip file and ask a Git expert to help me recover data from it, or 2. My repository is in a shared directory on NFS. Unless the administrator setting that up is very careful, it is likely that the least privileged user with write access to .git/config or .git/hooks/ may be someone that I don't want to be able to run arbitrary commands on behalf of the most privileged user working in that repository. A similar case to compare to is Linux's "perf" tool, which used to respect a .perfconfig file from the current working directory. Fortunately, nowadays "perf" only respects ~/.perfconfig and /etc/perfconfig. Proposed fix: because of case (1), I would like a way to tell Git to stop trusting any files in .git. That is: 1. Introduce a (configurable) list of "safe" configuration items that can be set in .git/config and don't respect any others. 2. But what if I want to set a different pager per-repository? I think we could do this using configuration "profiles". My ~/.config/git/profiles/ directory would contain git-style config files for repositories to include. Repositories could then contain [include] path = ~/.config/git/profiles/fancy-log-pager to make use of those settings. The facility (1) would special-case this directory to allow it to set "unsafe" settings since files there are assumed not to be under the control of an attacker. 3. Likewise for hooks: my ~/.config/git/hooks/ directory would contain hooks for repositories to make use of. Repositories could symlink to hook files from there to make use of them. That would allow the configured hooks in ~/.config/git/hooks/ to be easy to find and to upgrade in one place. To help users migrate, when a hook is present and executable in .git/hooks/, Git would print instructions for moving it to ~/.config/git/hooks/ and replacing it with a symlink after inspecting it. For backward compatibility, this facility would be controlled by a global configuration setting. If that setting is not enabled, then the current, less safe behavior would remain. One downside of (3) is its reliance on symlinks. Some alternatives: 3b. Use core.hooksPath configuration instead. Rely on (2). 3c. Introduce new hook.* configuration to be used instead of hook scripts. Rely on (2). Thoughts? Jonathan
Re: [PATCH] PR msg: capitalise "Git" to make it a proper noun
Hi, Thanks for working to improve Git! Bedhanger wrote: > Subject: [PATCH] PR msg: capitalise "Git" to make it a proper noun nit: What is a PR msg? Looking with "git log git-request-pull.sh", I see that previous patches called the subsystem request-pull, so this could be request-pull: capitalise "Git" to make it a proper noun > Of the many ways to spell the three-letter word, the variant "Git" > should be used when referring to a repository in a description; or, in > general, when it is used as a proper noun. > > We thus change the pull-request template message so that it reads > >"...in the Git repository at:" > > Besides, this brings us in line with the documentation, see > Documentation/howto/using-signed-tag-in-pull-request.txt > > Signed-off-by: bedhanger Please use your full name in the Signed-off-by line. See Documentation/SubmittingPatches section "(5) Certify your work" for why we ask for this. > --- > git-request-pull.sh | 2 +- > t/t5150-request-pull.sh | 4 ++-- > 2 files changed, 3 insertions(+), 3 deletions(-) The patch itself looks good, and I like the change it makes to the email generated by git request-pull. Looking forward to seeing what other changes you come up with in the future. Thanks and hope that helps, Jonathan
Re: Updated to v2.14.2 on macOS; git add --patch broken
Hi, Toni Uebernickel wrote: > I updated to git version v2.14.2 on macOS using homebrew. > > Since then `git add --patch` and `git stash save --patch` are not > working anymore. It's just printing the complete diff without ever > stopping to ask for actions. This results in an unusable state, as > the whole command option is rendered useless. Would a patch like the following help? I am worried that other scripts using diff-files would need the same kind of patch. So it seems worthwhile to look for alternatives. An alternative would be to partially roll back v2.14.2~61^2~4 (color: check color.ui in git_default_config, 2017-07-13) by making it not take effect for plumbing commands (i.e., by adding a boolean to "struct startup_info" to indicate whether a command is low-level plumbing). That would make the behavior of Git harder to explain so I don't particularly like it. Plus it defeats the point of the patch. Yet another alternative would be to treat color.ui=always as a deprecated synonym for color.ui=auto. I think that's my preferred fix. What do you think? Thanks again for reporting, Jonathan diff --git i/git-add--interactive.perl w/git-add--interactive.perl index 28b325d754..4ea69538c7 100755 --- i/git-add--interactive.perl +++ w/git-add--interactive.perl @@ -101,49 +101,49 @@ sub apply_patch_for_stash; my %patch_modes = ( 'stage' => { - DIFF => 'diff-files -p', + DIFF => 'diff-files --no-color -p', APPLY => sub { apply_patch 'apply --cached', @_; }, APPLY_CHECK => 'apply --cached', FILTER => 'file-only', IS_REVERSE => 0, }, 'stash' => { - DIFF => 'diff-index -p HEAD', + DIFF => 'diff-index --no-color -p HEAD', APPLY => sub { apply_patch 'apply --cached', @_; }, APPLY_CHECK => 'apply --cached', FILTER => undef, IS_REVERSE => 0, }, 'reset_head' => { - DIFF => 'diff-index -p --cached', + DIFF => 'diff-index --no-color -p --cached', APPLY => sub { apply_patch 'apply -R --cached', @_; }, APPLY_CHECK => 'apply -R --cached', FILTER => 'index-only', IS_REVERSE => 1, }, 'reset_nothead' => { - DIFF => 'diff-index -R -p --cached', + DIFF => 'diff-index --no-color -R -p --cached', APPLY => sub { apply_patch 'apply --cached', @_; }, APPLY_CHECK => 'apply --cached', FILTER => 'index-only', IS_REVERSE => 0, }, 'checkout_index' => { - DIFF => 'diff-files -p', + DIFF => 'diff-files --no-color -p', APPLY => sub { apply_patch 'apply -R', @_; }, APPLY_CHECK => 'apply -R', FILTER => 'file-only', IS_REVERSE => 1, }, 'checkout_head' => { - DIFF => 'diff-index -p', + DIFF => 'diff-index --no-color -p', APPLY => sub { apply_patch_for_checkout_commit '-R', @_ }, APPLY_CHECK => 'apply -R', FILTER => undef, IS_REVERSE => 1, }, 'checkout_nothead' => { - DIFF => 'diff-index -R -p', + DIFF => 'diff-index --no-color -R -p', APPLY => sub { apply_patch_for_checkout_commit '', @_ }, APPLY_CHECK => 'apply', FILTER => undef,
Re: hooks are ignored if there are not marked as executable
Hi Damien, Damien wrote: > If you do `echo my_script > .git/hooks/pre-commit` and then `git commit`, > The hook is just gonna be ignored. > But if you do `chmod +x .git/hooks/pre-commit`, then it's executed. This is intentional. > I think ignoring a hook is misleading and not newbie friendly, an error > message to signal an incorrectly configured hook could be better. Adding a warning sounds like a nice change. Care to propose a patch? In the early days, sample hooks were installed without .sample in their filename and could be enabled by "chmod +x"-ing them. Because of this, long-time users of Git may find themselves experiencing this warning more often than they'd like. That could be acceptable if the warning shows a command to run to prevent the warning from showing up again, though (see advice.c for some examples of how that can be done). The main code path to look at is run-command.c::find_hook. "git grep -e 'rev-parse --git-path hooks' -- . ':!contrib'" finds a few other code paths you may also want to look at. Thanks and hope that helps, Jonathan
Re: [PATCH v2] ref-filter.c: pass empty-string as NULL to atom parsers
Hi, Taylor Blau wrote: > Peff points out that different atom parsers handle the empty > "sub-argument" list differently. An example of this is the format > "%(refname:)". > > Since callers often use `string_list_split` (which splits the empty > string with any delimiter as a 1-ary string_list containing the empty > string), this makes handling empty sub-argument strings non-ergonomic. > > Let's fix this by assuming that atom parser implementations don't care > about distinguishing between the empty string "%(refname:)" and no > sub-arguments "%(refname)". > > Signed-off-by: Taylor Blau > --- > ref-filter.c| 10 +- > t/t6300-for-each-ref.sh | 1 + > 2 files changed, 10 insertions(+), 1 deletion(-) The above does a nice job of explaining - what this change is going to do - how it's good for the internal code structure / maintainability What it doesn't tell me about is why the user-facing effect won't cause problems. Is there no atom where %(atom:) was previously accepted and did something meaningful that this may break? Looking at the manpage and code, I don't see any, so for what it's worth, this is Reviewed-by: Jonathan Nieder but for next time, please remember to discuss regression risk in the commit message, too. Thanks, Jonathan
Re: git submodule add fails when using both --branch and --depth
Hi, Thadeus Fleming wrote: > I'm running git 2.14.2 on Ubuntu 16.04. > > Compare the behavior of > >> git clone --branch pu --depth 1 https://github.com/git/git git-pu > > which clones only the latest commit of the pu branch and Yes. >> mkdir tmp && cd tmp && git init >> git submodule add --branch pu --depth 1 https://github.com/git/git \ > git-pu > > which gives the error > > fatal: 'origin/pu' is not a commit and a branch 'pu' cannot be created from it > Unable to checkout submodule 'git-pu' As a side note (you are using "git submodule add --depth", not "git submodule update --depth"), I suspect that "submodule update --depth" may not always do what people expect. With add --depth, I agree with your expectation and after your fix everything should work fine. But with update --depth, consider the following sequence of steps: 1. I create a repository "super" with submodule "sub" and publish both. 2. I make a couple commits to "sub" and a commit to "super" making use of those changes and want to publish them. 3. I use "git push --recurse-submodules" to publish my commits to "sub" and "super": a. First it pushes to "sub". b. Then it pushes to "super". Between steps 3(a) and 3(b), a person can still "git clone --recurse-submodules" the "super" repository. The repository "super" does not have my change yet and "sub" does, but that is not a problem, since commands like "git checkout --recurse-submodules" and "git submodule update" know to check out the commit *before* my change in the submodule. But between steps 3(a) and 3(b), "git submodule update --depth=1" would not work. It would fetch the submodule with depth 1 and then try to check out a commit that is deeper in history. So I think there's more thinking needed there. That's all a tangent to your report about add --depth, though. Thanks, Jonathan
Re: What means "git config bla ~/"?
Hi, rpj...@crashcourse.ca wrote: > i'm sure i'm about to embarrass myself but, in "man git-config", > OPTIONS, one reads: > > --path > > git-config will expand leading ~ to the value of $HOME, and ~user > to the home directory for the specified user. This option has no > effect when setting the value (but you can use git config bla ~/ > from the command line to let your shell do the expansion). > > what's with that "git config bla ~/"? is this some config keyword > or something? No need to be embarrased. Here "bla" is a placeholder. That is, for example, I can run git config --global include.path ~/.config/git/more-config or git config --global include.path $HOME/.config/git/more-config to cause [include] path = /home/me/.config/git/more-config to be added to my global configuration. The expansion of ~ or $HOME is performed by my shell, not Git. For comparison, if I had run git config --global include.path '~/.config/git/more-config' then that would cause [include] path = ~/.config/git/more-config to be added to my global configuration, but it would still have the same effect at run time, since Git is also able to expand ~ to my home directory. The wording comes from commit 1349484e341a3ec2ba02a86c8fbd97ea9dc8c756 Author: Matthieu Moy Date: Wed Dec 30 17:51:53 2009 +0100 builtin-config: add --path option doing ~ and ~user expansion. I agree with you that it is less clear than it could be. Ideas for clarifying it? Thanks, Jonathan
Re: [PATCH] clang-format: adjust line break penalties
Stephan Beyer wrote: > On 09/29/2017 08:40 PM, Jonathan Nieder wrote: >> Going forward, is there an easy way to preview the effect of this kind >> of change (e.g., to run "make style" on the entire codebase so as to be >> able to compare the result with two different versions of >> .clang-format)? > > I just ran clang-format before and after the patch and pushed to github. > The resulting diff is quite big: > > https://github.com/sbeyer/git/commit/3d1186c4cf4dd7e40b97453af5fc1170f6868ccd Thanks. The first change I see there is -char *strbuf_realpath(struct strbuf *resolved, const char *path, int die_on_error) +char * +strbuf_realpath(struct strbuf *resolved, const char *path, int die_on_error) I understand why the line is broken, but the choice of line break is wrong. Seems like the penalty for putting return type on its own line quite high enough. My Reviewed-by still stands, though. It gets "make style" to signal long lines that should be broken, which is an improvement. > PS: There should be a comment at the beginning of the .clang-format file > that says what version it is tested with (on my machine it worked with > 5.0 but not with 4.0) and there should also probably a remark that the > clang-format-based style should only be understood as a hint or guidance > and that most of the Git codebase does not conform it. Sounds good to me. Care to send it as a patch? :) Thanks, Jonathan
Re: [PATCH] clang-format: adjust line break penalties
Hi Dscho, Johannes Schindelin wrote: > Signed-off-by: Johannes Schindelin > --- > .clang-format | 12 ++-- > 1 file changed, 6 insertions(+), 6 deletions(-) Well executed and well explained. Thank you. Reviewed-by: Jonathan Nieder Going forward, is there an easy way to preview the effect of this kind of change (e.g., to run "make style" on the entire codebase so as to be able to compare the result with two different versions of .clang-format)? Thanks, Jonathan
Re: [PATCH v4] technical doc: add a design doc for hash function transition
Junio C Hamano wrote: > Jonathan Nieder writes: >> This document describes what a transition to a new hash function for >> Git would look like. Add it to Documentation/technical/ as the plan >> of record so that future changes can be recorded as patches. >> >> Also-by: Brandon Williams >> Also-by: Jonathan Tan >> Also-by: Stefan Beller >> Signed-off-by: Jonathan Nieder >> --- > > Shoudln't these all be s-o-b: (with a note immediately before that > to say all four contributed equally or something)? I don't want to get lost in the weeds in the question of how to represent such a collaborative effort in git's metadata. You're right that I should collect their sign-offs! Your approach of using text instead of machine-readable data for common authorship also seems okay. In any event, this is indeed Signed-off-by: Brandon Williams Signed-off-by: Jonathan Tan Signed-off-by: Stefan Beller (I just checked :)). >> +Background >> +-- >> +At its core, the Git version control system is a content addressable >> +filesystem. It uses the SHA-1 hash function to name content. For >> +example, files, directories, and revisions are referred to by hash >> +values unlike in other traditional version control systems where files >> +or versions are referred to via sequential numbers. The use of a hash > > Traditional systems refer to files via numbers??? Perhaps "where > versions of files are referred to via sequential numbers" or > something? Good point. The wording you suggested will work well. >> +function to address its content delivers a few advantages: >> + >> +* Integrity checking is easy. Bit flips, for example, are easily >> + detected, as the hash of corrupted content does not match its name. >> +* Lookup of objects is fast. > > * There is no ambiguity what the object's name should be, given its > content. > > * Deduping the same content copied across versions and paths is > automatic. :) Yep, these are nice too, especially that second one. It also is how we make diff-ing fast. >> +SHA-1 still possesses the other properties such as fast object lookup >> +and safe error checking, but other hash functions are equally suitable >> +that are believed to be cryptographically secure. > > s/secure/more &/, perhaps? We were looking for a phrase meaning that it should be a cryptographic hash function in good standing, which SHA-1 is at least approaching not being. "more secure" should work fine. Let's go with that. >> +Goals >> +- >> +... >> + c. Users can use SHA-1 and NewHash identifiers for objects >> + interchangeably (see "Object names on the command line", below). > > Mental note. This needs to extend to the "index X..Y" lines in the > patch output, which is used by "apply -3" and "am -3". Will add a note about this to "Object names on the command line". Stefan had already pointed out that that section should really be renamed to something like "Object names in input and output". >> +2. Allow a complete transition away from SHA-1. >> + a. Local metadata for SHA-1 compatibility can be removed from a >> + repository if compatibility with SHA-1 is no longer needed. > > I like the emphasis on "Local" here. Metadata for compatiblity that > is embedded in the objects obviously cannot be removed. > > From that point of view, one of the goals ought to be "make sure > that as much SHA-1 compatibility metadata as possible is local and > outside the object". This goal may not be able to say more than "as > much as possible", as signed objects that came from SHA-1 world > needs to carry the compatibility metadata somewhere somehow. > > Or perhaps we could. There is nothing that says a signed tag > created in the SHA-1 world must have the PGP/SHA-1 signature in the > NewHash payload---it could be split off of the object data and > stored in a local metadata cache, to be used only when we need to > convert it back to the SHA-1 world. That would break round-tripping and would mean that multiple SHA-1 objects could have the same NewHash name. In other words, from my point of view there is something that says that such data must be preserved. Another way to put it: even after removing all SHA-1 compatibility metadata, one nice feature of this design is that it can be recovered if I change my mind, from data in the NewHash based repository alone. [...] >> +Non-Goals >> +- >> ... >> +6. Skip fetching some submodules of a project into a NewHash >> + repository. (This also depends on NewHash support in G
Re: [PATCH] git-sh: Avoid sourcing scripts with git --exec-path
Junio C Hamano wrote: > Jonathan Nieder writes: >> This has been broken for a while, but better late than never to >> address it. > > I am not sure if this is broken in the first place. We do want to > make sure that the scripted porcelains will source the shell helper > library from matching Git release. The proposed patch goes directly > against that and I do not see how it could be an improvement. It used to be that git allowed setting a colon-separated list of paths in GIT_EXEC_PATH. (Very long ago, I relied on that feature.) If we can restore that functionality without too much cost, then I think it's worthwhile. But the cost in this patch for that is pretty high. And $ git log -S'$(git --exec-path)/' tells me that colon-separated paths in GIT_EXEC_PATH did not work for some use cases as far back as 2006. Since 2008 the documentation has encouraged using "git --exec-path" in a way that does not work with colon-separated paths, too. I hadn't realized it was so long. Given that it hasn't been supported for more than ten years, I was wrong to read this as a bug report --- instead, it's a feature request. In that context, this cost is likely not worth paying. I wonder if there's another way to achieve this patch's goal. E.g. what if there were a way to specify some paths git could search for custom commands, separate from "git --exec-path"? Putting the custom commands on the $PATH seems nicer unless you're trying to override the implementation of an existing git command. And we already discourage overriding the implementation of an existing git command (as it's open source, you can patch them instead), so... Another possible motivation (the one that led me to use this mechnism long ago) is avoiding providing the dashed form git-$cmd in $PATH. I think time has shown that having git-$cmd in $PATH is not too painful. Thanks and hope that helps, Jonathan
Re: [PATCH] git-sh: Avoid sourcing scripts with git --exec-path
Hi, Dridi Boukelmoune wrote: > For end users making use of a custom exec path many commands will simply > fail. Adding git's exec path to the PATH also allows overriding git-sh-* > scripts, not just adding commands. One can then patch a script without > tainting their system installation of git for example. > > Signed-off-by: Dridi Boukelmoune This has been broken for a while, but better late than never to address it. [...] > --- a/Documentation/CodingGuidelines > +++ b/Documentation/CodingGuidelines > @@ -151,6 +151,28 @@ For shell scripts specifically (not exhaustive): > interface translatable. See "Marking strings for translation" in > po/README. > > + - When sourcing a "git-sh-*" file using "git --exec-path" make sure > + to only use its base name. > + > + (incorrect) > + . "$(git --exec-path)/git-sh-setup" > + > + (correct) > + . git-sh-setup I wonder if we can make this so intuitive that it doesn't need mentioning in CodingGuidelines. What if the test harness t/test-lib.sh were to set a GIT_EXEC_PATH with multiple directories in it? That way, authors of new commands would not have to be careful to look out for this issue proactively because the testsuite would catch it. [...] > +++ b/contrib/convert-grafts-to-replace-refs.sh > @@ -5,7 +5,8 @@ > > GRAFTS_FILE="${GIT_DIR:-.git}/info/grafts" > > -. $(git --exec-path)/git-sh-setup > +PATH="$(git --exec-path):$PATH" > +. git-sh-setup I wish there were a simple way to do this that doesn't involve polluting $PATH with the dashed commands. E.g. we can do something like OIFS=$IFS IFS=: set -f for d in $(git --exec-path) do if test -e "$d/git-sh-setup" then . "$d/git-sh-setup" break fi done set +f IFS=$OIFS but that is not very simple. Something like old_PATH=$PATH PATH=$(git --exec-path):$PATH . git-sh-setup PATH=$old_PATH looks like it could work, but it would undo the work git-sh-setup does to set a sane $PATH on platforms like Solaris. > --- a/contrib/rerere-train.sh > +++ b/contrib/rerere-train.sh > @@ -7,7 +7,8 @@ USAGE="$me rev-list-args" > > SUBDIRECTORY_OK=Yes > OPTIONS_SPEC= > -. "$(git --exec-path)/git-sh-setup" > +PATH="$(git --exec-path):$PATH" > +. git-sh-setup This makes me similarly unhappy about PATH pollution, but it may be that there's nothing to be done about that. [...] > --- a/contrib/subtree/git-subtree.sh > +++ b/contrib/subtree/git-subtree.sh > @@ -32,7 +32,6 @@ squashmerge subtree changes as a single commit > " > eval "$(echo "$OPTS_SPEC" | git rev-parse --parseopt -- "$@" || echo exit > $?)" > > -PATH=$PATH:$(git --exec-path) > . git-sh-setup This looks like a change that could be separated into a separate patch, both because it is to contrib/subtree (which is maintained separately) and because it is not necessary for the goal described in this patch's commit message. I do like this change, since it improves consistency with other commands. > --- a/git-sh-setup.sh > +++ b/git-sh-setup.sh > @@ -44,7 +44,7 @@ git_broken_path_fix () { > # @@BROKEN_PATH_FIX@@ > > # Source git-sh-i18n for gettext support. > -. "$(git --exec-path)/git-sh-i18n" > +. git-sh-i18n Do git-mergetool--lib.txt, git-parse-remote.txt, git-sh-i18n.txt, and git-sh-setup.txt in Documentation/ need the same treatment? Summary: I like the goal of this patch but I am nervous about the side-effect it introduces of PATH pollution. Is there a way around that? If there isn't, then this needs a few tweaks and then it should be ready to go. Thanks and hope that helps, Jonathan
Re: [PATCH] doc: correct command formatting
Junio C Hamano wrote: > Jonathan Nieder writes: >> Andreas Heiduk wrote: >>> +1, Thanks for spotting. >> >> Thanks for looking it over. Can we add your Reviewed-by? (See [1] >> for what this means.) > > I would just do "Acked-by: Andreas" after seeing such an obvious > admission of guilt & appreciation for fixing in the exchange. Oh! I had missed that context. Your instinct would have been right (and is born out by Andreas's reply to me). I was just fishing, but in this context there was no reason to fish for more than the Ack that was already there. > Would we rather want to make it more formal like how Linux folks do > the Reviewed-by: thing? Separate from this example: yes, I think adopting Linux's Reviewed-by convention would be a good thing. When I see a positive reply to a patch, I often wonder whether an ack or a fuller reviewed-by is intended, and Linux's way of formalizing that appeals to me. I'll try sending a patch to add it to SubmittingPatches tomorrow morning (Stefan had also been hinting recently about this being something worth trying). Thank you, Jonathan
Re: [PATCH] doc: correct command formatting
Hi, Andreas Heiduk wrote: > +1, Thanks for spotting. Thanks for looking it over. Can we add your Reviewed-by? (See [1] for what this means.) > I did a quick > > grep -r " ` " > > which came up with with another relevant place: > > Documentation/git-format-patch.txt: `--subject-prefix` option) has ` v` > appended to it. E.g. > > But here the space IS relevant but asciidoc does not pick up > the formatting. Perhaps that one could read like this: > > `--subject-prefix` option) has `v` appended to it. E.g. Interesting. This comes from commit 4aad08e061df699b49e24c4d34698d734473fb66 Author: Junio C Hamano Date: Wed Jan 2 14:16:07 2013 -0800 format-patch: document and test --reroll-count In some output formats, the text with backticks surrounding it is shown in a different background color, which makes something like `{space}v` tempting (with appropriate definition of {space} in the attributes section of asciidoc.conf). But that feels way too subtle. How about something like has a space and `v` appended to it ? Thanks, Jonathan [1] https://kernel.googlesource.com/pub/scm/linux/kernel/git/torvalds/linux/+/9cd6681cb1169e815c41af0265165dd1b872f228/Documentation/process/submitting-patches.rst#563
Re: [PATCH] doc: correct command formatting
Adam Dinwoodie wrote: > Leaving spaces around the `-delimeters for commands means asciidoc fails > to parse them as the start of a literal string. Remove an extraneous > space that is causing a literal to not be formatted as such. > > Signed-off-by: Adam Dinwoodie > --- > Documentation/git.txt | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) Good catch. Reviewed-by: Jonathan Nieder
[PATCH v4] technical doc: add a design doc for hash function transition
This document describes what a transition to a new hash function for Git would look like. Add it to Documentation/technical/ as the plan of record so that future changes can be recorded as patches. Also-by: Brandon Williams Also-by: Jonathan Tan Also-by: Stefan Beller Signed-off-by: Jonathan Nieder --- On Thu, Mar 09, 2017 at 11:14 AM, Shawn Pearce wrote: > On Mon, Mar 6, 2017 at 4:17 PM, Jonathan Nieder wrote: >> Thanks for the kind words on what had quite a few flaws still. Here's >> a new draft. I think the next version will be a patch against >> Documentation/technical/. > > FWIW, I like this approach. Okay, here goes. Instead of sharding the loose object translation tables by first byte, we went for a single table. It simplifies the design and we need to keep the number of loose objects under control anyway. We also included a description of the transition plan and tried to include a summary of what has been agreed upon so far about the choice of hash function. Thanks to Junio for reviving the discussion and in particular to Dscho for pushing this forward and making the missing pieces clearer. Thoughts of all kinds welcome, as always. Documentation/Makefile | 1 + .../technical/hash-function-transition.txt | 797 + 2 files changed, 798 insertions(+) create mode 100644 Documentation/technical/hash-function-transition.txt diff --git a/Documentation/Makefile b/Documentation/Makefile index 2415e0d657..471bb29725 100644 --- a/Documentation/Makefile +++ b/Documentation/Makefile @@ -67,6 +67,7 @@ SP_ARTICLES += howto/maintain-git API_DOCS = $(patsubst %.txt,%,$(filter-out technical/api-index-skel.txt technical/api-index.txt, $(wildcard technical/api-*.txt))) SP_ARTICLES += $(API_DOCS) +TECH_DOCS += technical/hash-function-transition TECH_DOCS += technical/http-protocol TECH_DOCS += technical/index-format TECH_DOCS += technical/pack-format diff --git a/Documentation/technical/hash-function-transition.txt b/Documentation/technical/hash-function-transition.txt new file mode 100644 index 00..417ba491d0 --- /dev/null +++ b/Documentation/technical/hash-function-transition.txt @@ -0,0 +1,797 @@ +Git hash function transition + + +Objective +- +Migrate Git from SHA-1 to a stronger hash function. + +Background +-- +At its core, the Git version control system is a content addressable +filesystem. It uses the SHA-1 hash function to name content. For +example, files, directories, and revisions are referred to by hash +values unlike in other traditional version control systems where files +or versions are referred to via sequential numbers. The use of a hash +function to address its content delivers a few advantages: + +* Integrity checking is easy. Bit flips, for example, are easily + detected, as the hash of corrupted content does not match its name. +* Lookup of objects is fast. + +Using a cryptographically secure hash function brings additional +advantages: + +* Object names can be signed and third parties can trust the hash to + address the signed object and all objects it references. +* Communication using Git protocol and out of band communication + methods have a short reliable string that can be used to reliably + address stored content. + +Over time some flaws in SHA-1 have been discovered by security +researchers. https://shattered.io demonstrated a practical SHA-1 hash +collision. As a result, SHA-1 cannot be considered cryptographically +secure any more. This impacts the communication of hash values because +we cannot trust that a given hash value represents the known good +version of content that the speaker intended. + +SHA-1 still possesses the other properties such as fast object lookup +and safe error checking, but other hash functions are equally suitable +that are believed to be cryptographically secure. + +Goals +- +Where NewHash is a strong 256-bit hash function to replace SHA-1 (see +"Selection of a New Hash", below): + +1. The transition to NewHash can be done one local repository at a time. + a. Requiring no action by any other party. + b. A NewHash repository can communicate with SHA-1 Git servers + (push/fetch). + c. Users can use SHA-1 and NewHash identifiers for objects + interchangeably (see "Object names on the command line", below). + d. New signed objects make use of a stronger hash function than + SHA-1 for their security guarantees. +2. Allow a complete transition away from SHA-1. + a. Local metadata for SHA-1 compatibility can be removed from a + repository if compatibility with SHA-1 is no longer needed. +3. Maintainability throughout the process. + a. The object format is kept simple and consistent. + b. Creation of a generalized repository conversion tool. + +Non-Goals +- +1. Add NewHash support to Git protocol. This is valuable and the + logic
Re: [PATCH] diff: correct newline in summary for renamed files
Stefan Beller wrote: > In 146fdb0dfe (diff.c: emit_diff_symbol learns about DIFF_SYMBOL_SUMMARY, > 2017-06-29), the conversion from direct printing to the symbol emission > dropped the new line character for renamed, copied and rewritten files. > > Add the emission of a newline, add a test for this case. > > Reported-by: Linus Torvalds > Helped-by: Jeff King > Signed-off-by: Stefan Beller > --- > diff.c| 1 + > t/t4013-diff-various.sh | 12 > t/t4013/diff.diff-tree_--stat_initial_mode| 4 > t/t4013/diff.diff-tree_--summary_initial_mode | 3 +++ > t/t4013/diff.diff-tree_initial_mode | 3 +++ > t/t4013/diff.log_--decorate=full_--all| 6 ++ > t/t4013/diff.log_--decorate_--all | 6 ++ > 7 files changed, 35 insertions(+) > create mode 100644 t/t4013/diff.diff-tree_--stat_initial_mode > create mode 100644 t/t4013/diff.diff-tree_--summary_initial_mode > create mode 100644 t/t4013/diff.diff-tree_initial_mode Reviewed-by: Jonathan Nieder Thanks.
Re: Updated to v2.14.2 on macOS; git add --patch broken
Jonathan Nieder wrote: > Jeff King wrote: >> There aren't a lot of changes to the script between v2.13.2 and v2.14.2. >> The most plausible culprit is d5addcf522 (add--interactive: handle EOF >> in prompt_yesno, 2017-06-21), but I'm scratching my head over how that >> could cause what you're seeing. >> >> Are you able to build Git from source and bisect the problem? It would >> help to know which commit introduced the problem. > > How about this change? > > commit 136c8c8b8fa39f1315713248473dececf20f8fe7 > Author: Jeff King > Date: Thu Jul 13 11:07:03 2017 -0400 > > color: check color.ui in git_default_config() Uh, I think I was thinking of another thread when I wrote this. Sorry for the nonsense. > Toni, what is the output of "git config -l"? I'm still curious about this. Thanks, Jonathan
Re: Updated to v2.14.2 on macOS; git add --patch broken
Hi, Jeff King wrote: > On Wed, Sep 27, 2017 at 07:28:49PM +0200, Toni Uebernickel wrote: >> The previous version was v2.13.2. >> I switched back to this version, and it works perfectly fine; without any >> changes to my system. > > Thanks for confirming. > > There aren't a lot of changes to the script between v2.13.2 and v2.14.2. > The most plausible culprit is d5addcf522 (add--interactive: handle EOF > in prompt_yesno, 2017-06-21), but I'm scratching my head over how that > could cause what you're seeing. > > Are you able to build Git from source and bisect the problem? It would > help to know which commit introduced the problem. How about this change? commit 136c8c8b8fa39f1315713248473dececf20f8fe7 Author: Jeff King Date: Thu Jul 13 11:07:03 2017 -0400 color: check color.ui in git_default_config() Toni, what is the output of "git config -l"? Thanks, Jonathan
Re: RFC v3: Another proposed hash function transition plan
Hi, Johannes Schindelin wrote: > Sorry, you are asking cryptography experts to spend their time on the Git > mailing list. I tried to get them to speak out on the Git mailing list. > They respectfully declined. > > I can't fault them, they have real jobs to do, and none of their managers > would be happy for them to educate the Git mailing list on matters of > cryptography, not after what happened in 2005. Fortunately we have had a few public comments from crypto specialists: https://public-inbox.org/git/91a34c5b-7844-3db2-cf29-411df5bcf...@noekeon.org/ https://public-inbox.org/git/CAL9PXLzhPyE+geUdcLmd=pidt5p8efebbsgx_ds88knz2q_...@mail.gmail.com/ https://public-inbox.org/git/CAL9PXLxMHG1nP5_GQaK_WSJTNKs=_qbaL6V5v2GzVG=9vu2...@mail.gmail.com/ https://public-inbox.org/git/59bfb95d.1030...@st.com/ https://public-inbox.org/git/59c149a3.6080...@st.com/ [...] > Let's be realistic. Git is pretty important to us, but it is not important > enough to sway, say, Intel into announcing hardware support for SHA3. Yes, I agree with this. (Adoption by Git could lead to adoption by some other projects, leading to more work on high quality software implementations in projects like OpenSSL, but I am not convinced that that would be a good thing for the world anyway. There are downsides to a proliferation of too many crypto primitives. This is the basic argument described in more detail at [1].) [...] > On Tue, 26 Sep 2017, Jason Cooper wrote: >> For my use cases, as a user of git, I have a plan to maintain provable >> integrity of existing objects stored in git under sha1 while migrating >> away from sha1. The same plan works for migrating away from SHA2 or >> SHA3 when the time comes. > > Please do not make the mistake of taking your use case to be a template > for everybody's use case. That said, I'm curious at what plan you are alluding to. Is it something that could benefit others on the list? Thanks, Jonathan [1] https://www.imperialviolet.org/2017/05/31/skipsha3.html
Re: [PATCH] technical doc: add a design doc for hash function transition
Hi, Stefan Beller wrote: > From: Jonathan Nieder I go by jrnie...@gmail.com upstream. :) > This is "RFC v3: Another proposed hash function transition plan" from > the git mailing list. > > Signed-off-by: Jonathan Nieder > Signed-off-by: Jonathan Tan > Signed-off-by: Brandon Williams > Signed-off-by: Stefan Beller I hadn't signed-off on this version, but it's not a big deal. [...] > --- > > This takes the original Google Doc[1] and adds it to our history, > such that the discussion can be on on list and in the commit messages. > > * replaced SHA3-256 with NEWHASH, sha3 with newhash > * added section 'Implementation plan' > * added section 'Future work' > * added section 'Agreed-upon criteria for selecting NewHash' Thanks for sending this out. I had let it stall too long. As a tiny nit, I think NewHash is easier to read than NEWHASH. Not a big deal. More importantly, we need some text describing it and saying it's a placeholder. The implementation plan included here is out of date. It comes from an email where I was answering a question about what people can do to make progress, before this design had been agreed on. In the context of this design there are other steps we'd want to describe (having to do with implementing the translation table, etc). I also planned to add a description of the translation table based on what was discussed previously in this thread. Jonathan
Re: [PATCH 3/7] read_in_full: reset errno before reading
Jeff King wrote: > What do you think of patch 7 in light of this? If the short-read case > gives us a sane errno, should we even bother trying to consistently > handle its error separately? I like the read_exactly_or_die variant because it makes callers more concise, but on the other hand a caller handling the error can write a more meaningful error message with the right amount of context. So I think you're right that it's better to drop patch 7. Thanks, Jonathan
Re: [PATCH 3/7] read_in_full: reset errno before reading
Jeff King wrote: > On Mon, Sep 25, 2017 at 05:17:32PM -0700, Jonathan Nieder wrote: >> #ifndef EUNDERFLOW >> # ifdef ENODATA >> # define EUNDERFLOW ENODATA >> # else >> # define EUNDERFLOW ESPIPE >> # endif >> #endif > > Right, I think our mails just crossed but I'm leaning in this direction. > I'd prefer to call it SHORT_READ_ERRNO or something, though. Your > "#ifndef EUNDERFLOW" had me thinking that this was something that a real > platform might have, but AFAICT you just made it up. Agreed. Two of the risks of replying too quickly. :) Jonathan
Re: [PATCH 3/7] read_in_full: reset errno before reading
Jeff King wrote: > I definitely would prefer to avoid EIO, or anything that an actual > read() might return. > > What do you think of ENODATA? The glibc text for it is pretty > appropriate. If it's not available everywhere, we'd have to fallback to > something else (EIO? 0?). I don't know how esoteric it is. The likely > candidate to be lacking it is Windows. ENODATA with a fallback to ESPIPE sounds fine to me. read() would never produce ESPIPE because it doesn't seek. So that would be: /* errno value to use for early EOF */ #ifndef ENODATA #define ENODATA ESPIPE #endif Thanks, Jonathan
Re: [PATCH 3/7] read_in_full: reset errno before reading
Jonathan Nieder wrote: > On Mon, Sep 25, 2017 at 08:09:13PM -0400, Jeff King wrote: >> ENODATA is not too bad. On my glibc system it yields "No data available" >> from strerror(), which is at least comprehensible. >> >> We're still left with the question of whether it is defined everywhere >> (and what to fallback to when it isn't). > > So, > > #ifndef EUNDERFLOW > #ifdef ENODATA > #define ENODATA EUNDERFLOW > #else > #define ENODATA ESPIPE > #endif > #endif > > ? Uh, pretend I said #ifndef EUNDERFLOW # ifdef ENODATA # define EUNDERFLOW ENODATA # else # define EUNDERFLOW ESPIPE # endif #endif Sorry for the nonsense. Jonathan
Re: [PATCH 3/7] read_in_full: reset errno before reading
On Mon, Sep 25, 2017 at 08:09:13PM -0400, Jeff King wrote: > On Mon, Sep 25, 2017 at 05:06:58PM -0700, Stefan Beller wrote: >> After reading this discussion from the sideline, maybe >> >> ENODATA No message is available on the STREAM head read >> queue (POSIX.1) >> >> is not too bad after all. Or >> >> ESPIPE Invalid seek (POSIX.1) >> >> Technically we do not seek, but one could imagine we'd seek there >> to read? > > ENODATA is not too bad. On my glibc system it yields "No data available" > from strerror(), which is at least comprehensible. > > We're still left with the question of whether it is defined everywhere > (and what to fallback to when it isn't). So, #ifndef EUNDERFLOW #ifdef ENODATA #define ENODATA EUNDERFLOW #else #define ENODATA ESPIPE #endif #endif ? Windows has ESPIPE, and I'm not sure what other non-POSIX platform we need to worry about. Thanks, Jonathan
Re: [PATCH 3/7] read_in_full: reset errno before reading
Jeff King wrote: > On Mon, Sep 25, 2017 at 04:55:10PM -0700, Jonathan Nieder wrote: >> All that really matters is the strerror() that the user would see. > > Agreed, though I didn't think those were necessarily standardized. The standard allows them to vary for the sake of internationalization, but they are de facto constants (probably because of application authors like us abusing them ;-)). [...] >> Of course, it's even >> better if we fix the callers and don't try to wedge this into errno. > > Yes. This patch is just a stop-gap. Perhaps we should abandon it > entirely. But I couldn't find a way to fix all the callers. If you have > a function that just returns "-1" when it sees a read_in_full() error, > how does _its_ caller tell the difference? > > I guess the answer is that the interface of the sub-function calling > read_in_full() needs to change. Yes. :/ >> If you are okay with the too-long/too-short confusion in EOVERFLOW, then >> there is EMSGSIZE: >> >> Message too long > > I don't really like that one either. > > I actually prefer "0" of the all of the options discussed. At least it > is immediately clear that it is not a syscall error. I have a basic aversion to ": Success" in error messages. Whenever I see such an error message, I stop trusting the program that produced it not to be seriously buggy. Maybe I'm the only one? If no actual errno works, we could make a custom strerror-like function with our own custom errors (making them negative to avoid clashing with standard errno values), but this feels like overkill. In the same spirit of misleadingly confusing too-long and too-short, there is also ERANGE ("Result too large"), which doesn't work here. There's also EPROTO "Protocol error", but that's about protocols, not file formats. More vague and therefor maybe more applicable is EBADMSG "Bad message". There's also ENOMSG "No message of the desired type". If the goal is to support debugging, an error like EPIPE "Broken pipe" or ESPIPE "Invalid seek" would be likely to lead me in the right direction (wrong file size), even though it is misleading about how the error surfaced. We could also avoid trying to be cute and use something generic like EIO "Input/output error". Jonathan
Re: [PATCH] t7406: submodule..update command must not be run from .gitmodules
Stefan Beller wrote: > submodule..update can be assigned an arbitrary command via setting > it to "!command". When this command is found in the regular config, Git > ought to just run that command instead of other update mechanisms. > > However if that command is just found in the .gitmodules file, it is > potentially untrusted, which is why we do not run it. Add a test > confirming the behavior. > > Suggested-by: Jonathan Nieder > Signed-off-by: Stefan Beller > --- > t/t7406-submodule-update.sh | 14 ++ > 1 file changed, 14 insertions(+) > > diff --git a/t/t7406-submodule-update.sh b/t/t7406-submodule-update.sh > index 034914a14f..d718cb00e7 100755 > --- a/t/t7406-submodule-update.sh > +++ b/t/t7406-submodule-update.sh > @@ -406,6 +406,20 @@ test_expect_success 'submodule update - command in > .git/config' ' > ) > ' > > +test_expect_success 'submodule update - command in .gitmodules is ignored' ' > + test_when_finished "git -C super reset --hard HEAD^" && > + > + write_script must_not_run.sh <<-EOF && > + >$TEST_DIRECTORY/bad > + EOF > + > + git -C super config -f .gitmodules submodule.submodule.update > "!$TEST_DIRECTORY/must_not_run.sh" && Long line, but I don't think I care. I wish there were a tool like "make style" to format shell scripts. > + git -C super commit -a -m "add command to .gitmodules file" && > + git -C super/submodule reset --hard $submodulesha1^ && > + git -C super submodule update submodule && > + test_path_is_missing bad > +' Per offline discussion, you tested that this fails when you use .git/config instead of .gitmodules, so there aren't any subtle typos here. :) Reviewed-by: Jonathan Nieder Thanks for writing it.
Re: [PATCH 3/7] read_in_full: reset errno before reading
Jeff King wrote: > [EOVERFLOW] > The file is a regular file, nbyte is greater than 0, the starting > position is before the end-of-file, and the starting position is > greater than or equal to the offset maximum established in the open > file description associated with fildes. > > That's not _exactly_ what's going on here, but it's pretty close. And is > what you would get if you implemented read_exactly() in terms of > something like pread(). All that really matters is the strerror() that the user would see. For EOVERFLOW, that is Value too large to be stored in data type For EILSEQ, it is Illegal byte sequence Neither is perfect, but the latter seems like a better fit for the kind of file format errors we're describing here. Of course, it's even better if we fix the callers and don't try to wedge this into errno. If you are okay with the too-long/too-short confusion in EOVERFLOW, then there is EMSGSIZE: Message too long Hope that helps, Jonathan
Re: [PATCH 3/7] read_in_full: reset errno before reading
Jeff King wrote: > On Mon, Sep 25, 2017 at 04:33:16PM -0700, Jonathan Nieder wrote: >> Jef King wrote: >>> errno = 0; >>> read_in_full(fd, buf, sizeof(buf)); >>> if (errno) >>> die_errno("oops"); [...] >>> in general we frown on it for calls like >>> read(). >> >> Correct. Actually more than "frown on": except for with the few calls >> like strtoul that are advertised to work this way, POSIX does not make >> the guarantee the above code would rely on, at all. >> >> So it's not just frowned upon: it's so unportable that the standard >> calls it out as something that won't work. > > Is it unportable? Certainly read() is free reset errno to zero on > success. But is it allowed to set it to another random value? > > I think we're getting pretty academic here, but I'm curious if you have > a good reference. Yes, it is allowed to set it to another random value. It's a common thing for implementations to do when they hit a recoverable error, which they sometimes do (e.g. think EFAULT, EINTR, ETIMEDOUT, or an implementation calling strtoul and getting EINVAL or ERANGE). glibc only recently started trying to avoid this kind of behavior, because even though the standard allows it, users hate it. POSIX.1-2008 XSI 2.3 "Error Numbers" says Some functions provide the error number in a variable accessed through the symbol errno, defined by including the header. The value of errno should only be examined when it is indicated to be valid by a function's return value. See http://austingroupbugs.net/view.php?id=374 for an example where someone wanted to add a new guarantee of that kind to a function. Hope that helps, Jonathan
Re: [PATCH] Documentation: consolidate submodule..update
Stefan Beller wrote: > By as-is I refer to origin/pu. Ah, okay. I'm not in that habit because pu frequently loses topics --- it's mostly useful as a tool to (1) distribute topic branches and (2) check whether they are compatible with each other. [...] > On Mon, Sep 25, 2017 at 3:22 PM, Jonathan Nieder wrote: >> Maybe we should work on first wordsmithing the doc and then figuring >> out where it goes? The current state of the doc with (?) three >> different texts, > > such that each different text highlights each locations purpose. > >> all wrong, > > Care to spell out this bold claim? In the state of "master", "man git-config" tells me that [submodule ""] update = ... sets the "default update procedure for a submodule". It suggests that I read about "git submodule update" in git-submodule(1) for more details. This is problematic because 1. It doesn't tell me when I should and should not set it. 2. I would expect "git checkout --recurse-submodules" to respect the default update procedure. 3. It doesn't tell me what commands like "git fetch --recurse-submodules" will do. 4. It doesn't point me to better alternatives, when this configuration doesn't fit my use case. "man git-submodule" doesn't have a section on submodule..update. It has references scattered throughout the manpage. It only tells me how the setting affects the "git submodule update" command and doesn't address the problems (1)-(4). "man gitmodules" also refers to this setting as the "default update procedure for the named submodule". It doesn't answer the questions (1)-(4) either. Thanks and hope that helps, Jonathan
Re: [PATCH 3/7] read_in_full: reset errno before reading
Jeff King wrote: >> --- a/wrapper.c >> +++ b/wrapper.c >> @@ -318,8 +318,10 @@ ssize_t read_in_full(int fd, void *buf, size_t count) >> ssize_t loaded = xread(fd, p, count); >> if (loaded < 0) >> return -1; >> -if (loaded == 0) >> +if (loaded == 0) { >> +errno = EILSEQ; >> return total; >> +} > > If I do this: > > errno = 0; > read_in_full(fd, buf, sizeof(buf)); > if (errno) > die_errno("oops"); > > then we'll claim an error, even though there was none (remember that > it's only an error for _some_ callers to not read the whole length). > > This may be sufficiently odd that we don't need to care about it. There > are some calls (like strtoul) which require this kind of clear-and-check > strategy with errno, but in general we frown on it for calls like > read(). Correct. Actually more than "frown on": except for with the few calls like strtoul that are advertised to work this way, POSIX does not make the guarantee the above code would rely on, at all. So it's not just frowned upon: it's so unportable that the standard calls it out as something that won't work. > We could also introduce a better helper like this: > > int read_exactly(int fd, void *buf, size_t count) > { > ssize_t ret = read_in_full(fd, buf, count); > if (ret < 0) > return -1; > if (ret != count) { > errno = EILSEQ; > return -1; > } > return 0; > } > > Then we know that touching errno always coincides with an error return. > And it's shorter to check "< 0" compared to "!= count" in the caller. > But of course a caller which wants to distinguish the two cases for its > error messages then has to look at errno: That sounds nice, but it doesn't solve the original problem of callers using read_in_full that way. Thanks, Jonathan
Re: git ls-tree -d doesn't work if the specified path is the repository root?
Hi, Roy Wellington wrote: > When I run `git ls-tree -d HEAD -- subdir` from the root of my > repository, where `subdir` is a subdirectory in that root, I get the > treehash of that subdirectory. This is what I expect. > > However, if I merely replace `subdir` with `.` (the root of the > repository), (i.e., `git ls-tree -d HEAD -- .`) git ls-tree returns > the treehashes of the /children/ of the root, instead of the root > itself, contrary to the documented behavior of -d. Can you be more specific? What documentation led you to this expectation? I don't have a strong opinion about this behavior, but in any case if the documentation and command disagree about the correct behavior then one of them needs to be fixed. :) > Is there some reason for this? This behavior seems like a bug to me: > it means that prior to calling ls-tree I need to check if the > referenced path happens to be the root, and if so, find some other > means (rev-parse?) of converting it to a treehash. Does git rev-parse HEAD:subdir work better for what you're trying to accomplish? To get the root of the repository, you can use git rev-parse HEAD: Thanks and hope that helps, Jonathan
Re: [PATCH] Documentation: consolidate submodule..update
Stefan Beller wrote: > On Mon, Sep 25, 2017 at 12:17 PM, Brandon Williams wrote: >> On 09/25, Stefan Beller wrote: >>> Have one place to explain the effects of setting submodule..update >>> instead of two. >>> >>> Signed-off-by: Stefan Beller >>> --- > I disagree. Actually, I think the git-config(1) blurb could just > point here, and that the text here ought to be clear about what > commands it affects and how an end user would use it. I tend to agree with the consolidation. >>> >>> Something like this? >> >> I like the consolidation, its easier to keep up to date when its only in >> one place. > > After thinking about it further, I'd like to withdraw this patch > for now. > > The reason is that this would also forward point to > git-submodule.txt, such that we'd still have 2 places > in which it is explained. The current situation with explaining > it in 3 places is not too bad as each place hints at their specific > circumstance: > git-submodule.txt explains the values to set itself. > gitmodules.txt explains what the possible fields in that file are, > and regarding this field it points out it is ignored in the init call. > config.txt: actually describe the effects of the setting. > > I think we can keep it as-is for now. Sorry, I got lost. Which state is as-is? As a user, how do I find out what submodule.*.update is going to do and which commands will respect it? Maybe we should work on first wordsmithing the doc and then figuring out where it goes? The current state of the doc with (?) three different texts, all wrong, doesn't seem like a good state to preserve. Thanks, Jonathan
Re: [PATCH 7/7] add xread_in_full() helper
Jeff King wrote: > Many callers of read_in_full() expect to see exactly "len" > bytes, and die if that isn't the case. micronit: Can this be named read_in_full_or_die? Otherwise it's too easy to mistake for a function like xread, which has different semantics. I realize that xmalloc, xmemdupz, etc use a different convention. That's yet another reason to be explicit, IMHO. Thanks, Jonathan
Re: [PATCH 6/7] worktree: check the result of read_in_full()
Jeff King wrote: > We try to read "len" bytes into a buffer and just assume > that it happened correctly. In practice this should usually > be the case, since we just stat'd the file to get the > length. But we could be fooled by transient errors or by > other processes racily truncating the file. > > Let's be more careful. There's a slim chance this could > catch a real error, but it also prevents people and tools > from getting worried while reading the code. > > Signed-off-by: Jeff King > --- > builtin/worktree.c | 6 +- > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff --git a/builtin/worktree.c b/builtin/worktree.c > index 2f4a4ef9cd..87b3d70b0b 100644 > --- a/builtin/worktree.c > +++ b/builtin/worktree.c > @@ -59,7 +59,11 @@ static int prune_worktree(const char *id, struct strbuf > *reason) > } > len = xsize_t(st.st_size); > path = xmallocz(len); > - read_in_full(fd, path, len); > + if (read_in_full(fd, path, len) != len) { > + strbuf_addf(reason, _("Removing worktrees/%s: gitdir read did > not match stat (%s)"), > + id, strerror(errno)); I'm a little confused. The 'if' condition checks for a read error but the message says something about 'stat'. If we're trying to double-check the 'stat' result, shouldn't we read all the way to EOF in case the file got longer? Puzzled, Jonathan
Re: [PATCH 5/7] worktree: use xsize_t to access file size
Jeff King wrote: > To read the "gitdir" file into memory, we stat the file and > allocate a buffer. But we store the size in an "int", which > may be truncated. We should use a size_t and xsize_t(), > which will detect truncation. > > An overflow is unlikely for a "gitdir" file, but it's a good > practice to model. > > Signed-off-by: Jeff King > --- > builtin/worktree.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) Reviewed-by: Jonathan Nieder
Re: [PATCH 4/7] get-tar-commit-id: prefer "!=" for read_in_full() error check
Jeff King wrote: > Suggested-by: Jonathan Nieder > Signed-off-by: Jeff King > --- > builtin/get-tar-commit-id.c | 6 ++ > 1 file changed, 2 insertions(+), 4 deletions(-) Reviewed-by: Jonathan Nieder
Re: [PATCH 3/7] read_in_full: reset errno before reading
Jeff King wrote: > In an ideal world, callers would always distinguish between > these cases and give a useful message for each. But as an > easy way to make our imperfect world better, let's reset > errno to a known value. The best we can do is "0", which > will yield something like: > > unable to read: Success > > That's not great, but at least it's deterministic and makes > it clear that we didn't see an error from read(). Yuck. Can we set errno to something more specific instead? read(2) also doesn't promise not to clobber errno on success. How about something like the following? -- >8 -- Subject: read_in_full: set errno for partial reads Many callers of read_in_full() complain when we do not read their full byte-count. But a check like: if (read_in_full(fd, buf, len) != len) return error_errno("unable to read"); conflates two problem conditions: 1. A real error from read(). 2. There were fewer than "len" bytes available. In the first case, showing the user strerror(errno) is useful. In the second, we may see a random errno that was set by some previous system call. In an ideal world, callers would always distinguish between these cases and give a useful message for each. But as an easy way to make our imperfect world better, let's set errno in the second case to a known value. There is no standard "Unexpected end of file" errno value, so instead use EILSEQ, which yields a message like unable to read: Illegal byte sequence That's not great, but at least it's deterministic and makes it possible to reverse-engineer what went wrong. Change-Id: I48138052f71b7c6cf35c2d00a10dc8febaca4f10 Signed-off-by: Jonathan Nieder --- wrapper.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/wrapper.c b/wrapper.c index 61aba0b5c1..1842a99b87 100644 --- a/wrapper.c +++ b/wrapper.c @@ -318,8 +318,10 @@ ssize_t read_in_full(int fd, void *buf, size_t count) ssize_t loaded = xread(fd, p, count); if (loaded < 0) return -1; - if (loaded == 0) + if (loaded == 0) { + errno = EILSEQ; return total; + } count -= loaded; p += loaded; total += loaded; -- 2.14.1.821.g8fa685d3b7
Re: [PATCH 2/7] notes-merge: drop dead zero-write code
Jeff King wrote: > We call write_in_full() with a size that we know is greater > than zero. The return value can never be zero, then, since > write_in_full() converts such a failed write() into ENOSPC > and returns -1. We can just drop this branch of the error > handling entirely. > > Suggested-by: Jonathan Nieder > Signed-off-by: Jeff King > --- > notes-merge.c | 2 -- > 1 file changed, 2 deletions(-) Reviewed-by: Jonathan Nieder Thanks for tying up this loose end.
Re: [PATCH 1/7] files-backend: prefer "0" for write_in_full() error check
Jeff King wrote: > Commit 06f46f237a (avoid "write_in_full(fd, buf, len) != > len" pattern, 2017-09-13) converted this callsite from: > > write_in_full(...) != 1 > > to > > write_in_full(...) < 0 > > But during the conflict resolution in c50424a6f0 (Merge > branch 'jk/write-in-full-fix', 2017-09-25), this morphed > into > > write_in_full(...) < 1 > > This behaves as we want, but we prefer to avoid modeling the > "less than length" error-check which can be subtly buggy, as > shown in efacf609c8 (config: avoid "write_in_full(fd, buf, > len) < len" pattern, 2017-09-13). > > Signed-off-by: Jeff King > --- > refs/files-backend.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) Good eyes. Thanks. Reviewed-by: Jonathan Nieder By the way, the description from that merge commit Many codepaths did not diagnose write failures correctly when disks go full, due to their misuse of write_in_full() helper function, which have been corrected. does not look accurate to me. At least the "Many codepaths" part. All of those changes except for three should be no-ops. The scariest one is the 'long ret = write_in_full(fd, buf, size)' in notes-merge.c, which is about overflowing a "long" on Windows instead of error handling. Thanks, Jonathan
Re: [PATCH] t7406: submodule..update command must not be run from .gitmodules
Stefan Beller wrote: > submodule..update can be assigned an arbitrary command via setting > it to "!command". When this command is found in the regular config, Git > ought to just run that command instead of other update mechanisms. > > However if that command is just found in the .gitmodules file, it is > potentially untrusted, which is why we do not run it. Add a test > confirming the behavior. > > Suggested-by: Jonathan Nieder > Signed-off-by: Stefan Beller > --- > t/t7406-submodule-update.sh | 10 ++ > 1 file changed, 10 insertions(+) > > diff --git a/t/t7406-submodule-update.sh b/t/t7406-submodule-update.sh > index 034914a14f..780af4e6f5 100755 > --- a/t/t7406-submodule-update.sh > +++ b/t/t7406-submodule-update.sh > @@ -406,6 +406,16 @@ test_expect_success 'submodule update - command in > .git/config' ' > ) > ' > > +test_expect_success 'submodule update - command in .gitmodules is ignored' ' > + test_when_finished "git -C super reset --hard HEAD^" && > + > + git -C super config -f .gitmodules submodule.submodule.update "!false > || echo >bad" && What does the '!false || echo >bad' do? Ideally we want this test to be super robust: e.g. if it runs the command but from a different directory, we still want the test to fail, and if it runs the command but using exec instead of a shell, we still want the test to fail. Maybe write_script would help with this. E.g. would something like test_when_finished ... && write_script must_not_run.sh <<-EOF && >$TEST_DIRECTORY/bad EOF git -C super config -f .gitmodules submodule.submodule.update \ "!$TEST_DIRECTORY/must_not_run.sh" && ... work? Thanks, Jonathan
Re: [PATCH v2 1/1] Win32: simplify loading of DLL functions
Johannes Schindelin wrote: > Dynamic loading of DLL functions is duplicated in several places in Git > for Windows' source code. > > This patch adds a pair of macros to simplify the process: the > DECLARE_PROC_ADDR(, , , > ..) macro to be used at the beginning of a > code block, and the INIT_PROC_ADDR() macro to call before > using the declared function. The return value of the INIT_PROC_ADDR() > call has to be checked; If it is NULL, the function was not found in the > specified DLL. > > Example: > > DECLARE_PROC_ADDR(kernel32.dll, BOOL, CreateHardLinkW, > LPCWSTR, LPCWSTR, LPSECURITY_ATTRIBUTES); > > if (!INIT_PROC_ADDR(CreateHardLinkW)) > return error("Could not find CreateHardLinkW() function"; > > if (!CreateHardLinkW(source, target, NULL)) > return error("could not create hardlink from %S to %S", >source, target); > return 0; > > Signed-off-by: Karsten Blees > Signed-off-by: Johannes Schindelin > Reviewed-by: Jonathan Nieder Yep, this is indeed Reviewed-by: Jonathan Nieder > --- > compat/win32/lazyload.h | 57 > + > 1 file changed, 57 insertions(+) > create mode 100644 compat/win32/lazyload.h Thanks, Jonathan
Re: [PATCHv2] Documentation/config: clarify the meaning of submodule..update
Stefan Beller wrote: > Reported-by: Lars Schneider > Signed-off-by: Stefan Beller > --- > Documentation/config.txt | 12 > 1 file changed, 8 insertions(+), 4 deletions(-) > > Jonathan writes: >> You'll want to update Documentation/gitmodules.txt, too. > > No. /grumpycat > > It should already be fine, as I read it as if it is only relevant to > "git submodule update" there, already: > > submodule..update:: > Defines the default update procedure for the named submodule, > i.e. how the submodule is updated by "git submodule update" > command in the superproject. This is only used by `git > submodule init` to initialize the configuration variable of > the same name. Allowed values here are 'checkout', 'rebase', > 'merge' or 'none'. See description of 'update' command in > linkgit:git-submodule[1] for their meaning. Note that the > '!command' form is intentionally ignored here for security > reasons. I disagree. Actually, I think the git-config(1) blurb could just point here, and that the text here ought to be clear about what commands it affects and how an end user would use it.
Re: [PATCH] Documentation/config: clarify the meaning of submodule..update
Hi, Stefan Beller wrote: > With more commands (that potentially change a submodule) paying attention > to submodules as well as the recent discussion[1] on submodule..update, > let's spell out that submodule..update is strictly to be used > for configuring the "submodule update" command and not to be obeyed > by other commands. Good idea, thank you. You'll want to update Documentation/gitmodules.txt, too. I think this can go further: it should say explicitly that commands like "git checkout --recurse-submodules" do not pay attention to this option. > These other commands usually have a strict meaning of what they should > do (i.e. checkout, reset, rebase, merge) as well as have their name > overlapping with the modes possible for submodule..update. > > [1] > https://public-inbox.org/git/4283f0b0-bc1c-4ed1-8126-7e512d844...@gmail.com/ Can you summarize what this discussion concluded with so the reader does not have to look far to understand it? > Signed-off-by: Stefan Beller Reported-by: Lars Schneider > --- > Documentation/config.txt | 7 +++ > 1 file changed, 3 insertions(+), 4 deletions(-) > > diff --git a/Documentation/config.txt b/Documentation/config.txt > index dc4e3f58a2..b0ded777fe 100644 > --- a/Documentation/config.txt > +++ b/Documentation/config.txt > @@ -3085,10 +3085,9 @@ submodule..url:: > See linkgit:git-submodule[1] and linkgit:gitmodules[5] for details. > > submodule..update:: > - The default update procedure for a submodule. This variable > - is populated by `git submodule init` from the > - linkgit:gitmodules[5] file. See description of 'update' > - command in linkgit:git-submodule[1]. > + The method how a submodule is updated via 'git submodule update'. > + It is populated by `git submodule init` from the linkgit:gitmodules[5] > + file. See description of 'update' command in linkgit:git-submodule[1]. > Wording nits: s/The method how/The method by which/; s/via/by/ More importantly, can this be more explicit about how it is meant to be used? E.g. to say 1. This only affects "git submodule update" and doesn't affect commands like "git checkout --recurse-submodules". 2. It exists for historical reasons; settings like submodule.active and pull.rebase are more likely to be what someone is looking for. Thanks and hope that helps, Jonathan
Re: [PATCH 1/4] cat-file: handle NULL object_context.path
Jeff King wrote: > Commit dc944b65f1 (get_sha1_with_context: dynamically > allocate oc->path, 2017-05-19) changed the rules that > callers must follow for seeing if we parsed a path in the > object name. The rules switched from "check if the oc.path > buffer is empty" to "check if the oc.path pointer is NULL". > But that commit forgot to update some sites in > cat_one_file(), meaning we might dereference a NULL pointer. > > You can see this by making a path-aware request like > --textconv without specifying --path, and giving an object > name that doesn't have a path in it. Like: > > git cat-file --textconv HEAD > > which will reliably segfault. > > Signed-off-by: Jeff King > --- > builtin/cat-file.c | 4 ++-- > t/t8010-cat-file-filters.sh | 5 + > 2 files changed, 7 insertions(+), 2 deletions(-) Yikes. Commit dc944b65f1 even touched this function, so we reviewers have no excuse for not having found it. Technically this changes the behavior of cat-file --path='', but I don't think that matters. Do other GET_SHA1_RECORD_PATH callers need similar treatment? * builtin/grep.c appears to do the right thing (it stores NULL in list, so it passes NULL to grep_object, which calls grep_oid, which calls grep_source_init, which stores NULL for the grep machinery that is able to cope with a NULL). * builtin/log.c is correctly updated as part of the patch. Those are the only other callers. So we're safe. *phew* Reviewed-by: Jonathan Nieder
Re: [PATCH v1] travis-ci: fix "skip_branch_tip_with_tag()" string comparison
larsxschnei...@gmail.com wrote: > 09f5e97 ("travis-ci: skip a branch build if equal tag is present", > 2017-09-17) introduced the "skip_branch_tip_with_tag" function with > a broken string comparison. Fix it! > > Reported-by: SZEDER Gábor > Signed-off-by: Lars Schneider > --- Thanks for the fix. 09f5e97 appears to be for the ls/travis-scriptify branch, which is already part of "next" (if it weren't, I'd suggest just squashing your patch into that commit). > --- a/ci/lib-travisci.sh > +++ b/ci/lib-travisci.sh > @@ -14,7 +14,7 @@ skip_branch_tip_with_tag () { > # of a tag. > > if TAG=$(git describe --exact-match "$TRAVIS_BRANCH" 2>/dev/null) && > - $TAG != $TRAVIS_BRANCH > + [ "$TAG" != "$TRAVIS_BRANCH" ] Git style is to use 'test' instead of '[' for this. See https://public-inbox.org/git/2f3cdc85-f051-c0ae-b9db-fd13cac78...@gmail.com/ for more on that subject. Could you squash in the following? Thanks, Jonathan diff --git i/ci/lib-travisci.sh w/ci/lib-travisci.sh index c3b46f4a7d..b3ed0a0dda 100755 --- i/ci/lib-travisci.sh +++ w/ci/lib-travisci.sh @@ -14,7 +14,7 @@ skip_branch_tip_with_tag () { # of a tag. if TAG=$(git describe --exact-match "$TRAVIS_BRANCH" 2>/dev/null) && - [ "$TAG" != "$TRAVIS_BRANCH" ] + test "$TAG" != "$TRAVIS_BRANCH" then echo "Tip of $TRAVIS_BRANCH is exactly at $TAG" exit 0
Re: [PATCH] revision: replace "struct cmdline_pathspec" with argv_array
Jeff King wrote: > On Wed, Sep 20, 2017 at 03:48:26PM -0700, Jonathan Nieder wrote: >> Jeff King wrote: >>> Subject: [PATCH] revision: replace "struct cmdline_pathspec" with argv_array >>> >>> We assemble an array of strings in a custom struct, >>> NULL-terminate the result, and then pass it to >>> parse_pathspec(). >>> >>> But then we never free the array or the individual strings >>> (nor can we do the latter, as they are heap-allocated when >>> they come from stdin but not when they come from the >>> passed-in argv). [...] >> Except... is the idea that this allows the strings from stdin to be >> freed sooner, as soon as they have been parsed into a "struct >> pathspec"? > > Well, no...the idea is that this is a function which leaks a bunch of > memory, and we shouldn't have to think hard about how often its leak can > be triggered or how severe it is. We should just fix it. I forgot to say: thanks for writing such a pleasant patch. Reviewed-by: Jonathan Nieder [...] > I certainly agree that the pathspec interface could use better > documentation. Patches welcome? :) Here's such a patch. -- 8< -- Subject: pathspec doc: parse_pathspec does not maintain references to args The command line arguments passed to main() are valid for the life of a program, but the same is not true for all other argv-style arrays (e.g. when a caller creates an argv_array). Clarify that parse_pathspec does not rely on the argv passed to it to remain valid. This makes it easier to tell that callers like "git rev-list --stdin" are safe and ensures that that is more likely to remain true as the implementation of parse_pathspec evolves. Signed-off-by: Jonathan Nieder --- pathspec.c | 4 pathspec.h | 7 +++ 2 files changed, 7 insertions(+), 4 deletions(-) diff --git a/pathspec.c b/pathspec.c index e2a23ebc96..cdefdc7cc0 100644 --- a/pathspec.c +++ b/pathspec.c @@ -526,10 +526,6 @@ static void NORETURN unsupported_magic(const char *pattern, pattern, sb.buf); } -/* - * Given command line arguments and a prefix, convert the input to - * pathspec. die() if any magic in magic_mask is used. - */ void parse_pathspec(struct pathspec *pathspec, unsigned magic_mask, unsigned flags, const char *prefix, const char **argv) diff --git a/pathspec.h b/pathspec.h index 60e6500401..6420d1080a 100644 --- a/pathspec.h +++ b/pathspec.h @@ -70,6 +70,13 @@ struct pathspec { */ #define PATHSPEC_LITERAL_PATH (1<<6) +/* + * Given command line arguments and a prefix, convert the input to + * pathspec. die() if any magic in magic_mask is used. + * + * Any arguments used are copied. It is safe for the caller to modify + * or free 'prefix' and 'args' after calling this function. + */ extern void parse_pathspec(struct pathspec *pathspec, unsigned magic_mask, unsigned flags, -- 2.14.1.821.g8fa685d3b7
Re: [PATCH] revision: replace "struct cmdline_pathspec" with argv_array
Hi, Jeff King wrote: > But mostly I am fundamentally against using UNLEAK() in a case like > this, because it does not match either of the properties which justified > adding UNLEAK() in the first place: > > 1. We are about to exit the program, so the "leak" is only caused by > the memory going out of scope at that exit. > > By contrast, the revision machinery may be called many times in the > same program. > > 2. The memory remains useful until around the time of program exit. > > This most certainly does not, as it would not even be reachable. [...] > On Wed, Sep 20, 2017 at 03:48:26PM -0700, Jonathan Nieder wrote: >> In other words, proposed changes: >> >> 1. Could the commit message describe what effect this would have on >> maximum heap usage, if any? (In qualitative terms is fine, though >> actual numbers would be even better if it's easy to get them.) >> That would make it easier to justify not using UNLEAK. > > What wording are you looking for? It was a leak, and now it's gone. The > size of the leak depends on how much you feed to --stdin. IMHO using > UNLEAK is totally inappropriate for this case, and doesn't even seem > like an alternative worth rejecting. > >> 2. Can parse_pathspec get a comment in pathspec.h saying that it >> defensively copies anything it needs from args so the caller is >> free to modify or free it? That way, it should be more obvious >> to people in the future modifying parse_pathspec() that callers >> may rely on that. (The current API comment describes argv as >> "command line arguments", which I fear would send the opposite >> message to implementors.) > > I certainly agree that the pathspec interface could use better > documentation. Patches welcome? :) I think I failed at communicating here. That is not what I meant at all. The context is that Git (especially older parts of it) suffers from a pretty severe lack of API documentation. For newcomers that is especially obvious --- long-time git contributors, on the other hand, may get used to patterns and common interfaces and may have trouble seeing just how bad the lack of clearly communicated API contracts is. There is a bit of a "broken window" problem, too: authors of one-off patches may reasonably assume from existing code that this is just the way it is and, lacking examples of how to document an API, add more underdocumented API contracts. The patch I am replying to tightens the contract for parse_pathspec(). I am not asking for comprehensive documentation of that function --- that would be clearly off-topic for the patch. Instead, I am saying that we should document what we are newly requiring of the function in the patch. That way, the documented contract becomes clearer over time as people document what they are relying on. I think of that as generally a good practice. In other words, it was not at all obvious that "(2) The memory remains useful until around the time of program exit" did not hold. To a casual reader it instead looks like (2) does hold, and there's no documentation short of delving into the implementation of parse_pathspec() to convince a reader otherwise. The documentation that is present leads to the opposite conclusion. The assertion (1) that this allocation is going to happen multiple times in a program isn't true either. As you noted, we only read stdin once. But that doesn't matter as long as (2) doesn't hold. TBH saying that I should write the one-sentence doc patch feels like a cop-out. Just like it is not sustainable for those reviewers that are interested in good test coverage to be the only ones who write tests, I think we cannot avoid treating documentation of API contracts as a shared responsibility. Thanks and hope that clarifies, Jonathan
Re: [PATCH] revision: replace "struct cmdline_pathspec" with argv_array
Hi, Jeff King wrote: > Subject: [PATCH] revision: replace "struct cmdline_pathspec" with argv_array > > We assemble an array of strings in a custom struct, > NULL-terminate the result, and then pass it to > parse_pathspec(). > > But then we never free the array or the individual strings > (nor can we do the latter, as they are heap-allocated when > they come from stdin but not when they come from the > passed-in argv). To be devil's advocate for a moment: why don't we use UNLEAK on the paths passed in from stdin? It's true that there can be an unbounded number of them, but they all coexist in memory anyway. They are not generated dynamically on the fly. Being able to free them doesn't have any obvious advantage over using exit(). Except... is the idea that this allows the strings from stdin to be freed sooner, as soon as they have been parsed into a "struct pathspec"? That sounds appealing. The only risk would be if "struct pathspec" holds on to a pointer to one of these strings. Fortunately parse_pathspec() is careful to strdup any strings it borrows (though it is not documented to do so). In other words, proposed changes: 1. Could the commit message describe what effect this would have on maximum heap usage, if any? (In qualitative terms is fine, though actual numbers would be even better if it's easy to get them.) That would make it easier to justify not using UNLEAK. 2. Can parse_pathspec get a comment in pathspec.h saying that it defensively copies anything it needs from args so the caller is free to modify or free it? That way, it should be more obvious to people in the future modifying parse_pathspec() that callers may rely on that. (The current API comment describes argv as "command line arguments", which I fear would send the opposite message to implementors.) > Let's swap this out for an argv_array. It does the same > thing with fewer lines of code, and it's safe to call > argv_array_clear() at the end to avoid a memory leak. > > Reported-by: Martin Ågren > Signed-off-by: Jeff King > --- > revision.c | 39 +++ > 1 file changed, 11 insertions(+), 28 deletions(-) The code looks good. Thanks, Jonathan
Re: [PATCH] commit: fix memory leak in `reduce_heads()`
Martin Ågren wrote: > We don't free the temporary scratch space we use with > `remove_redundant()`. Free it similar to how we do it in > `get_merge_bases_many_0()`. > > Signed-off-by: Martin Ågren > --- > commit.c | 1 + > 1 file changed, 1 insertion(+) Good catch. Thanks. Reviewed-by: Jonathan Nieder
Re: [PATCH] Win32: simplify loading of DLL functions
Johannes Schindelin wrote: > On Tue, 19 Sep 2017, Jonathan Nieder wrote: >> Could this example go near the top of the header instead? That way, >> it's easier for people reading the header to see how to use it. > > Funny, I am *so* used to examples being at the very end, from tutorials to > man pages. > > If my experience is any indication, I would rather keep this order. Sorry for the lack of clarity. I meant "near the top of the header *file*". [...] >> Are any of the Git for Windows users something that could go upstream >> along with this patch? That would help illustrate what a good caller >> looks like, which should help with reviewing future patches that use >> this code. > > I do not currently have the time to do that, that's why I did not > accompany the patch by any user. > > However, having said that, Ben's patch series will make for an *excellent* > user, fulfilling your wish. Ok. I think what you are saying is "go ahead --- anyone is welcome to grab some patches from git for windows and upstream them", which is a perfectly reasonable answer. [...] >> Reviewed-by: Jonathan Nieder > > Okay, I'll add that for v2. Will wait a couple of days in case more stuff > crops up. FWIW nothing I noticed came to the level of requiring a v2 imho. If any of the ideas I mentioned seems good, they can go in patches on top. The patch is in Junio's jch branch and I wouldn't be surprised if it hits "next" soon. Thanks again, Jonathan
Re: [PATCH 1.5/8] connect: die when a capability line comes after a ref
Brandon Williams wrote: > On 09/20, Jeff King wrote: >> (For that matter, could we just be checking whether *list is NULL?) > > True, that would probably be the better way to do this. Nice idea, thank you. That doesn't capture a few other cases of pkts that aren't supposed to come before the capabilities^{} line: * shallow * .have * capabilities^{} * invalid refnames Perhaps it should check all of those: if ((shallow_points && shallow_points->nr) || (extra_have && extra_have->nr) || got_dummy_ref_with_capabilities_declaration || got_invalid_ref || *list) What happens when another type of pkt gets introduced? This feels pretty error-prone. The underlying problem is that we are emulating a state machine that is not a simple for loop using a simple for loop, by piling up variables that keep track of the current state. That suggests one of the following approaches: A. Replace saw_response with an enum describing the state. Immediately after reading the first packet, update the state to EXPECTING_FIRST_REF. Immediately after reading the first ref, update the state to EXPECTING_SHALLOW. B. Use instruction flow to encode the state machine. Have separate loops for processing refs and shallow lines. By the way, there are some other ways the current code is less strict than described in pack-protocol.txt: - allowing an empty list-of-refs. (This is deliberate --- pack-protocol.txt's lack of documentation of this case is a bug.) - allowing multiple capability-lists - allowing capabilities^{} combined with other refs - allowing refs, shallow, and .have to be interleaved Tightening those would likely be good for the ecosystem (so that buggy servers get noticed quickly), but that's a separate topic from this change. [...] > I wasn't sure either, which is why I added the comment to prod > discussion. I agree that is is orthogonal to this series so I'll most > likely drop it, as it doesn't help with the protocol transition > discussion. I'd be happy to write a separate patch adding the NEEDSWORK comment (or even a patch doing what the NEEDSWORK comment suggests) to avoid derailing this one. :) Thanks, Jonathan
Re: [PATCH] Add t/helper/test-write-cache to .gitignore
Brandon Williams wrote: > On 08/28, Jonathan Tan wrote: >> This new binary was introduced in commit 3921a0b ("perf: add test for >> writing the index", 2017-08-21), but a .gitignore entry was not added >> for it. Add that entry. >> >> Signed-off-by: Jonathan Tan > > Looks good to me Reviewed-by: Jonathan Nieder as well. I wonder if we can automate this a little. Some thoughts: A. We could include a test that all the binaries named in TEST_PROGRAMS_NEED_X are also named in t/helper/.gitignore. That way tests would fail if this ever falls out of date again. B. Even better would be if we could have /t/helper/test-* in .gitignore. E.g. if we rename test-*.c to *.c (e.g. t/helper/ctype.c instead of t/helper/test-ctype.c), then the test helper source file names would be easier to type and maintaining this .gitignore would become a problem of the past. What do you think? Jonathan
Re: [PATCH v6 09/12] split-index: disable the fsmonitor extension when running the split index test
Hi, Ben Peart wrote: > On 9/19/2017 4:43 PM, Jonathan Nieder wrote: >> This feels to me like the wrong fix. Wouldn't it be better for the >> test not to depend on the precise object ids? See the "Tips for >> Writing Tests" section in t/README: > > I completely agree that a better fix would be to rewrite the test to > not hard code the SHA values. I'm sure this will come to bite us > again as we discuss the migration to a different SHA algorithm. nit: the kind of change I'm proposing does not entail a full rewrite. :) The SHA migration aspect is true, but that's actually the least of my worries. I intend to introduce a SHA1 test prereq that crazy tests which want to depend on the hash function can declare a dependency on. My actual worry is that tests hard-coding object ids are (1) hard to understand, as illustrated by my having no clue what these particular object ids refer to and (2) very brittle, since an object id changes whenever a timestamp or any of the history leading to an object changes. They create a trap for anyone wanting to change the test later. They are basically change detector tests, which is generally accepted to be a bad practice. > That said, I think fixing this correctly is outside the scope of > this patch series. It has been written this way since it was > created back in 2014 (and patched in 2015 to hard code the V4 index > SHA). Fair enough. > If desired, this patch can simply be dropped from the series > entirely as I doubt anyone other than me will attempt to run it with > the fsmonitor extension turned on. *shrug* My motivations in the context of the review were: * now that we noticed the problem, we have an opportunity to fix it! (i.e. a fix would not have to be part of this series and would not necessarily have to be written by you) * if we include this non-fix, the commit message really needs to say something about it. Otherwise people are likely to cargo-cult it in other contexts and make the problem worse. Thanks, Jonathan
Re: [PATCH v2] for_each_string_list_item: avoid undefined behavior for empty list
Andreas Schwab wrote: > On Sep 19 2017, Jonathan Nieder wrote: >> B. #define for_each_string_list_item(item, list) \ >> if (list->items) \ >> for (item = ...; ...; ... ) >> >>This breaks a caller like >> if (foo) >> for_each_string_list_item(item, list) >> ... >> else >> ... >> >>making it a non-starter. > > That can be fixed with a dangling else. I believe the fix you're referring to is option C, from the same email you are replying to. If not, please correct me. Thanks, Jonathan
[PATCH v2] for_each_string_list_item: avoid undefined behavior for empty list
From: Michael Haggerty If you pass a newly initialized or newly cleared `string_list` to `for_each_string_list_item()`, then the latter does for ( item = (list)->items; /* NULL */ item < (list)->items + (list)->nr; /* NULL + 0 */ ++item) Even though this probably works almost everywhere, it is undefined behavior, and it could plausibly cause highly-optimizing compilers to misbehave. C99 section 6.5.6 paragraph 8 explains: If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined. and (6.3.2.3.3) a null pointer does not point to anything. Guard the loop with a NULL check to make the intent crystal clear to even the most pedantic compiler. A suitably clever compiler could let the NULL check only run in the first iteration, but regardless, this overhead is likely to be dwarfed by the work to be done on each item. This problem was noticed by Coverity. [jn: using a NULL check instead of a placeholder empty list; fleshed out the commit message based on mailing list discussion] Signed-off-by: Michael Haggerty Signed-off-by: Jonathan Nieder --- string-list.h | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) Junio C Hamano wrote: > Jonathan Nieder writes: >> ... But a quick test with gcc 4.8.4 >> -O2 finds that at least this compiler does not contain such an >> optimization. The overhead Michael Haggerty mentioned is real. > > Still, I have a feeling that users of string_list wouldn't care > the overhead of single pointer NULL-ness check. > > - apply.c collects conflicted paths and reports them with fprintf(). > > - builtin/clean.c uses the function to walk the list of paths to be >removed, and either does a human interaction (for "-i" codepath) >or goes to the filesystem to remove things. > > - builtin/config.c uses it in get_urlmatch() in preparation for >doing network-y things. > > - builtin/describe.c walks the list of exclude and include patterns >to run wildmatch on the candidate reference name to filter it out. > > ... > > In all of these examples, what happens for each item in the loop > seems to me far heavier than the overhead this macro adds. Yes, agreed. As a small tweak, #define for_each_string_list_item(item, list) \ for (item = ...; item && ...; ...) produces nicer assembly than #define for_each_string_list_item(item, list) \ for (item = ...; list->items && ...; ...) (By the way, the potential optimization I described isn't valid: we know that when item == NULL and list->items == NULL, list->nr is always zero, but the compiler has no way to know that. So it can't eliminate the NULL test. For comparison, a suitably smart compiler should be able to eliminate a 'list->nr != 0 &&' guard if 'list' doesn't escape in the loop body.) Recapping the other proposed fixes: A. Make it an invariant of string_list that items is never NULL and update string_list_init et al to use an empty array. This is pretty painless until you notice some other structs that embed string_list without using STRING_LIST_INIT. Updating all those would be too painful. B. #define for_each_string_list_item(item, list) \ if (list->items) \ for (item = ...; ...; ... ) This breaks a caller like if (foo) for_each_string_list_item(item, list) ... else ... making it a non-starter. C. As Gábor suggested, #define for_each_string_list_item(item, list) \ if (!list->items) \ ; /* nothing to do */ \ else \ for (item = ...; ...; ...) This handles the caller from (B) correctly. But it produces compiler warnings for a caller like if (foo) for_each_string_list_item(item, list) ... There is only one instance of that construct in git today. It looks nicer anyway with braces, so this approach would also be promising. D. Eliminate for_each_string_list_item and let callers just do unsigned int i; for (i = 0; i < list->nr; i++) { struct string_list_item *item = list->items[i]; ... } Having to declare item is unnecessarily verbose, decreasing the appeal of this option. I think I like it anyway, but I wasn't able to convince coccinelle to do it. E. Use subtraction instead of addition: #define for_each_string_list_item(item, list) \ for (item = ...; \ (item == list->items ? 0 : item - list->items) < nr; \ item++) I ex
Re: [PATCH] for_each_string_list_item(): behave correctly for empty list
Hi, Kaartic Sivaraam wrote: > On Saturday 16 September 2017 09:36 AM, Michael Haggerty wrote: >> Jonathan Nieder wrote: >>> Does the following alternate fix work? I think I prefer it because >>> it doesn't require introducing a new global. [...] >>> #define for_each_string_list_item(item,list) \ >>> - for (item = (list)->items; item < (list)->items + (list)->nr; ++item) >>> + for (item = (list)->items; \ >>> +(list)->items && item < (list)->items + (list)->nr; \ >>> +++item) >> >> This is the possibility that I was referring to as "add[ing] overhead to >> each iteration of the loop". I'd rather not add an extra test-and-branch >> to every iteration of a loop in which `list->items` is *not* NULL, which >> your solution appears to do. Or are compilers routinely able to optimize >> the check out? > > I t seems at least 'gcc' is able to optimize this out even with a -O1 > and 'clang' optimizes this out with a -O2. Taking a sneak peek at > the 'Makefile' shows that our default is -O2. > > For a proof, see https://godbolt.org/g/CPt73L >From that link: for ( ;valid_int && *valid_int < 10; (*valid_int)++) { printf("Valid instance"); } Both gcc and clang are able to optimize out the 'valid_int &&' because it is dereferenced on the RHS of the &&. For comparison, 'item < (list)->items + (list)->nr' does not dereference (list)->items. So that optimization doesn't apply here. A smart compiler could be able to take advantage of there being no object pointed to by a null pointer, which means item < (list)->items + (list)->nr is always false when (list)->items is NULL, which in turn makes a '(list)->items &&' test redundant. But a quick test with gcc 4.8.4 -O2 finds that at least this compiler does not contain such an optimization. The overhead Michael Haggerty mentioned is real. Thanks and hope that helps, Jonathan
Re: [PATCH] for_each_string_list_item(): behave correctly for empty list
Hi, Junio C Hamano wrote: > Kaartic Sivaraam writes: >> On Saturday 16 September 2017 09:36 AM, Michael Haggerty wrote: >>> Jonathan Nieder wrote: >>>> Does the following alternate fix work? I think I prefer it because >>>> it doesn't require introducing a new global. [...] >>>> #define for_each_string_list_item(item,list) \ >>>> - for (item = (list)->items; item < (list)->items + (list)->nr; ++item) >>>> + for (item = (list)->items; \ >>>> + (list)->items && item < (list)->items + (list)->nr; \ >>>> + ++item) >>> >>> This is the possibility that I was referring to as "add[ing] overhead to >>> each iteration of the loop". I'd rather not add an extra test-and-branch >>> to every iteration of a loop in which `list->items` is *not* NULL, which >>> your solution appears to do. Or are compilers routinely able to optimize >>> the check out? >> >> It seems at least 'gcc' is able to optimize this out even with a -O1 >> and 'clang' optimizes this out with a -O2. Taking a sneak peek at >> the 'Makefile' shows that our default is -O2. > > But doesn't the versions of gcc and clang currently available do the > right thing with the current code without this change anyway? I've > been operating under the assumption that this is to future-proof the > code even when the compilers change to use the "NULL+0 is undefined" > as an excuse to make demons fly out of your nose, so unfortunately I > do not think it is not so huge a plus to find that the current > compilers do the right thing to the code with proposed updates. I think you and Kaartic are talking about different things. Kaartic was checking that this wouldn't introduce a performance regression (thanks!). You are concerned about whether the C standard and common practice treat the resulting code as exhibiting undefined behavior. Fortunately the C standard is pretty clear about this. The undefined behavior here is at run time, not compile time. As you suggested in an earlier reply, the 'list->items &&' effectively guards the 'list->items + list->nr' to prevent that undefined behavior. I'll send a patch with a commit message saying so to try to close out this discussion. Thanks, Jonathan
Re: [PATCH v6 09/12] split-index: disable the fsmonitor extension when running the split index test
Hi, Ben Peart wrote: > The split index test t1700-split-index.sh has hard coded SHA values for > the index. Currently it supports index V4 and V3 but assumes there are > no index extensions loaded. > > When manually forcing the fsmonitor extension to be turned on when > running the test suite, the SHA values no longer match which causes the > test to fail. > > The potential matrix of index extensions and index versions can is quite > large so instead disable the extension before attempting to run the test. Thanks for finding and diagnosing this problem. This feels to me like the wrong fix. Wouldn't it be better for the test not to depend on the precise object ids? See the "Tips for Writing Tests" section in t/README: And such drastic changes to the core GIT that even changes these otherwise supposedly stable object IDs should be accompanied by an update to t-basic.sh. However, other tests that simply rely on basic parts of the core GIT working properly should not have that level of intimate knowledge of the core GIT internals. If all the test scripts hardcoded the object IDs like t-basic.sh does, that defeats the purpose of t-basic.sh, which is to isolate that level of validation in one place. Your test also ends up needing updating when such a change to the internal happens, so do _not_ do it and leave the low level of validation to t-basic.sh. Worse, t1700-split-index.sh doesn't explain where the object ids it uses comes from so it is not even obvious to a casual reader like me how to fix it. See t/diff-lib.sh for some examples of one way to avoid depending on the object id computation. Another way that is often preferable is to come up with commands to compute the expected hash values, like $(git rev-parse HEAD^{tree}), and use those instead of hard-coded values. Thanks and hope that helps, Jonathan
Re: [PATCH v1 1/1] test-lint: echo -e (or -E) is not portable
Torsten Bögershausen wrote: > Some implementations of `echo` support the '-e' option to enable > backslash interpretation of the following string. > As an addition, they support '-E' to turn it off. nit: please wrap the commit message to a consistent line width. > However, none of these are portable, POSIX doesn't even mention them, > and many implementations don't support them. > > A check for '-n' is already done in check-non-portable-shell.pl, > extend it to cover '-n', '-e' or '-E-' > > Signed-off-by: Torsten Bögershausen > --- > t/check-non-portable-shell.pl | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) An excellent change. Thanks for noticing and fixing this. Reviewed-by: Jonathan Nieder
Re: [PATCH v6 10/12] fsmonitor: add test cases for fsmonitor extension
Ben Peart wrote: > Some stats on these same coding style errors in the current bash scripts: > > 298 instances of "[a-z]\(\).*\{" ie "function_name() {" (no space) > 140 instances of "if \[ .* \]" (not using the preferred "test") > 293 instances of "if .*; then" > > Wouldn't it be great not to have to write up style feedback for when > these all get copy/pasted into new scripts? Agreed. Care to write patches for these? :) (I think three patches, one for each issue, would do the trick.) Thanks, Jonathan
Re: [PATCH] doc: camelCase the config variables to improve readability
Hi, Kaartic Sivaraam wrote: > The config variable used weren't readable as they were in the > crude form of how git stores/uses it's config variables. nit: Git's commit messages describe the current behavior of Git in the present tense. Something like: These manpages' references to config variables are hard to read because they use all-lowercase names, without indicating where each word ends and begins in the configuration variable names. Improve readability by using camelCase instead. Git treats these names case-insensitively so this does not affect functionality. > Improve it's readability by replacing them with camelCased versions > of config variables as it doesn't have any impact on it's usage. nit: s/it's/its/ (in both places) > Signed-off-by: Kaartic Sivaraam > --- > Documentation/git-branch.txt | 4 ++-- > Documentation/git-tag.txt| 2 +- > 2 files changed, 3 insertions(+), 3 deletions(-) This also is just more consistent with the rest of the docs. FWIW, with or without the commit message tweaks mentioned above, Reviewed-by: Jonathan Nieder Thanks for your attention to detail.
Re: [PATCH] Win32: simplify loading of DLL functions
Hi, Johannes Schindelin wrote: > Dynamic loading of DLL functions is duplicated in several places in Git > for Windows' source code. > > This patch adds a pair of macros to simplify the process: the > DECLARE_PROC_ADDR(, , , > ..) macro to be used at the beginning of a > code block, and the INIT_PROC_ADDR() macro to call before > using the declared function. The return value of the INIT_PROC_ADDR() > call has to be checked; If it is NULL, the function was not found in the > specified DLL. > > Example: > > DECLARE_PROC_ADDR(kernel32.dll, BOOL, CreateHardLinkW, > LPCWSTR, LPCWSTR, LPSECURITY_ATTRIBUTES); > > if (!INIT_PROC_ADDR(CreateHardLinkW)) > return error("Could not find CreateHardLinkW() function"; > > if (!CreateHardLinkW(source, target, NULL)) > return error("could not create hardlink from %S to %S", >source, target); > return 0; nit: whitespace is a bit strange here (mixture of tabs and spaces). Could this example go near the top of the header instead? That way, it's easier for people reading the header to see how to use it. > Signed-off-by: Karsten Blees > Signed-off-by: Johannes Schindelin > --- Just curious: what was Karsten's contribution? (I ask mostly because I'm interested in kinds of collaboration git metadata is failing to capture correctly --- e.g. pair programming.) > So far, there are no users (except in Git for Windows). Ben > promised to make use of it in his fsmonitor patch series. > > compat/win32/lazyload.h | 44 > 1 file changed, 44 insertions(+) > create mode 100644 compat/win32/lazyload.h Are any of the Git for Windows users something that could go upstream along with this patch? That would help illustrate what a good caller looks like, which should help with reviewing future patches that use this code. > --- /dev/null > +++ b/compat/win32/lazyload.h > @@ -0,0 +1,44 @@ [...] > +/* Declares a function to be loaded dynamically from a DLL. */ > +#define DECLARE_PROC_ADDR(dll, rettype, function, ...) \ > + static struct proc_addr proc_addr_##function = \ > + { #dll, #function, NULL, 0 }; \ > + static rettype (WINAPI *function)(__VA_ARGS__) > + > +/* > + * Loads a function from a DLL (once-only). > + * Returns non-NULL function pointer on success. > + * Returns NULL + errno == ENOSYS on failure. > + */ > +#define INIT_PROC_ADDR(function) \ > + (function = get_proc_addr(&proc_addr_##function)) Probably worth mentioning in the doc comment that this is not thread safe, so a caller that wants to lazy-init in a threaded context is responsible for doing their own locking. > + > +static inline void *get_proc_addr(struct proc_addr *proc) > +{ > + /* only do this once */ > + if (!proc->initialized) { > + HANDLE hnd; > + proc->initialized = 1; > + hnd = LoadLibraryExA(proc->dll, NULL, > + LOAD_LIBRARY_SEARCH_SYSTEM32); > + if (hnd) > + proc->pfunction = GetProcAddress(hnd, proc->function); > + } > + /* set ENOSYS if DLL or function was not found */ > + if (!proc->pfunction) > + errno = ENOSYS; > + return proc->pfunction; > +} strerror(ENOSYS) is "Function not implemented". Cute. Reviewed-by: Jonathan Nieder Thanks, Jonathan
Re: What's cooking in git.git (Aug 2017, #05; Tue, 22)
Johannes Schindelin wrote: > What you need is a tool to aggregate this information, to help working > with it, to manage the project, and to be updated automatically. And to > publish this information continuously, without costing you extra effort. > > I understand that you started before GitHub existed, and before GitHub was > an option, the script-generated What's cooking mail was the best you could > do. On second reading, I think you're talking about GitHub's code review ("pull request") feature, not a bug tracker. There's been some active work (some that you've been involved in, I believe) on getting information from a GitHub pull request to the mailing list. One possibility would be to get information to also go in the other direction, so people have information about what happened to their patch in one place. I can also see why you are directing your attention to the maintainer for this one, since if the entire project adopts the GitHub Pull Request workflow, then the complexity and other flaws of such a two-way sync could be avoided. Unfortunately the maintainer is not the only person you'd need to convince. You'd also need to convince patch authors and reviewers to move to the new workflow. There are likely some potential contributors that this would bring in, that would like to get involved in the Git project but had been deterred by e.g. the mailing list's aggressive filtering of any email with an HTML part. Meanwhile it would also be a large adjustment for existing contributors, so it's not risk free. I personally like how Bazel[1] handles this. They have two ways to contribute: A. People used to GitHub can use a pull request like they are accustomed to. B. People preferring a more structured code review can use Gerrit. Having only two ways to contribute means that the code review information is still easy to archive and search, just like our mailing list archive. (Actually, an example I like even more is Akaros[2], which also appears to accept patch contributions by email.) Adding new ways for a contributor to submit a patch would mean that we can experiment with new workflows without getting in the way of people using an existing one. Thoughts? Jonathan [1] https://bazel.build/contributing.html [2] https://groups.google.com/forum/#!forum/akaros
Re: What's cooking in git.git (Aug 2017, #05; Tue, 22)
Hi, Johannes Schindelin wrote: > To relate that, you are using a plain text format that is not well defined > and not structured, and certainly not machine-readable, for information > that is crucial for project management. > > What you need is a tool to aggregate this information, to help working > with it, to manage the project, and to be updated automatically. And to > publish this information continuously, without costing you extra effort. > > I understand that you started before GitHub existed, and before GitHub was > an option, the script-generated What's cooking mail was the best you could > do. I think you are subtly (but not directly, for some reason?) advocating moving project management for the Git project to GitHub Issues. For what it's worth: 1. I would be happy to see Git adopt a bug tracker. As we've discussed on the list before, I suspect the only way that this is going to happen is if some contributors start using a bug tracker and keep up with bugs there, without requiring everyone to use it. That is how the Linux Kernel project started using bugzilla.kernel.org, for example. 2. GitHub Issues is one of my least favorite bug trackers, for what it's worth. If some sub-project of Git chooses to use it, then that's great and I won't get in their way. I'm just providing this single data point that approximately any other tracker (Bugzilla, JIRA, debbugs, patchwork) is something I'd be more likely to use. 3. This advice might feel hopeless, because if the maintainer is not involved in the initial pilot, then how does the bug tracker get notified when a patch has been accepted? But fortunately this is a problem other people have solved: e.g. most bug trackers have an API that can be used to automatically notify the bug when a patch with a certain subject line appears on a certain branch. Thanks and hope that helps, Jonathan
Re: [PATCH 2/2] t9010-*.sh: skip all tests if the PIPE prereq is missing
Ramsay Jones wrote: > Every test in this file, except one, is marked with the PIPE prereq. > However, that lone test ('set up svn repo'), only performs some setup > work and checks whether the following test should be executed (by > setting an additional SVNREPO prerequisite). Since the following test > also requires the PIPE prerequisite, performing the setup test, when the > PIPE preequisite is missing, is simply wasted effort. Use the skip-all > test facility to skip all tests when the PIPE prerequisite is missing. > > Signed-off-by: Ramsay Jones > --- > t/t9010-svn-fe.sh | 55 > --- > 1 file changed, 28 insertions(+), 27 deletions(-) It was always the intention that this test script eventually be able to run on Windows, but it was not meant to be. It would need to use a socket or something for backflow to work on Windows. Reviewed-by: Jonathan Nieder
Re: [PATCH 1/2] test-lib: use more compact expression in PIPE prerequisite
Ramsay Jones wrote: > Signed-off-by: Ramsay Jones > --- > t/test-lib.sh | 10 ++ > 1 file changed, 2 insertions(+), 8 deletions(-) Reviewed-by: Jonathan Nieder
Re: [PATCH 2/2] read_info_alternates: warn on non-trivial errors
Jeff King wrote: > When we fail to open $GIT_DIR/info/alternates, we silently > assume there are no alternates. This is the right thing to > do for ENOENT, but not for other errors. > > A hard error is probably overkill here. If we fail to read > an alternates file then either we'll complete our operation > anyway, or we'll fail to find some needed object. Either > way, a warning is good idea. And we already have a helper > function to handle this pattern; let's just call > warn_on_fopen_error(). I think I prefer a hard error. What kind of cases are you imagining where it would be better to warn? E.g. for EIO, erroring out so that the user can try again seems better than hoping that the application will be able to cope with the more subtle error that comes from discovering some objects are missing. For EACCES, I can see how it makes sense to warn and move on, but no other errors like that are occuring to me. Thoughts? Thanks, Jonathan > Note that technically the errno from strbuf_read_file() > might be from a read() error, not open(). But since read() > would never return ENOENT or ENOTDIR, and since it produces > a generic "unable to access" error, it's suitable for > handling errors from either.
Re: [PATCH 1/2] read_info_alternates: read contents into strbuf
Jeff King wrote: > Reported-by: Michael Haggerty > Signed-off-by: Jeff King > --- > sha1_file.c | 29 + > 1 file changed, 9 insertions(+), 20 deletions(-) Thanks for tracking it down. Reviewed-by: Jonathan Nieder [...] > +++ b/sha1_file.c [...] > @@ -427,28 +427,17 @@ static void link_alt_odb_entries(const char *alt, int > len, int sep, > > static void read_info_alternates(const char * relative_base, int depth) > { > - char *map; > - size_t mapsz; > - struct stat st; > char *path; > - int fd; > + struct strbuf buf = STRBUF_INIT; > > path = xstrfmt("%s/info/alternates", relative_base); > - fd = git_open(path); > - free(path); > - if (fd < 0) > - return; > - if (fstat(fd, &st) || (st.st_size == 0)) { > - close(fd); > + if (strbuf_read_file(&buf, path, 1024) < 0) { > + free(path); > return; strbuf_read_file is careful to release buf on failure, so this doesn't leak (but it's a bit subtle). What happened to the !st.st_size case? Is it eliminated for simplicity? The rest looks good. Thanks, Jonathan
Re: [PATCH 0/2] fix read past end of array in alternates files
Jeff King wrote: > This series fixes a regression in v2.11.1 where we might read past the > end of an mmap'd buffer. It was introduced in cf3c635210, The above information is super helpful. Can it go in one of the commit messages? > but I didn't > base the patch on there, for a few reasons: > > 1. There's a trivial conflict when merging up (because of > git_open_noatime() becoming just git_open() in the inerim). > > 2. The reproduction advice relies on our SANITIZE Makefile knob, which > didn't exist back then. > > 3. The second patch does not apply there because we don't have > warn_on_fopen_errors(). Though admittedly it could be applied > separately after merging up; it's just a clean-up on top. Even this part could go in a commit message, but it's fine for it not to. Thanks, Jonathan
Re: [PATCH] rev-parse: rev-parse: add --is-shallow-repository
Hi, Øystein Walle wrote: > Running `git fetch --unshallow` on a repo that is not in fact shallow > produces a fatal error message. Hm, can you say more about the context? From a certain point of view, it might make sense for that command to succeed instead: if the repo is already unshallow, then why should't "fetch --unshallow" complain instead of declaring victory? > Add a helper to rev-parse that scripters > can use to determine whether a repo is shallow or not. > > Signed-off-by: Øystein Walle > --- > Documentation/git-rev-parse.txt | 3 +++ > builtin/rev-parse.c | 5 + > t/t1500-rev-parse.sh| 15 +++ > 3 files changed, 23 insertions(+) Regardless, this new rev-parse --is-shallow helper looks like a good feature. [...] > --- a/builtin/rev-parse.c > +++ b/builtin/rev-parse.c > @@ -868,6 +868,11 @@ int cmd_rev_parse(int argc, const char **argv, const > char *prefix) > : "false"); > continue; > } > + if (!strcmp(arg, "--is-shallow-repository")) { > + printf("%s\n", is_repository_shallow() ? "true" > + : "false"); > + continue; > + } The implementation is straightforward and correct. [...] > --- a/t/t1500-rev-parse.sh > +++ b/t/t1500-rev-parse.sh Thanks for writing tests. \o/ > @@ -116,6 +116,21 @@ test_expect_success 'git-path inside sub-dir' ' > test_cmp expect actual > ' > > +test_expect_success 'git-path shallow repository' ' What does git-path mean here? I wonder if it's a copy/paste error. Did you mean something like test_expect_success 'rev-parse --is-shallow-repository in shallow repo' ' ? > + test_commit test_commit && > + echo true >expect && > + git clone --depth 1 --no-local . shallow && > + test_when_finished "rm -rf shallow" && > + git -C shallow rev-parse --is-shallow-repository >actual && > + test_cmp expect actual > +' > + > +test_expect_success 'git-path notshallow repository' ' Likewise: should this be test_expect_success 'rev-parse --is-shallow-repository in non-shallow repo' ' ? > + echo false >expect && > + git rev-parse --is-shallow-repository >actual && > + test_cmp expect actual > +' > + > test_expect_success 'showing the superproject correctly' ' With the two tweaks mentioned above, Reviewed-by: Jonathan Nieder Thanks.
Re: [OUTREACHY] pack: make packed_git_mru global a value instead of a pointer
Hi, phionah bugosi wrote: > Just to reecho a previous change requested before in one of the mail > threads, we currently have two global variables declared: > > struct mru packed_git_mru_storage; > struct mru *packed_git_mru = &packed_git_mru_storage; > > We normally use pointers in C to point or refer to the same location > or space in memory from multiple places. That means in situations > where we will have the pointer be assigned to in many places in our > code. But in this case, we do not have any other logic refering or > assigning to the pointer packed_git_mru. In simple words we actually > dont need a pointer in this case. > > Therefore this patch makes packed_git_mru global a value instead of > a pointer and makes use of list.h > > Signed-off-by: phionah bugosi > --- > builtin/pack-objects.c | 5 +++-- > cache.h| 7 --- > list.h | 6 ++ > packfile.c | 12 ++-- > 4 files changed, 15 insertions(+), 15 deletions(-) *puzzled* This appears to already be in "pu", with a different author. Did you independently make the same change? Or are you asking for a progress report on that change, and just phrasing it in a confusing way? Confused, Jonathan