Re: [PATCH] t4062: stop using repetition in regex
Am 08.08.2017 um 16:49 schrieb Johannes Schindelin: > Hi René, > > On Tue, 8 Aug 2017, René Scharfe wrote: > >> OpenBSD's regex library has a repetition limit (RE_DUP_MAX) of 255. >> That's the minimum acceptable value according to POSIX. In t4062 we use >> 4096 repetitions in the test "-G matches", though, causing it to fail. >> >> Do the same as the test "-S --pickaxe-regex" in the same file and search >> for a single zero instead. That still suffices to trigger the buffer >> overrun in older versions (checked with b7d36ffca02^ and --valgrind on >> Linux), simplifies the test a bit, and avoids exceeding OpenBSD's limit. > > I am afraid not. The 4096 is precisely the page size required to trigger > the bug on Windows against which this regression test tries to safeguard. Checked with b7d36ffca02^ on MinGW now as well and found that it segfaults with the proposed change ten out of ten times. You get different results? How is that possible? The search string is NUL-terminated in each case, while the point of the test is that the file contents isn't, right? > Maybe simply disable the test on OpenBSD instead? Or guard the {4096} > behind the MINGW prereq. It's easy to build a long search string with two repetitions or by using a longer string as the base, if necessary. But first we need to find out why regexec() doesn't overflow in your case. My build uses the version from compat/. Why would it stop before reaching a NUL? That sounds like a different and serious bug. Thanks, René
[PATCH] t4062: stop using repetition in regex
OpenBSD's regex library has a repetition limit (RE_DUP_MAX) of 255. That's the minimum acceptable value according to POSIX. In t4062 we use 4096 repetitions in the test "-G matches", though, causing it to fail. Do the same as the test "-S --pickaxe-regex" in the same file and search for a single zero instead. That still suffices to trigger the buffer overrun in older versions (checked with b7d36ffca02^ and --valgrind on Linux), simplifies the test a bit, and avoids exceeding OpenBSD's limit. Original-patch-by: David CoppaSigned-off-by: Rene Scharfe --- t/t4062-diff-pickaxe.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/t/t4062-diff-pickaxe.sh b/t/t4062-diff-pickaxe.sh index 7c4903f497..c16e5af6fa 100755 --- a/t/t4062-diff-pickaxe.sh +++ b/t/t4062-diff-pickaxe.sh @@ -15,7 +15,7 @@ test_expect_success setup ' git commit -m "A 4k file" ' test_expect_success '-G matches' ' - git diff --name-only -G "^0{4096}$" HEAD^ >out && + git diff --name-only -G0 HEAD^ >out && test 4096-zeroes.txt = "$(cat out)" ' -- 2.14.0
Re: [PATCH v4 4/4] add: modify already added files when --chmod is given
Am 14.09.2016 um 23:07 schrieb Thomas Gummerer: > When the chmod option was added to git add, it was hooked up to the diff > machinery, meaning that it only works when the version in the index > differs from the version on disk. > > As the option was supposed to mirror the chmod option in update-index, > which always changes the mode in the index, regardless of the status of > the file, make sure the option behaves the same way in git add. > > Signed-off-by: Thomas GummererSorry for replying almost a year late, hopefully you're still interested. > --- > builtin/add.c | 47 --- > builtin/checkout.c | 2 +- > builtin/commit.c | 2 +- > cache.h| 10 +- > read-cache.c | 14 ++ > t/t3700-add.sh | 50 ++ > 6 files changed, 91 insertions(+), 34 deletions(-) > > diff --git a/builtin/add.c b/builtin/add.c > index b1dddb4..595a0b2 100644 > --- a/builtin/add.c > +++ b/builtin/add.c > @@ -26,10 +26,25 @@ static int patch_interactive, add_interactive, > edit_interactive; > static int take_worktree_changes; > > struct update_callback_data { > - int flags, force_mode; > + int flags; > int add_errors; > }; > > +static void chmod_pathspec(struct pathspec *pathspec, int force_mode) "int force_mode" looks like a binary (or perhaps ternary) flag, but actually it is a character and can only have the values '-' or '+'. In builtin/update-index.c it's called "char flip" and we probably should define it like this here as well. > +{ > + int i; > + > + for (i = 0; i < active_nr; i++) { > + struct cache_entry *ce = active_cache[i]; > + > + if (pathspec && !ce_path_match(ce, pathspec, NULL)) > + continue; > + > + if (chmod_cache_entry(ce, force_mode) < 0) > + fprintf(stderr, "cannot chmod '%s'", ce->name); This error message is missing a newline. In builtin/update-index.c we also show the attempted change (-x or +x); perhaps we want to do that here as well. Currently chmod_cache_entry() can only fail if ce is not a regular file or it's other parameter is neither '-' nor '+'. We rule out the latter already in the argument parsing code. The former can happen if we add a symlink, either explicitly or because it's in a directory we're specified. I wonder if we even need to report anything, or under which conditions. If you have a file named dir/file and a symlink named dir/symlink then the interesting cases are: git add --chmod=.. dir/symlink git add --chmod=.. dir/file dir/symlink git add --chmod=.. dir Warning about each case may be the most cautious thing to do, but documenting that --chmod has no effect on symlinks and keeping silent might be less annoying, especially in the last case. What do you think? > @@ -342,13 +354,8 @@ int cmd_add(int argc, const char **argv, const char > *prefix) > if (!show_only && ignore_missing) > die(_("Option --ignore-missing can only be used together with > --dry-run")); > > - if (!chmod_arg) > - force_mode = 0; > - else if (!strcmp(chmod_arg, "-x")) > - force_mode = 0666; > - else if (!strcmp(chmod_arg, "+x")) > - force_mode = 0777; > - else > + if (chmod_arg && ((chmod_arg[0] != '-' && chmod_arg[0] != '+') || > + chmod_arg[1] != 'x' || chmod_arg[2])) > die(_("--chmod param '%s' must be either -x or +x"), chmod_arg); That's the argument parsing code mentioned above. The strcmp-based checks look nicer to me btw. How about this? if (chmod_arg && strcmp(chmod_arg, "-x") && strcmp(chmod_arg, "+x")) But that's just nitpicking. René
[PATCH] t3700: fix broken test under !POSIXPERM
76e368c378 (t3700: fix broken test under !SANITY) explains that the test 'git add --chmod=[+-]x changes index with already added file' can fail if xfoo3 is still present as a symlink from a previous test and deletes it with rm(1). That still leaves it present in the index, which causes the test to fail if POSIXPERM is not defined. Get rid of it by calling "git reset --hard" instead, as 76e368c378 already mentioned in passing. Signed-off-by: Rene Scharfe--- t/t3700-add.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/t/t3700-add.sh b/t/t3700-add.sh index f3a4b4a913..4111fa3b7a 100755 --- a/t/t3700-add.sh +++ b/t/t3700-add.sh @@ -355,7 +355,7 @@ test_expect_success POSIXPERM,SYMLINKS 'git add --chmod=+x with symlinks' ' ' test_expect_success 'git add --chmod=[+-]x changes index with already added file' ' - rm -f foo3 xfoo3 && + git reset --hard && echo foo >foo3 && git add foo3 && git add --chmod=+x foo3 && -- 2.14.0
[PATCH] test-path-utils: handle const parameter of basename and dirname
The parameter to basename(3) and dirname(3) traditionally had the type "char *", but on OpenBSD it's been "const char *" for years. That causes (at least) Clang to throw an incompatible-pointer-types warning for test-path-utils, where we try to pass around pointers to these functions. Avoid this warning (which is fatal in DEVELOPER mode) by ignoring the promise of OpenBSD's implementations to keep input strings unmodified and enclosing them in POSIX-compatible wrappers. Signed-off-by: Rene Scharfe--- t/helper/test-path-utils.c | 18 -- 1 file changed, 16 insertions(+), 2 deletions(-) diff --git a/t/helper/test-path-utils.c b/t/helper/test-path-utils.c index 1ebe0f750c..2b3c5092a1 100644 --- a/t/helper/test-path-utils.c +++ b/t/helper/test-path-utils.c @@ -38,6 +38,20 @@ struct test_data { const char *alternative; /* output: ... or this. */ }; +/* + * Compatibility wrappers for OpenBSD, whose basename(3) and dirname(3) + * have const parameters. + */ +static char *posix_basename(char *path) +{ + return basename(path); +} + +static char *posix_dirname(char *path) +{ + return dirname(path); +} + static int test_function(struct test_data *data, char *(*func)(char *input), const char *funcname) { @@ -251,10 +265,10 @@ int cmd_main(int argc, const char **argv) } if (argc == 2 && !strcmp(argv[1], "basename")) - return test_function(basename_data, basename, argv[1]); + return test_function(basename_data, posix_basename, argv[1]); if (argc == 2 && !strcmp(argv[1], "dirname")) - return test_function(dirname_data, dirname, argv[1]); + return test_function(dirname_data, posix_dirname, argv[1]); fprintf(stderr, "%s: unknown function name: %s\n", argv[0], argv[1] ? argv[1] : "(there was none)"); -- 2.14.0
[PATCH] t0001: skip test with restrictive permissions if getpwd(3) respects them
The sub-test "init in long base path" in t0001 checks the ability to handle long base paths with restrictive permissions (--x). On OpenBSD getcwd(3) fails in that case even for short paths. Check the two aspects separately by trying to use a long base path both with and without execute-only permissions. Only attempt the former if we know that getcwd(3) doesn't care. Original-patch-by: David Coppa <dco...@openbsd.org> Reported-by: Ævar Arnfjörð Bjarmason <ava...@gmail.com> Signed-off-by: René Scharfe <l@web.de> --- t/t0001-init.sh | 30 -- 1 file changed, 28 insertions(+), 2 deletions(-) diff --git a/t/t0001-init.sh b/t/t0001-init.sh index c4814d248f..86c1a51654 100755 --- a/t/t0001-init.sh +++ b/t/t0001-init.sh @@ -315,18 +315,44 @@ test_expect_success 'init with separate gitdir' ' test_path_is_dir realgitdir/refs ' -test_expect_success 'init in long base path' ' +test_lazy_prereq GETCWD_IGNORES_PERMS ' + base=GETCWD_TEST_BASE_DIR && + mkdir -p $base/dir && + chmod 100 $base || + error "bug in test script: cannot prepare $base" + + (cd $base/dir && /bin/pwd -P) + status=$? + + chmod 700 $base && + rm -rf $base || + error "bug in test script: cannot clean $base" + return $status +' + +check_long_base_path () { # exceed initial buffer size of strbuf_getcwd() component=123456789abcdef && test_when_finished "chmod 0700 $component; rm -rf $component" && p31=$component/$component && p127=$p31/$p31/$p31/$p31 && mkdir -p $p127 && - chmod 0111 $component && + if test $# = 1 + then + chmod $1 $component + fi && ( cd $p127 && git init newdir ) +} + +test_expect_success 'init in long base path' ' + check_long_base_path +' + +test_expect_success GETCWD_IGNORES_PERMS 'init in long restricted base path' ' + check_long_base_path 0111 ' test_expect_success 're-init on .git file' ' -- 2.14.0
Re: [PATCH] tests: don't give unportable ">" to "test" built-in, use -gt
Am 07.08.2017 um 03:18 schrieb brian m. carlson: > On Sun, Aug 06, 2017 at 11:38:50PM +, Ævar Arnfjörð Bjarmason wrote: >> Change an argument to test_line_count (which'll ultimately be turned >> into a "test" expression) to use "-gt" instead of ">" for an >> arithmetic test. >> >> This broken on e.g. OpenBSD as of v2.13.0 with my commit >> ac3f5a3468 ("ref-filter: add --no-contains option to >> tag/branch/for-each-ref", 2017-03-24). >> >> Upstream just worked around it by patching git and didn't tell us >> about it, I discovered this when reading various Git packaging >> implementations: https://github.com/openbsd/ports/commit/7e48bf88a20 >> >> Signed-off-by: Ævar Arnfjörð Bjarmason>> --- >> >> David, it would be great to get a quick bug report to >> git@vger.kernel.org if you end up having to monkeypatch something >> we've done. We won't bite, promise :) >> >> As shown in that linked Github commit OpenBSD has another recent >> workaround in turning on DIR_HAS_BSD_GROUP_SEMANTICS and skipping a >> related test, maybe René can make more sense of that? > > I've confirmed using the NetBSD 7.1 man pages that NetBSD will also want > DIR_HAS_BSD_GROUP_SEMANTICS. MirBSD will also, according to its man > pages. > > As I understand it, the only consequence of not setting this flag on BSD > systems is that some directories will be setgid, which, while ugly and > useless, should have no negative effect. Right; specifically it's for newly created subdirectories of shared directories, which we want to be owned by the same group as their parent. That's the default on BSDs, and we have to set the setgid bit to turn on that semantic only on other systems, e.g. on Linux. 81a24b52c1 (Do not use GUID on dir in git init --shared=all on FreeBSD) introduced DIR_HAS_BSD_GROUP_SEMANTICS with the rationale that setting setgid on directories may not even be allowed for normal users. I can't reproduce this (t1301 succeeds for me on FreeBSD 10.3 even without that build flag), but apparently at least in some configurations it's not just a cosmetic issue. The skipped test 'init in long base path' in t0001 is a different kettle of fish. getcwd(3) on OpenBSD respects permissions on the parent directories up to root. E.g. after "chmod 711 /home" normal users would get EACCES when they'd call getcwd(3) in their homes there, while e.g. on Linux and FreeBSD they'd successfully get their current working dir. René
Re: Fwd: New Defects reported by Coverity Scan for git
Am 18.07.2017 um 19:23 schrieb Junio C Hamano: > Stefan Bellerwrites: > >> I looked at this report for a while. My current understanding: >> * its detection was triggered by including rs/move-array, >>f331ab9d4c (use MOVE_ARRAY, 2017-07-15) >> * But it is harmless, because the scan logic does not understand >>how ALLOC_GROW works. It assumes that >>done_pbase_paths_alloc can be larger >>than done_pbase_paths_num + 1, while done_pbase_paths >>is NULL, such that the memory allocation is not triggered. >>If that were the case, then we have 2 subsequent dereferences >>of a NULL pointer right after that. But by inspecting the use >>of _alloc and _num the initial assumption does not seem possible. > > Yes, it does appear that way. ALLOC_GROW() calls REALLOC_ARRAY() > which safely can realloc NULL to specified size via xrealloc(). MOVE_ARRAY is passing its pointer arguments to memmove(); all it adds is a check for (done_pbase_paths_num - pos - 1) being zero. I don't understand how that change can make it more likely for one of the pointers to be NULL. I guess the first message ('Comparing "done_pbase_paths" to null implies that "done_pbase_paths" might be null.') has to be understood as an explanation of how the checker arrived at the second one? We could remove that NULL check -- it's effectively just a shortcut. But how would that improve safety? Well, if the array is unallocated (NULL) and _num is greater than zero we'd get a segfault without it, and thus would notice it. That check currently papers over such a hypothetical bug. Makes sense? -- >8 -- Subject: [PATCH] pack-objects: remove unnecessary NULL check If done_pbase_paths is NULL then done_pbase_paths_num must be zero and done_pbase_path_pos() returns -1 without accessing the array, so the check is not necessary. If the invariant was violated then the check would make sure we keep on going and allocate the necessary amount of memory in the next ALLOC_GROW call. That sounds nice, but all array entries except for one would contain garbage data. If the invariant was violated without the check we'd get a segfault in done_pbase_path_pos(), i.e. an observable crash, alerting us of the presence of a bug. Currently there is no such bug: Only the functions check_pbase_path() and cleanup_preferred_base() change pointer and counter, and both make sure to keep them in sync. Get rid of the check anyway to allow us to see if later changes introduce such a defect, and to simplify the code. Detected by Coverity Scan. Signed-off-by: Rene Scharfe --- builtin/pack-objects.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index e730b415bf..c753e9237a 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -1289,7 +1289,7 @@ static int done_pbase_path_pos(unsigned hash) static int check_pbase_path(unsigned hash) { - int pos = (!done_pbase_paths) ? -1 : done_pbase_path_pos(hash); + int pos = done_pbase_path_pos(hash); if (0 <= pos) return 1; pos = -pos - 1; -- 2.13.3
Re: [PATCH] dir: support platforms that require aligned reads
Am 16.07.2017 um 16:04 schrieb Jeff King: > On Sun, Jul 16, 2017 at 02:17:37PM +0200, René Scharfe wrote: > >> -static void stat_data_from_disk(struct stat_data *to, const struct >> stat_data *from) >> +static void stat_data_from_disk(struct stat_data *to, const unsigned char >> *data) >> { >> -to->sd_ctime.sec = get_be32(>sd_ctime.sec); >> -to->sd_ctime.nsec = get_be32(>sd_ctime.nsec); >> -to->sd_mtime.sec = get_be32(>sd_mtime.sec); >> -to->sd_mtime.nsec = get_be32(>sd_mtime.nsec); >> -to->sd_dev= get_be32(>sd_dev); >> -to->sd_ino= get_be32(>sd_ino); >> -to->sd_uid= get_be32(>sd_uid); >> -to->sd_gid= get_be32(>sd_gid); >> -to->sd_size = get_be32(>sd_size); >> +memcpy(to, data, sizeof(*to)); >> +to->sd_ctime.sec = ntohl(to->sd_ctime.sec); >> +to->sd_ctime.nsec = ntohl(to->sd_ctime.nsec); >> +to->sd_mtime.sec = ntohl(to->sd_mtime.sec); >> +to->sd_mtime.nsec = ntohl(to->sd_mtime.nsec); >> +to->sd_dev= ntohl(to->sd_dev); >> +to->sd_ino= ntohl(to->sd_ino); >> +to->sd_uid= ntohl(to->sd_uid); >> +to->sd_gid= ntohl(to->sd_gid); >> +to->sd_size = ntohl(to->sd_size); >> } > > Hmm. I would have written this to pull the bytes directly out of the > array, like: > >to->sd_ctime.sec = get_be32(data); data += 4; >to->sd_ctime.nsec = get_be32(data); data += 4; > > etc. Or even a helper to do the advancing like: > >to->sd_ctime.sec = parse_be32(); > > That reduces assumptions about padding in "struct stat_data". But > looking more at this code, and reading your comment: > >> Side note: The OS name is not enough for determining the layout of >> struct ondisk_untracked_cache. Different platforms can have different >> int sizes and padding. Adding the machine type could help, but that >> would be a breaking change. At that point we would be better off >> defining a machine-independent format, no? > > it looks like assumptions about struct layout are pervasive and part of > the on-disk format. Yuck. :( Assuming that there is no padding probably even works for the platforms the code currently supports (basically x86), but I don't know about others. We'd need to change the writing side as well to match, though. Which is probably a good idea, but I tried to keep the patch small and its impact low. Cross-machine usability is currently explicitly not supported -- not sure why, though. René
[PATCH] dir: support platforms that require aligned reads
The untracked cache is stored on disk by concatenating its memory structures without any padding. Consequently some of the structs are not aligned at a particular boundary when the whole extension is read back in one go. That's only OK on platforms without strict alignment requirements, or for byte-aligned data like strings or hash values. Decode struct ondisk_untracked_cache carefully from the extension blob by using explicit pointer arithmetic with offsets, avoiding alignment issues. Use char pointers for passing stat_data objects to stat_data_from_disk(), and use memcpy(3) in that function to get the contents into a properly aligned struct, then perform the byte-order adjustment in place there. Found with Clang's UBSan. Signed-off-by: Rene Scharfe--- Side note: The OS name is not enough for determining the layout of struct ondisk_untracked_cache. Different platforms can have different int sizes and padding. Adding the machine type could help, but that would be a breaking change. At that point we would be better off defining a machine-independent format, no? dir.c | 50 +++--- 1 file changed, 27 insertions(+), 23 deletions(-) diff --git a/dir.c b/dir.c index ae6f5c9636..1c55dc3e36 100644 --- a/dir.c +++ b/dir.c @@ -2398,7 +2398,8 @@ struct ondisk_untracked_cache { char exclude_per_dir[FLEX_ARRAY]; }; -#define ouc_size(len) (offsetof(struct ondisk_untracked_cache, exclude_per_dir) + len + 1) +#define ouc_offset(x) offsetof(struct ondisk_untracked_cache, x) +#define ouc_size(len) (ouc_offset(exclude_per_dir) + len + 1) struct write_data { int index; /* number of written untracked_cache_dir */ @@ -2560,17 +2561,18 @@ struct read_data { const unsigned char *end; }; -static void stat_data_from_disk(struct stat_data *to, const struct stat_data *from) +static void stat_data_from_disk(struct stat_data *to, const unsigned char *data) { - to->sd_ctime.sec = get_be32(>sd_ctime.sec); - to->sd_ctime.nsec = get_be32(>sd_ctime.nsec); - to->sd_mtime.sec = get_be32(>sd_mtime.sec); - to->sd_mtime.nsec = get_be32(>sd_mtime.nsec); - to->sd_dev= get_be32(>sd_dev); - to->sd_ino= get_be32(>sd_ino); - to->sd_uid= get_be32(>sd_uid); - to->sd_gid= get_be32(>sd_gid); - to->sd_size = get_be32(>sd_size); + memcpy(to, data, sizeof(*to)); + to->sd_ctime.sec = ntohl(to->sd_ctime.sec); + to->sd_ctime.nsec = ntohl(to->sd_ctime.nsec); + to->sd_mtime.sec = ntohl(to->sd_mtime.sec); + to->sd_mtime.nsec = ntohl(to->sd_mtime.nsec); + to->sd_dev= ntohl(to->sd_dev); + to->sd_ino= ntohl(to->sd_ino); + to->sd_uid= ntohl(to->sd_uid); + to->sd_gid= ntohl(to->sd_gid); + to->sd_size = ntohl(to->sd_size); } static int read_one_dir(struct untracked_cache_dir **untracked_, @@ -2645,7 +2647,7 @@ static void read_stat(size_t pos, void *cb) rd->data = rd->end + 1; return; } - stat_data_from_disk(>stat_data, (struct stat_data *)rd->data); + stat_data_from_disk(>stat_data, rd->data); rd->data += sizeof(struct stat_data); ud->valid = 1; } @@ -2663,22 +2665,22 @@ static void read_sha1(size_t pos, void *cb) } static void load_sha1_stat(struct sha1_stat *sha1_stat, - const struct stat_data *stat, + const unsigned char *data, const unsigned char *sha1) { - stat_data_from_disk(_stat->stat, stat); + stat_data_from_disk(_stat->stat, data); hashcpy(sha1_stat->sha1, sha1); sha1_stat->valid = 1; } struct untracked_cache *read_untracked_extension(const void *data, unsigned long sz) { - const struct ondisk_untracked_cache *ouc; struct untracked_cache *uc; struct read_data rd; const unsigned char *next = data, *end = (const unsigned char *)data + sz; const char *ident; int ident_len, len; + const char *exclude_per_dir; if (sz <= 1 || end[-1] != '\0') return NULL; @@ -2690,21 +2692,23 @@ struct untracked_cache *read_untracked_extension(const void *data, unsigned long ident = (const char *)next; next += ident_len; - ouc = (const struct ondisk_untracked_cache *)next; if (next + ouc_size(0) > end) return NULL; uc = xcalloc(1, sizeof(*uc)); strbuf_init(>ident, ident_len); strbuf_add(>ident, ident, ident_len); - load_sha1_stat(>ss_info_exclude, >info_exclude_stat, - ouc->info_exclude_sha1); - load_sha1_stat(>ss_excludes_file, >excludes_file_stat, - ouc->excludes_file_sha1); - uc->dir_flags = get_be32(>dir_flags); - uc->exclude_per_dir = xstrdup(ouc->exclude_per_dir); +
Re: [PATCH] ls-files: don't try to prune an empty index
Am 16.07.2017 um 13:15 schrieb René Scharfe: Am 16.07.2017 um 13:08 schrieb Jeff King: On Sun, Jul 16, 2017 at 01:06:45PM +0200, René Scharfe wrote: diff --git a/builtin/ls-files.c b/builtin/ls-files.c index b8514a0029..adf572da68 100644 --- a/builtin/ls-files.c +++ b/builtin/ls-files.c @@ -362,7 +362,7 @@ static void prune_index(struct index_state *istate, int pos; unsigned int first, last; -if (!prefix) +if (!prefix || !istate->cache_nr) return; pos = index_name_pos(istate, prefix, prefixlen); if (pos < 0) "git am" complained that this does not apply to its blobs. Did you hand-edit? I didn't, but perhaps I messed up the order of patches? MOVE_ARRAY patch 2 touches the same file, but I wouldn't expect the two changes to conflict. So not sure what's going on. For some reason there's an extra space before the tab on each of the context lines. MUA issue or cut-and-paste, maybe? That's possible. Will resend. ... pressed Send to fast. Thanks for reporting the broken patch! I use the extension Toggle Word Wrap with Thunderbird, but it wraps by default with no way to change that, so I forgot toggling this time. Grr, I've had enough of this! Went past the warranty warning and set mailnews.wraplength=0 now. René
Re: [PATCH] ls-files: don't try to prune an empty index
Am 16.07.2017 um 13:08 schrieb Jeff King: On Sun, Jul 16, 2017 at 01:06:45PM +0200, René Scharfe wrote: diff --git a/builtin/ls-files.c b/builtin/ls-files.c index b8514a0029..adf572da68 100644 --- a/builtin/ls-files.c +++ b/builtin/ls-files.c @@ -362,7 +362,7 @@ static void prune_index(struct index_state *istate, int pos; unsigned int first, last; - if (!prefix) + if (!prefix || !istate->cache_nr) return; pos = index_name_pos(istate, prefix, prefixlen); if (pos < 0) "git am" complained that this does not apply to its blobs. Did you hand-edit? I didn't, but perhaps I messed up the order of patches? MOVE_ARRAY patch 2 touches the same file, but I wouldn't expect the two changes to conflict. So not sure what's going on. For some reason there's an extra space before the tab on each of the context lines. MUA issue or cut-and-paste, maybe? That's possible. Will resend.
[PATCH (resend)] ls-files: don't try to prune an empty index
Exit early when asked to prune an index that contains no entries to begin with. This avoids pointer arithmetic on istate->cache, which is possibly NULL in that case. Found with Clang's UBSan. Signed-off-by: Rene Scharfe--- Messed up whitespace when sending this the first time. builtin/ls-files.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/builtin/ls-files.c b/builtin/ls-files.c index b8514a0029..adf572da68 100644 --- a/builtin/ls-files.c +++ b/builtin/ls-files.c @@ -362,7 +362,7 @@ static void prune_index(struct index_state *istate, int pos; unsigned int first, last; - if (!prefix) + if (!prefix || !istate->cache_nr) return; pos = index_name_pos(istate, prefix, prefixlen); if (pos < 0) -- 2.13.3
Re: [PATCH] t: handle EOF in test_copy_bytes()
Am 16.07.2017 um 12:47 schrieb Jeff King: On Sun, Jul 16, 2017 at 06:45:32AM -0400, Jeff King wrote: I was playing with SANITIZE=undefined after René's patches to see how far we had left to go. I forgot to turn off sha1dc, which causes most programs to die due to the unaligned loads. That means git-archive in t5000 generates no output, triggering the bug. :) And I was pleased to see that after setting OPENSSL_SHA1, the answer to "how far" is "we are there". Yay. True with GCC, but not with Clang 3.9. A patch for alignment issues in dir.c is coming up.. René
Re: [PATCH] ls-files: don't try to prune an empty index
Am 16.07.2017 um 12:41 schrieb Jeff King: On Sat, Jul 15, 2017 at 10:11:20PM +0200, René Scharfe wrote: Exit early when asked to prune an index that contains no entries to begin with. This avoids pointer arithmetic on istate->cache, which is possibly NULL in that case. Found with Clang's UBSan. Makes sense. We could use MOVE_ARRAY() here, but this is a sensible short-circuit to the whole function. Looks like a good solution. diff --git a/builtin/ls-files.c b/builtin/ls-files.c index b8514a0029..adf572da68 100644 --- a/builtin/ls-files.c +++ b/builtin/ls-files.c @@ -362,7 +362,7 @@ static void prune_index(struct index_state *istate, int pos; unsigned int first, last; - if (!prefix) + if (!prefix || !istate->cache_nr) return; pos = index_name_pos(istate, prefix, prefixlen); if (pos < 0) "git am" complained that this does not apply to its blobs. Did you hand-edit? I didn't, but perhaps I messed up the order of patches? MOVE_ARRAY patch 2 touches the same file, but I wouldn't expect the two changes to conflict. So not sure what's going on. René
Re: [PATCH 5/5] Makefile: disable unaligned loads with UBSan
Am 16.07.2017 um 12:17 schrieb Jeff King: On Sat, Jul 15, 2017 at 07:18:56PM +0200, René Scharfe wrote: -- >8 -- Subject: [PATCH] Makefile: allow combining UBSan with other sanitizers Multiple sanitizers can be specified as a comma-separated list. Set the flag NO_UNALIGNED_LOADS even if UndefinedBehaviorSanitizer is not the only sanitizer to build with. Nice, I didn't know you could do comma-separation here. ;) I always just built the various sanitizers separately since I was testing whether each one worked. But if we can get UBSan to build cleanly, I agree that "make SANITIZE=address,undefined test" would be a nice thing to run periodically. There are *some* restrictions: You can only use one of address, memory and thread, as mentioned here: http://clang.llvm.org/docs/UsersManual.html#controlling-code-generation Combining ASan and UBSan works fine. René
Re: [PATCH 3/2] apply: use COPY_ARRAY and MOVE_ARRAY in update_image()
Am 16.07.2017 um 02:31 schrieb Ramsay Jones: On 15/07/17 21:20, René Scharfe wrote: Simplify the code by using the helper macros COPY_ARRAY and MOVE_ARRAY, which also makes them more robust in the case we copy or move no lines, as they allow using NULL points in that case, while memcpy(3) and memmove(3) don't. Found with Clang's UBSan. Signed-off-by: Rene Scharfe <l@web.de> --- I don't know why the rules in contrib/coccinelle/array.cocci didn't match. :-? apply.c | 11 --- 1 file changed, 4 insertions(+), 7 deletions(-) diff --git a/apply.c b/apply.c index f2d599141d..40707ca50c 100644 --- a/apply.c +++ b/apply.c @@ -2809,13 +2809,10 @@ static void update_image(struct apply_state *state, img->line_allocated = img->line; } if (preimage_limit != postimage->nr) - memmove(img->line + applied_pos + postimage->nr, - img->line + applied_pos + preimage_limit, - (img->nr - (applied_pos + preimage_limit)) * - sizeof(*img->line)); - memcpy(img->line + applied_pos, - postimage->line, - postimage->nr * sizeof(*img->line)); + MOVE_ARRAY(img->line + applied_pos + postimage->nr, + img->line + applied_pos + preimage_limit, + img->nr - (applied_pos + preimage_limit)); + COPY_ARRAY(img->line + applied_pos, postimage->line, postimage->nr); My patch looks like: - memcpy(img->line + applied_pos, - postimage->line, - postimage->nr * sizeof(*img->line)); + if (postimage->line && postimage->nr) + memcpy(img->line + applied_pos, + postimage->line, + postimage->nr * sizeof(*img->line)); ... which I think I prefer (slightly). Can postimage->line be NULL when postimage->nr is bigger than 0? What would that mean? The only ways to arrive at that point that I an come up with are bugs (we accidentally set ->line to NULL, or we forgot to clean ->line). We'd better notice them early by getting a nice shrieking segfault. Adding an assert would work as well. René
Re: [PATCH] ls-files: don't try to prune an empty index
Am 16.07.2017 um 02:28 schrieb Ramsay Jones: On 15/07/17 21:11, René Scharfe wrote: Exit early when asked to prune an index that contains no entries to begin with. This avoids pointer arithmetic on istate->cache, which is possibly NULL in that case. Found with Clang's UBSan. Signed-off-by: Rene Scharfe <l@web.de> --- builtin/ls-files.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/builtin/ls-files.c b/builtin/ls-files.c index b8514a0029..adf572da68 100644 --- a/builtin/ls-files.c +++ b/builtin/ls-files.c @@ -362,7 +362,7 @@ static void prune_index(struct index_state *istate, int pos; unsigned int first, last; -if (!prefix) +if (!prefix || !istate->cache_nr) return; pos = index_name_pos(istate, prefix, prefixlen); if (pos < 0) My patch looked like: - if (!prefix) + if (!prefix || !istate->cache || istate->cache_nr == 0) ... which is probably a bit 'belt-n-braces'. ;-) Not checking for !istate->cache at this point is a good thing, I think. If we have entries, then ->cache must not be NULL, and if it is we'd get a segfault, notifying us that we have a bug. We could add an assert to state this requirement explicitly, but that would be the topic of a different patch. René
[PATCH 3/2] apply: use COPY_ARRAY and MOVE_ARRAY in update_image()
Simplify the code by using the helper macros COPY_ARRAY and MOVE_ARRAY, which also makes them more robust in the case we copy or move no lines, as they allow using NULL points in that case, while memcpy(3) and memmove(3) don't. Found with Clang's UBSan. Signed-off-by: Rene Scharfe--- I don't know why the rules in contrib/coccinelle/array.cocci didn't match. :-? apply.c | 11 --- 1 file changed, 4 insertions(+), 7 deletions(-) diff --git a/apply.c b/apply.c index f2d599141d..40707ca50c 100644 --- a/apply.c +++ b/apply.c @@ -2809,13 +2809,10 @@ static void update_image(struct apply_state *state, img->line_allocated = img->line; } if (preimage_limit != postimage->nr) - memmove(img->line + applied_pos + postimage->nr, - img->line + applied_pos + preimage_limit, - (img->nr - (applied_pos + preimage_limit)) * - sizeof(*img->line)); - memcpy(img->line + applied_pos, - postimage->line, - postimage->nr * sizeof(*img->line)); + MOVE_ARRAY(img->line + applied_pos + postimage->nr, + img->line + applied_pos + preimage_limit, + img->nr - (applied_pos + preimage_limit)); + COPY_ARRAY(img->line + applied_pos, postimage->line, postimage->nr); if (!state->allow_overlap) for (i = 0; i < postimage->nr; i++) img->line[applied_pos + i].flag |= LINE_PATCHED; -- 2.13.3
[PATCH] ls-files: don't try to prune an empty index
Exit early when asked to prune an index that contains no entries to begin with. This avoids pointer arithmetic on istate->cache, which is possibly NULL in that case. Found with Clang's UBSan. Signed-off-by: Rene Scharfe--- builtin/ls-files.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/builtin/ls-files.c b/builtin/ls-files.c index b8514a0029..adf572da68 100644 --- a/builtin/ls-files.c +++ b/builtin/ls-files.c @@ -362,7 +362,7 @@ static void prune_index(struct index_state *istate, int pos; unsigned int first, last; - if (!prefix) + if (!prefix || !istate->cache_nr) return; pos = index_name_pos(istate, prefix, prefixlen); if (pos < 0) -- 2.13.3
[PATCH 2/2] use MOVE_ARRAY
Simplify the code for moving members inside of an array and make it more robust by using the helper macro MOVE_ARRAY. It calculates the size based on the specified number of elements for us and supports NULL pointers when that number is zero. Raw memmove(3) calls with NULL can cause the compiler to (over-eagerly) optimize out later NULL checks. This patch was generated with contrib/coccinelle/array.cocci and spatch (Coccinelle). Signed-off-by: Rene Scharfe--- Downside: The one in builtin/ls-files.c papers over a report by UBSan for the case of istate->cache being NULL. Technically we still try to add pos to NULL if pruning an empty index, but we won't see it anymore. Will send a separate patch. builtin/ls-files.c | 3 +-- builtin/merge.c| 2 +- builtin/pack-objects.c | 5 ++--- cache-tree.c | 5 ++--- commit.c | 5 ++--- notes-merge.c | 3 +-- read-cache.c | 5 ++--- reflog-walk.c | 7 +++ string-list.c | 8 +++- 9 files changed, 17 insertions(+), 26 deletions(-) diff --git a/builtin/ls-files.c b/builtin/ls-files.c index b8514a0029..dc4a6aa3d9 100644 --- a/builtin/ls-files.c +++ b/builtin/ls-files.c @@ -378,8 +378,7 @@ static void prune_index(struct index_state *istate, } last = next; } - memmove(istate->cache, istate->cache + pos, - (last - pos) * sizeof(struct cache_entry *)); + MOVE_ARRAY(istate->cache, istate->cache + pos, last - pos); istate->cache_nr = last - pos; } diff --git a/builtin/merge.c b/builtin/merge.c index 900bafdb45..d5797b8fe7 100644 --- a/builtin/merge.c +++ b/builtin/merge.c @@ -537,7 +537,7 @@ static void parse_branch_merge_options(char *bmo) die(_("Bad branch.%s.mergeoptions string: %s"), branch, split_cmdline_strerror(argc)); REALLOC_ARRAY(argv, argc + 2); - memmove(argv + 1, argv, sizeof(*argv) * (argc + 1)); + MOVE_ARRAY(argv + 1, argv, argc + 1); argc++; argv[0] = "branch.*.mergeoptions"; parse_options(argc, argv, NULL, builtin_merge_options, diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index f4a8441fe9..e730b415bf 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -1298,9 +1298,8 @@ static int check_pbase_path(unsigned hash) done_pbase_paths_alloc); done_pbase_paths_num++; if (pos < done_pbase_paths_num) - memmove(done_pbase_paths + pos + 1, - done_pbase_paths + pos, - (done_pbase_paths_num - pos - 1) * sizeof(unsigned)); + MOVE_ARRAY(done_pbase_paths + pos + 1, done_pbase_paths + pos, + done_pbase_paths_num - pos - 1); done_pbase_paths[pos] = hash; return 0; } diff --git a/cache-tree.c b/cache-tree.c index ec23d8c03d..2440d1dc89 100644 --- a/cache-tree.c +++ b/cache-tree.c @@ -131,9 +131,8 @@ static int do_invalidate_path(struct cache_tree *it, const char *path) * move 4 and 5 up one place (2 entries) * 2 = 6 - 3 - 1 = subtree_nr - pos - 1 */ - memmove(it->down+pos, it->down+pos+1, - sizeof(struct cache_tree_sub *) * - (it->subtree_nr - pos - 1)); + MOVE_ARRAY(it->down + pos, it->down + pos + 1, + it->subtree_nr - pos - 1); it->subtree_nr--; } return 1; diff --git a/commit.c b/commit.c index cbfd689939..d3150d6270 100644 --- a/commit.c +++ b/commit.c @@ -223,9 +223,8 @@ int unregister_shallow(const struct object_id *oid) if (pos < 0) return -1; if (pos + 1 < commit_graft_nr) - memmove(commit_graft + pos, commit_graft + pos + 1, - sizeof(struct commit_graft *) - * (commit_graft_nr - pos - 1)); + MOVE_ARRAY(commit_graft + pos, commit_graft + pos + 1, + commit_graft_nr - pos - 1); commit_graft_nr--; return 0; } diff --git a/notes-merge.c b/notes-merge.c index 70e3fbeefb..c12b354f10 100644 --- a/notes-merge.c +++ b/notes-merge.c @@ -99,8 +99,7 @@ static struct notes_merge_pair *find_notes_merge_pair_pos( else { *occupied = 0; if (insert_new && i < len) { - memmove(list + i + 1, list + i, - (len - i) * sizeof(struct notes_merge_pair)); + MOVE_ARRAY(list + i + 1, list + i, len - i); memset(list + i, 0, sizeof(struct notes_merge_pair)); } } diff --git a/read-cache.c b/read-cache.c index 2121b6e7bb..acfb028f48 100644 --- a/read-cache.c +++ b/read-cache.c @@ -515,9
[PATCH 1/2] add MOVE_ARRAY
Similar to COPY_ARRAY (introduced in 60566cbb58), add a safe and convenient helper for moving potentially overlapping ranges of array entries. It infers the element size, multiplies automatically and safely to get the size in bytes, does a basic type safety check by comparing element sizes and unlike memmove(3) it supports NULL pointers iff 0 elements are to be moved. Also add a semantic patch to demonstrate the helper's intended usage. Signed-off-by: Rene Scharfe--- contrib/coccinelle/array.cocci | 17 + git-compat-util.h | 8 2 files changed, 25 insertions(+) diff --git a/contrib/coccinelle/array.cocci b/contrib/coccinelle/array.cocci index 4ba98b7eaf..c61d1ca8dc 100644 --- a/contrib/coccinelle/array.cocci +++ b/contrib/coccinelle/array.cocci @@ -27,6 +27,23 @@ expression n; @@ type T; +T *dst; +T *src; +expression n; +@@ +( +- memmove(dst, src, (n) * sizeof(*dst)); ++ MOVE_ARRAY(dst, src, n); +| +- memmove(dst, src, (n) * sizeof(*src)); ++ MOVE_ARRAY(dst, src, n); +| +- memmove(dst, src, (n) * sizeof(T)); ++ MOVE_ARRAY(dst, src, n); +) + +@@ +type T; T *ptr; expression n; @@ diff --git a/git-compat-util.h b/git-compat-util.h index 047172d173..159f82154a 100644 --- a/git-compat-util.h +++ b/git-compat-util.h @@ -825,6 +825,14 @@ static inline void copy_array(void *dst, const void *src, size_t n, size_t size) memcpy(dst, src, st_mult(size, n)); } +#define MOVE_ARRAY(dst, src, n) move_array((dst), (src), (n), sizeof(*(dst)) + \ + BUILD_ASSERT_OR_ZERO(sizeof(*(dst)) == sizeof(*(src +static inline void move_array(void *dst, const void *src, size_t n, size_t size) +{ + if (n) + memmove(dst, src, st_mult(size, n)); +} + /* * These functions help you allocate structs with flex arrays, and copy * the data directly into the array. For example, if you had: -- 2.13.3
[PATCH 2/1] bswap: convert get_be16, get_be32 and put_be32 to inline functions
Simplify the implementation and allow callers to use expressions with side-effects by turning the macros get_be16, get_be32 and put_be32 into inline functions. Signed-off-by: Rene Scharfe--- All these redundant casts started to bother me, so I tried to come up with nice and clean inline functions. Successfully? You tell me. They are longer, but less cluttered. Would it punish -O0 builds? Is it all worth it? compat/bswap.h | 38 -- 1 file changed, 24 insertions(+), 14 deletions(-) diff --git a/compat/bswap.h b/compat/bswap.h index 4582c1107a..7d063e9e40 100644 --- a/compat/bswap.h +++ b/compat/bswap.h @@ -162,19 +162,29 @@ static inline uint64_t git_bswap64(uint64_t x) #else -#define get_be16(p)( \ - (*((unsigned char *)(p) + 0) << 8) | \ - (*((unsigned char *)(p) + 1) << 0) ) -#define get_be32(p)( \ - ((uint32_t)*((unsigned char *)(p) + 0) << 24) | \ - ((uint32_t)*((unsigned char *)(p) + 1) << 16) | \ - ((uint32_t)*((unsigned char *)(p) + 2) << 8) | \ - ((uint32_t)*((unsigned char *)(p) + 3) << 0) ) -#define put_be32(p, v) do { \ - unsigned int __v = (v); \ - *((unsigned char *)(p) + 0) = __v >> 24; \ - *((unsigned char *)(p) + 1) = __v >> 16; \ - *((unsigned char *)(p) + 2) = __v >> 8; \ - *((unsigned char *)(p) + 3) = __v >> 0; } while (0) +static inline uint16_t get_be16(const void *ptr) +{ + const unsigned char *p = ptr; + return (uint16_t)p[0] << 8 | + (uint16_t)p[1] << 0; +} + +static inline uint32_t get_be32(const void *ptr) +{ + const unsigned char *p = ptr; + return (uint32_t)p[0] << 24 | + (uint32_t)p[1] << 16 | + (uint32_t)p[2] << 8 | + (uint32_t)p[3] << 0; +} + +static inline void put_be32(void *ptr, uint32_t value) +{ + unsigned char *p = ptr; + p[0] = value >> 24; + p[1] = value >> 16; + p[2] = value >> 8; + p[3] = value >> 0; +} #endif -- 2.13.3
[PATCH] bswap: convert to unsigned before shifting in get_be32
The pointer p is dereferenced and we get an unsigned char. Before shifting it's automatically promoted to int. Left-shifting a signed 32-bit value bigger than 127 by 24 places is undefined. Explicitly convert to a 32-bit unsigned type to avoid undefined behaviour if the highest bit is set. Found with Clang's UBSan. Signed-off-by: Rene Scharfe--- compat/bswap.h | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/compat/bswap.h b/compat/bswap.h index d47c003544..4582c1107a 100644 --- a/compat/bswap.h +++ b/compat/bswap.h @@ -166,10 +166,10 @@ static inline uint64_t git_bswap64(uint64_t x) (*((unsigned char *)(p) + 0) << 8) | \ (*((unsigned char *)(p) + 1) << 0) ) #define get_be32(p)( \ - (*((unsigned char *)(p) + 0) << 24) | \ - (*((unsigned char *)(p) + 1) << 16) | \ - (*((unsigned char *)(p) + 2) << 8) | \ - (*((unsigned char *)(p) + 3) << 0) ) + ((uint32_t)*((unsigned char *)(p) + 0) << 24) | \ + ((uint32_t)*((unsigned char *)(p) + 1) << 16) | \ + ((uint32_t)*((unsigned char *)(p) + 2) << 8) | \ + ((uint32_t)*((unsigned char *)(p) + 3) << 0) ) #define put_be32(p, v) do { \ unsigned int __v = (v); \ *((unsigned char *)(p) + 0) = __v >> 24; \ -- 2.13.3
Re: [PATCH v3 06/20] builtin/receive-pack: convert portions to struct object_id
Am 31.03.2017 um 03:39 schrieb brian m. carlson: > @@ -1081,10 +1081,10 @@ static const char *update(struct command *cmd, struct > shallow_info *si) > return "hook declined"; > } > > - if (is_null_sha1(new_sha1)) { > + if (is_null_oid(new_oid)) { > struct strbuf err = STRBUF_INIT; > - if (!parse_object(old_sha1)) { > - old_sha1 = NULL; > + if (!parse_object(old_oid->hash)) { > + old_oid = NULL; So old_oid can become NULL... > if (ref_exists(name)) { > rp_warning("Allowing deletion of corrupt ref."); > } else { > @@ -1094,7 +1094,7 @@ static const char *update(struct command *cmd, struct > shallow_info *si) > } > if (ref_transaction_delete(transaction, > namespaced_name, > -old_sha1, > +old_oid->hash, ... and here we dereference it. -- >8 -- Subject: [PATCH] receive-pack: don't access hash of NULL object_id pointer We set old_oid to NULL if we found out that it's a corrupt reference. In that case don't try to access the hash member and pass NULL to ref_transaction_delete() instead. Found with Clang's UBSan. Signed-off-by: Rene Scharfe--- That's the last bug of this kind which "make SANITIZE=undefined test" turned up. builtin/receive-pack.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c index cabdc55e09..946cf55138 100644 --- a/builtin/receive-pack.c +++ b/builtin/receive-pack.c @@ -1138,7 +1138,7 @@ static const char *update(struct command *cmd, struct shallow_info *si) } if (ref_transaction_delete(transaction, namespaced_name, - old_oid->hash, + old_oid ? old_oid->hash : NULL, 0, "push", )) { rp_error("%s", err.buf); strbuf_release(); -- 2.13.3
Re: [PATCH 04/33] notes: make get_note return pointer to struct object_id
Am 30.05.2017 um 19:30 schrieb Brandon Williams: > @@ -392,7 +392,7 @@ static int add(int argc, const char **argv, const char > *prefix) > const char *object_ref; > struct notes_tree *t; > unsigned char object[20], new_note[20]; > - const unsigned char *note; > + const struct object_id *note; > struct note_data d = { 0, 0, NULL, STRBUF_INIT }; > struct option options[] = { > { OPTION_CALLBACK, 'm', "message", , N_("message"), In between here, note can be set to NULL... > @@ -453,7 +453,7 @@ static int add(int argc, const char **argv, const char > *prefix) > sha1_to_hex(object)); > } > > - prepare_note_data(object, , note); > + prepare_note_data(object, , note->hash); ... which we then dereference here. > @@ -598,13 +598,13 @@ static int append_edit(int argc, const char **argv, > const char *prefix) > t = init_notes_check(argv[0], NOTES_INIT_WRITABLE); > note = get_note(t, object); > > - prepare_note_data(object, , edit ? note : NULL); > + prepare_note_data(object, , edit && note ? note->hash : NULL); Here a NULL check was added; we need a similar one above as well. -- >8 -- Subject: [PATCH] notes: don't access hash of NULL object_id pointer Check if note is NULL, as we already do for different purposes a few lines above, and pass a NULL pointer to prepare_note_data() in that case instead of trying to access the hash member. Found with Clang's UBSan. Signed-off-by: Rene Scharfe--- The third parameter of prepare_note_data() could easily be turned into an object_id pointer (and it should), but this patch is meant to be a minimal fix. builtin/notes.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/builtin/notes.c b/builtin/notes.c index 77573cf1ea..4303848e04 100644 --- a/builtin/notes.c +++ b/builtin/notes.c @@ -456,7 +456,7 @@ static int add(int argc, const char **argv, const char *prefix) oid_to_hex()); } - prepare_note_data(, , note->hash); + prepare_note_data(, , note ? note->hash : NULL); if (d.buf.len || allow_empty) { write_note_data(, new_note.hash); if (add_note(t, , _note, combine_notes_overwrite)) -- 2.13.3
Re: [PATCH 30/33] tree-diff: convert diff_tree_paths to struct object_id
Am 30.05.2017 um 19:31 schrieb Brandon Williams: > @@ -273,21 +274,20 @@ static struct combine_diff_path *emit_path(struct > combine_diff_path *p, > } > > if (recurse) { > - const unsigned char **parents_sha1; > + const struct object_id **parents_oid; > > - FAST_ARRAY_ALLOC(parents_sha1, nparent); > + FAST_ARRAY_ALLOC(parents_oid, nparent); > for (i = 0; i < nparent; ++i) { > /* same rule as in emitthis */ > int tpi_valid = tp && !(tp[i].entry.mode & > S_IFXMIN_NEQ); > > - parents_sha1[i] = tpi_valid ? tp[i].entry.oid->hash > - : NULL; > + parents_oid[i] = tpi_valid ? tp[i].entry.oid : NULL; > } So elements of parents_oid can be NULL... > > strbuf_add(base, path, pathlen); > strbuf_addch(base, '/'); > - p = ll_diff_tree_paths(p, sha1, parents_sha1, nparent, base, > opt); > - FAST_ARRAY_FREE(parents_sha1, nparent); > + p = ll_diff_tree_paths(p, oid, parents_oid, nparent, base, opt); ... and we pass that array to ll_diff_tree_paths()... > + FAST_ARRAY_FREE(parents_oid, nparent); > } > > strbuf_setlen(base, old_baselen); > @@ -312,7 +312,7 @@ static void skip_uninteresting(struct tree_desc *t, > struct strbuf *base, > > > /* > - * generate paths for combined diff D(sha1,parents_sha1[]) > + * generate paths for combined diff D(sha1,parents_oid[]) >* >* Resulting paths are appended to combine_diff_path linked list, and also, > are >* emitted on the go via opt->pathchange() callback, so it is possible to > @@ -404,8 +404,8 @@ static inline void update_tp_entries(struct tree_desc > *tp, int nparent) > } > > static struct combine_diff_path *ll_diff_tree_paths( > - struct combine_diff_path *p, const unsigned char *sha1, > - const unsigned char **parents_sha1, int nparent, > + struct combine_diff_path *p, const struct object_id *oid, > + const struct object_id **parents_oid, int nparent, > struct strbuf *base, struct diff_options *opt) > { > struct tree_desc t, *tp; > @@ -422,8 +422,8 @@ static struct combine_diff_path *ll_diff_tree_paths( >* diff_tree_oid(parent, commit) ) >*/ > for (i = 0; i < nparent; ++i) > - tptree[i] = fill_tree_descriptor([i], parents_sha1[i]); > - ttree = fill_tree_descriptor(, sha1); > + tptree[i] = fill_tree_descriptor([i], parents_oid[i]->hash); ... and here we are in that function, dereferencing a pointer that may be NULL. > + ttree = fill_tree_descriptor(, oid->hash); oid here can also be NULL, but I think that's not as easy to see. -- >8 -- Subject: [PATCH] tree-diff: don't access hash of NULL object_id pointer The object_id pointers can be NULL for invalid entries. Don't try to dereference them and pass NULL along to fill_tree_descriptor() instead, which handles them just fine. Found with Clang's UBSan. Signed-off-by: Rene Scharfe--- fill_tree_descriptor() can easily be converted to object_id, by the way, which would get us rid of the extra check introduced here, but this patch is meant as a minimal fix. tree-diff.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/tree-diff.c b/tree-diff.c index bd6d65a409..2357f72899 100644 --- a/tree-diff.c +++ b/tree-diff.c @@ -421,8 +421,9 @@ static struct combine_diff_path *ll_diff_tree_paths( * diff_tree_oid(parent, commit) ) */ for (i = 0; i < nparent; ++i) - tptree[i] = fill_tree_descriptor([i], parents_oid[i]->hash); - ttree = fill_tree_descriptor(, oid->hash); + tptree[i] = fill_tree_descriptor([i], + parents_oid[i] ? parents_oid[i]->hash : NULL); + ttree = fill_tree_descriptor(, oid ? oid->hash : NULL); /* Enable recursion indefinitely */ opt->pathspec.recursive = DIFF_OPT_TST(opt, RECURSIVE); -- 2.13.3
Re: [PATCH 5/5] Makefile: disable unaligned loads with UBSan
Am 10.07.2017 um 15:24 schrieb Jeff King: > The undefined behavior sanitizer complains about unaligned > loads, even if they're OK for a particular platform in > practice. It's possible that they _are_ a problem, of > course, but since it's a known tradeoff the UBSan errors are > just noise. > > Let's quiet it automatically by building with > NO_UNALIGNED_LOADS when SANITIZE=undefined is in use. > > Signed-off-by: Jeff King> --- > Makefile | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/Makefile b/Makefile > index cc03b3c95..3c341b2a6 100644 > --- a/Makefile > +++ b/Makefile > @@ -1015,6 +1015,9 @@ endif > ifdef SANITIZE > BASIC_CFLAGS += -fsanitize=$(SANITIZE) -fno-sanitize-recover=$(SANITIZE) > BASIC_CFLAGS += -fno-omit-frame-pointer > +ifeq ($(SANITIZE),undefined) > +BASIC_CFLAGS += -DNO_UNALIGNED_LOADS > +endif > endif > > ifndef sysconfdir Nice, but let's be even nicer! -- >8 -- Subject: [PATCH] Makefile: allow combining UBSan with other sanitizers Multiple sanitizers can be specified as a comma-separated list. Set the flag NO_UNALIGNED_LOADS even if UndefinedBehaviorSanitizer is not the only sanitizer to build with. Signed-off-by: Rene Scharfe --- Makefile | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/Makefile b/Makefile index ba4359ef8d..9b98535a04 100644 --- a/Makefile +++ b/Makefile @@ -1022,10 +1022,15 @@ ifdef DEVELOPER CFLAGS += $(DEVELOPER_CFLAGS) endif +comma := , +empty := +space := $(empty) $(empty) + ifdef SANITIZE +SANITIZERS := $(foreach flag,$(subst $(comma),$(space),$(SANITIZE)),$(flag)) BASIC_CFLAGS += -fsanitize=$(SANITIZE) -fno-sanitize-recover=$(SANITIZE) BASIC_CFLAGS += -fno-omit-frame-pointer -ifeq ($(SANITIZE),undefined) +ifneq ($(filter undefined,$(SANITIZERS)),) BASIC_CFLAGS += -DNO_UNALIGNED_LOADS endif endif -- 2.13.3
Re: [PATCH] sha1_file: add slash once in for_each_file_in_obj_subdir()
Am 10.07.2017 um 04:10 schrieb Junio C Hamano: Jeff Kingwrites: ... And you could even drop origlen by replacing it with "baselen - 3" at the end. But somehow doing the computation on the fly actually seems more complicated to me (from the perspective of a reader who is trying to make sure all is correct). True enough. I personally do not mind any of the three variants (including the original) and would rather not spend too much time micro-optimizing this codepath, lest the next clever person starts to turn the loop in for_each_loose_file_in_objdir() to a nested loop that runs 16 times each to avoid copying the same leading byte again and again ;-) *evil grin* Anyway, I agree that this patch is not worth spending much time on. I still think the result is cleaner (perhaps due to my strbuf_addf allergy), though it's longer and perhaps adding a twelfth variable makes the function too complicated. The magic chomping variant is a clever compromise -- but needs a comment to explain itself. So if you are indifferent please just drop it. Thanks, René
Re: [PATCH] use DIV_ROUND_UP
Am 10.07.2017 um 09:27 schrieb Jeff King: On Sat, Jul 08, 2017 at 12:35:35PM +0200, René Scharfe wrote: diff --git a/ewah/ewah_bitmap.c b/ewah/ewah_bitmap.c index 2dc9c82ecf..06c479f70a 100644 --- a/ewah/ewah_bitmap.c +++ b/ewah/ewah_bitmap.c @@ -210,8 +210,8 @@ size_t ewah_add(struct ewah_bitmap *self, eword_t word) void ewah_set(struct ewah_bitmap *self, size_t i) { const size_t dist = - (i + BITS_IN_EWORD) / BITS_IN_EWORD - - (self->bit_size + BITS_IN_EWORD - 1) / BITS_IN_EWORD; + DIV_ROUND_UP(i + 1, BITS_IN_EWORD) - + DIV_ROUND_UP(self->bit_size, BITS_IN_EWORD); ...this first one is a bit trickier. Our "n" in the first one is now "i+1". But that's because the original was implicitly canceling the "-1" and "+1" terms. So I think it's a true mechanical conversion, but I have to admit the original is confusing. Without digging I suspect it's correct, though, just because a simple bug here would mean that our ewah bitmaps totally don't work. So it's probably not worth spending time on. A few lines below there is "self->bit_size = i + 1", so the code calculates how many words the old and the new value are apart (or by how many words the bitmap needs to be extended), which becomes easier to see with the patch. René
Re: [PATCH] use DIV_ROUND_UP
Am 09.07.2017 um 15:25 schrieb Martin Ågren: > On 8 July 2017 at 12:35, René Scharfe <l@web.de> wrote: >> Convert code that divides and rounds up to use DIV_ROUND_UP to make the >> intent clearer and reduce the number of magic constants. > ... >> diff --git a/sha1_name.c b/sha1_name.c >> index e7f7b12ceb..8c513dbff6 100644 >> --- a/sha1_name.c >> +++ b/sha1_name.c >> @@ -492,7 +492,7 @@ int find_unique_abbrev_r(char *hex, const unsigned char >> *sha1, int len) >> * together we need to divide by 2; but we also want to >> round >> * odd numbers up, hence adding one before dividing. >> */ >> - len = (len + 1) / 2; >> + len = DIV_ROUND_UP(len, 2); > > Since the addition is now an implementation detail of DIV_ROUND_UP, > should the comment be adjusted, maybe simply by removing ", hence > adding one before dividing"? > > Or perhaps even better, "... divide by 2; but since len might be odd, > we need to make sure we round up as we divide". My thinking being, > we're not actually rounding odd numbers up (presumably to even > numbers), but we're rounding the result of the division up (to the > smallest larger integer). Good point; perhaps just squash this in? --- sha1_name.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/sha1_name.c b/sha1_name.c index 8c513dbff6..74fcb6d788 100644 --- a/sha1_name.c +++ b/sha1_name.c @@ -489,8 +489,7 @@ int find_unique_abbrev_r(char *hex, const unsigned char *sha1, int len) * We now know we have on the order of 2^len objects, which * expects a collision at 2^(len/2). But we also care about hex * chars, not bits, and there are 4 bits per hex. So all -* together we need to divide by 2; but we also want to round -* odd numbers up, hence adding one before dividing. +* together we need to divide by 2 and round up. */ len = DIV_ROUND_UP(len, 2); /* -- 2.13.2
Re: [PATCH] sha1_file: add slash once in for_each_file_in_obj_subdir()
Am 09.07.2017 um 13:00 schrieb Jeff King: On Sat, Jul 08, 2017 at 10:59:06AM +0200, René Scharfe wrote: Add the slash between loose object subdirectory and file name just once outside the loop instead of overwriting it with each readdir call. Redefine baselen as the length with that slash, and add dirlen for the length without it. The result is slightly less wasteful and can use the the cheaper strbuf_addstr instead of strbuf_addf without losing clarity. This patch looks correct to me. I'm a little lukewarm on it overall, though. I'd be shocked if the efficiency change is measurable. What I really care about is whether the result is easier to read or not. On the plus side, this moves an invariant out of the loop. On the minus side, it has to introduce an extra variable for "length we add on to" versus "dir length to pass to the subdir_cb". That's not rocket science, but it does slightly complicate things (though I note we already have "origlen", so this is bumping us from 2 to 3 length variables, not 1 to 2). Admittedly this is more of an OCD thing. Overwriting a character 200 times or parsing a format string don't very long compared to the rest of the function, but it's a bit itchy. René
Re: [PATCH] apply: use strcmp(3) for comparing strings in gitdiff_verify_name()
Am 09.07.2017 um 14:37 schrieb Andreas Schwab: On Jul 09 2017, René Scharfe <l@web.de> wrote: [2] http://pubs.opengroup.org/onlinepubs/009695399/functions/memchr.html You are using an old revision. OK, the final draft of C11 [3] says "The implementation shall behave as if it reads the characters sequentially and stops as soon as a matching character is found." I wonder when we can begin to target C99 in git's source, though. :) René [3] http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf
Re: [PATCH] apply: use strcmp(3) for comparing strings in gitdiff_verify_name()
Am 09.07.2017 um 00:29 schrieb Junio C Hamano: René Scharfe <l@web.de> writes: Am 08.07.2017 um 13:08 schrieb Ramsay Jones: On 08/07/17 09:58, René Scharfe wrote: Avoid running over the end of another -- a C string whose length we don't know -- by using strcmp(3) instead of memcmp(3) for comparing it with another C string. I had to read this twice, along with the patch text, before this made any sense. ;-) The missing information being that 'another' was the name of the string variable that we were potentially 'running over the end of'. Yeah, sorry, encasing that unusual variable name in quotes would probably have helped. What makes it even more confusing is that the variable with the problematic name is referred to as "it" in the last part of the description--- the second occurrence of 'another' is actually not referring to that variable but yet another string that is being compared with it ;-) Perhaps like this instead? We don't know the length of the C string "another". It could be shorter than "name", which we compare it to using memchr(3). Call strcmp(3) instead to avoid running over the end of the former, and get rid of a strlen(3) call as a bonus. René
Re: [PATCH] apply: use strcmp(3) for comparing strings in gitdiff_verify_name()
Am 08.07.2017 um 15:38 schrieb Andreas Schwab: On Jul 08 2017, René Scharfe <l@web.de> wrote: Am 08.07.2017 um 13:08 schrieb Andreas Schwab: On Jul 08 2017, René Scharfe <l@web.de> wrote: Avoid running over the end of another -- a C string whose length we don't know -- by using strcmp(3) instead of memcmp(3) for comparing it with another C string. That's not a good justification for the change, since memcmp never reads past the differing characters. Interesting. Where does that guarantee come from? Sorry, I misremembered. It's only memchr that has this restriction. Hmm, I can't get ASan to complain about memchr reading beyond the end of a C string, but I don't know why. Glibc reads full words [1], and I don't see how the standard [2] forbids reading past the found byte. René [1] https://sourceware.org/git/?p=glibc.git;a=blob;f=string/memchr.c [2] http://pubs.opengroup.org/onlinepubs/009695399/functions/memchr.html
Re: 0 bytes/s vs. ∞ bytes/s
Am 07.07.2017 um 17:57 schrieb 積丹尼 Dan Jacobson: > Receiving objects: 100% (1003/1003), 1.15 MiB | 0 bytes/s, done. > Receiving objects: 100% (1861/1861), 11.74 MiB | 4.58 MiB/s, done. > Receiving objects: 100% (474/474), 160.72 KiB | 0 bytes/s, done. > Receiving objects: 100% (7190/7190), 26.02 MiB | 6.53 MiB/s, done. > > If the connection is too fast to calculate, please report > ∞ bytes/s or > inf bytes/s or > ? bytes/s or > anything but 0 bytes/s, which means nothing (transmitted.) I don't know your actual transfer rate, but I would guess it's closer to zero than to infinity. :) How about this, though: -- >8 -- Subject: [PATCH] progress: show overall rate in last update The values in struct throughput are only updated every 0.5 seconds. If we're all done before that time span then the final update will show a rate of 0 bytes/s, which is misleading if some bytes had been handled. Remember the start time and show the total throughput instead. And avoid division by zero by enforcing a minimum time span value of 1 (unit: 1/1024th of a second). That makes the resulting rate an underestimation, but it's closer to the actual value than the currently shown 0 bytes/s. Reported-by: 積丹尼 Dan JacobsonSigned-off-by: Rene Scharfe --- progress.c | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/progress.c b/progress.c index 29378caa05..73e36d4a42 100644 --- a/progress.c +++ b/progress.c @@ -36,6 +36,7 @@ struct progress { unsigned delay; unsigned delayed_percent_treshold; struct throughput *throughput; + uint64_t start_ns; }; static volatile sig_atomic_t progress_update; @@ -221,6 +222,7 @@ struct progress *start_progress_delay(const char *title, unsigned total, progress->delayed_percent_treshold = percent_treshold; progress->delay = delay; progress->throughput = NULL; + progress->start_ns = getnanotime(); set_progress_signal(); return progress; } @@ -247,8 +249,10 @@ void stop_progress_msg(struct progress **p_progress, const char *msg) struct throughput *tp = progress->throughput; if (tp) { - unsigned int rate = !tp->avg_misecs ? 0 : - tp->avg_bytes / tp->avg_misecs; + uint64_t now_ns = getnanotime(); + unsigned int misecs, rate; + misecs = ((now_ns - progress->start_ns) * 4398) >> 32; + rate = tp->curr_total / (misecs ? misecs : 1); throughput_string(>display, tp->curr_total, rate); } progress_update = 1; -- 2.13.2
Re: [PATCH] urlmatch: use hex2chr() in append_normalized_escapes()
Am 08.07.2017 um 16:28 schrieb Kyle J. McKay: On Jul 8, 2017, at 01:59, René Scharfe wrote: Simplify the code by using hex2chr() to convert and check for invalid characters at the same time instead of doing that sequentially with one table lookup for each. I think that comment may be a bit misleading as the changes are just switching from one set of inlines to another. Essentially the same sequential check takes place in the hex2chr inlined function which is being used to replace the "one table lookup for each". An optimizing compiler will likely eliminate any difference between the before and after patch versions. Nothing immediately comes to mind as an alternate comment though, so I'm not proposing any changes to the comment. Right, the table lookups for isxdigit and hexval are not duplicated when compiling with -O2. René
Re: [PATCH] apply: use strcmp(3) for comparing strings in gitdiff_verify_name()
Am 08.07.2017 um 13:08 schrieb Ramsay Jones: On 08/07/17 09:58, René Scharfe wrote: Avoid running over the end of another -- a C string whose length we don't know -- by using strcmp(3) instead of memcmp(3) for comparing it with another C string. I had to read this twice, along with the patch text, before this made any sense. ;-) The missing information being that 'another' was the name of the string variable that we were potentially 'running over the end of'. Yeah, sorry, encasing that unusual variable name in quotes would probably have helped. René
Re: [PATCH] apply: use strcmp(3) for comparing strings in gitdiff_verify_name()
Am 08.07.2017 um 13:08 schrieb Andreas Schwab: On Jul 08 2017, René Scharfe <l@web.de> wrote: Avoid running over the end of another -- a C string whose length we don't know -- by using strcmp(3) instead of memcmp(3) for comparing it with another C string. That's not a good justification for the change, since memcmp never reads past the differing characters. Interesting. Where does that guarantee come from? ASan reports an overflow with the following test program for me on Debian testing x64: #include int main(int argc, char **argv) { char a[32] = "1234567890123456789012345678901"; char b[2] = "a"; return memcmp(a, b, 32); }
[PATCH] wt-status: use separate variable for result of shorten_unambiguous_ref
Store the pointer to the string allocated by shorten_unambiguous_ref in a dedicated variable, short_base, and keep base unchanged. A non-const variable is more appropriate for such an object. It avoids having to cast const away on free and stops redefining the meaning of base, making the code slightly clearer. Signed-off-by: Rene Scharfe--- wt-status.c | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/wt-status.c b/wt-status.c index 8d2fb35b08..77c27c5113 100644 --- a/wt-status.c +++ b/wt-status.c @@ -1763,6 +1763,7 @@ static void wt_shortstatus_print_tracking(struct wt_status *s) const char *branch_color_remote = color(WT_STATUS_REMOTE_BRANCH, s); const char *base; + char *short_base; const char *branch_name; int num_ours, num_theirs; int upstream_is_gone = 0; @@ -1797,10 +1798,10 @@ static void wt_shortstatus_print_tracking(struct wt_status *s) upstream_is_gone = 1; } - base = shorten_unambiguous_ref(base, 0); + short_base = shorten_unambiguous_ref(base, 0); color_fprintf(s->fp, header_color, "..."); - color_fprintf(s->fp, branch_color_remote, "%s", base); - free((char *)base); + color_fprintf(s->fp, branch_color_remote, "%s", short_base); + free(short_base); if (!upstream_is_gone && !num_ours && !num_theirs) goto conclude; -- 2.13.2
[PATCH] use DIV_ROUND_UP
Convert code that divides and rounds up to use DIV_ROUND_UP to make the intent clearer and reduce the number of magic constants. Signed-off-by: Rene Scharfe--- builtin/gc.c | 2 +- builtin/grep.c | 2 +- builtin/log.c | 2 +- builtin/receive-pack.c | 2 +- diff.c | 2 +- ewah/ewah_bitmap.c | 4 ++-- imap-send.c| 2 +- sha1_name.c| 2 +- shallow.c | 8 9 files changed, 13 insertions(+), 13 deletions(-) diff --git a/builtin/gc.c b/builtin/gc.c index bd91f136fe..2ba50a2873 100644 --- a/builtin/gc.c +++ b/builtin/gc.c @@ -149,7 +149,7 @@ static int too_many_loose_objects(void) if (!dir) return 0; - auto_threshold = (gc_auto_threshold + 255) / 256; + auto_threshold = DIV_ROUND_UP(gc_auto_threshold, 256); while ((ent = readdir(dir)) != NULL) { if (strspn(ent->d_name, "0123456789abcdef") != 38 || ent->d_name[38] != '\0') diff --git a/builtin/grep.c b/builtin/grep.c index fa351c49f4..0d6e669732 100644 --- a/builtin/grep.c +++ b/builtin/grep.c @@ -543,7 +543,7 @@ static void compile_submodule_options(const struct grep_opt *opt, * submodule process has its own thread pool. */ argv_array_pushf(_options, "--threads=%d", -(num_threads + 1) / 2); +DIV_ROUND_UP(num_threads, 2)); /* Add Pathspecs */ argv_array_push(_options, "--"); diff --git a/builtin/log.c b/builtin/log.c index 8ca1de9894..c6362cf92e 100644 --- a/builtin/log.c +++ b/builtin/log.c @@ -1308,7 +1308,7 @@ static struct commit *get_base_commit(const char *base_commit, if (rev_nr % 2) rev[i] = rev[2 * i]; - rev_nr = (rev_nr + 1) / 2; + rev_nr = DIV_ROUND_UP(rev_nr, 2); } if (!in_merge_bases(base, rev[0])) diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c index 71c0c768db..cabdc55e09 100644 --- a/builtin/receive-pack.c +++ b/builtin/receive-pack.c @@ -1806,7 +1806,7 @@ static const char *unpack_with_sideband(struct shallow_info *si) static void prepare_shallow_update(struct command *commands, struct shallow_info *si) { - int i, j, k, bitmap_size = (si->ref->nr + 31) / 32; + int i, j, k, bitmap_size = DIV_ROUND_UP(si->ref->nr, 32); ALLOC_ARRAY(si->used_shallow, si->shallow->nr); assign_shallow_commits_to_refs(si, si->used_shallow, NULL); diff --git a/diff.c b/diff.c index 00b4c86698..85e714f6c6 100644 --- a/diff.c +++ b/diff.c @@ -2095,7 +2095,7 @@ static void show_dirstat_by_line(struct diffstat_t *data, struct diff_options *o * bytes per "line". * This is stupid and ugly, but very cheap... */ - damage = (damage + 63) / 64; + damage = DIV_ROUND_UP(damage, 64); ALLOC_GROW(dir.files, dir.nr + 1, dir.alloc); dir.files[dir.nr].name = file->name; dir.files[dir.nr].changed = damage; diff --git a/ewah/ewah_bitmap.c b/ewah/ewah_bitmap.c index 2dc9c82ecf..06c479f70a 100644 --- a/ewah/ewah_bitmap.c +++ b/ewah/ewah_bitmap.c @@ -210,8 +210,8 @@ size_t ewah_add(struct ewah_bitmap *self, eword_t word) void ewah_set(struct ewah_bitmap *self, size_t i) { const size_t dist = - (i + BITS_IN_EWORD) / BITS_IN_EWORD - - (self->bit_size + BITS_IN_EWORD - 1) / BITS_IN_EWORD; + DIV_ROUND_UP(i + 1, BITS_IN_EWORD) - + DIV_ROUND_UP(self->bit_size, BITS_IN_EWORD); assert(i >= self->bit_size); diff --git a/imap-send.c b/imap-send.c index 351e84aea1..b2d0b849bb 100644 --- a/imap-send.c +++ b/imap-send.c @@ -861,7 +861,7 @@ static char hexchar(unsigned int b) return b < 10 ? '0' + b : 'a' + (b - 10); } -#define ENCODED_SIZE(n) (4*((n+2)/3)) +#define ENCODED_SIZE(n) (4 * DIV_ROUND_UP((n), 3)) static char *cram(const char *challenge_64, const char *user, const char *pass) { int i, resp_len, encoded_len, decoded_len; diff --git a/sha1_name.c b/sha1_name.c index e7f7b12ceb..8c513dbff6 100644 --- a/sha1_name.c +++ b/sha1_name.c @@ -492,7 +492,7 @@ int find_unique_abbrev_r(char *hex, const unsigned char *sha1, int len) * together we need to divide by 2; but we also want to round * odd numbers up, hence adding one before dividing. */ - len = (len + 1) / 2; + len = DIV_ROUND_UP(len, 2); /* * For very small repos, we stick with our regular fallback. */ diff --git a/shallow.c b/shallow.c index ef7ca78993..54359d5490 100644 --- a/shallow.c +++ b/shallow.c @@ -443,7 +443,7 @@ struct paint_info { static uint32_t *paint_alloc(struct paint_info *info) { -
[PATCH] urlmatch: use hex2chr() in append_normalized_escapes()
Simplify the code by using hex2chr() to convert and check for invalid characters at the same time instead of doing that sequentially with one table lookup for each. Signed-off-by: Rene Scharfe--- urlmatch.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/urlmatch.c b/urlmatch.c index 4bbde924e8..3e42bd7504 100644 --- a/urlmatch.c +++ b/urlmatch.c @@ -42,12 +42,12 @@ static int append_normalized_escapes(struct strbuf *buf, from_len--; if (ch == '%') { - if (from_len < 2 || - !isxdigit(from[0]) || - !isxdigit(from[1])) + if (from_len < 2) return 0; - ch = hexval(*from++) << 4; - ch |= hexval(*from++); + ch = hex2chr(from); + if (ch < 0) + return 0; + from += 2; from_len -= 2; was_esc = 1; } -- 2.13.2
[PATCH] sha1_file: add slash once in for_each_file_in_obj_subdir()
Add the slash between loose object subdirectory and file name just once outside the loop instead of overwriting it with each readdir call. Redefine baselen as the length with that slash, and add dirlen for the length without it. The result is slightly less wasteful and can use the the cheaper strbuf_addstr instead of strbuf_addf without losing clarity. Signed-off-by: Rene Scharfe--- sha1_file.c | 11 +++ 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/sha1_file.c b/sha1_file.c index 5862386cd0..6a9deb9e68 100644 --- a/sha1_file.c +++ b/sha1_file.c @@ -3749,7 +3749,7 @@ int for_each_file_in_obj_subdir(unsigned int subdir_nr, each_loose_subdir_fn subdir_cb, void *data) { - size_t origlen, baselen; + size_t origlen, dirlen, baselen; DIR *dir; struct dirent *de; int r = 0; @@ -3760,7 +3760,7 @@ int for_each_file_in_obj_subdir(unsigned int subdir_nr, origlen = path->len; strbuf_complete(path, '/'); strbuf_addf(path, "%02x", subdir_nr); - baselen = path->len; + dirlen = path->len; dir = opendir(path->buf); if (!dir) { @@ -3770,12 +3770,15 @@ int for_each_file_in_obj_subdir(unsigned int subdir_nr, return r; } + strbuf_addch(path, '/'); + baselen = path->len; + while ((de = readdir(dir))) { if (is_dot_or_dotdot(de->d_name)) continue; strbuf_setlen(path, baselen); - strbuf_addf(path, "/%s", de->d_name); + strbuf_addstr(path, de->d_name); if (strlen(de->d_name) == GIT_SHA1_HEXSZ - 2) { char hex[GIT_MAX_HEXSZ+1]; @@ -3801,7 +3804,7 @@ int for_each_file_in_obj_subdir(unsigned int subdir_nr, } closedir(dir); - strbuf_setlen(path, baselen); + strbuf_setlen(path, dirlen); if (!r && subdir_cb) r = subdir_cb(subdir_nr, path->buf, data); -- 2.13.2
[PATCH] apply: use strcmp(3) for comparing strings in gitdiff_verify_name()
Avoid running over the end of another -- a C string whose length we don't know -- by using strcmp(3) instead of memcmp(3) for comparing it with another C string. Signed-off-by: Rene Scharfe--- apply.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/apply.c b/apply.c index 946be4d2f5..9b3df8a3aa 100644 --- a/apply.c +++ b/apply.c @@ -962,13 +962,12 @@ static int gitdiff_verify_name(struct apply_state *state, } if (*name) { - int len = strlen(*name); char *another; if (isnull) return error(_("git apply: bad git-diff - expected /dev/null, got %s on line %d"), *name, state->linenr); another = find_name(state, line, NULL, state->p_value, TERM_TAB); - if (!another || memcmp(another, *name, len + 1)) { + if (!another || strcmp(another, *name)) { free(another); return error((side == DIFF_NEW_NAME) ? _("git apply: bad git-diff - inconsistent new filename on line %d") : -- 2.13.2
Re: [PATCH v6 2/2] strbuf: change an always NULL/"" strbuf_addftime() param to bool
Am 01.07.2017 um 14:55 schrieb Ævar Arnfjörð Bjarmason: strbuf_addstr() allows callers to pass a time zone name for expanding ^^^ That should be "strbuf_addftime()" instead (my typo), as Junio noted. %Z. The only current caller either passes the empty string or NULL, in which case %Z is handed over verbatim to strftime(3). Replace that string parameter with a flag controlling whether to remove %Z from the format specification. This simplifies the code. Commit-message-by: René Scharfe <l@web.de> Signed-off-by: Ævar Arnfjörð Bjarmason <ava...@gmail.com> --- On Sat, Jun 24 2017, Junio C. Hamano jotted: René Scharfe <l@web.de> writes: Here's an attempt at a commit message that would have be easier to understand for me: strbuf_addstr() allows callers to pass a time zone name for expanding %Z. The only current caller either passes the empty string or NULL, in which case %Z is handed over verbatim to strftime(3). Replace that string parameter with a flag controlling whether to remove %Z from the format specification. This simplifies the code. I think the first one is strbuf_addftime(); other than that, I think this version explains what is going on in this patch than the original. René
[PATCH] apply: use starts_with() in gitdiff_verify_name()
Avoid running over the end of line -- a C string whose length is not known to this function -- by using starts_with() instead of memcmp(3) for checking if it starts with "/dev/null". Also simply include the newline in the string constant to compare against. Drop a comment that just states the obvious. Signed-off-by: Rene Scharfe--- apply.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/apply.c b/apply.c index c442b89328..946be4d2f5 100644 --- a/apply.c +++ b/apply.c @@ -976,8 +976,7 @@ static int gitdiff_verify_name(struct apply_state *state, } free(another); } else { - /* expect "/dev/null" */ - if (memcmp("/dev/null", line, 9) || line[9] != '\n') + if (!starts_with(line, "/dev/null\n")) return error(_("git apply: bad git-diff - expected /dev/null on line %d"), state->linenr); } -- 2.13.2
Re: git log use of date format differs between Command Line and script usage.
Am 30.06.2017 um 18:06 schrieb Shaun Uldrikis: If you supply a non-standard format to the date configuration for git log, something like: [log] date = format:%Y-%m-%d %H:%M then, when you run 'git log' inside a script, or when using gitk (anywhere), it fails on decoding the format. fatal: unknown date format format: %Y-%m-%d %H:%M However, that format works correctly on the command line. I do not have a patch to address this issue. I guess you have two versions of git on your system, and the one used in scripts is older than 2.6.0, which introduced this feature. René
Re: [PATCH v4 4/6] coccinelle: add a rule to make "expression" code use FREE_AND_NULL()
Am 29.06.2017 um 00:21 schrieb René Scharfe: $ git am # pasted my email Applying: coccinelle: add a rule to make "expression" code use FREE_AND_NULL() # results in commit message with scissor line, interesting.. "git am --scissors" commits with the correct message. I'll do a "git config --local --add mailinfo.scissors true" real quick.. René
Re: [PATCH v4 4/6] coccinelle: add a rule to make "expression" code use FREE_AND_NULL()
Am 29.06.2017 um 00:21 schrieb René Scharfe: Am 28.06.2017 um 23:39 schrieb Ævar Arnfjörð Bjarmason: On Sun, Jun 25 2017, René Scharfe jotted: Am 16.06.2017 um 21:43 schrieb Junio C Hamano: Ævar Arnfjörð Bjarmason <ava...@gmail.com> writes: A follow-up to the existing "type" rule added in an earlier change. This catches some occurrences that are missed by the previous rule. Signed-off-by: Ævar Arnfjörð Bjarmason <ava...@gmail.com> --- Hmph, I wonder if the "type" thing is really needed. Over there, "ptr" is an expression and we can find "free(ptr); ptr = NULL" with the rule in this patch already, no? Indeed. How about this on top of master? -- >8 -- Subject: [PATCH] coccinelle: polish FREE_AND_NULL rules There are two rules for using FREE_AND_NULL in free.cocci, one for pointer types and one for expressions. Both cause coccinelle to remove empty lines and even newline characters between replacements for some reason; consecutive "free(x);/x=NULL;" sequences end up as multiple FREE_AND_NULL calls on the same time. Remove the type rule, as the expression rule already covers it, and rearrange the lines of the latter to place the addition of FREE_AND_NULL between the removals, which causes coccinelle to leave surrounding whitespace untouched. Signed-off-by: Rene Scharfe <l@web.de> --- contrib/coccinelle/free.cocci | 10 +- 1 file changed, 1 insertion(+), 9 deletions(-) diff --git a/contrib/coccinelle/free.cocci b/contrib/coccinelle/free.cocci index f2d97e755b..4490069df9 100644 --- a/contrib/coccinelle/free.cocci +++ b/contrib/coccinelle/free.cocci @@ -11,16 +11,8 @@ expression E; free(E); @@ -type T; -T *ptr; -@@ -- free(ptr); -- ptr = NULL; -+ FREE_AND_NULL(ptr); - -@@ expression E; @@ - free(E); -- E = NULL; + FREE_AND_NULL(E); +- E = NULL; Late reply, sorry. What version of coccinelle are you running? I have 1.0.4 (from Debian) and can't get this to produce the same results as what I have. On top of next, I did: Revert "*.[ch] refactoring: make use of the FREE_AND_NULL() macro" Revert "coccinelle: make use of the "expression" FREE_AND_NULL() rule" Revert "coccinelle: make use of the "type" FREE_AND_NULL() rule" And then generated the patch as usual with `make coccicheck`, and applied it. This has your change. I then re-applied the manual "*.[ch] refactoring" changes This results in this diff with next: $ git diff --stat origin/next.. -- '*.[ch]' builtin/am.c | 3 ++- builtin/clean.c | 6 -- builtin/config.c | 6 -- builtin/index-pack.c | 6 -- builtin/pack-objects.c | 12 builtin/unpack-objects.c | 3 ++- fast-import.c| 6 -- http-push.c | 24 http.c | 15 ++- imap-send.c | 3 ++- ref-filter.c | 3 ++- refs/files-backend.c | 3 ++- remote-testsvn.c | 3 ++- sequencer.c | 3 ++- sha1-array.c | 3 ++- sha1_file.c | 3 ++- transport-helper.c | 27 ++- transport.c | 3 ++- tree-diff.c | 6 -- tree.c | 3 ++- 20 files changed, 94 insertions(+), 47 deletions(-) These are all cases where we now miss things that should use FREE_AND_NULL(), e.g.: diff --git a/builtin/am.c b/builtin/am.c index c973bd96dc..2f89338ed7 100644 --- a/builtin/am.c +++ b/builtin/am.c @@ -484,7 +484,8 @@ static int run_applypatch_msg_hook(struct am_state *state) ret = run_hook_le(NULL, "applypatch-msg", am_path(state, "final-commit"), NULL); if (!ret) { - FREE_AND_NULL(state->msg); + free(state->msg); + state->msg = NULL; if (read_commit_msg(state) < 0) die(_("'%s' was deleted by the applypatch-msg hook"), am_path(state, "final-commit")); So it looks to me like removing the "type T" rule breaks a lot of things, but that the change you made to the expression rule is good, and we should do that for the "type" rule as well. Your commit says the "expression rule already covers it", but this doesn't seem to be the case at all. As an aside: Junio, did you mean to apply f8bb4631fb to next this way? Looks like a mis-applied scissor commit. I can't reproduce that strange result with master, but get a scissors line into the commit message as well. The diff at the end looks reasonable (contains just the manual stuff). Here's what I did: On top of next it looks basically the same for me, only difference being several new converted cases in repository.c. Did you commit before diff? René
Re: [PATCH v4 4/6] coccinelle: add a rule to make "expression" code use FREE_AND_NULL()
Am 28.06.2017 um 23:39 schrieb Ævar Arnfjörð Bjarmason: > > On Sun, Jun 25 2017, René Scharfe jotted: > >> Am 16.06.2017 um 21:43 schrieb Junio C Hamano: >>> Ævar Arnfjörð Bjarmason <ava...@gmail.com> writes: >>> >>>> A follow-up to the existing "type" rule added in an earlier >>>> change. This catches some occurrences that are missed by the previous >>>> rule. >>>> >>>> Signed-off-by: Ævar Arnfjörð Bjarmason <ava...@gmail.com> >>>> --- >>> >>> Hmph, I wonder if the "type" thing is really needed. Over there, >>> "ptr" is an expression and we can find "free(ptr); ptr = NULL" with >>> the rule in this patch already, no? >> >> Indeed. How about this on top of master? >> >> -- >8 -- >> Subject: [PATCH] coccinelle: polish FREE_AND_NULL rules >> >> There are two rules for using FREE_AND_NULL in free.cocci, one for >> pointer types and one for expressions. Both cause coccinelle to remove >> empty lines and even newline characters between replacements for some >> reason; consecutive "free(x);/x=NULL;" sequences end up as multiple >> FREE_AND_NULL calls on the same time. >> >> Remove the type rule, as the expression rule already covers it, and >> rearrange the lines of the latter to place the addition of FREE_AND_NULL >> between the removals, which causes coccinelle to leave surrounding >> whitespace untouched. >> >> Signed-off-by: Rene Scharfe <l@web.de> >> --- >> contrib/coccinelle/free.cocci | 10 +- >> 1 file changed, 1 insertion(+), 9 deletions(-) >> >> diff --git a/contrib/coccinelle/free.cocci b/contrib/coccinelle/free.cocci >> index f2d97e755b..4490069df9 100644 >> --- a/contrib/coccinelle/free.cocci >> +++ b/contrib/coccinelle/free.cocci >> @@ -11,16 +11,8 @@ expression E; >> free(E); >> >> @@ >> -type T; >> -T *ptr; >> -@@ >> -- free(ptr); >> -- ptr = NULL; >> -+ FREE_AND_NULL(ptr); >> - >> -@@ >> expression E; >> @@ >> - free(E); >> -- E = NULL; >> + FREE_AND_NULL(E); >> +- E = NULL; > > Late reply, sorry. What version of coccinelle are you running? I have > 1.0.4 (from Debian) and can't get this to produce the same results as > what I have. > > On top of next, I did: > > Revert "*.[ch] refactoring: make use of the FREE_AND_NULL() macro" > Revert "coccinelle: make use of the "expression" FREE_AND_NULL() > rule" > Revert "coccinelle: make use of the "type" FREE_AND_NULL() rule" > > And then generated the patch as usual with `make coccicheck`, and > applied it. This has your change. > > I then re-applied the manual "*.[ch] refactoring" changes > > This results in this diff with next: > > $ git diff --stat origin/next.. -- '*.[ch]' > builtin/am.c | 3 ++- > builtin/clean.c | 6 -- > builtin/config.c | 6 -- > builtin/index-pack.c | 6 -- > builtin/pack-objects.c | 12 > builtin/unpack-objects.c | 3 ++- > fast-import.c| 6 -- > http-push.c | 24 > http.c | 15 ++- > imap-send.c | 3 ++- > ref-filter.c | 3 ++- > refs/files-backend.c | 3 ++- > remote-testsvn.c | 3 ++- > sequencer.c | 3 ++- > sha1-array.c | 3 ++- > sha1_file.c | 3 ++- > transport-helper.c | 27 ++- > transport.c | 3 ++- > tree-diff.c | 6 -- > tree.c | 3 ++- > 20 files changed, 94 insertions(+), 47 deletions(-) > > These are all cases where we now miss things that should use > FREE_AND_NULL(), e.g.: > > diff --git a/builtin/am.c b/builtin/am.c > index c973bd96dc..2f89338ed7 100644 > --- a/builtin/am.c > +++ b/builtin/am.c > @@ -484,7 +484,8 @@ static int run_applypatch_msg_hook(struct am_state > *state) > ret = run_hook_le(NULL, "applypatch-msg", am_path(state, > "final-commit"), NULL); > > if (!ret) { > - FREE_AND_NULL(state->msg); > + free(state->msg); > + state->msg = NULL; > if (re
Re: [PATCH 2/2] apply: handle assertion failure gracefully
Am 27.06.2017 um 20:08 schrieb Junio C Hamano: > René Scharfe <l@web.de> writes: > >> Thought a bit more about it, and as a result here's a simpler approach: >> >> -- >8 -- >> Subject: [PATCH] apply: check git diffs for mutually exclusive header lines >> >> A file can either be added, removed, copied, or renamed, but no two of >> these actions can be done by the same patch. Some of these combinations >> provoke error messages due to missing file names, and some are only >> caught by an assertion. Check git patches already as they are parsed >> and report conflicting lines on sight. >> >> Found by Vegard Nossum using AFL. >> >> Reported-by: Vegard Nossum <vegard.nos...@oracle.com> >> Signed-off-by: Rene Scharfe <l@web.de> >> --- >> apply.c| 14 ++ >> apply.h| 1 + >> t/t4136-apply-check.sh | 18 ++ >> 3 files changed, 33 insertions(+) >> >> diff --git a/apply.c b/apply.c >> index 8cd6435c74..8a5e44c474 100644 >> --- a/apply.c >> +++ b/apply.c >> @@ -1312,6 +1312,18 @@ static char *git_header_name(struct apply_state >> *state, >> } >> } >> >> +static int check_header_line(struct apply_state *state, struct patch *patch) >> +{ >> +int extensions = (patch->is_delete == 1) + (patch->is_new == 1) + >> + (patch->is_rename == 1) + (patch->is_copy == 1); >> +if (extensions > 1) >> +return error(_("inconsistent header lines %d and %d"), >> + state->extension_linenr, state->linenr); >> +if (extensions && !state->extension_linenr) >> +state->extension_linenr = state->linenr; > > OK. I wondered briefly what happens if the first git_header that > sets one of the extensions can be at line 0 (calusng > state->extension_linenr to be set to 0), but even in that case, the > second problematic one will correctly report the 0th and its own > line as culprit, so this is OK. It makes me question if there is > any point checking !state->extension_linenr in the if() statement, > though. It makes sure that ->extension_linenr is set to the first line number of an extension (like, say, "copy from") and not advanced again (e.g. when we hit "copy to", or more importantly some line like "similarity index" which does not set one of the extension bits and thus can't actually cause a conflict). Hmm, pondering that, it seems I forgot to reset its value after each patch. Or better just move it into struct patch, next to the extension bits: -- >8 -- Subject: fixup! apply: check git diffs for mutually exclusive header lines --- apply.c | 7 --- apply.h | 1 - 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/apply.c b/apply.c index db38bc3cdd..c442b89328 100644 --- a/apply.c +++ b/apply.c @@ -211,6 +211,7 @@ struct patch { unsigned ws_rule; int lines_added, lines_deleted; int score; + int extension_linenr; /* first line specifying delete/new/rename/copy */ unsigned int is_toplevel_relative:1; unsigned int inaccurate_eof:1; unsigned int is_binary:1; @@ -1325,9 +1326,9 @@ static int check_header_line(struct apply_state *state, struct patch *patch) (patch->is_rename == 1) + (patch->is_copy == 1); if (extensions > 1) return error(_("inconsistent header lines %d and %d"), -state->extension_linenr, state->linenr); - if (extensions && !state->extension_linenr) - state->extension_linenr = state->linenr; +patch->extension_linenr, state->linenr); + if (extensions && !patch->extension_linenr) + patch->extension_linenr = state->linenr; return 0; } diff --git a/apply.h b/apply.h index b52078b486..b3d6783d55 100644 --- a/apply.h +++ b/apply.h @@ -79,7 +79,6 @@ struct apply_state { /* Various "current state" */ int linenr; /* current line number */ - int extension_linenr; /* first line specifying delete/new/rename/copy */ struct string_list symlink_changes; /* we have to track symlinks */ /* -- 2.13.2
Re: [PATCH 2/2] apply: handle assertion failure gracefully
Am 25.02.2017 um 11:13 schrieb Vegard Nossum: > For the patches in the added testcases, we were crashing with: > > git-apply: apply.c:3665: check_preimage: Assertion `patch->is_new <= 0' > failed. > diff --git a/t/t4154-apply-git-header.sh b/t/t4154-apply-git-header.sh > index d651af4a2..c440c48ad 100755 > --- a/t/t4154-apply-git-header.sh > +++ b/t/t4154-apply-git-header.sh > @@ -12,4 +12,40 @@ rename new 0 > EOF > ' > > +test_expect_success 'apply deleted file mode / new file mode / wrong mode' ' > + test_must_fail git apply << EOF > +diff --git a/. b/. > +deleted file mode > +new file mode > +EOF > +' -- >8 -- Subject: [PATCH] apply: check git diffs for invalid file modes An empty string as mode specification is accepted silently by git apply, as Vegard Nossum found out using AFL. It's interpreted as zero. Reject such bogus file modes, and only accept ones consisting exclusively of octal digits. Reported-by: Vegard NossumSigned-off-by: Rene Scharfe --- apply.c | 17 - t/t4129-apply-samemode.sh | 16 +++- 2 files changed, 27 insertions(+), 6 deletions(-) diff --git a/apply.c b/apply.c index 8a5e44c474..db38bc3cdd 100644 --- a/apply.c +++ b/apply.c @@ -1001,20 +1001,27 @@ static int gitdiff_newname(struct apply_state *state, DIFF_NEW_NAME); } +static int parse_mode_line(const char *line, int linenr, unsigned int *mode) +{ + char *end; + *mode = strtoul(line, , 8); + if (end == line || !isspace(*end)) + return error(_("invalid mode on line %d: %s"), linenr, line); + return 0; +} + static int gitdiff_oldmode(struct apply_state *state, const char *line, struct patch *patch) { - patch->old_mode = strtoul(line, NULL, 8); - return 0; + return parse_mode_line(line, state->linenr, >old_mode); } static int gitdiff_newmode(struct apply_state *state, const char *line, struct patch *patch) { - patch->new_mode = strtoul(line, NULL, 8); - return 0; + return parse_mode_line(line, state->linenr, >new_mode); } static int gitdiff_delete(struct apply_state *state, @@ -1128,7 +1135,7 @@ static int gitdiff_index(struct apply_state *state, memcpy(patch->new_sha1_prefix, line, len); patch->new_sha1_prefix[len] = 0; if (*ptr == ' ') - patch->old_mode = strtoul(ptr+1, NULL, 8); + return gitdiff_oldmode(state, ptr + 1, patch); return 0; } diff --git a/t/t4129-apply-samemode.sh b/t/t4129-apply-samemode.sh index c268298eaf..5cdd76dfa7 100755 --- a/t/t4129-apply-samemode.sh +++ b/t/t4129-apply-samemode.sh @@ -13,7 +13,9 @@ test_expect_success setup ' echo modified >file && git diff --stat -p >patch-0.txt && chmod +x file && - git diff --stat -p >patch-1.txt + git diff --stat -p >patch-1.txt && + sed "s/^\(new mode \).*/\1/" patch-empty-mode.txt && + sed "s/^\(new mode \).*/\1garbage/" patch-bogus-mode.txt ' test_expect_success FILEMODE 'same mode (no index)' ' @@ -59,4 +61,16 @@ test_expect_success FILEMODE 'mode update (index only)' ' git ls-files -s file | grep "^100755" ' +test_expect_success FILEMODE 'empty mode is rejected' ' + git reset --hard && + test_must_fail git apply patch-empty-mode.txt 2>err && + test_i18ngrep "invalid mode" err +' + +test_expect_success FILEMODE 'bogus mode is rejected' ' + git reset --hard && + test_must_fail git apply patch-bogus-mode.txt 2>err && + test_i18ngrep "invalid mode" err +' + test_done -- 2.13.2
Re: [PATCH 1/2] apply: guard against renames of non-existant empty files
Am 27.02.2017 um 23:18 schrieb René Scharfe: > Am 27.02.2017 um 21:10 schrieb Junio C Hamano: >> René Scharfe <l@web.de> writes: >> >>> Would it make sense to mirror the previously existing condition and >>> check for is_new instead? I.e.: >>> >>> if ((!patch->is_delete && !patch->new_name) || >>> (!patch->is_new&& !patch->old_name)) { >>> >> >> Yes, probably. So let's actually do it! -- >8 -- Subject: [PATCH] apply: check git diffs for missing old filenames 2c93286a (fix "git apply --index ..." not to deref NULL) added a check for git patches missing a +++ line, preventing a segfault. Check for missing --- lines as well, and add a test for each case. Found by Vegard Nossum using AFL. Original-patch-by: Vegard Nossum <vegard.nos...@oracle.com> Signed-off-by: Rene Scharfe <l@web.de> --- apply.c| 3 ++- t/t4133-apply-filenames.sh | 24 2 files changed, 26 insertions(+), 1 deletion(-) diff --git a/apply.c b/apply.c index b963d7d8fb..8cd6435c74 100644 --- a/apply.c +++ b/apply.c @@ -1575,7 +1575,8 @@ static int find_header(struct apply_state *state, patch->old_name = xstrdup(patch->def_name); patch->new_name = xstrdup(patch->def_name); } - if (!patch->is_delete && !patch->new_name) { + if ((!patch->new_name && !patch->is_delete) || + (!patch->old_name && !patch->is_new)) { error(_("git diff header lacks filename information " "(line %d)"), state->linenr); return -128; diff --git a/t/t4133-apply-filenames.sh b/t/t4133-apply-filenames.sh index 2ecb4216b7..c5ed3b17c4 100755 --- a/t/t4133-apply-filenames.sh +++ b/t/t4133-apply-filenames.sh @@ -35,4 +35,28 @@ test_expect_success 'apply diff with inconsistent filenames in headers' ' test_i18ngrep "inconsistent old filename" err ' +test_expect_success 'apply diff with new filename missing from headers' ' + cat >missing_new_filename.diff <<-\EOF && + diff --git a/f b/f + index 000..d00491f + --- a/f + @@ -0,0 +1 @@ + +1 + EOF + test_must_fail git apply missing_new_filename.diff 2>err && + test_i18ngrep "lacks filename information" err +' + +test_expect_success 'apply diff with old filename missing from headers' ' + cat >missing_old_filename.diff <<-\EOF && + diff --git a/f b/f + index d00491f..000 + +++ b/f + @@ -1 +0,0 @@ + -1 + EOF + test_must_fail git apply missing_old_filename.diff 2>err && + test_i18ngrep "lacks filename information" err +' + test_done -- 2.13.2
Re: [PATCH 2/2] apply: handle assertion failure gracefully
Am 28.02.2017 um 11:50 schrieb René Scharfe: > Am 27.02.2017 um 23:33 schrieb Junio C Hamano: >> René Scharfe <l@web.de> writes: >> >>> Am 27.02.2017 um 21:04 schrieb Junio C Hamano: >>>> René Scharfe <l@web.de> writes: >>>> >>>>>> diff --git a/apply.c b/apply.c >>>>>> index cbf7cc7f2..9219d2737 100644 >>>>>> --- a/apply.c >>>>>> +++ b/apply.c >>>>>> @@ -3652,7 +3652,6 @@ static int check_preimage(struct apply_state >>>>>> *state, >>>>>> if (!old_name) >>>>>> return 0; >>>>>> >>>>>> -assert(patch->is_new <= 0); >>>>> >>>>> 5c47f4c6 (builtin-apply: accept patch to an empty file) added that >>>>> line. Its intent was to handle diffs that contain an old name even for >>>>> a file that's created. Citing from its commit message: "When we >>>>> cannot be sure by parsing the patch that it is not a creation patch, >>>>> we shouldn't complain when if there is no such a file." Why not stop >>>>> complaining also in case we happen to know for sure that it's a >>>>> creation patch? I.e., why not replace the assert() with: >>>>> >>>>> if (patch->is_new == 1) >>>>> goto is_new; >>>>> >>>>>> previous = previous_patch(state, patch, ); >>>> >>>> When the caller does know is_new is true, old_name must be made/left >>>> NULL. That is the invariant this assert is checking to catch an >>>> error in the calling code. >>> >>> There are some places in apply.c that set ->is_new to 1, but none of >>> them set ->old_name to NULL at the same time. >> >> I thought all of these are flipping ->is_new that used to be -1 >> (unknown) to (now we know it is new), and sets only new_name without >> doing anything to old_name, because they know originally both names >> are set to NULL. >> >>> Having to keep these two members in sync sounds iffy anyway. Perhaps >>> accessors can help, e.g. a setter which frees old_name when is_new is >>> set to 1, or a getter which returns NULL for old_name if is_new is 1. >> >> Definitely, the setter would make it harder to make the mistake. > > When I added setters, apply started to passed NULL to unlink(2) and > rmdir(2) in some of the new tests, which still failed. > > That's because three of the diffs trigger both gitdiff_delete(), which > sets is_delete and old_name, and gitdiff_newfile(), which sets is_new > and new_name. Create and delete equals move, right? Or should we > error out at this point already? > > The last new diff adds a new file that is copied. Sounds impossible. > How about something like this, which forbids combinations that make no > sense. Hope it's not too strict; at least all tests succeed. > > --- > apply.c | 79 > ++--- > 1 file changed, 61 insertions(+), 18 deletions(-) Thought a bit more about it, and as a result here's a simpler approach: -- >8 -- Subject: [PATCH] apply: check git diffs for mutually exclusive header lines A file can either be added, removed, copied, or renamed, but no two of these actions can be done by the same patch. Some of these combinations provoke error messages due to missing file names, and some are only caught by an assertion. Check git patches already as they are parsed and report conflicting lines on sight. Found by Vegard Nossum using AFL. Reported-by: Vegard Nossum <vegard.nos...@oracle.com> Signed-off-by: Rene Scharfe <l@web.de> --- apply.c| 14 ++ apply.h| 1 + t/t4136-apply-check.sh | 18 ++ 3 files changed, 33 insertions(+) diff --git a/apply.c b/apply.c index 8cd6435c74..8a5e44c474 100644 --- a/apply.c +++ b/apply.c @@ -1312,6 +1312,18 @@ static char *git_header_name(struct apply_state *state, } } +static int check_header_line(struct apply_state *state, struct patch *patch) +{ + int extensions = (patch->is_delete == 1) + (patch->is_new == 1) + +(patch->is_rename == 1) + (patch->is_copy == 1); + if (extensions > 1) + return error(_("inconsistent header lines %d and %d"), +state->extension_linenr, state->linenr); + if (extensions && !state->extension_linenr) + state->extension_linenr = state->linenr; + re
Re: [PATCH v4 4/6] coccinelle: add a rule to make "expression" code use FREE_AND_NULL()
Am 16.06.2017 um 21:43 schrieb Junio C Hamano: > Ævar Arnfjörð Bjarmasonwrites: > >> A follow-up to the existing "type" rule added in an earlier >> change. This catches some occurrences that are missed by the previous >> rule. >> >> Signed-off-by: Ævar Arnfjörð Bjarmason >> --- > > Hmph, I wonder if the "type" thing is really needed. Over there, > "ptr" is an expression and we can find "free(ptr); ptr = NULL" with > the rule in this patch already, no? Indeed. How about this on top of master? -- >8 -- Subject: [PATCH] coccinelle: polish FREE_AND_NULL rules There are two rules for using FREE_AND_NULL in free.cocci, one for pointer types and one for expressions. Both cause coccinelle to remove empty lines and even newline characters between replacements for some reason; consecutive "free(x);/x=NULL;" sequences end up as multiple FREE_AND_NULL calls on the same time. Remove the type rule, as the expression rule already covers it, and rearrange the lines of the latter to place the addition of FREE_AND_NULL between the removals, which causes coccinelle to leave surrounding whitespace untouched. Signed-off-by: Rene Scharfe --- contrib/coccinelle/free.cocci | 10 +- 1 file changed, 1 insertion(+), 9 deletions(-) diff --git a/contrib/coccinelle/free.cocci b/contrib/coccinelle/free.cocci index f2d97e755b..4490069df9 100644 --- a/contrib/coccinelle/free.cocci +++ b/contrib/coccinelle/free.cocci @@ -11,16 +11,8 @@ expression E; free(E); @@ -type T; -T *ptr; -@@ -- free(ptr); -- ptr = NULL; -+ FREE_AND_NULL(ptr); - -@@ expression E; @@ - free(E); -- E = NULL; + FREE_AND_NULL(E); +- E = NULL; -- 2.13.2
Re: [PATCH] sha1_name: cache readdir(3) results in find_short_object_filename()
Am 24.06.2017 um 14:20 schrieb Jeff King: > On Sat, Jun 24, 2017 at 02:12:30PM +0200, René Scharfe wrote: > >>> That's redundant with "subdir_nr". Should we push that logic down into >>> the function, and basically do: >>> >>> if (subdir_nr < 0 || subdir_nr > 256) >>> BUG("object subdir number out of range"); >> >> Hmm. I don't expect many more callers, so do we really need this safety >> check? It's cheap compared to the readdir(3) call, of course. > > To me it's as much about documenting the assumptions as it is about > catching buggy callers. I'd be OK with a comment, too. I didn't catch the off-by-one error above in the first reading. Did you add it intentionally? ;-) In any case, I'm convinced now that we need such a check, but with hexadecimal notation to avoid such mistakes. >> But wouldn't it make more sense to use an unsigned data type to avoid >> the first half? And an unsigned char specifically to only accept valid >> values, leaving the overflow concern to the caller? It warrants a >> separate patch in any case IMHO. > > Using unsigned makes sense. Using "char" because you know it only goes > to 256 is a bit too subtle for my taste. And yes, I'd do it in a > preparatory patch (or follow-on if you prefer). It's subtle on the caller's side, as big numbers would just wrap if the programmer ignored the limits of the type. On the callee's side it's fundamental, though. Anyway, here's a patch on top of the others. -- >8 -- Subject: [PATCH] sha1_file: guard against invalid loose subdirectory numbers Loose object subdirectories have hexadecimal names based on the first byte of the hash of contained objects, thus their numerical representation can range from 0 (0x00) to 255 (0xff). Change the type of the corresponding variable in for_each_file_in_obj_subdir() and associated callback functions to unsigned int and add a range check. Suggested-by: Jeff King <p...@peff.net> Signed-off-by: Rene Scharfe <l@web.de> --- builtin/fsck.c | 2 +- builtin/prune-packed.c | 2 +- builtin/prune.c| 2 +- cache.h| 4 ++-- sha1_file.c| 5 - 5 files changed, 9 insertions(+), 6 deletions(-) diff --git a/builtin/fsck.c b/builtin/fsck.c index 3a2c27f241..5d113f8bc0 100644 --- a/builtin/fsck.c +++ b/builtin/fsck.c @@ -536,7 +536,7 @@ static int fsck_cruft(const char *basename, const char *path, void *data) return 0; } -static int fsck_subdir(int nr, const char *path, void *progress) +static int fsck_subdir(unsigned int nr, const char *path, void *progress) { display_progress(progress, nr + 1); return 0; diff --git a/builtin/prune-packed.c b/builtin/prune-packed.c index c026299e78..ac978ad401 100644 --- a/builtin/prune-packed.c +++ b/builtin/prune-packed.c @@ -10,7 +10,7 @@ static const char * const prune_packed_usage[] = { static struct progress *progress; -static int prune_subdir(int nr, const char *path, void *data) +static int prune_subdir(unsigned int nr, const char *path, void *data) { int *opts = data; display_progress(progress, nr + 1); diff --git a/builtin/prune.c b/builtin/prune.c index f0e2bff04c..c378690545 100644 --- a/builtin/prune.c +++ b/builtin/prune.c @@ -68,7 +68,7 @@ static int prune_cruft(const char *basename, const char *path, void *data) return 0; } -static int prune_subdir(int nr, const char *path, void *data) +static int prune_subdir(unsigned int nr, const char *path, void *data) { if (!show_only) rmdir(path); diff --git a/cache.h b/cache.h index 00a017dfcb..04dd2961ad 100644 --- a/cache.h +++ b/cache.h @@ -1819,10 +1819,10 @@ typedef int each_loose_object_fn(const struct object_id *oid, typedef int each_loose_cruft_fn(const char *basename, const char *path, void *data); -typedef int each_loose_subdir_fn(int nr, +typedef int each_loose_subdir_fn(unsigned int nr, const char *path, void *data); -int for_each_file_in_obj_subdir(int subdir_nr, +int for_each_file_in_obj_subdir(unsigned int subdir_nr, struct strbuf *path, each_loose_object_fn obj_cb, each_loose_cruft_fn cruft_cb, diff --git a/sha1_file.c b/sha1_file.c index 98ce85acf9..90b9414170 100644 --- a/sha1_file.c +++ b/sha1_file.c @@ -3735,7 +3735,7 @@ void assert_sha1_type(const unsigned char *sha1, enum object_type expect) typename(expect)); } -int for_each_file_in_obj_subdir(int subdir_nr, +int for_each_file_in_obj_subdir(unsigned int subdir_nr, struct strbuf *path, each_loose_object_fn obj_cb,
Re: [PATCH v5 2/2] strbuf: change an always NULL/"" strbuf_addftime() param to bool
Am 24.06.2017 um 14:14 schrieb Ævar Arnfjörð Bjarmason: Change the code for deciding what's to be done about %Z to stop passing always either a NULL or "" char * to strbuf_addftime(). Instead pass a boolean int to indicate whether the strftime() %Z format should be suppressed by converting it to an empty string, which is what this code is actually doing. This code grew organically between the changes in 9eafe86d58 ("Merge branch 'rs/strbuf-addftime-zZ'", 2017-06-22). The intent was to use this API in the future to pass a custom leave the door open to pass a custom timezone name to the function (see my [1] and related messages). "leave the door open to pass a" seems redundant. But that's not what this code does now, and this strbuf_addstr() call always being redundant makes it hard to understand the current functionality. So simplify this internal API to match its use, we can always change it in the future if it gets a different use-case. I don't understand the confusion, but of course I'm biased. And I don't like binary parameters in general and would use named flags or two function names in most cases. But that aside I find the description hard to follow (perhaps I should do something about my attention span). Here's an attempt at a commit message that would have be easier to understand for me: strbuf_addstr() allows callers to pass a time zone name for expanding %Z. The only current caller either passes the empty string or NULL, in which case %Z is handed over verbatim to strftime(3). Replace that string parameter with a flag controlling whether to remove %Z from the format specification. This simplifies the code. René
Re: [PATCH] sha1_name: cache readdir(3) results in find_short_object_filename()
Am 23.06.2017 um 01:10 schrieb Jeff King: > On Thu, Jun 22, 2017 at 08:19:48PM +0200, René Scharfe wrote: >> @@ -1811,6 +1822,12 @@ typedef int each_loose_cruft_fn(const char *basename, >> typedef int each_loose_subdir_fn(int nr, >> const char *path, >> void *data); >> +int for_each_file_in_obj_subdir(int subdir_nr, >> +struct strbuf *path, >> +each_loose_object_fn obj_cb, >> +each_loose_cruft_fn cruft_cb, >> +each_loose_subdir_fn subdir_cb, >> +void *data); > > Now that this is becoming public, I think we need to document what > should be in "path" here. It is the complete path, including the 2-hex > subdir. > > That's redundant with "subdir_nr". Should we push that logic down into > the function, and basically do: > >if (subdir_nr < 0 || subdir_nr > 256) > BUG("object subdir number out of range"); Hmm. I don't expect many more callers, so do we really need this safety check? It's cheap compared to the readdir(3) call, of course. But wouldn't it make more sense to use an unsigned data type to avoid the first half? And an unsigned char specifically to only accept valid values, leaving the overflow concern to the caller? It warrants a separate patch in any case IMHO. >orig_len = path->len; >strbuf_addf(path, "/%02x", subdir_nr); >baselen = path->len; > >... > >strbuf_setlen(path, orig_len); > > That's one less thing for the caller to worry about, and it's easy to > explain the argument as "the path to the object directory root". > >> +if (!alt->loose_objects_subdir_seen[subdir_nr]) { >> +struct strbuf *buf = alt_scratch_buf(alt); >> +strbuf_addf(buf, "%02x/", subdir_nr); >> +for_each_file_in_obj_subdir(subdir_nr, buf, >> +append_loose_object, >> +NULL, NULL, >> +>loose_objects_cache); > > I think the trailing slash here is superfluous. If you take my > suggestion above, that line goes away. But then the front slash it adds > becomes superfluous. It should probably just do: > >strbuf_complete(path, '/'); >strbuf_addf(path, "%02x", subdir_nr); So how about this then as a follow-up patch? -- >8 -- Subject: [PATCH] sha1_file: let for_each_file_in_obj_subdir() handle subdir names The function for_each_file_in_obj_subdir() takes a object subdirectory number and expects the name of the same subdirectory to be included in the path strbuf. Avoid this redundancy by letting the function append the hexadecimal subdirectory name itself. This makes it a bit easier and safer to use the function -- it becomes impossible to specify different subdirectories in subdir_nr and path. Suggested-by: Jeff King <p...@peff.net> Signed-off-by: Rene Scharfe <l@web.de> --- sha1_file.c | 22 ++ sha1_name.c | 1 - 2 files changed, 14 insertions(+), 9 deletions(-) diff --git a/sha1_file.c b/sha1_file.c index 5e0ee2b68b..98ce85acf9 100644 --- a/sha1_file.c +++ b/sha1_file.c @@ -3742,15 +3742,22 @@ int for_each_file_in_obj_subdir(int subdir_nr, each_loose_subdir_fn subdir_cb, void *data) { - size_t baselen = path->len; - DIR *dir = opendir(path->buf); + size_t origlen, baselen; + DIR *dir; struct dirent *de; int r = 0; + origlen = path->len; + strbuf_complete(path, '/'); + strbuf_addf(path, "%02x", subdir_nr); + baselen = path->len; + + dir = opendir(path->buf); if (!dir) { - if (errno == ENOENT) - return 0; - return error_errno("unable to open %s", path->buf); + if (errno != ENOENT) + r = error_errno("unable to open %s", path->buf); + strbuf_setlen(path, origlen); + return r; } while ((de = readdir(dir))) { @@ -3788,6 +3795,8 @@ int for_each_file_in_obj_subdir(int subdir_nr, if (!r && subdir_cb) r = subdir_cb(subdir_nr, path->buf, data); + strbuf_setlen(path, origlen); + return r; } @@ -3797,15 +3806,12 @@ int for_each_loose_file_in_objdir_buf(struct strbuf *path, each_loose_subdir_fn subdir_cb, void *data) { - s
Re: [PATCH] sha1_name: cache readdir(3) results in find_short_object_filename()
Am 23.06.2017 um 01:10 schrieb Jeff King: > On Thu, Jun 22, 2017 at 08:19:48PM +0200, René Scharfe wrote: > >> Read each loose object subdirectory at most once when looking for unique >> abbreviated hashes. This speeds up commands like "git log --pretty=%h" >> considerably, which previously caused one readdir(3) call for each >> candidate, even for subdirectories that were visited before. > > Is it worth adding a perf test that's heavy on abbreviations? Sure. Here's a simple one. It currently reports for me: Testorigin/master origin/next origin/pu - 4205.1: log with %H 0.44(0.41+0.02) 0.43(0.42+0.01) -2.3% 0.43(0.43+0.00) -2.3% 4205.2: log with %h 1.03(0.60+0.42) 1.04(0.64+0.39) +1.0% 0.57(0.55+0.01) -44.7% 4205.3: log with %T 0.43(0.42+0.00) 0.43(0.42+0.01) +0.0% 0.43(0.40+0.02) +0.0% 4205.4: log with %t 1.05(0.64+0.38) 1.05(0.61+0.42) +0.0% 0.59(0.58+0.00) -43.8% 4205.5: log with %P 0.45(0.42+0.02) 0.43(0.42+0.00) -4.4% 0.43(0.42+0.01) -4.4% 4205.6: log with %p 1.21(0.63+0.57) 1.17(0.68+0.47) -3.3% 0.59(0.58+0.00) -51.2% 4205.7: log with %h-%h-%h 1.05(0.64+0.39) 2.00(0.83+1.15) +90.5% 0.69(0.66+0.02) -34.3% origin/next has fe9e2aefd4 (pretty: recalculate duplicate short hashes), while origin/pu has cc817ca3ef (sha1_name: cache readdir(3) results in find_short_object_filename()). -- >8 -- Subject: [PATCH] p4205: add perf test script for pretty log formats Add simple performance tests for expanded log format placeholders. Suggested-by: Jeff King <p...@peff.net> Signed-off-by: Rene Scharfe <l@web.de> --- t/perf/p4205-log-pretty-formats.sh | 16 1 file changed, 16 insertions(+) create mode 100755 t/perf/p4205-log-pretty-formats.sh diff --git a/t/perf/p4205-log-pretty-formats.sh b/t/perf/p4205-log-pretty-formats.sh new file mode 100755 index 00..7c26f4f337 --- /dev/null +++ b/t/perf/p4205-log-pretty-formats.sh @@ -0,0 +1,16 @@ +#!/bin/sh + +test_description='Tests the performance of various pretty format placeholders' + +. ./perf-lib.sh + +test_perf_default_repo + +for format in %H %h %T %t %P %p %h-%h-%h +do + test_perf "log with $format" " + git log --format=\"$format\" >/dev/null + " +done + +test_done -- 2.13.1
Re: [PATCH] xdiff: trim common tail with -U0 after diff
Am 23.06.2017 um 12:36 schrieb Daniel Hahler: When -U0 is used, trim_common_tail should be called after `xdl_diff`, so that `--indent-heuristic` (and other diff processing) works as expected. It also removes the check for `!(xecfg->flags & XDL_EMIT_FUNCCONTEXT)` added in e0876bca4, which does not appear to be necessary anymore after moving the trimming down. This only adds a test to t4061-diff-indent.sh, but should also have one for normal (i.e. non-experimental) diff mode probably?! Ref: https://github.com/tomtom/quickfixsigns_vim/issues/74#issue-237900460 --- t/t4061-diff-indent.sh | 15 +++ xdiff-interface.c | 7 --- 2 files changed, 19 insertions(+), 3 deletions(-) diff --git a/t/t4061-diff-indent.sh b/t/t4061-diff-indent.sh index 2affd7a10..df3151393 100755 --- a/t/t4061-diff-indent.sh +++ b/t/t4061-diff-indent.sh @@ -116,6 +116,16 @@ test_expect_success 'prepare' ' 4 EOF + cat <<-\EOF >spaces-compacted-U0-expect && + diff --git a/spaces.txt b/spaces.txt + --- a/spaces.txt + +++ b/spaces.txt + @@ -4,0 +5,3 @@ a + +b + +a + + + EOF + tr "_" " " <<-\EOF >functions-expect && diff --git a/functions.c b/functions.c --- a/functions.c @@ -184,6 +194,11 @@ test_expect_success 'diff: --indent-heuristic with --histogram' ' compare_diff spaces-compacted-expect out-compacted4 ' +test_expect_success 'diff: --indent-heuristic with -U0' ' + git diff -U0 --indent-heuristic old new -- spaces.txt >out-compacted5 && + compare_diff spaces-compacted-U0-expect out-compacted5 +' + test_expect_success 'diff: ugly functions' ' git diff --no-indent-heuristic old new -- functions.c >out && compare_diff functions-expect out The changed test script passes just fine for me even without your change to xdiff-interface.c, which is odd. Do you have another way to demonstrate the unexpected behavior? And can someone replicate the failure? René
Re: [PATCH] strbuf: change an always NULL/"" strbuf_addftime() param to bool
Am 23.06.2017 um 17:23 schrieb Jeff King: On Fri, Jun 23, 2017 at 05:13:38PM +0200, Ævar Arnfjörð Bjarmason wrote: The idea was that eventually the caller might be able to come up with a TZ that is not blank, but is also not what strftime("%Z") would produce. Conceivably that could be done if Git commits carried the "%Z" information (not likely), or if we used a reverse-lookup table (also not likely). This closes the door on that. Since we don't have immediate plans to go that route, I'm OK with this patch. It would be easy enough to re-open the door if we change our minds later. Closes the door on doing that via passing the char * of the prepared custom tz_name to strbuf_addftime(). I have a WIP patch (which may not make it on-list, depending) playing with the idea I proposed in CACBZZX5OQc45fUyDVayE89rkT=+8m5s4efsxcabcy7upme5...@mail.gmail.com which just inserts the custom TZ name based on the offset inside that `if (omit_strftime_tz_name)` branch. OK. I'd assumed that would all happen outside of strbuf_addftime(). But if it happens inside, then I agree a flag is better. Oh, so the interface that was meant to allow better time zone names without having to make strbuf_addftime() even bigger than it already is turns out to be too ugly for its purpose? I'm sorry. :( René
Re: [PATCH] strbuf: change an always NULL/"" strbuf_addftime() param to bool
Am 23.06.2017 um 17:25 schrieb Jeff King: On Fri, Jun 23, 2017 at 05:20:19PM +0200, René Scharfe wrote: diff --git a/strbuf.c b/strbuf.c index be3b9e37b1..81ff3570e2 100644 --- a/strbuf.c +++ b/strbuf.c @@ -786,7 +786,7 @@ char *xstrfmt(const char *fmt, ...) } void strbuf_addftime(struct strbuf *sb, const char *fmt, const struct tm *tm, -int tz_offset, const char *tz_name) +int tz_offset, const int omit_strftime_tz_name) Why const? And as written above, naming the parameter local would make it easier to understand instead of exposing an implementation detail in the interface. I think calling it "local" isn't right. That's a decision the _caller_ is making about whether to pass through %Z. But the actual implementation is more like "should the function fill tzname based on tz?" So some name along those lines would make sense. In which case the caller would then pass "!mode->local" for the flag. We only have a single caller currently, so responsibilities can still be shifted, and it's a bit hard to draw the line. "Here's a format and all time information I have, expand!" is just as viable as "here's a format and most time information, expand, and handle %Z in this particular way when you see it!", I think. René
Re: [PATCH] strbuf: change an always NULL/"" strbuf_addftime() param to bool
Am 23.06.2017 um 16:46 schrieb Ævar Arnfjörð Bjarmason: Change the code for deciding what's to be done about %Z to stop passing always either a NULL or "" char * to strbuf_addftime(). Instead pass a boolean int to indicate whether the strftime() %Z format should be omitted, which is what this code is actually doing. "Omitting" sounds not quite right somehow. We expand %Z to the empty string because that's the best we can do -- which amounts to a removal, but that's not the intent, just an implementation detail. Calling it "handling %Z internally" would be better, I think. This code grew organically between the changes in 9eafe86d58 ("Merge branch 'rs/strbuf-addftime-zZ'", 2017-06-22) yielding an end result that wasn't very readable. Out of context it looked as though the call to strbuf_addstr() might be adding a custom tz_name to the string, but actually tz_name would always be "", so the call to strbuf_addstr() just to add an empty string to the format was pointless. Signed-off-by: Ævar Arnfjörð Bjarmason--- date.c | 2 +- strbuf.c | 5 ++--- strbuf.h | 5 +++-- 3 files changed, 6 insertions(+), 6 deletions(-) diff --git a/date.c b/date.c index 1fd6d66375..5f09743bad 100644 --- a/date.c +++ b/date.c @@ -256,7 +256,7 @@ const char *show_date(timestamp_t time, int tz, const struct date_mode *mode) tm->tm_hour, tm->tm_min, tm->tm_sec, tz); else if (mode->type == DATE_STRFTIME) strbuf_addftime(, mode->strftime_fmt, tm, tz, - mode->local ? NULL : ""); + mode->local ? 0 : 1); I don't see how this is more readable -- both look about equally ugly to me. Passing mode->local unchanged would be better. else strbuf_addf(, "%.3s %.3s %d %02d:%02d:%02d %d%c%+05d", weekday_names[tm->tm_wday], diff --git a/strbuf.c b/strbuf.c index be3b9e37b1..81ff3570e2 100644 --- a/strbuf.c +++ b/strbuf.c @@ -786,7 +786,7 @@ char *xstrfmt(const char *fmt, ...) } void strbuf_addftime(struct strbuf *sb, const char *fmt, const struct tm *tm, -int tz_offset, const char *tz_name) +int tz_offset, const int omit_strftime_tz_name) Why const? And as written above, naming the parameter local would make it easier to understand instead of exposing an implementation detail in the interface. { struct strbuf munged_fmt = STRBUF_INIT; size_t hint = 128; @@ -815,8 +815,7 @@ void strbuf_addftime(struct strbuf *sb, const char *fmt, const struct tm *tm, fmt++; break; case 'Z': - if (tz_name) { - strbuf_addstr(_fmt, tz_name); + if (omit_strftime_tz_name) { Getting rid of this strbuf_addstr call is nice, but as Peff mentioned in his reply it also reduces the flexibility of the function. While it's unlikely to be needed I'm not convinced that we should already block this path (even though it could be easily reopened). fmt++; break; } diff --git a/strbuf.h b/strbuf.h index 4559035c47..bad698058a 100644 --- a/strbuf.h +++ b/strbuf.h @@ -340,14 +340,15 @@ extern void strbuf_vaddf(struct strbuf *sb, const char *fmt, va_list ap); /** * Add the time specified by `tm`, as formatted by `strftime`. - * `tz_name` is used to expand %Z internally unless it's NULL. * `tz_offset` is in decimal hhmm format, e.g. -600 means six hours west * of Greenwich, and it's used to expand %z internally. However, tokens * with modifiers (e.g. %Ez) are passed to `strftime`. + * `omit_strftime_tz_name` when set, means don't let `strftime` format + * %Z, instead do our own formatting. */ extern void strbuf_addftime(struct strbuf *sb, const char *fmt, const struct tm *tm, int tz_offset, - const char *tz_name); + const int omit_strftime_tz_name); /** * Read a given size of data from a FILE* pointer to the buffer.
Re: [BUG] add_again() off-by-one error in custom format
Am 18.06.2017 um 15:56 schrieb Jeff King: On Sun, Jun 18, 2017 at 02:59:04PM +0200, René Scharfe wrote: @@ -1586,6 +1587,9 @@ extern struct alternate_object_database { struct strbuf scratch; size_t base_len; + uint32_t loose_objects_subdir_bitmap[8]; Is it worth the complexity of having an actual bitmap and not just an array of char? I guess it's not _that_ complex to access the bits, but there are a lot of magic numbers involved. That would be 224 bytes more per alternate_object_database, and we'd gain simpler code. Hmm. We could add some bitmap helper macros, but you're probably right that the first version should use the simplest form for representing small bitmaps that we currently have. I'd be OK with keeping it if we could reduce the number of magic numbers. E.g,. rather than 32 elsewhere use: (sizeof(*loose_objects_subdir_bitmap) * CHAR_BIT) We have a bitsizeof macro for that. and similarly rather than 8 here use 256 / sizeof(*loose_objects_subdir_bitmap) / CHAR_BIT If we're pretending not to know the number of bits in a byte then we need to round up, and we have DIV_ROUND_UP for that. :) There's also already a bitmap data structure in ewah/bitmap.c. It's a little bit overkill, though, because it mallocs and will grow the bitmap as needed. Yes, I feel that's too big a hammer for this purpose. There is another example of a bitmap in shallow_c (just search for "32"). It would benefit from the macros mentioned above. That might make it easier to switch to the native word size (unsigned int) instead of using uint32_t everywhere. But perhaps this one is actually a candidate for using EWAH, depending on the number of refs the code is supposed to handle. +static void read_loose_object_subdir(struct alternate_object_database *alt, +int subdir_nr) I think it's nice to pull this out into a helper function. I do wonder if it should just be reusing for_each_loose_file_in_objdir(). You'd just need to expose the in_obj_subdir() variant publicly. It does do slightly more than we need (for the callbacks it actually builds the filename), but I doubt memcpy()ing a few bytes would be measurable. Good point. The function also copies the common first two hex digits for each entry. But all that extra work is certainly dwarfed by the readdir calls. Yes. You're welcome to micro-optimize that implementation if you want. ;) My knee-jerk reaction was that this would lead to ugliness, but on second look it might actually be doable. Will check, but I don't expect much of a speedup. René
[PATCH] sha1_name: cache readdir(3) results in find_short_object_filename()
Read each loose object subdirectory at most once when looking for unique abbreviated hashes. This speeds up commands like "git log --pretty=%h" considerably, which previously caused one readdir(3) call for each candidate, even for subdirectories that were visited before. The new cache is kept until the program ends and never invalidated. The same is already true for pack indexes. The inherent racy nature of finding unique short hashes makes it still fit for this purpose -- a conflicting new object may be added at any time. Tasks with higher consistency requirements should not use it, though. The cached object names are stored in an oid_array, which is quite compact. The bitmap for remembering which subdir was already read is stored as a char array, with one char per directory -- that's not quite as compact, but really simple and incurs only an overhead equivalent to 11 hashes after all. Suggested-by: Jeff KingHelped-by: Jeff King Signed-off-by: Rene Scharfe --- cache.h | 17 + sha1_file.c | 12 ++-- sha1_name.c | 50 ++ 3 files changed, 53 insertions(+), 26 deletions(-) diff --git a/cache.h b/cache.h index d6ba8a2f11..00a017dfcb 100644 --- a/cache.h +++ b/cache.h @@ -11,6 +11,7 @@ #include "string-list.h" #include "pack-revindex.h" #include "hash.h" +#include "sha1-array.h" #ifndef platform_SHA_CTX /* @@ -1593,6 +1594,16 @@ extern struct alternate_object_database { struct strbuf scratch; size_t base_len; + /* +* Used to store the results of readdir(3) calls when searching +* for unique abbreviated hashes. This cache is never +* invalidated, thus it's racy and not necessarily accurate. +* That's fine for its purpose; don't use it for tasks requiring +* greater accuracy! +*/ + char loose_objects_subdir_seen[256]; + struct oid_array loose_objects_cache; + char path[FLEX_ARRAY]; } *alt_odb_list; extern void prepare_alt_odb(void); @@ -1811,6 +1822,12 @@ typedef int each_loose_cruft_fn(const char *basename, typedef int each_loose_subdir_fn(int nr, const char *path, void *data); +int for_each_file_in_obj_subdir(int subdir_nr, + struct strbuf *path, + each_loose_object_fn obj_cb, + each_loose_cruft_fn cruft_cb, + each_loose_subdir_fn subdir_cb, + void *data); int for_each_loose_file_in_objdir(const char *path, each_loose_object_fn obj_cb, each_loose_cruft_fn cruft_cb, diff --git a/sha1_file.c b/sha1_file.c index 59a4ed2ed3..5e0ee2b68b 100644 --- a/sha1_file.c +++ b/sha1_file.c @@ -3735,12 +3735,12 @@ void assert_sha1_type(const unsigned char *sha1, enum object_type expect) typename(expect)); } -static int for_each_file_in_obj_subdir(int subdir_nr, - struct strbuf *path, - each_loose_object_fn obj_cb, - each_loose_cruft_fn cruft_cb, - each_loose_subdir_fn subdir_cb, - void *data) +int for_each_file_in_obj_subdir(int subdir_nr, + struct strbuf *path, + each_loose_object_fn obj_cb, + each_loose_cruft_fn cruft_cb, + each_loose_subdir_fn subdir_cb, + void *data) { size_t baselen = path->len; DIR *dir = opendir(path->buf); diff --git a/sha1_name.c b/sha1_name.c index 5126853bb5..ccb5144d0d 100644 --- a/sha1_name.c +++ b/sha1_name.c @@ -77,10 +77,19 @@ static void update_candidates(struct disambiguate_state *ds, const struct object /* otherwise, current can be discarded and candidate is still good */ } +static int append_loose_object(const struct object_id *oid, const char *path, + void *data) +{ + oid_array_append(data, oid); + return 0; +} + +static int match_sha(unsigned, const unsigned char *, const unsigned char *); + static void find_short_object_filename(struct disambiguate_state *ds) { + int subdir_nr = ds->bin_pfx.hash[0]; struct alternate_object_database *alt; - char hex[GIT_MAX_HEXSZ]; static struct alternate_object_database *fakeent; if (!fakeent) { @@ -95,29 +104,30 @@ static void find_short_object_filename(struct disambiguate_state *ds) } fakeent->next = alt_odb_list; - xsnprintf(hex, sizeof(hex), "%.2s", ds->hex_pfx); for (alt = fakeent; alt && !ds->ambiguous; alt = alt->next) { - struct
Re: [BUG] add_again() off-by-one error in custom format
Am 18.06.2017 um 13:49 schrieb Jeff King: On Sun, Jun 18, 2017 at 12:58:37PM +0200, René Scharfe wrote: An oid_array spends ca. 30 bytes per entry (20 bytes for a hash times a factor of 1.5 from alloc_nr). How many loose objects do we have to expect? 30 MB for a million of them sounds not too bad, but can there realistically be orders of magnitude more? Good point. We gc by default at 6000 loose objects, and lots of thing will suffer poor performance by the time you hit a million. So a little extra space is probably not worth worrying about. So here's a patch for caching loose object hashes in an oid_array without a way to invalidate or release the cache. It uses a single oid_array for simplicity, but reads each subdirectory individually and remembers that fact using a bitmap. I like the direction of this very much. @@ -1586,6 +1587,9 @@ extern struct alternate_object_database { struct strbuf scratch; size_t base_len; + uint32_t loose_objects_subdir_bitmap[8]; Is it worth the complexity of having an actual bitmap and not just an array of char? I guess it's not _that_ complex to access the bits, but there are a lot of magic numbers involved. That would be 224 bytes more per alternate_object_database, and we'd gain simpler code. Hmm. We could add some bitmap helper macros, but you're probably right that the first version should use the simplest form for representing small bitmaps that we currently have. + struct oid_array loose_objects_cache; We should probably insert a comment here warning that the cache may go out of date during the process run, and should only be used when that's an acceptable tradeoff. Then we need to offer a way for opting out. Through a global variable? (I'd rather reduce their number, but don't see how allow programs to nicely toggle this cache otherwise.) +static void read_loose_object_subdir(struct alternate_object_database *alt, +int subdir_nr) I think it's nice to pull this out into a helper function. I do wonder if it should just be reusing for_each_loose_file_in_objdir(). You'd just need to expose the in_obj_subdir() variant publicly. It does do slightly more than we need (for the callbacks it actually builds the filename), but I doubt memcpy()ing a few bytes would be measurable. Good point. The function also copies the common first two hex digits for each entry. But all that extra work is certainly dwarfed by the readdir calls. René
Re: [BUG] add_again() off-by-one error in custom format
Am 15.06.2017 um 15:25 schrieb Jeff King: > On Thu, Jun 15, 2017 at 01:33:34PM +0200, René Scharfe wrote: >> Can we trust object directory time stamps for cache invalidation? > > I think it would work on POSIX-ish systems, since loose object changes > always involve new files, not modifications of existing ones. I don't > know if there are systems that don't update directory mtimes even for > newly added files. > > That would still be a stat() per request. I'd be curious how the timing > compares to the opendir/readdir that happens now. Modification times wouldn't be exact, as you mentioned above: An object could be added just after checking the time stamp. So perhaps we don't need that, or a time-based invalidation suffices, e.g. we discard the cache after five minutes or so? Anyway, here's a patch for stat-based invalidation, on top of the other one. Array removal is really slow (hope I didn't sneak a bug in there, but my confidence in this code isn't very high). No locking is done; parallel threads removing and adding entries could make a mess, but that's not an issue for log. Timings for "time git log --pretty=%h >/dev/null" in my git repository with 5094 loose objects on Debian: master first patch this patch real0m1.065s 0m0.581s 0m0.633s user0m0.648s 0m0.564s 0m0.580s sys 0m0.412s 0m0.016s 0m0.052s And on mingw with 227 loose objects: master first patch this patch real0m1.756s 0m0.546s 0m1.659s user0m0.000s 0m0.000s 0m0.000s sys 0m0.000s 0m0.000s 0m0.000s So at least for Windows it would be really nice if we could avoid calling stat.. --- cache.h | 1 + sha1_name.c | 32 2 files changed, 29 insertions(+), 4 deletions(-) diff --git a/cache.h b/cache.h index 8c6748907b..386b09710d 100644 --- a/cache.h +++ b/cache.h @@ -1589,6 +1589,7 @@ extern struct alternate_object_database { uint32_t loose_objects_subdir_bitmap[8]; struct oid_array loose_objects_cache; + time_t loose_objects_subdir_mtime[256]; char path[FLEX_ARRAY]; } *alt_odb_list; diff --git a/sha1_name.c b/sha1_name.c index ae6a5c3abe..baecb10454 100644 --- a/sha1_name.c +++ b/sha1_name.c @@ -77,6 +77,24 @@ static void update_candidates(struct disambiguate_state *ds, const struct object /* otherwise, current can be discarded and candidate is still good */ } +static void remove_subdir_entries(struct oid_array *array, int subdir_nr) +{ + struct object_id oid; + int start, end; + + memset(, 0, sizeof(oid)); + oid.hash[0] = subdir_nr; + start = oid_array_lookup(array, ); + if (start < 0) + start = -1 - start; + end = start; + while (end < array->nr && array->oid[end].hash[0] == subdir_nr) + end++; + memmove(array->oid + start, array->oid + end, + (array->nr - end) * sizeof(array->oid[0])); + array->nr -= end - start; +} + static void read_loose_object_subdir(struct alternate_object_database *alt, int subdir_nr) { @@ -86,15 +104,21 @@ static void read_loose_object_subdir(struct alternate_object_database *alt, struct dirent *de; size_t bitmap_index = subdir_nr / 32; uint32_t bitmap_mask = 1 << (subdir_nr % 32); - - if (alt->loose_objects_subdir_bitmap[bitmap_index] & bitmap_mask) - return; - alt->loose_objects_subdir_bitmap[bitmap_index] |= bitmap_mask; + struct stat st; buf = alt_scratch_buf(alt); strbuf_addf(buf, "%02x/", subdir_nr); xsnprintf(hex, sizeof(hex), "%02x", subdir_nr); + stat(buf->buf, ); + if (alt->loose_objects_subdir_bitmap[bitmap_index] & bitmap_mask) { + if (alt->loose_objects_subdir_mtime[subdir_nr] == st.st_mtime) + return; + remove_subdir_entries(>loose_objects_cache, subdir_nr); + } + alt->loose_objects_subdir_mtime[subdir_nr] = st.st_mtime; + alt->loose_objects_subdir_bitmap[bitmap_index] |= bitmap_mask; + dir = opendir(buf->buf); if (!dir) return; -- 2.13.0
Re: [BUG] add_again() off-by-one error in custom format
Am 15.06.2017 um 15:25 schrieb Jeff King: > On Thu, Jun 15, 2017 at 01:33:34PM +0200, René Scharfe wrote: >>> I'm not really sure how, though, short of caching the directory >>> contents. That opens up questions of whether and when to invalidate the >>> cache. If the cache were _just_ about short hashes, it might be OK to >>> just assume that it remains valid through the length of the program (so >>> worst case, a simultaneous write might mean that we generate a sha1 >>> which just became ambiguous, but that's generally going to be racy >>> anyway). >>> >>> The other downside of course is that we'd spend RAM on it. We could >>> bound the size of the cache, I suppose. >> >> You mean like an in-memory pack index for loose objects? In order to >> avoid the readdir call in sha1_name.c::find_short_object_filename()? >> We'd only need to keep the hashes of found objects. An oid_array >> would be quite compact. > > Yes, that's what I was thinking of. An oid_array spends ca. 30 bytes per entry (20 bytes for a hash times a factor of 1.5 from alloc_nr). How many loose objects do we have to expect? 30 MB for a million of them sounds not too bad, but can there realistically be orders of magnitude more? >> Non-racy writes inside a process should not be ignored (write, read >> later) -- e.g. checkout does something like that. > > Ideally, yes. Though thinking on this more, it's racy even today, > because we rely on the in-memory packed_git list. We usually re-check the > directory only when we look for an object and it's missing. So any new > packs which have been added while the process runs will be missed when > doing short-sha1 lookups (and actually, find_short_packed_object does > not even seem to do the usual retry-on-miss). > > A normal process like "checkout" isn't writing new packs, but a > simultaneous repack could be migrating its new objects to a pack behind > its back. (It also _can_ write packs, but only for large blobs). > > Given that we are talking about finding "unique" abbreviations here, and > that those abbreviations can become invalidated immediately anyway as > more objects are added (even by the same process), I'm not sure we need > to strive for absolute accuracy. Agreed. And processes that add objects and use them later probably use full hashes (at least checkout does). So here's a patch for caching loose object hashes in an oid_array without a way to invalidate or release the cache. It uses a single oid_array for simplicity, but reads each subdirectory individually and remembers that fact using a bitmap. --- cache.h | 4 sha1_name.c | 69 +++-- 2 files changed, 53 insertions(+), 20 deletions(-) diff --git a/cache.h b/cache.h index 4d92aae0e8..8c6748907b 100644 --- a/cache.h +++ b/cache.h @@ -11,6 +11,7 @@ #include "string-list.h" #include "pack-revindex.h" #include "hash.h" +#include "sha1-array.h" #ifndef platform_SHA_CTX /* @@ -1586,6 +1587,9 @@ extern struct alternate_object_database { struct strbuf scratch; size_t base_len; + uint32_t loose_objects_subdir_bitmap[8]; + struct oid_array loose_objects_cache; + char path[FLEX_ARRAY]; } *alt_odb_list; extern void prepare_alt_odb(void); diff --git a/sha1_name.c b/sha1_name.c index 5126853bb5..ae6a5c3abe 100644 --- a/sha1_name.c +++ b/sha1_name.c @@ -77,10 +77,46 @@ static void update_candidates(struct disambiguate_state *ds, const struct object /* otherwise, current can be discarded and candidate is still good */ } +static void read_loose_object_subdir(struct alternate_object_database *alt, +int subdir_nr) +{ + struct strbuf *buf; + char hex[GIT_MAX_HEXSZ]; + DIR *dir; + struct dirent *de; + size_t bitmap_index = subdir_nr / 32; + uint32_t bitmap_mask = 1 << (subdir_nr % 32); + + if (alt->loose_objects_subdir_bitmap[bitmap_index] & bitmap_mask) + return; + alt->loose_objects_subdir_bitmap[bitmap_index] |= bitmap_mask; + + buf = alt_scratch_buf(alt); + strbuf_addf(buf, "%02x/", subdir_nr); + xsnprintf(hex, sizeof(hex), "%02x", subdir_nr); + + dir = opendir(buf->buf); + if (!dir) + return; + + while ((de = readdir(dir)) != NULL) { + struct object_id oid; + + if (strlen(de->d_name) != GIT_SHA1_HEXSZ - 2) + continue; + memcpy(hex + 2, de->d_name, GIT_SHA1_HEXSZ - 2); + if (!get_oid_hex(hex, )) + oid_array_append(>loose_objects_cache, ); + } + closedir(dir); +}
Re: [PATCH 2/2] date: use localtime() for "-local" time formats
Am 15.06.2017 um 15:52 schrieb Jeff King: But for the special case of the "-local" formats, we can just skip the adjustment and use localtime() instead of gmtime(). This makes --date=format-local:%Z work correctly, showing the local timezone instead of an empty string. Documentation/rev-list-options.txt should be updated to mention that %Z is passed to strftime in the local case, no? The new test checks the result for "UTC", our default test-lib value for $TZ. Using something like EST5 might be more interesting, but the actual zone string is system-dependent (for instance, on my system it expands to just EST). Hopefully "UTC" is vanilla enough that every system treats it the same. Signed-off-by: Jeff King--- I don't have a Windows system to test this on, but from the output Dscho provided earlier, I believe this should pass. The first patch applies with some fuzz on master of Git for Windows, the second one applies cleanly. A "typedef unsigned long timestamp_t;" is required to compile it; such a fixup won't be needed for long, I guess. t0006 succeeds. René
Re: [PATCH] checkout: don't write merge results into the object database
Am 15.06.2017 um 15:57 schrieb Jeff King: > On Thu, Jun 15, 2017 at 01:33:42PM +0200, René Scharfe wrote: > >> Merge results need to be written to the worktree, of course, but we >> don't necessarily need object entries for them, especially if they >> contain conflict markers. Use pretend_sha1_file() to create fake >> blobs to pass to make_cache_entry() and checkout_entry() instead. > > Conceptually this makes sense, although I have a comment below. > > Out of curiosity, did this come up when looking into the cherry-pick > segfault/error from a few hours ago? No, it came up in the discussion of Dscho's memory leak plug series [1]. >> @@ -225,8 +225,8 @@ static int checkout_merged(int pos, const struct >> checkout *state) >> * (it also writes the merge result to the object database even >> * when it may contain conflicts). >> */ >> -if (write_sha1_file(result_buf.ptr, result_buf.size, >> -blob_type, oid.hash)) >> +if (pretend_sha1_file(result_buf.ptr, result_buf.size, >> + OBJ_BLOB, oid.hash)) >> die(_("Unable to add merge result for '%s'"), path); >> free(result_buf.ptr); > > I wondered if pretend_sha1_file() makes a copy of the buffer, and indeed > it does. So this is correct. > > But that raises an interesting question: how big are these objects, and > is it a good idea to store them in RAM? Obviously they're in RAM > already, but I'm not sure if it's one-at-a-time. We could be bumping the > peak RAM used if there's a large number of these entries. Touching the > on-disk odb does feel hacky, but it also means they behave like other > objects. > > If it is a concern, I think it could be solved by "unpretending" after > our call to checkout_entry completes. That would need a new call in > sha1_file.c, but it should be easy to write. Good point; we'd accumulate fake entries that we'll never need again. The patch should clean them up. Alternatively we could finally address the NEEDSWORK comment and provide a way to checkout a file without faking an index entry.. René [1] https://public-inbox.org/git/2704e145927c851c4163a68cfdfd5ada48fff21d.1493906085.git.johannes.schinde...@gmx.de/
[PATCH v3] strbuf: let strbuf_addftime handle %z and %Z itself
There is no portable way to pass timezone information to strftime. Add parameters for timezone offset and name to strbuf_addftime and let it handle the timezone-related format specifiers %z and %Z internally. Callers can opt out for %Z by passing NULL as timezone name. %z is always handled internally -- this helps on Windows, where strftime would expand it to a timezone name (same as %Z), in violation of POSIX. Modifiers are not handled, e.g. %Ez is still passed to strftime. Use an empty string as timezone name in show_date (the only current caller) for now because we only have the timezone offset in non-local mode. POSIX allows %Z to resolve to an empty string in case of missing information. Helped-by: Ulrich MuellerHelped-by: Jeff King Signed-off-by: Rene Scharfe --- Changes from v3: - Updated developer documentation in strbuf.h. - Added short note to user documentation. Documentation/rev-list-options.txt | 3 ++- date.c | 2 +- strbuf.c | 41 ++ strbuf.h | 10 -- t/t0006-date.sh| 6 ++ 5 files changed, 54 insertions(+), 8 deletions(-) diff --git a/Documentation/rev-list-options.txt b/Documentation/rev-list-options.txt index a46f70c2b1..34ae2553f1 100644 --- a/Documentation/rev-list-options.txt +++ b/Documentation/rev-list-options.txt @@ -768,7 +768,8 @@ timezone value. 1970). As with `--raw`, this is always in UTC and therefore `-local` has no effect. + -`--date=format:...` feeds the format `...` to your system `strftime`. +`--date=format:...` feeds the format `...` to your system `strftime`, +except for %z and %Z, which are handled internally. Use `--date=format:%c` to show the date in your system locale's preferred format. See the `strftime` manual for a complete list of format placeholders. When using `-local`, the correct syntax is diff --git a/date.c b/date.c index 63fa99685e..5580577334 100644 --- a/date.c +++ b/date.c @@ -246,7 +246,7 @@ const char *show_date(timestamp_t time, int tz, const struct date_mode *mode) month_names[tm->tm_mon], tm->tm_year + 1900, tm->tm_hour, tm->tm_min, tm->tm_sec, tz); else if (mode->type == DATE_STRFTIME) - strbuf_addftime(, mode->strftime_fmt, tm); + strbuf_addftime(, mode->strftime_fmt, tm, tz, ""); else strbuf_addf(, "%.3s %.3s %d %02d:%02d:%02d %d%c%+05d", weekday_names[tm->tm_wday], diff --git a/strbuf.c b/strbuf.c index 00457940cf..be3b9e37b1 100644 --- a/strbuf.c +++ b/strbuf.c @@ -785,14 +785,48 @@ char *xstrfmt(const char *fmt, ...) return ret; } -void strbuf_addftime(struct strbuf *sb, const char *fmt, const struct tm *tm) +void strbuf_addftime(struct strbuf *sb, const char *fmt, const struct tm *tm, +int tz_offset, const char *tz_name) { + struct strbuf munged_fmt = STRBUF_INIT; size_t hint = 128; size_t len; if (!*fmt) return; + /* +* There is no portable way to pass timezone information to +* strftime, so we handle %z and %Z here. +*/ + for (;;) { + const char *percent = strchrnul(fmt, '%'); + strbuf_add(_fmt, fmt, percent - fmt); + if (!*percent) + break; + fmt = percent + 1; + switch (*fmt) { + case '%': + strbuf_addstr(_fmt, "%%"); + fmt++; + break; + case 'z': + strbuf_addf(_fmt, "%+05d", tz_offset); + fmt++; + break; + case 'Z': + if (tz_name) { + strbuf_addstr(_fmt, tz_name); + fmt++; + break; + } + /* FALLTHROUGH */ + default: + strbuf_addch(_fmt, '%'); + } + } + fmt = munged_fmt.buf; + strbuf_grow(sb, hint); len = strftime(sb->buf + sb->len, sb->alloc - sb->len, fmt, tm); @@ -804,17 +838,16 @@ void strbuf_addftime(struct strbuf *sb, const char *fmt, const struct tm *tm) * output contains at least one character, and then drop the extra * character before returning. */ - struct strbuf munged_fmt = STRBUF_INIT; - strbuf_addf(_fmt, "%s ", fmt); + strbuf_addch(_fmt, ' '); while (!len) { hint *= 2; strbuf_grow(sb, hint); len = strftime(sb->buf + sb->len, sb->alloc - sb->len,
Re: [PATCH v2] strbuf: let strbuf_addftime handle %z and %Z itself
Am 15.06.2017 um 13:27 schrieb Ulrich Mueller: On Thu, 15 Jun 2017, René Scharfe wrote: Callers can opt out for %Z by passing NULL as timezone name. %z is always handled internally -- this helps on Windows, where strftime would expand it to a timezone name (same as %Z), in violation of POSIX. Modifiers are not handled, e.g. %Ez is still passed to strftime. POSIX would also allow other things, like a field width: http://pubs.opengroup.org/onlinepubs/9699919799/functions/strftime.html $ date '+%8z' +200 (But I believe that's not very useful, and supporting it might require duplicating much of strftime's code.) Windows doesn't support that (unsurprisingly), but it accepts %#z, which does the same as %z. Let's wait for someone to request support for modifiers and just document the behavior for now. Changes from v1: - Always handle %z internally. Minor nitpick: Shouldn't the comment in strbuf.h be updated to reflect that change? + * Add the time specified by `tm`, as formatted by `strftime`. `tz_offset` + * and `tz_name` are used to expand %z and %Z internally, unless `tz_name` + * is NULL. `tz_offset` is in decimal hhmm format, e.g. -600 means six + * hours west of Greenwich. Yes, it should. Thanks for paying attention! :) René
[PATCH] checkout: don't write merge results into the object database
Merge results need to be written to the worktree, of course, but we don't necessarily need object entries for them, especially if they contain conflict markers. Use pretend_sha1_file() to create fake blobs to pass to make_cache_entry() and checkout_entry() instead. Signed-off-by: Rene Scharfe--- builtin/checkout.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/builtin/checkout.c b/builtin/checkout.c index 1624eed7e7..f51c15af9f 100644 --- a/builtin/checkout.c +++ b/builtin/checkout.c @@ -215,7 +215,7 @@ static int checkout_merged(int pos, const struct checkout *state) /* * NEEDSWORK: -* There is absolutely no reason to write this as a blob object +* There is absolutely no reason to build a fake blob object * and create a phony cache entry. This hack is primarily to get * to the write_entry() machinery that massages the contents to * work-tree format and writes out which only allows it for a @@ -225,8 +225,8 @@ static int checkout_merged(int pos, const struct checkout *state) * (it also writes the merge result to the object database even * when it may contain conflicts). */ - if (write_sha1_file(result_buf.ptr, result_buf.size, - blob_type, oid.hash)) + if (pretend_sha1_file(result_buf.ptr, result_buf.size, + OBJ_BLOB, oid.hash)) die(_("Unable to add merge result for '%s'"), path); free(result_buf.ptr); ce = make_cache_entry(mode, oid.hash, path, 2, 0); -- 2.13.0
Re: [BUG] add_again() off-by-one error in custom format
Am 15.06.2017 um 07:56 schrieb Jeff King: One interesting thing is that the cost of finding short hashes very much depends on your loose object setup. I timed: git log --format=%H >/dev/null versus git log --format=%h >/dev/null on git.git. It went from about 400ms to about 800ms. But then I noticed I had a lot of loose object directories, and ran "git gc --prune=now". Afterwards, my timings were more like 380ms and 460ms. The difference is that in the "before" case, we actually opened each directory and ran getdents(). But after gc, the directories are gone totally and open() fails. We also have to do a linear walk through the objects in each directory, since the contents are sorted. Do you mean "unsorted"? So I wonder if it is worth trying to optimize the short-sha1 computation in the first place. Double-%h aside, that would make _everything_ faster, including --oneline. Right. I'm not really sure how, though, short of caching the directory contents. That opens up questions of whether and when to invalidate the cache. If the cache were _just_ about short hashes, it might be OK to just assume that it remains valid through the length of the program (so worst case, a simultaneous write might mean that we generate a sha1 which just became ambiguous, but that's generally going to be racy anyway). The other downside of course is that we'd spend RAM on it. We could bound the size of the cache, I suppose. You mean like an in-memory pack index for loose objects? In order to avoid the readdir call in sha1_name.c::find_short_object_filename()? We'd only need to keep the hashes of found objects. An oid_array would be quite compact. Non-racy writes inside a process should not be ignored (write, read later) -- e.g. checkout does something like that. Can we trust object directory time stamps for cache invalidation? René
Re: rs/strbuf-addftime-zZ, was Re: What's cooking in git.git (Jun 2017, #04; Tue, 13)
Am 15.06.2017 um 07:42 schrieb Jeff King: On Thu, Jun 15, 2017 at 01:03:29AM +0200, René Scharfe wrote: But there's more. strftime on Windows doesn't support common POSIX- defined tokens like %F (%Y-%m-%d) and %T (%H:%M:%S). We could handle them as well. Do we want that? At least we'd have to update the added test that uses them.. Here's the full list of tokens in POSIX [1], but not supported by Windows [2]: %C, %D, %F, %G, %R, %T, %V, %e, %g, %h, %n, %r, %t, %u plus the modifiers %E and %O. I don't have a real opinion on that. The point of adding strftime was always to give the user access to whatever their system supports. In particular "%c" which we cannot emulate ourselves. If people want support for those other things on platforms that don't have it, I have no real objection. But I also don't know that it's worth spending time on if nobody is asking for it. Agreed; let's make the tests more focused (i.e. not exercise %F and %T needlessly). René
[PATCH v2] strbuf: let strbuf_addftime handle %z and %Z itself
There is no portable way to pass timezone information to strftime. Add parameters for timezone offset and name to strbuf_addftime and let it handle the timezone-related format specifiers %z and %Z internally. Callers can opt out for %Z by passing NULL as timezone name. %z is always handled internally -- this helps on Windows, where strftime would expand it to a timezone name (same as %Z), in violation of POSIX. Modifiers are not handled, e.g. %Ez is still passed to strftime. Use an empty string as timezone name in show_date (the only current caller) for now because we only have the timezone offset in non-local mode. POSIX allows %Z to resolve to an empty string in case of missing information. Helped-by: Ulrich MuellerHelped-by: Jeff King Signed-off-by: Rene Scharfe --- Changes from v1: - Always handle %z internally. - Move tests from t6006 to t0006, as that's a more appropriate place. - Changed tests to only use %%, %z and %Z to avoid incompatibilities. - Tested on mingw (applies there with patch and some fuzz). date.c | 2 +- strbuf.c| 41 + strbuf.h| 11 --- t/t0006-date.sh | 6 ++ 4 files changed, 52 insertions(+), 8 deletions(-) diff --git a/date.c b/date.c index 63fa99685e..5580577334 100644 --- a/date.c +++ b/date.c @@ -246,7 +246,7 @@ const char *show_date(timestamp_t time, int tz, const struct date_mode *mode) month_names[tm->tm_mon], tm->tm_year + 1900, tm->tm_hour, tm->tm_min, tm->tm_sec, tz); else if (mode->type == DATE_STRFTIME) - strbuf_addftime(, mode->strftime_fmt, tm); + strbuf_addftime(, mode->strftime_fmt, tm, tz, ""); else strbuf_addf(, "%.3s %.3s %d %02d:%02d:%02d %d%c%+05d", weekday_names[tm->tm_wday], diff --git a/strbuf.c b/strbuf.c index 00457940cf..be3b9e37b1 100644 --- a/strbuf.c +++ b/strbuf.c @@ -785,14 +785,48 @@ char *xstrfmt(const char *fmt, ...) return ret; } -void strbuf_addftime(struct strbuf *sb, const char *fmt, const struct tm *tm) +void strbuf_addftime(struct strbuf *sb, const char *fmt, const struct tm *tm, +int tz_offset, const char *tz_name) { + struct strbuf munged_fmt = STRBUF_INIT; size_t hint = 128; size_t len; if (!*fmt) return; + /* +* There is no portable way to pass timezone information to +* strftime, so we handle %z and %Z here. +*/ + for (;;) { + const char *percent = strchrnul(fmt, '%'); + strbuf_add(_fmt, fmt, percent - fmt); + if (!*percent) + break; + fmt = percent + 1; + switch (*fmt) { + case '%': + strbuf_addstr(_fmt, "%%"); + fmt++; + break; + case 'z': + strbuf_addf(_fmt, "%+05d", tz_offset); + fmt++; + break; + case 'Z': + if (tz_name) { + strbuf_addstr(_fmt, tz_name); + fmt++; + break; + } + /* FALLTHROUGH */ + default: + strbuf_addch(_fmt, '%'); + } + } + fmt = munged_fmt.buf; + strbuf_grow(sb, hint); len = strftime(sb->buf + sb->len, sb->alloc - sb->len, fmt, tm); @@ -804,17 +838,16 @@ void strbuf_addftime(struct strbuf *sb, const char *fmt, const struct tm *tm) * output contains at least one character, and then drop the extra * character before returning. */ - struct strbuf munged_fmt = STRBUF_INIT; - strbuf_addf(_fmt, "%s ", fmt); + strbuf_addch(_fmt, ' '); while (!len) { hint *= 2; strbuf_grow(sb, hint); len = strftime(sb->buf + sb->len, sb->alloc - sb->len, munged_fmt.buf, tm); } - strbuf_release(_fmt); len--; /* drop munged space */ } + strbuf_release(_fmt); strbuf_setlen(sb, sb->len + len); } diff --git a/strbuf.h b/strbuf.h index 80047b1bb7..39d5836abd 100644 --- a/strbuf.h +++ b/strbuf.h @@ -339,9 +339,14 @@ __attribute__((format (printf,2,0))) extern void strbuf_vaddf(struct strbuf *sb, const char *fmt, va_list ap); /** - * Add the time specified by `tm`, as formatted by `strftime`. - */ -extern void strbuf_addftime(struct strbuf *sb, const char *fmt, const struct tm *tm); + * Add the time specified by `tm`, as formatted by `strftime`. `tz_offset` + * and
Re: rs/strbuf-addftime-zZ, was Re: What's cooking in git.git (Jun 2017, #04; Tue, 13)
Am 14.06.2017 um 23:04 schrieb Johannes Schindelin: > On Wed, 14 Jun 2017, René Scharfe wrote: > >> Does someone actually expect %z to show time zone names instead of >> offsets on Windows? > > Not me ;-) > > I cannot speak for anyone else, as I lack that information, though. Before the patch %z would always expand to + on Linux and to the name of the local time zone on Windows, no matter which offset was actually given. So it was broken in either case (even though it got at least some aspects right by accident for some commits). Based on that I'd think handling %z internally should be OK. But there's more. strftime on Windows doesn't support common POSIX- defined tokens like %F (%Y-%m-%d) and %T (%H:%M:%S). We could handle them as well. Do we want that? At least we'd have to update the added test that uses them.. Here's the full list of tokens in POSIX [1], but not supported by Windows [2]: %C, %D, %F, %G, %R, %T, %V, %e, %g, %h, %n, %r, %t, %u plus the modifiers %E and %O. René [1] http://pubs.opengroup.org/onlinepubs/009695399/functions/strftime.html [2] https://msdn.microsoft.com/en-us/library/fe06s4ak.aspx
Re: rs/strbuf-addftime-zZ, was Re: What's cooking in git.git (Jun 2017, #04; Tue, 13)
Am 14.06.2017 um 13:10 schrieb Jeff King: On Wed, Jun 14, 2017 at 12:57:06PM +0200, Johannes Schindelin wrote: But even then, it fails in t0006 on Windows with this error: -- snip -- ++ eval 'diff -u "$@" ' +++ diff -u expect actual --- expect 2017-06-14 10:53:40.126136900 + +++ actual 2017-06-14 10:53:40.171146800 + @@ -1 +1 @@ -146600 +0200 -> 2016-06-15 14:13:20 + (UTC) +146600 +0200 -> 2016-06-15 14:13:20 UTC (UTC) Ugh, I was worried about that some systems might display timezones differently (that's why I _didn't_ check %Z in the EST5 case). But I must admit this was not an incompatibility I was expecting. It looks like your system strftime() turns %z into "UTC". POSIX says: %z Replaced by the offset from UTC in the ISO 8601:2000 standard format (+hhmm or -hhmm), or by no characters if no timezone is determinable. So it seems like the mingw strftime is violating POSIX. I don't see an easy solution beyond marking this as !MINGW. Though if we wanted a partial test, we could test %z and %Z separately. Hmm. The patches currently either let strftime handle both %z and %Z (in the local case) or handle both internally. Perhaps we need a third option, namely to handle %z internally in all cases for systems whose implementation violates POSIX? Nah, it would be easier to always handle %z internally. Any downsides? Does someone actually expect %z to show time zone names instead of offsets on Windows? René
Re: [BUG] add_again() off-by-one error in custom format
Am 13.06.2017 um 23:20 schrieb Junio C Hamano: > René Scharfe <l@web.de> writes: > >> The difference is about the same as the one between: >> >> $ time git log --format="" >/dev/null >> >> real0m0.463s >> user0m0.448s >> sys 0m0.012s >> >> and: >> >> $ time git log --format="%h" >/dev/null >> >> real0m1.062s >> user0m0.636s >> sys 0m0.416s >> >> With caching duplicates are basically free and without it short >> hashes have to be looked up again. Other placeholders may reduce >> the relative slowdown, depending on how expensive they are. > > I think the real question is how likely people use more than one > occurrence of the same thing in their custom format, and how deeply > they care that --format='%h %h' costs more than --format='%h'. The > cost won't of course be double (because the main traversal costs > without any output), but it would be rather unreasonable to expect > that --format='%h %h %h %h %h' to cost the same as --format='%h'; > after all, Git is doing more for them ;-) The answer to the first half is obviously "very likely" -- otherwise this bug wouldn't have been found, right? :) Regarding the question of how bad a 50% slowdown for a second %h would be: No idea. If ran interactively it may not even be noticeable because the user can read the first few lines in less while the rest is prepared in the background. We don't have a perf test for formats with duplicate short hashes, so we don't promise anything, right? :) René -- >8 -- Subject: [PATCH] pretty: recalculate duplicate short hashes b9c6232138 (--format=pretty: avoid calculating expensive expansions twice) optimized adding short hashes multiple times by using the fact that the output strbuf was only ever simply appended to and copying the added string from the previous run. That prerequisite is no longer given; we now have modfiers like %< and %+ that can cause the cache to lose track of the correct offsets. Remove it. Reported-by: Michael Giuffrida <michae...@chromium.org> Signed-off-by: Rene Scharfe <l@web.de> --- I'm sending this out in the hope that there might be a simple way to fix it after all, like Gábor's patch does for %+. %< and %> seem to be the only other problematic modifiers for now -- I'm actually surprised that %w seems to be OK. pretty.c | 32 strbuf.c | 7 --- strbuf.h | 6 -- 3 files changed, 45 deletions(-) diff --git a/pretty.c b/pretty.c index 09701bd2ff..cc099dfdd1 100644 --- a/pretty.c +++ b/pretty.c @@ -783,29 +783,9 @@ struct format_commit_context { size_t body_off; /* The following ones are relative to the result struct strbuf. */ - struct chunk abbrev_commit_hash; - struct chunk abbrev_tree_hash; - struct chunk abbrev_parent_hashes; size_t wrap_start; }; -static int add_again(struct strbuf *sb, struct chunk *chunk) -{ - if (chunk->len) { - strbuf_adddup(sb, chunk->off, chunk->len); - return 1; - } - - /* -* We haven't seen this chunk before. Our caller is surely -* going to add it the hard way now. Remember the most likely -* start of the to-be-added chunk: the current end of the -* struct strbuf. -*/ - chunk->off = sb->len; - return 0; -} - static void parse_commit_header(struct format_commit_context *context) { const char *msg = context->message; @@ -1147,24 +1127,16 @@ static size_t format_commit_one(struct strbuf *sb, /* in UTF-8 */ return 1; case 'h': /* abbreviated commit hash */ strbuf_addstr(sb, diff_get_color(c->auto_color, DIFF_COMMIT)); - if (add_again(sb, >abbrev_commit_hash)) { - strbuf_addstr(sb, diff_get_color(c->auto_color, DIFF_RESET)); - return 1; - } strbuf_add_unique_abbrev(sb, commit->object.oid.hash, c->pretty_ctx->abbrev); strbuf_addstr(sb, diff_get_color(c->auto_color, DIFF_RESET)); - c->abbrev_commit_hash.len = sb->len - c->abbrev_commit_hash.off; return 1; case 'T': /* tree hash */ strbuf_addstr(sb, oid_to_hex(>tree->object.oid)); return 1; case 't': /* abbreviated tree hash */ - if (add_again(sb, >abbrev_tree_hash)) - return 1; strbuf_add_unique_abbrev(sb, commit->tree->object.oid.hash, c->pretty_ctx->abbrev); -
Re: [BUG] add_again() off-by-one error in custom format
Am 14.06.2017 um 00:24 schrieb SZEDER Gábor: [sorry for double post, forgot the mailing list...] To throw in a fourth option, this one adjusts the expansions' cached offsets when the magic makes it necessary. It's not necessary for '%-', because it only makes a difference when the expansion is empty, and in that case - add_again() doesn't consider it cached, - and even if it did, the offset of a zero-length string wouldn't matter. pretty.c | 32 +--- 1 file changed, 21 insertions(+), 11 deletions(-) There are other modifiers, e.g. try the format '%h%>(20)%h' -- its output is obviously wrong (contains control characters), even with your patch. We'd have to update the offsets for each placeholder that changes the output of others. I don't think that approach scales, alas. René
Re: [BUG] add_again() off-by-one error in custom format
Am 13.06.2017 um 20:29 schrieb Junio C Hamano: René Scharfe <l@web.de> writes: Indeed, a very nice bug report! I think the call to format_commit_one() needs to be taught to pass 'magic' through, and then add_again() mechanism needs to be told not to cache when magic is in effect, or something like that. Perhaps something along this line...? pretty.c | 64 ++-- 1 file changed, 38 insertions(+), 26 deletions(-) That looks quite big. You even invent a way to classify magic. Hmph, by "a way to classify", do you mean the enum? That thing was there from before, just moved. Oh, yes, sorry. I didn't even get that far into the patch. (I'll better go to bed after hitting send..) Also I think we do not have to change add_again() at all, because chunk->off tolerates being given a garbage value as long as chunk->len stays 0, and add_again() does not change chunk->len at all. Which cuts the diffstat down to pretty.c | 40 +--- 1 file changed, 25 insertions(+), 15 deletions(-) Does the caching feature justify the added complexity? That's a good question. I thought about your second alternative while adding the "don't cache" thing, but if we can measure and find out that we are not gaining much from caching, certainly that sounds much simpler. The difference is about the same as the one between: $ time git log --format="" >/dev/null real0m0.463s user0m0.448s sys 0m0.012s and: $ time git log --format="%h" >/dev/null real0m1.062s user0m0.636s sys 0m0.416s With caching duplicates are basically free and without it short hashes have to be looked up again. Other placeholders may reduce the relative slowdown, depending on how expensive they are. Forgot a third option, probably because it's not a particularly good idea: Replacing the caching in pretty.c with a short static cache in find_unique_abbrev_r(). René
Re: [BUG] add_again() off-by-one error in custom format
Am 13.06.2017 um 00:49 schrieb Junio C Hamano: Michael Giuffridawrites: For the abbreviated commit hash placeholder ('h'), pretty.c uses add_again() to cache the result of the expansion, and then uses that result in future expansions. This causes problems when the expansion includes whitespace: $ git log -n 1 --pretty='format:newline:%+h%nno_newline:%h' newline: a0b1c2d no_newline: a0b1c2 The second expansion of the hash added an unwanted newline and removed the final character. It seemingly used the cached expansion *starting from the inserted newline*. The expected output is: $ git log -n 1 --pretty='format:newline:%+h%nno_newline:%h' newline: a0b1c2d no_newline:a0b1c2d Nicely explained. The add_again() mechanism caches an earlier result by remembering the offset and the length in the strbuf the formatted string is being accumulated to, but because %+x (and probably %-x) magic placeholders futz with the result of format_commit_one() in place, the cache goes out of sync, of course. Indeed, a very nice bug report! I think the call to format_commit_one() needs to be taught to pass 'magic' through, and then add_again() mechanism needs to be told not to cache when magic is in effect, or something like that. Perhaps something along this line...? pretty.c | 64 ++-- 1 file changed, 38 insertions(+), 26 deletions(-) That looks quite big. You even invent a way to classify magic. Does the caching feature justify the added complexity? Alternatives: - Don't cache anymore, now that we have placeholders that change output on top of the original append-only ones. Yields a net code reduction. Duplicated %h, %t and %p will have to be resolved at each occurrence. - Provide some space for the cache instead of reusing the output, like a strbuf for each placeholder. Adds a memory allocation to each first occurrence of %h, %t and %p. Such a cache could be reused for multiple commits by truncating it instead of releasing; not sure how to pass it in nicely for it to survive a sequence of calls, though. René
Re: [PATCH] strbuf: let strbuf_addftime handle %z and %Z itself
Am 12.06.2017 um 21:02 schrieb Ævar Arnfjörð Bjarmason: I only ever use the time offset info to quickly make a mental note of "oh +0200, this guy's in Europe", or "oh -0400 America East". Having any info at all for %Z would allow me to easily replace that already buggy mapping that exists in my head right now with something that's at least a bit more accurate, e.g. I remember that +0900 is Japan, but I can't now offhand recall what timezones India is at. Now, obviously we can't know whether to translate e.g. -0400 to New York or Santiago, but for the purposes of human readable output I thought it would be more useful to guess than show nothing at all. So for that purpose the most useful output of %Z *for me* would be just showing a string with the 3 most notable cities/areas, weighted for showing locations on different continents, e.g.: + = Iceland, West Africa, Ittoqqortoormiit +0100 = London, Lisbon, Kinshasa ... +0900 = Tokyo, Seul, Manokwari -0900 = San Francisco, Vancouver, Tijuana -0600 = Denver, Managua, Calgary Then I could run: git log --date=format-local:"%Y[...](%Z)" And get output like: commit 826c06412e (HEAD -> master, origin/master, origin/HEAD) Author: Junio C HamanoDate: Fri Jun 2 15:07:36 2017 +0900 (San Francisco, Vancouver, Tijuana etc.) Fifth batch for 2.14 [...] commit 30d005c020 Author: Jeff King Date: Fri May 19 08:59:34 2017 -0400 (New York, Havana, Santiago etc.) diff: use blob path for blob/file diffs Which gives me a pretty good idea of where the people who are making my colleges / collaborators who are making commits all over the world are located, for the purposes of reinforcing the abstract numeric mapping with a best-guess at what the location might be, or at least something that's close to the actual location. Half the time this would be off by a zone in areas that use daylight saving time, or you'd need to determine when DST starts and ends, but since that depends on the exact time zone it will be tricky. You could use military time zones, which are nice and easy to convert: Alpha (UTC+1) to Mike (UTC+12) (Juliet is skipped), November (UTC-1) to Yankee (UTC-12), and Zulu time zone (UTC+0). Downside: Most civilians don't use them. :) Also there are no names for zones that have an offset of a fraction of an hour. I'd definitely use a feature like that, and could hack it up on top of the current patches if there wasn't vehement opposition to it. Maybe the above examples change someone's mind, I can't think of a case where someone's relying on %Z now for date-local, or at least cases where something like the above wouldn't be more useful in practice than just an empty string, but if nobody agrees so be it :) Personally I don't have a use for time information; dates alone would suffice for me -- so I certainly won't hold you back. But I don't see how it could be done. René
Re: [PATCH] strbuf: let strbuf_addftime handle %z and %Z itself
Am 12.06.2017 um 18:16 schrieb Ævar Arnfjörð Bjarmason: On Mon, Jun 12, 2017 at 5:12 PM, Junio C Hamano <gits...@pobox.com> wrote: René Scharfe <l@web.de> writes: Am 07.06.2017 um 10:17 schrieb Jeff King: On Sat, Jun 03, 2017 at 12:40:34PM +0200, René Scharfe wrote: Duplicates strbuf_expand to a certain extent, but not too badly, I think. Leaves the door open for letting strftime handle the local case. I guess you'd plan to do that like this in the caller: if (date->local) tz_name = NULL; else tz_name = ""; and then your strftime() doesn't do any %z expansion when tz_name is NULL. Yes, or you could look up a time zone name somewhere else -- except we don't have a way to do that, at least for now. Is that only "for now"? I have a feeling that it is fundamentally impossible with the data we record. When GMTOFF 9:00 is the only thing we have for a timestamp, can we tell if we should label it as JST (aka Asia/Tokyo) or KST (aka Asia/Seoul)? We could track the time zone on commit, e.g. in a new header. I doubt that the need for showing time zone names will be strong enough to go to that length, though. It's obviously not perfect for all the reasons mentioned in this thread, but we already have a timezone->offset mapping in the timezone_names variable in date.c, a good enough solution might be to simply reverse that lookup when formatting %Z Of course we can never know if you were in Tokyo or Seul from the info in the commit object, but we don't need to, it's enough that we just emit JST for +0900 and anyone reading the output has at least some idea what +0900 maps to. I suspect that there will be different opinions, for cultural and political reasons. We could also simply replace "%Z" with the empty string, as the the POSIX strftime() documentation allows for: http://pubs.opengroup.org/onlinepubs/009695399/functions/strftime.html ("Replaced by the timezone name or abbreviation, or by no bytes if no timezone information exists."). Yes, that's what the patch does. René
Re: [PATCH] strbuf: let strbuf_addftime handle %z and %Z itself
Am 07.06.2017 um 10:17 schrieb Jeff King: On Sat, Jun 03, 2017 at 12:40:34PM +0200, René Scharfe wrote: Duplicates strbuf_expand to a certain extent, but not too badly, I think. Leaves the door open for letting strftime handle the local case. I guess you'd plan to do that like this in the caller: if (date->local) tz_name = NULL; else tz_name = ""; and then your strftime() doesn't do any %z expansion when tz_name is NULL. Yes, or you could look up a time zone name somewhere else -- except we don't have a way to do that, at least for now. I was thinking that we would need to have it take the actual time_t, and then it would be able to do the tzset/localtime dance itself. But since I don't think we're planning to do that (if anything we'd just handle the normal localtime() case), the complication it would add to the interface isn't worth it. A caller that really needs to do that can, and pass the result as a string. Not pretty, but at least it's a possibility. René
Re: [PATCH] strbuf: let strbuf_addftime handle %z and %Z itself
Am 03.06.2017 um 15:13 schrieb Ulrich Mueller: On Sat, 3 Jun 2017, René Scharfe wrote: + case 'Z': + strbuf_addstr(_fmt, tz_name); Is it guaranteed that tz_name cannot contain a percent sign itself? Currently yes, because the only caller passes an empty string. The fact that tz_name is subject to expansion by strftime could be mentioned explicitly in strbuf.h. I'm not sure if that's a desirable property, but it allows callers to expand %z internally and %Z using strftime. René
[PATCH] strbuf: let strbuf_addftime handle %z and %Z itself
There is no portable way to pass timezone information to strftime. Add parameters for timezone offset and name to strbuf_addftime and let it handle the timezone-related format specifiers %z and %Z internally. Callers can opt out by passing NULL as timezone name. Use an empty string as timezone name in show_date (the only current caller) for now because we only have the timezone offset in non-local mode. POSIX allows %Z to resolve to nothing in case of missing info. Helped-by: Ulrich MuellerHelped-by: Jeff King Signed-off-by: Rene Scharfe --- Duplicates strbuf_expand to a certain extent, but not too badly, I think. Leaves the door open for letting strftime handle the local case. date.c | 2 +- strbuf.c | 42 ++ strbuf.h | 11 --- t/t6006-rev-list-format.sh | 12 4 files changed, 59 insertions(+), 8 deletions(-) diff --git a/date.c b/date.c index 63fa99685e..5580577334 100644 --- a/date.c +++ b/date.c @@ -246,7 +246,7 @@ const char *show_date(timestamp_t time, int tz, const struct date_mode *mode) month_names[tm->tm_mon], tm->tm_year + 1900, tm->tm_hour, tm->tm_min, tm->tm_sec, tz); else if (mode->type == DATE_STRFTIME) - strbuf_addftime(, mode->strftime_fmt, tm); + strbuf_addftime(, mode->strftime_fmt, tm, tz, ""); else strbuf_addf(, "%.3s %.3s %d %02d:%02d:%02d %d%c%+05d", weekday_names[tm->tm_wday], diff --git a/strbuf.c b/strbuf.c index 00457940cf..b880107370 100644 --- a/strbuf.c +++ b/strbuf.c @@ -785,14 +785,47 @@ char *xstrfmt(const char *fmt, ...) return ret; } -void strbuf_addftime(struct strbuf *sb, const char *fmt, const struct tm *tm) +void strbuf_addftime(struct strbuf *sb, const char *fmt, const struct tm *tm, +int tz_offset, const char *tz_name) { + struct strbuf munged_fmt = STRBUF_INIT; size_t hint = 128; size_t len; if (!*fmt) return; + /* +* There is no portable way to pass timezone information to +* strftime, so we handle %z and %Z here. +*/ + if (tz_name) { + for (;;) { + const char *percent = strchrnul(fmt, '%'); + strbuf_add(_fmt, fmt, percent - fmt); + if (!*percent) + break; + fmt = percent + 1; + switch (*fmt) { + case '%': + strbuf_addstr(_fmt, "%%"); + fmt++; + break; + case 'z': + strbuf_addf(_fmt, "%+05d", tz_offset); + fmt++; + break; + case 'Z': + strbuf_addstr(_fmt, tz_name); + fmt++; + break; + default: + strbuf_addch(_fmt, '%'); + } + } + fmt = munged_fmt.buf; + } + strbuf_grow(sb, hint); len = strftime(sb->buf + sb->len, sb->alloc - sb->len, fmt, tm); @@ -804,17 +837,18 @@ void strbuf_addftime(struct strbuf *sb, const char *fmt, const struct tm *tm) * output contains at least one character, and then drop the extra * character before returning. */ - struct strbuf munged_fmt = STRBUF_INIT; - strbuf_addf(_fmt, "%s ", fmt); + if (fmt != munged_fmt.buf) + strbuf_addstr(_fmt, fmt); + strbuf_addch(_fmt, ' '); while (!len) { hint *= 2; strbuf_grow(sb, hint); len = strftime(sb->buf + sb->len, sb->alloc - sb->len, munged_fmt.buf, tm); } - strbuf_release(_fmt); len--; /* drop munged space */ } + strbuf_release(_fmt); strbuf_setlen(sb, sb->len + len); } diff --git a/strbuf.h b/strbuf.h index 80047b1bb7..39d5836abd 100644 --- a/strbuf.h +++ b/strbuf.h @@ -339,9 +339,14 @@ __attribute__((format (printf,2,0))) extern void strbuf_vaddf(struct strbuf *sb, const char *fmt, va_list ap); /** - * Add the time specified by `tm`, as formatted by `strftime`. - */ -extern void strbuf_addftime(struct strbuf *sb, const char *fmt, const struct tm *tm); + * Add the time specified by `tm`, as formatted by `strftime`. `tz_offset` + * and `tz_name` are used to expand %z and %Z internally, unless `tz_name` + * is NULL. `tz_offset` is in
Re: git-2.13.0: log --date=format:%z not working
Am 02.06.2017 um 05:08 schrieb Jeff King: In theory the solution is: 1. Start using localtime() instead of gmtime() with an adjustment when we are converting to the local timezone (i.e., format-local). We should be able to do this portably. This is easy to do, and it's better than handling %z ourselves, because it makes %Z work, too. 2. When showing the author's timezone, do some trickery to set the program's timezone, then use localtime(), then restore the program timezone. I couldn't get this to work reliably. And anyway, we'd still have nothing to put in %Z since we don't have a timezone name at all in the git objects. We just have "+0400" or whatever. So I don't see a portable way to make (2) work. We could create a strftime wrapper that also takes a time zone offset, with platform-specific implementations. Is it worth the effort? What reliability issues did you run into? But it seems a shame that %Z does not work for case (1) with René's patch. I guess we could do (1) for the local cases and then handle "%z" ourselves otherwise. That sounds even _more_ confusing, but it at least gets the most cases right. If we do handle "%z" ourselves (either always or for just the one case), what should the matching %Z say? Right now (and I think with René's patch) it says GMT, which is actively misleading. We should probably replace it with the same text as "%z". That's not quite what the user wanted, but at least it's accurate. On Linux "%z %Z" is expanded to "+0200 CEST" for me, while on Windows I get "Mitteleurop▒ische Sommerzeit Mitteleurop▒ische Sommerzeit". (That "▒" is probably supposed to be an "ä".) POSIX requires +hhmm or -hhmm format for %z, and for %Z is to be "Replaced by the timezone name or abbreviation". I'd say "GMT+0200" etc. is a nice enough timezone name, i.e. having %Z resolve to the same as %z plus a literal prefix of "GMT" should at least not be wrong. Alternatively we could have a lookup table mapping a few typical offsets to timezone names, but e.g. handling daylight saving times would probably be too hard (when did that part of the world switch in the given year? north or south of the equator?).. As far as the patch itself goes, I'm disappointed to lose the automatic "%" handling for all of the other callers. But I suspect the boilerplate involved in any solution that lets callers choose whether or not to use it would end up being longer than just handling it in each caller. Actually I felt uneasy when you added that forced %% handling because it put a policy into an otherwise neutral interpreter function. I just had no practical argument against it -- until now. I'd rather see strbuf_expand also lose the hard-coded percent sign, but again I don't have an actual user for such a flexibility (yet). Perhaps we should add a fully neutral strbuf_expand_core (or whatever), make strbuf_expand a wrapper with hard-coded % and %% handling and use the core function in the strftime wrapper. Except that the function is not easily stackable. Hmm.. René
Re: git-2.13.0: log --date=format:%z not working
Am 27.05.2017 um 23:46 schrieb Jeff King: > On Sat, May 27, 2017 at 06:57:08PM +0200, Ævar Arnfjörð Bjarmason wrote: > >> There's another test which breaks if we just s/gmtime/localtime/g. As >> far as I can tell to make the non-local case work we'd need to do a >> whole dance where we set the TZ variable to e.g. UTC$offset, then call >> strftime(), then call it again. Maybe there's some way to just specify >> the tz offset, but I didn't find any in a quick skimming of time.h. > > There isn't. Right. We could handle %z internally, though. %Z would be harder (left as an exercise for readers..). First we'd have to undo 0a0416a3 (strbuf_expand: convert "%%" to "%"), though, in order to give full control back to strbuf_expand callbacks. 2-pack patch: --- >8 --- builtin/cat-file.c | 5 + convert.c | 1 + daemon.c | 3 +++ date.c | 2 +- ll-merge.c | 5 +++-- pretty.c | 3 +++ strbuf.c | 39 ++- strbuf.h | 11 +-- t/t6006-rev-list-format.sh | 12 9 files changed, 63 insertions(+), 18 deletions(-) diff --git a/builtin/cat-file.c b/builtin/cat-file.c index 1890d7a639..9cf2e4291d 100644 --- a/builtin/cat-file.c +++ b/builtin/cat-file.c @@ -252,6 +252,11 @@ static size_t expand_format(struct strbuf *sb, const char *start, void *data) { const char *end; + if (*start == '%') { + strbuf_addch(sb, '%'); + return 1; + } + if (*start != '(') return 0; end = strchr(start + 1, ')'); diff --git a/convert.c b/convert.c index 8d652bf27c..8336fff694 100644 --- a/convert.c +++ b/convert.c @@ -399,6 +399,7 @@ static int filter_buffer_or_fd(int in, int out, void *data) struct strbuf path = STRBUF_INIT; struct strbuf_expand_dict_entry dict[] = { { "f", NULL, }, + { "%", "%" }, { NULL, NULL, }, }; diff --git a/daemon.c b/daemon.c index ac7181a483..191fac2716 100644 --- a/daemon.c +++ b/daemon.c @@ -148,6 +148,9 @@ static size_t expand_path(struct strbuf *sb, const char *placeholder, void *ctx) case 'D': strbuf_addstr(sb, context->directory); return 1; + case '%': + strbuf_addch(sb, '%'); + return 1; } return 0; } diff --git a/date.c b/date.c index 63fa99685e..d16a88eea5 100644 --- a/date.c +++ b/date.c @@ -246,7 +246,7 @@ const char *show_date(timestamp_t time, int tz, const struct date_mode *mode) month_names[tm->tm_mon], tm->tm_year + 1900, tm->tm_hour, tm->tm_min, tm->tm_sec, tz); else if (mode->type == DATE_STRFTIME) - strbuf_addftime(, mode->strftime_fmt, tm); + strbuf_addftime(, mode->strftime_fmt, tm, tz); else strbuf_addf(, "%.3s %.3s %d %02d:%02d:%02d %d%c%+05d", weekday_names[tm->tm_wday], diff --git a/ll-merge.c b/ll-merge.c index ac0d4a5d78..e6202c7397 100644 --- a/ll-merge.c +++ b/ll-merge.c @@ -172,7 +172,7 @@ static int ll_ext_merge(const struct ll_merge_driver *fn, { char temp[4][50]; struct strbuf cmd = STRBUF_INIT; - struct strbuf_expand_dict_entry dict[6]; + struct strbuf_expand_dict_entry dict[7]; struct strbuf path_sq = STRBUF_INIT; const char *args[] = { NULL, NULL }; int status, fd, i; @@ -185,7 +185,8 @@ static int ll_ext_merge(const struct ll_merge_driver *fn, dict[2].placeholder = "B"; dict[2].value = temp[2]; dict[3].placeholder = "L"; dict[3].value = temp[3]; dict[4].placeholder = "P"; dict[4].value = path_sq.buf; - dict[5].placeholder = NULL; dict[5].value = NULL; + dict[5].placeholder = "%"; dict[5].value = "%"; + dict[6].placeholder = NULL; dict[6].value = NULL; if (fn->cmdline == NULL) die("custom merge driver %s lacks command line.", fn->name); diff --git a/pretty.c b/pretty.c index 587d48371b..56872bfa25 100644 --- a/pretty.c +++ b/pretty.c @@ -1428,6 +1428,9 @@ static size_t format_commit_item(struct strbuf *sb, /* in UTF-8 */ } magic = NO_MAGIC; switch (placeholder[0]) { + case '%': + strbuf_addch(sb, '%'); + return 1; case '-': magic = DEL_LF_BEFORE_EMPTY; break; diff --git a/strbuf.c b/strbuf.c index 00457940cf..206dff6037 100644 --- a/strbuf.c +++ b/strbuf.c @@ -309,12 +309,6 @@ void strbuf_expand(struct strbuf *sb, const char *format, expand_fn_t fn, break; format = percent + 1; - if (*format == '%') { - strbuf_addch(sb, '%'); - format++; - continue; - } -
Re: [PATCH v4 00/10] The final building block for a faster rebase -i
Am 26.05.2017 um 05:15 schrieb Liam Beguin: I tried to time the execution on an interactive rebase (on Linux) but I did not notice a significant change in speed. Do we have a way to measure performance / speed changes between version? Well, there's performance test script p3404-rebase-interactive.sh. You could run it e.g. like this: $ (cd t/perf && ./run origin/master HEAD ./p3404*.sh) This would compare the performance of master with the current branch you're on. The results of p3404 are quite noisy for me on master, though (saw 15% difference between runs without any code changes), so take them with a bag of salt. There's more info on performance tests in general in t/perf/README. René
Re: [PATCH] mingw: simplify PATH handling
Am 20.05.2017 um 19:00 schrieb Johannes Sixt: Am 20.05.2017 um 17:29 schrieb René Scharfe: -static char *path_lookup(const char *cmd, char **path, int exe_only) +static char *path_lookup(const char *cmd, int exe_only) { +const char *path; char *prog = NULL; int len = strlen(cmd); int isexe = len >= 4 && !strcasecmp(cmd+len-4, ".exe"); if (strchr(cmd, '/') || strchr(cmd, '\\')) -prog = xstrdup(cmd); +return xstrdup(cmd); -while (!prog && *path) -prog = lookup_prog(*path++, cmd, isexe, exe_only); +path = mingw_getenv("PATH"); +if (!path) +return NULL; + +for (; !prog && *path; path++) { +const char *sep = strchrnul(path, ';'); +if (sep == path) +continue; +prog = lookup_prog(path, sep - path, cmd, isexe, exe_only); +path = sep; +} The loop termination does not work here. When the final PATH component is investigated, sep points to the NUL. This pointer is assigned to path, which is incremented and now points one past NUL. Then the loop condition (*path) accesses the char behind NUL. Ugh, yes. Thanks for catching! Cause: Last minute/hour edit, had used strspn() before. return prog; } @@ -1569,13 +1527,10 @@ static pid_t mingw_spawnve_fd(const char *cmd, const char **argv, char **deltaen } if (getenv("GIT_STRACE_COMMANDS")) { -char **path = get_path_split(); -char *p = path_lookup("strace.exe", path, 1); +char *p = path_lookup("strace.exe", 1); if (!p) { -free_path_split(path); return error("strace not found!"); } -free_path_split(path); if (xutftowcs_path(wcmd, p) < 0) { free(p); return -1; Upstream does not have this hunk. Right, it's in next, but not yet in master. And there are other differences, so it's a bad time to do that cleanup. :( René
[PATCH v2] mingw: simplify PATH handling
On Windows the environment variable PATH contains a semicolon-separated list of directories to search for, in order, when looking for the location of a binary to run. get_path_split() parses it and returns an array of string copies, which is iterated by path_lookup(), which in turn passes each entry to lookup_prog(). Change lookup_prog() to take the directory name as a length-limited string instead of as a NUL-terminated one and parse PATH directly in path_lookup(). This avoids memory allocations, simplifying the code. Helped-by: Johannes SixtSigned-off-by: Rene Scharfe --- Rebased against Junio's master, fixed string overrun. Can hold and resubmit in a few months if it gets in the way right now. compat/mingw.c | 91 +++--- 1 file changed, 23 insertions(+), 68 deletions(-) diff --git a/compat/mingw.c b/compat/mingw.c index 3fbfda5978..c6134f7c81 100644 --- a/compat/mingw.c +++ b/compat/mingw.c @@ -941,64 +941,14 @@ static const char *parse_interpreter(const char *cmd) } /* - * Splits the PATH into parts. - */ -static char **get_path_split(void) -{ - char *p, **path, *envpath = mingw_getenv("PATH"); - int i, n = 0; - - if (!envpath || !*envpath) - return NULL; - - envpath = xstrdup(envpath); - p = envpath; - while (p) { - char *dir = p; - p = strchr(p, ';'); - if (p) *p++ = '\0'; - if (*dir) { /* not earlier, catches series of ; */ - ++n; - } - } - if (!n) - return NULL; - - ALLOC_ARRAY(path, n + 1); - p = envpath; - i = 0; - do { - if (*p) - path[i++] = xstrdup(p); - p = p+strlen(p)+1; - } while (i < n); - path[i] = NULL; - - free(envpath); - - return path; -} - -static void free_path_split(char **path) -{ - char **p = path; - - if (!path) - return; - - while (*p) - free(*p++); - free(path); -} - -/* * exe_only means that we only want to detect .exe files, but not scripts * (which do not have an extension) */ -static char *lookup_prog(const char *dir, const char *cmd, int isexe, int exe_only) +static char *lookup_prog(const char *dir, int dirlen, const char *cmd, +int isexe, int exe_only) { char path[MAX_PATH]; - snprintf(path, sizeof(path), "%s/%s.exe", dir, cmd); + snprintf(path, sizeof(path), "%.*s\\%s.exe", dirlen, dir, cmd); if (!isexe && access(path, F_OK) == 0) return xstrdup(path); @@ -1013,17 +963,29 @@ static char *lookup_prog(const char *dir, const char *cmd, int isexe, int exe_on * Determines the absolute path of cmd using the split path in path. * If cmd contains a slash or backslash, no lookup is performed. */ -static char *path_lookup(const char *cmd, char **path, int exe_only) +static char *path_lookup(const char *cmd, int exe_only) { + const char *path; char *prog = NULL; int len = strlen(cmd); int isexe = len >= 4 && !strcasecmp(cmd+len-4, ".exe"); if (strchr(cmd, '/') || strchr(cmd, '\\')) - prog = xstrdup(cmd); + return xstrdup(cmd); + + path = mingw_getenv("PATH"); + if (!path) + return NULL; - while (!prog && *path) - prog = lookup_prog(*path++, cmd, isexe, exe_only); + while (!prog) { + const char *sep = strchrnul(path, ';'); + int dirlen = sep - path; + if (dirlen) + prog = lookup_prog(path, dirlen, cmd, isexe, exe_only); + if (!*sep) + break; + path = sep + 1; + } return prog; } @@ -1190,8 +1152,7 @@ pid_t mingw_spawnvpe(const char *cmd, const char **argv, char **deltaenv, int fhin, int fhout, int fherr) { pid_t pid; - char **path = get_path_split(); - char *prog = path_lookup(cmd, path, 0); + char *prog = path_lookup(cmd, 0); if (!prog) { errno = ENOENT; @@ -1202,7 +1163,7 @@ pid_t mingw_spawnvpe(const char *cmd, const char **argv, char **deltaenv, if (interpr) { const char *argv0 = argv[0]; - char *iprog = path_lookup(interpr, path, 1); + char *iprog = path_lookup(interpr, 1); argv[0] = prog; if (!iprog) { errno = ENOENT; @@ -1220,21 +1181,18 @@ pid_t mingw_spawnvpe(const char *cmd, const char **argv, char **deltaenv, fhin, fhout, fherr); free(prog); } - free_path_split(path); return pid; } static int
[PATCH] mingw: simplify PATH handling
On Windows the environment variable PATH contains a semicolon-separated list of directories to search for, in order, when looking for the location of a binary to run. get_path_split() parses it and returns an array of string copies, which is iterated by path_lookup(), which in turn passes each entry to lookup_prog(). Change lookup_prog() to take the directory name as a length-limited string instead of as a NUL-terminated one and parse PATH directly in path_lookup(). This avoids memory allocations, simplifying the code. Signed-off-by: Rene Scharfe--- compat/mingw.c | 96 ++ 1 file changed, 22 insertions(+), 74 deletions(-) diff --git a/compat/mingw.c b/compat/mingw.c index 5113071bc7..7bc61d4066 100644 --- a/compat/mingw.c +++ b/compat/mingw.c @@ -1154,67 +1154,15 @@ static const char *parse_interpreter(const char *cmd) } /* - * Splits the PATH into parts. - */ -static char **get_path_split(void) -{ - char *p, **path, *envpath = mingw_getenv("PATH"); - int i, n = 0; - - if (!envpath || !*envpath) - return NULL; - - envpath = xstrdup(envpath); - p = envpath; - while (p) { - char *dir = p; - p = strchr(p, ';'); - if (p) *p++ = '\0'; - if (*dir) { /* not earlier, catches series of ; */ - ++n; - } - } - if (!n) { - free(envpath); - return NULL; - } - - ALLOC_ARRAY(path, n + 1); - p = envpath; - i = 0; - do { - if (*p) - path[i++] = xstrdup(p); - p = p+strlen(p)+1; - } while (i < n); - path[i] = NULL; - - free(envpath); - - return path; -} - -static void free_path_split(char **path) -{ - char **p = path; - - if (!path) - return; - - while (*p) - free(*p++); - free(path); -} - -/* * exe_only means that we only want to detect .exe files, but not scripts * (which do not have an extension) */ -static char *lookup_prog(const char *dir, const char *cmd, int isexe, int exe_only) +static char *lookup_prog(const char *dir, int dirlen, const char *cmd, +int isexe, int exe_only) { char path[MAX_PATH]; wchar_t wpath[MAX_PATH]; - snprintf(path, sizeof(path), "%s\\%s.exe", dir, cmd); + snprintf(path, sizeof(path), "%.*s\\%s.exe", dirlen, dir, cmd); if (xutftowcs_path(wpath, path) < 0) return NULL; @@ -1235,17 +1183,27 @@ static char *lookup_prog(const char *dir, const char *cmd, int isexe, int exe_on * Determines the absolute path of cmd using the split path in path. * If cmd contains a slash or backslash, no lookup is performed. */ -static char *path_lookup(const char *cmd, char **path, int exe_only) +static char *path_lookup(const char *cmd, int exe_only) { + const char *path; char *prog = NULL; int len = strlen(cmd); int isexe = len >= 4 && !strcasecmp(cmd+len-4, ".exe"); if (strchr(cmd, '/') || strchr(cmd, '\\')) - prog = xstrdup(cmd); + return xstrdup(cmd); - while (!prog && *path) - prog = lookup_prog(*path++, cmd, isexe, exe_only); + path = mingw_getenv("PATH"); + if (!path) + return NULL; + + for (; !prog && *path; path++) { + const char *sep = strchrnul(path, ';'); + if (sep == path) + continue; + prog = lookup_prog(path, sep - path, cmd, isexe, exe_only); + path = sep; + } return prog; } @@ -1569,13 +1527,10 @@ static pid_t mingw_spawnve_fd(const char *cmd, const char **argv, char **deltaen } if (getenv("GIT_STRACE_COMMANDS")) { - char **path = get_path_split(); - char *p = path_lookup("strace.exe", path, 1); + char *p = path_lookup("strace.exe", 1); if (!p) { - free_path_split(path); return error("strace not found!"); } - free_path_split(path); if (xutftowcs_path(wcmd, p) < 0) { free(p); return -1; @@ -1634,8 +1589,7 @@ pid_t mingw_spawnvpe(const char *cmd, const char **argv, char **deltaenv, int fhin, int fhout, int fherr) { pid_t pid; - char **path = get_path_split(); - char *prog = path_lookup(cmd, path, 0); + char *prog = path_lookup(cmd, 0); if (!prog) { errno = ENOENT; @@ -1646,7 +1600,7 @@ pid_t mingw_spawnvpe(const char *cmd, const char **argv, char **deltaenv, if (interpr) { const char *argv0 = argv[0]; - char *iprog =
[PATCH 5/5] p0004: don't error out if test repo is too small
Repositories with less than 4000 entries are always handled using a single thread, causing test-lazy-init-name-hash --multi to error out. Don't abort the whole test script in that case, but simply skip the multi-threaded performance check. We can still use it to compare the single-threaded speed of different versions in that case. Signed-off-by: Rene Scharfe--- t/perf/p0004-lazy-init-name-hash.sh | 13 - 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/t/perf/p0004-lazy-init-name-hash.sh b/t/perf/p0004-lazy-init-name-hash.sh index 3c2135a185..8de5a98cfc 100755 --- a/t/perf/p0004-lazy-init-name-hash.sh +++ b/t/perf/p0004-lazy-init-name-hash.sh @@ -8,10 +8,13 @@ test_checkout_worktree test_expect_success 'verify both methods build the same hashmaps' ' test-lazy-init-name-hash --dump --single >out.single && - test-lazy-init-name-hash --dump --multi >out.multi && - sort sorted.single && - sort sorted.multi && - test_cmp sorted.single sorted.multi + if test-lazy-init-name-hash --dump --multi >out.multi + then + test_set_prereq REPO_BIG_ENOUGH_FOR_MULTI && + sort sorted.single && + sort sorted.multi && + test_cmp sorted.single sorted.multi + fi ' test_expect_success 'calibrate' ' @@ -46,7 +49,7 @@ test_perf "single-threaded, $desc" " test-lazy-init-name-hash --single --count=$count " -test_perf "multi-threaded, $desc" " +test_perf REPO_BIG_ENOUGH_FOR_MULTI "multi-threaded, $desc" " test-lazy-init-name-hash --multi --count=$count " -- 2.12.2
[PATCH 4/5] p0004: don't abort if multi-threaded is too slow
If the single-threaded variant beats the multi-threaded one then we may have a performance bug, but that doesn't justify aborting the test. Drop that check; we can compare the results for --single and --multi using the actual performance tests. Signed-off-by: Rene Scharfe--- t/perf/p0004-lazy-init-name-hash.sh | 4 1 file changed, 4 deletions(-) diff --git a/t/perf/p0004-lazy-init-name-hash.sh b/t/perf/p0004-lazy-init-name-hash.sh index d30c32f97b..3c2135a185 100755 --- a/t/perf/p0004-lazy-init-name-hash.sh +++ b/t/perf/p0004-lazy-init-name-hash.sh @@ -14,10 +14,6 @@ test_expect_success 'verify both methods build the same hashmaps' ' test_cmp sorted.single sorted.multi ' -test_expect_success 'multithreaded should be faster' ' - test-lazy-init-name-hash --perf >out.perf -' - test_expect_success 'calibrate' ' entries=$(wc -l
[PATCH 3/5] p0004: use test_perf
The perf test suite (more specifically: t/perf/aggregate.perl) requires each test script to write test results into a file, otherwise it aborts when aggregating. Add actual performance tests with test_perf to allow p0004 to be run together with other perf scripts. Calibrate the value for the parameter --count based on the size of the test repository, in order to get meaningful results with smaller repos yet still be able to finish the script against huge ones without having to wait for hours. Signed-off-by: Rene Scharfe--- The numbers are just guesses; I didn't actually test all ranges. t/perf/p0004-lazy-init-name-hash.sh | 36 1 file changed, 36 insertions(+) diff --git a/t/perf/p0004-lazy-init-name-hash.sh b/t/perf/p0004-lazy-init-name-hash.sh index 576bdc3c4e..d30c32f97b 100755 --- a/t/perf/p0004-lazy-init-name-hash.sh +++ b/t/perf/p0004-lazy-init-name-hash.sh @@ -18,4 +18,40 @@ test_expect_success 'multithreaded should be faster' ' test-lazy-init-name-hash --perf >out.perf ' +test_expect_success 'calibrate' ' + entries=$(wc -l
[PATCH 2/5] p0004: avoid using pipes
The return code of commands on the producing end of a pipe is ignored. Evaluate the outcome of test-lazy-init-name-hash by calling sort separately. Signed-off-by: Rene Scharfe--- t/perf/p0004-lazy-init-name-hash.sh | 8 +--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/t/perf/p0004-lazy-init-name-hash.sh b/t/perf/p0004-lazy-init-name-hash.sh index 716c951553..576bdc3c4e 100755 --- a/t/perf/p0004-lazy-init-name-hash.sh +++ b/t/perf/p0004-lazy-init-name-hash.sh @@ -7,9 +7,11 @@ test_perf_large_repo test_checkout_worktree test_expect_success 'verify both methods build the same hashmaps' ' - test-lazy-init-name-hash --dump --single | sort >out.single && - test-lazy-init-name-hash --dump --multi | sort >out.multi && - test_cmp out.single out.multi + test-lazy-init-name-hash --dump --single >out.single && + test-lazy-init-name-hash --dump --multi >out.multi && + sort sorted.single && + sort sorted.multi && + test_cmp sorted.single sorted.multi ' test_expect_success 'multithreaded should be faster' ' -- 2.12.2
[PATCH 1/5] p0004: simplify calls of test-lazy-init-name-hash
The test library puts helpers into $PATH, so we can simply call them without specifying their location. The suffix $X is also not necessary because .exe files on Windows can be started without specifying their extension, and on other platforms it's empty anyway. Signed-off-by: Rene Scharfe--- t/perf/p0004-lazy-init-name-hash.sh | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/t/perf/p0004-lazy-init-name-hash.sh b/t/perf/p0004-lazy-init-name-hash.sh index 5afa8c8df3..716c951553 100755 --- a/t/perf/p0004-lazy-init-name-hash.sh +++ b/t/perf/p0004-lazy-init-name-hash.sh @@ -7,13 +7,13 @@ test_perf_large_repo test_checkout_worktree test_expect_success 'verify both methods build the same hashmaps' ' - $GIT_BUILD_DIR/t/helper/test-lazy-init-name-hash$X --dump --single | sort >out.single && - $GIT_BUILD_DIR/t/helper/test-lazy-init-name-hash$X --dump --multi | sort >out.multi && + test-lazy-init-name-hash --dump --single | sort >out.single && + test-lazy-init-name-hash --dump --multi | sort >out.multi && test_cmp out.single out.multi ' test_expect_success 'multithreaded should be faster' ' - $GIT_BUILD_DIR/t/helper/test-lazy-init-name-hash$X --perf >out.perf + test-lazy-init-name-hash --perf >out.perf ' test_done -- 2.12.2
[PATCH 0/5] p0004: support being called by t/perf/run
p0004-lazy-init-name-hash.sh errors out if the test repo is too small, and doesn't generate any perf test results even if it finishes successfully. That prevents t/perf/run from running the whole test suite. This series tries to address these issues. p0004: simplify calls of test-lazy-init-name-hash p0004: avoid using pipes p0004: use test_perf p0004: don't abort if multi-threaded is too slow p0004: don't error out if test repo is too small t/perf/p0004-lazy-init-name-hash.sh | 47 + 1 file changed, 42 insertions(+), 5 deletions(-) -- 2.12.2
Re: [PATCH v3 12/25] split_commit_in_progress(): fix memory leak
Am 04.05.2017 um 12:59 schrieb Johannes Schindelin: Hi René, On Wed, 3 May 2017, René Scharfe wrote: Am 02.05.2017 um 18:02 schrieb Johannes Schindelin: Reported via Coverity. Signed-off-by: Johannes Schindelin <johannes.schinde...@gmx.de> --- wt-status.c | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/wt-status.c b/wt-status.c index 0a6e16dbe0f..1f3f6bcb980 100644 --- a/wt-status.c +++ b/wt-status.c @@ -1088,8 +1088,13 @@ static int split_commit_in_progress(struct wt_status *s) char *rebase_orig_head = read_line_from_git_path("rebase-merge/orig-head"); if (!head || !orig_head || !rebase_amend || !rebase_orig_head || - !s->branch || strcmp(s->branch, "HEAD")) + !s->branch || strcmp(s->branch, "HEAD")) { + free(head); + free(orig_head); + free(rebase_amend); + free(rebase_orig_head); return split_in_progress; + } if (!strcmp(rebase_amend, rebase_orig_head)) { if (strcmp(head, rebase_amend)) The return line could be replaced by "; else" to achieve the same result, without duplicating the free calls. True. It is much harder to explain it that way, though: the context looks like this: static int split_commit_in_progress(struct wt_status *s) { int split_in_progress = 0; char *head = read_line_from_git_path("HEAD"); char *orig_head = read_line_from_git_path("ORIG_HEAD"); char *rebase_amend = read_line_from_git_path("rebase-merge/amend"); char *rebase_orig_head = read_line_from_git_path("rebase-merge/orig-head"); if (!head || !orig_head || !rebase_amend || !rebase_orig_head || - !s->branch || strcmp(s->branch, "HEAD")) + !s->branch || strcmp(s->branch, "HEAD")) { + free(head); + free(orig_head); + free(rebase_amend); + free(rebase_orig_head); return split_in_progress; + } if (!strcmp(rebase_amend, rebase_orig_head)) { if (strcmp(head, rebase_amend)) split_in_progress = 1; } else if (strcmp(orig_head, rebase_orig_head)) { split_in_progress = 1; } if (!s->amend && !s->nowarn && !s->workdir_dirty) split_in_progress = 0; free(head); free(orig_head); free(rebase_amend); free(rebase_orig_head); return split_in_progress; } So as you see, if we make the change you suggested, the next if() is hit which possibly sets split_in_progress = 0. The thing is: this variable has been initialized to 0 in the beginning! So this conditional assignment would be a noop, unless any of the code paths before this conditional modified split_in_progress. Right. It could have been used as a quick two-line fix. After you fully wrapped your head around this code, I am sure you agree that the code is unnecessarily confusing to begin with: why do we go out of our way to allocate and read all those strings, compare them to figure out whether there is a split in progress, just to look at bits in the wt_status struct (that we had available from the beginning) to potentially undo all of that. The only function with a higher cyclomatic complexity in this file is wt_longstatus_print(), which spans more than 100 lines. Amazing. What's worse, I cannot even find a logical explanation why the code is so confusing, as it has been added as it is today in commit 2d1ccebae41 (status: better advices when splitting a commit (during rebase -i), 2012-06-10). Repeatedly I get the impression that defects or shortcomings tend to cluster. It pays to take a good look at the code surrounding a find of an automatic checker for other possible improvements. So I think this function really wants to be fixed more fully (although I feel bad for inserting such a non-trivial fix into a patch series otherwise populated by trivial memory/resource leak plugs): It could be done in two steps. Doing it in one is probably OK as well. -- snipsnap -- Subject: [PATCH] squash! split_commit_in_progress(): fix memory leak split_commit_in_progress(): simplify & fix memory leak This function did a whole lot of unnecessary work, such as reading in four files just to figure out that, oh, hey, we do not need to look at them after all because the HEAD is not detached. Simplify the entire function to return early when possible, to read in the files only when necessary, and to release the allocated memory always (there was a leak, reported via Coverity, where we failed to release the allocated strings if the HEAD is not detached). Signed-off-by: Johannes Schindelin <johannes.schinde...@gmx.de> --- wt-status.c | 39