Re: [PATCH] t4062: stop using repetition in regex

2017-08-08 Thread René Scharfe
Am 08.08.2017 um 16:49 schrieb Johannes Schindelin:
> Hi René,
> 
> On Tue, 8 Aug 2017, René Scharfe wrote:
> 
>> OpenBSD's regex library has a repetition limit (RE_DUP_MAX) of 255.
>> That's the minimum acceptable value according to POSIX.  In t4062 we use
>> 4096 repetitions in the test "-G matches", though, causing it to fail.
>>
>> Do the same as the test "-S --pickaxe-regex" in the same file and search
>> for a single zero instead.  That still suffices to trigger the buffer
>> overrun in older versions (checked with b7d36ffca02^ and --valgrind on
>> Linux), simplifies the test a bit, and avoids exceeding OpenBSD's limit.
> 
> I am afraid not. The 4096 is precisely the page size required to trigger
> the bug on Windows against which this regression test tries to safeguard.

Checked with b7d36ffca02^ on MinGW now as well and found that it
segfaults with the proposed change ten out of ten times.

You get different results?  How is that possible?  The search string is
NUL-terminated in each case, while the point of the test is that the
file contents isn't, right?

> Maybe simply disable the test on OpenBSD instead? Or guard the {4096}
> behind the MINGW prereq.

It's easy to build a long search string with two repetitions or by using
a longer string as the base, if necessary.  But first we need to find out
why regexec() doesn't overflow in your case.  My build uses the version
from compat/.  Why would it stop before reaching a NUL?  That sounds like
a different and serious bug.

Thanks,
René


[PATCH] t4062: stop using repetition in regex

2017-08-08 Thread René Scharfe
OpenBSD's regex library has a repetition limit (RE_DUP_MAX) of 255.
That's the minimum acceptable value according to POSIX.  In t4062 we use
4096 repetitions in the test "-G matches", though, causing it to fail.

Do the same as the test "-S --pickaxe-regex" in the same file and search
for a single zero instead.  That still suffices to trigger the buffer
overrun in older versions (checked with b7d36ffca02^ and --valgrind on
Linux), simplifies the test a bit, and avoids exceeding OpenBSD's limit.

Original-patch-by: David Coppa 
Signed-off-by: Rene Scharfe 
---
 t/t4062-diff-pickaxe.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/t/t4062-diff-pickaxe.sh b/t/t4062-diff-pickaxe.sh
index 7c4903f497..c16e5af6fa 100755
--- a/t/t4062-diff-pickaxe.sh
+++ b/t/t4062-diff-pickaxe.sh
@@ -15,7 +15,7 @@ test_expect_success setup '
git commit -m "A 4k file"
 '
 test_expect_success '-G matches' '
-   git diff --name-only -G "^0{4096}$" HEAD^ >out &&
+   git diff --name-only -G0 HEAD^ >out &&
test 4096-zeroes.txt = "$(cat out)"
 '
 
-- 
2.14.0


Re: [PATCH v4 4/4] add: modify already added files when --chmod is given

2017-08-07 Thread René Scharfe
Am 14.09.2016 um 23:07 schrieb Thomas Gummerer:
> When the chmod option was added to git add, it was hooked up to the diff
> machinery, meaning that it only works when the version in the index
> differs from the version on disk.
> 
> As the option was supposed to mirror the chmod option in update-index,
> which always changes the mode in the index, regardless of the status of
> the file, make sure the option behaves the same way in git add.
> 
> Signed-off-by: Thomas Gummerer 

Sorry for replying almost a year late, hopefully you're still interested.

> ---
>   builtin/add.c  | 47 ---
>   builtin/checkout.c |  2 +-
>   builtin/commit.c   |  2 +-
>   cache.h| 10 +-
>   read-cache.c   | 14 ++
>   t/t3700-add.sh | 50 ++
>   6 files changed, 91 insertions(+), 34 deletions(-)
> 
> diff --git a/builtin/add.c b/builtin/add.c
> index b1dddb4..595a0b2 100644
> --- a/builtin/add.c
> +++ b/builtin/add.c
> @@ -26,10 +26,25 @@ static int patch_interactive, add_interactive, 
> edit_interactive;
>   static int take_worktree_changes;
>   
>   struct update_callback_data {
> - int flags, force_mode;
> + int flags;
>   int add_errors;
>   };
>   
> +static void chmod_pathspec(struct pathspec *pathspec, int force_mode)

"int force_mode" looks like a binary (or perhaps ternary) flag, but
actually it is a character and can only have the values '-' or '+'.
In builtin/update-index.c it's called "char flip" and we probably should
define it like this here as well.

> +{
> + int i;
> + 
> + for (i = 0; i < active_nr; i++) {
> + struct cache_entry *ce = active_cache[i];
> +
> + if (pathspec && !ce_path_match(ce, pathspec, NULL))
> + continue;
> +
> + if (chmod_cache_entry(ce, force_mode) < 0)
> + fprintf(stderr, "cannot chmod '%s'", ce->name);

This error message is missing a newline.  In builtin/update-index.c we
also show the attempted change (-x or +x); perhaps we want to do that
here as well.

Currently chmod_cache_entry() can only fail if ce is not a regular
file or it's other parameter is neither '-' nor '+'.  We rule out the
latter already in the argument parsing code.  The former can happen if
we add a symlink, either explicitly or because it's in a directory
we're specified.

I wonder if we even need to report anything, or under which conditions.
If you have a file named dir/file and a symlink named dir/symlink then
the interesting cases are:

git add --chmod=.. dir/symlink
git add --chmod=.. dir/file dir/symlink
git add --chmod=.. dir

Warning about each case may be the most cautious thing to do, but
documenting that --chmod has no effect on symlinks and keeping silent
might be less annoying, especially in the last case.  What do you
think?

> @@ -342,13 +354,8 @@ int cmd_add(int argc, const char **argv, const char 
> *prefix)
>   if (!show_only && ignore_missing)
>   die(_("Option --ignore-missing can only be used together with 
> --dry-run"));
>   
> - if (!chmod_arg)
> - force_mode = 0;
> - else if (!strcmp(chmod_arg, "-x"))
> - force_mode = 0666;
> - else if (!strcmp(chmod_arg, "+x"))
> - force_mode = 0777;
> - else
> + if (chmod_arg && ((chmod_arg[0] != '-' && chmod_arg[0] != '+') ||
> +   chmod_arg[1] != 'x' || chmod_arg[2]))
>   die(_("--chmod param '%s' must be either -x or +x"), chmod_arg);

That's the argument parsing code mentioned above.  The strcmp-based
checks look nicer to me btw.  How about this?

if (chmod_arg && strcmp(chmod_arg, "-x") && strcmp(chmod_arg, "+x"))

But that's just nitpicking.

René


[PATCH] t3700: fix broken test under !POSIXPERM

2017-08-07 Thread René Scharfe
76e368c378 (t3700: fix broken test under !SANITY) explains that the test
'git add --chmod=[+-]x changes index with already added file' can fail
if xfoo3 is still present as a symlink from a previous test and deletes
it with rm(1).  That still leaves it present in the index, which causes
the test to fail if POSIXPERM is not defined.  Get rid of it by calling
"git reset --hard" instead, as 76e368c378 already mentioned in passing.

Signed-off-by: Rene Scharfe 
---
 t/t3700-add.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/t/t3700-add.sh b/t/t3700-add.sh
index f3a4b4a913..4111fa3b7a 100755
--- a/t/t3700-add.sh
+++ b/t/t3700-add.sh
@@ -355,7 +355,7 @@ test_expect_success POSIXPERM,SYMLINKS 'git add --chmod=+x 
with symlinks' '
 '
 
 test_expect_success 'git add --chmod=[+-]x changes index with already added 
file' '
-   rm -f foo3 xfoo3 &&
+   git reset --hard &&
echo foo >foo3 &&
git add foo3 &&
git add --chmod=+x foo3 &&
-- 
2.14.0


[PATCH] test-path-utils: handle const parameter of basename and dirname

2017-08-07 Thread René Scharfe
The parameter to basename(3) and dirname(3) traditionally had the type
"char *", but on OpenBSD it's been "const char *" for years.  That
causes (at least) Clang to throw an incompatible-pointer-types warning
for test-path-utils, where we try to pass around pointers to these
functions.

Avoid this warning (which is fatal in DEVELOPER mode) by ignoring the
promise of OpenBSD's implementations to keep input strings unmodified
and enclosing them in POSIX-compatible wrappers.

Signed-off-by: Rene Scharfe 
---
 t/helper/test-path-utils.c | 18 --
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/t/helper/test-path-utils.c b/t/helper/test-path-utils.c
index 1ebe0f750c..2b3c5092a1 100644
--- a/t/helper/test-path-utils.c
+++ b/t/helper/test-path-utils.c
@@ -38,6 +38,20 @@ struct test_data {
const char *alternative; /* output: ... or this.  */
 };
 
+/*
+ * Compatibility wrappers for OpenBSD, whose basename(3) and dirname(3)
+ * have const parameters.
+ */
+static char *posix_basename(char *path)
+{
+   return basename(path);
+}
+
+static char *posix_dirname(char *path)
+{
+   return dirname(path);
+}
+
 static int test_function(struct test_data *data, char *(*func)(char *input),
const char *funcname)
 {
@@ -251,10 +265,10 @@ int cmd_main(int argc, const char **argv)
}
 
if (argc == 2 && !strcmp(argv[1], "basename"))
-   return test_function(basename_data, basename, argv[1]);
+   return test_function(basename_data, posix_basename, argv[1]);
 
if (argc == 2 && !strcmp(argv[1], "dirname"))
-   return test_function(dirname_data, dirname, argv[1]);
+   return test_function(dirname_data, posix_dirname, argv[1]);
 
fprintf(stderr, "%s: unknown function name: %s\n", argv[0],
argv[1] ? argv[1] : "(there was none)");
-- 
2.14.0



[PATCH] t0001: skip test with restrictive permissions if getpwd(3) respects them

2017-08-07 Thread René Scharfe
The sub-test "init in long base path" in t0001 checks the ability to
handle long base paths with restrictive permissions (--x).  On OpenBSD
getcwd(3) fails in that case even for short paths.  Check the two
aspects separately by trying to use a long base path both with and
without execute-only permissions.  Only attempt the former if we know
that getcwd(3) doesn't care.

Original-patch-by: David Coppa <dco...@openbsd.org>
Reported-by: Ævar Arnfjörð Bjarmason <ava...@gmail.com>
Signed-off-by: René Scharfe <l@web.de>
---
 t/t0001-init.sh | 30 --
 1 file changed, 28 insertions(+), 2 deletions(-)

diff --git a/t/t0001-init.sh b/t/t0001-init.sh
index c4814d248f..86c1a51654 100755
--- a/t/t0001-init.sh
+++ b/t/t0001-init.sh
@@ -315,18 +315,44 @@ test_expect_success 'init with separate gitdir' '
test_path_is_dir realgitdir/refs
 '
 
-test_expect_success 'init in long base path' '
+test_lazy_prereq GETCWD_IGNORES_PERMS '
+   base=GETCWD_TEST_BASE_DIR &&
+   mkdir -p $base/dir &&
+   chmod 100 $base ||
+   error "bug in test script: cannot prepare $base"
+
+   (cd $base/dir && /bin/pwd -P)
+   status=$?
+
+   chmod 700 $base &&
+   rm -rf $base ||
+   error "bug in test script: cannot clean $base"
+   return $status
+'
+
+check_long_base_path () {
# exceed initial buffer size of strbuf_getcwd()
component=123456789abcdef &&
test_when_finished "chmod 0700 $component; rm -rf $component" &&
p31=$component/$component &&
p127=$p31/$p31/$p31/$p31 &&
mkdir -p $p127 &&
-   chmod 0111 $component &&
+   if test $# = 1
+   then
+   chmod $1 $component
+   fi &&
(
cd $p127 &&
git init newdir
)
+}
+
+test_expect_success 'init in long base path' '
+   check_long_base_path
+'
+
+test_expect_success GETCWD_IGNORES_PERMS 'init in long restricted base path' '
+   check_long_base_path 0111
 '
 
 test_expect_success 're-init on .git file' '
-- 
2.14.0


Re: [PATCH] tests: don't give unportable ">" to "test" built-in, use -gt

2017-08-07 Thread René Scharfe
Am 07.08.2017 um 03:18 schrieb brian m. carlson:
> On Sun, Aug 06, 2017 at 11:38:50PM +, Ævar Arnfjörð Bjarmason wrote:
>> Change an argument to test_line_count (which'll ultimately be turned
>> into a "test" expression) to use "-gt" instead of ">" for an
>> arithmetic test.
>>
>> This broken on e.g. OpenBSD as of v2.13.0 with my commit
>> ac3f5a3468 ("ref-filter: add --no-contains option to
>> tag/branch/for-each-ref", 2017-03-24).
>>
>> Upstream just worked around it by patching git and didn't tell us
>> about it, I discovered this when reading various Git packaging
>> implementations: https://github.com/openbsd/ports/commit/7e48bf88a20
>>
>> Signed-off-by: Ævar Arnfjörð Bjarmason 
>> ---
>>
>> David, it would be great to get a quick bug report to
>> git@vger.kernel.org if you end up having to monkeypatch something
>> we've done. We won't bite, promise :)
>>
>> As shown in that linked Github commit OpenBSD has another recent
>> workaround in turning on DIR_HAS_BSD_GROUP_SEMANTICS and skipping a
>> related test, maybe René can make more sense of that?
> 
> I've confirmed using the NetBSD 7.1 man pages that NetBSD will also want
> DIR_HAS_BSD_GROUP_SEMANTICS.  MirBSD will also, according to its man
> pages.
> 
> As I understand it, the only consequence of not setting this flag on BSD
> systems is that some directories will be setgid, which, while ugly and
> useless, should have no negative effect.

Right; specifically it's for newly created subdirectories of shared
directories, which we want to be owned by the same group as their
parent. That's the default on BSDs, and we have to set the setgid bit to
turn on that semantic only on other systems, e.g. on Linux.

81a24b52c1 (Do not use GUID on dir in git init --shared=all on FreeBSD)
introduced DIR_HAS_BSD_GROUP_SEMANTICS with the rationale that setting
setgid on directories may not even be allowed for normal users.  I can't
reproduce this (t1301 succeeds for me on FreeBSD 10.3 even without that
build flag), but apparently at least in some configurations it's not just
a cosmetic issue.

The skipped test 'init in long base path' in t0001 is a different kettle
of fish.  getcwd(3) on OpenBSD respects permissions on the parent
directories up to root.  E.g. after "chmod 711 /home" normal users would
get EACCES when they'd call getcwd(3) in their homes there, while e.g.
on Linux and FreeBSD they'd successfully get their current working dir.

René


Re: Fwd: New Defects reported by Coverity Scan for git

2017-07-20 Thread René Scharfe
Am 18.07.2017 um 19:23 schrieb Junio C Hamano:
> Stefan Beller  writes:
> 
>> I looked at this report for a while. My current understanding:
>> * its detection was triggered by including rs/move-array,
>>f331ab9d4c (use MOVE_ARRAY, 2017-07-15)
>> * But it is harmless, because the scan logic does not understand
>>how ALLOC_GROW works. It assumes that
>>done_pbase_paths_alloc can be larger
>>than done_pbase_paths_num + 1, while done_pbase_paths
>>is NULL, such that the memory allocation is not triggered.
>>If that were the case, then we have 2 subsequent dereferences
>>of a NULL pointer right after that. But by inspecting the use
>>of _alloc and _num the initial assumption does not seem possible.
> 
> Yes, it does appear that way.  ALLOC_GROW() calls REALLOC_ARRAY()
> which safely can realloc NULL to specified size via xrealloc().

MOVE_ARRAY is passing its pointer arguments to memmove(); all it adds
is a check for (done_pbase_paths_num - pos - 1) being zero.  I don't
understand how that change can make it more likely for one of the
pointers to be NULL.

I guess the first message ('Comparing "done_pbase_paths" to null
implies that "done_pbase_paths" might be null.') has to be understood
as an explanation of how the checker arrived at the second one?

We could remove that NULL check -- it's effectively just a shortcut.
But how would that improve safety?  Well, if the array is unallocated
(NULL) and _num is greater than zero we'd get a segfault without it,
and thus would notice it.  That check currently papers over such a
hypothetical bug.  Makes sense?

-- >8 --
Subject: [PATCH] pack-objects: remove unnecessary NULL check

If done_pbase_paths is NULL then done_pbase_paths_num must be zero and
done_pbase_path_pos() returns -1 without accessing the array, so the
check is not necessary.

If the invariant was violated then the check would make sure we keep
on going and allocate the necessary amount of memory in the next
ALLOC_GROW call.  That sounds nice, but all array entries except for
one would contain garbage data.

If the invariant was violated without the check we'd get a segfault in
done_pbase_path_pos(), i.e. an observable crash, alerting us of the
presence of a bug.

Currently there is no such bug: Only the functions check_pbase_path()
and cleanup_preferred_base() change pointer and counter, and both make
sure to keep them in sync.  Get rid of the check anyway to allow us to
see if later changes introduce such a defect, and to simplify the code.

Detected by Coverity Scan.

Signed-off-by: Rene Scharfe 
---
 builtin/pack-objects.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index e730b415bf..c753e9237a 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -1289,7 +1289,7 @@ static int done_pbase_path_pos(unsigned hash)
 
 static int check_pbase_path(unsigned hash)
 {
-   int pos = (!done_pbase_paths) ? -1 : done_pbase_path_pos(hash);
+   int pos = done_pbase_path_pos(hash);
if (0 <= pos)
return 1;
pos = -pos - 1;
-- 
2.13.3


Re: [PATCH] dir: support platforms that require aligned reads

2017-07-16 Thread René Scharfe
Am 16.07.2017 um 16:04 schrieb Jeff King:
> On Sun, Jul 16, 2017 at 02:17:37PM +0200, René Scharfe wrote:
> 
>> -static void stat_data_from_disk(struct stat_data *to, const struct 
>> stat_data *from)
>> +static void stat_data_from_disk(struct stat_data *to, const unsigned char 
>> *data)
>>   {
>> -to->sd_ctime.sec  = get_be32(>sd_ctime.sec);
>> -to->sd_ctime.nsec = get_be32(>sd_ctime.nsec);
>> -to->sd_mtime.sec  = get_be32(>sd_mtime.sec);
>> -to->sd_mtime.nsec = get_be32(>sd_mtime.nsec);
>> -to->sd_dev= get_be32(>sd_dev);
>> -to->sd_ino= get_be32(>sd_ino);
>> -to->sd_uid= get_be32(>sd_uid);
>> -to->sd_gid= get_be32(>sd_gid);
>> -to->sd_size   = get_be32(>sd_size);
>> +memcpy(to, data, sizeof(*to));
>> +to->sd_ctime.sec  = ntohl(to->sd_ctime.sec);
>> +to->sd_ctime.nsec = ntohl(to->sd_ctime.nsec);
>> +to->sd_mtime.sec  = ntohl(to->sd_mtime.sec);
>> +to->sd_mtime.nsec = ntohl(to->sd_mtime.nsec);
>> +to->sd_dev= ntohl(to->sd_dev);
>> +to->sd_ino= ntohl(to->sd_ino);
>> +to->sd_uid= ntohl(to->sd_uid);
>> +to->sd_gid= ntohl(to->sd_gid);
>> +to->sd_size   = ntohl(to->sd_size);
>>   }
> 
> Hmm. I would have written this to pull the bytes directly out of the
> array, like:
> 
>to->sd_ctime.sec  = get_be32(data); data += 4;
>to->sd_ctime.nsec = get_be32(data); data += 4;
> 
> etc. Or even a helper to do the advancing like:
> 
>to->sd_ctime.sec = parse_be32();
> 
> That reduces assumptions about padding in "struct stat_data". But
> looking more at this code, and reading your comment:
> 
>> Side note: The OS name is not enough for determining the layout of
>> struct ondisk_untracked_cache.  Different platforms can have different
>> int sizes and padding.  Adding the machine type could help, but that
>> would be a breaking change.  At that point we would be better off
>> defining a machine-independent format, no?
> 
> it looks like assumptions about struct layout are pervasive and part of
> the on-disk format. Yuck. :(

Assuming that there is no padding probably even works for the platforms
the code currently supports (basically x86), but I don't know about
others.  We'd need to change the writing side as well to match, though.
Which is probably a good idea, but I tried to keep the patch small and
its impact low.  Cross-machine usability is currently explicitly not
supported -- not sure why, though.

René


[PATCH] dir: support platforms that require aligned reads

2017-07-16 Thread René Scharfe
The untracked cache is stored on disk by concatenating its memory
structures without any padding.  Consequently some of the structs are
not aligned at a particular boundary when the whole extension is read
back in one go.  That's only OK on platforms without strict alignment
requirements, or for byte-aligned data like strings or hash values.

Decode struct ondisk_untracked_cache carefully from the extension
blob by using explicit pointer arithmetic with offsets, avoiding
alignment issues.  Use char pointers for passing stat_data objects to
stat_data_from_disk(), and use memcpy(3) in that function to  get the
contents into a properly aligned struct, then perform the byte-order
adjustment in place there.

Found with Clang's UBSan.

Signed-off-by: Rene Scharfe 
---
Side note: The OS name is not enough for determining the layout of
struct ondisk_untracked_cache.  Different platforms can have
different int sizes and padding.  Adding the machine type could
help, but that would be a breaking change.  At that point we would
be better off defining a machine-independent format, no?

 dir.c | 50 +++---
 1 file changed, 27 insertions(+), 23 deletions(-)

diff --git a/dir.c b/dir.c
index ae6f5c9636..1c55dc3e36 100644
--- a/dir.c
+++ b/dir.c
@@ -2398,7 +2398,8 @@ struct ondisk_untracked_cache {
char exclude_per_dir[FLEX_ARRAY];
 };
 
-#define ouc_size(len) (offsetof(struct ondisk_untracked_cache, 
exclude_per_dir) + len + 1)
+#define ouc_offset(x) offsetof(struct ondisk_untracked_cache, x)
+#define ouc_size(len) (ouc_offset(exclude_per_dir) + len + 1)
 
 struct write_data {
int index; /* number of written untracked_cache_dir */
@@ -2560,17 +2561,18 @@ struct read_data {
const unsigned char *end;
 };
 
-static void stat_data_from_disk(struct stat_data *to, const struct stat_data 
*from)
+static void stat_data_from_disk(struct stat_data *to, const unsigned char 
*data)
 {
-   to->sd_ctime.sec  = get_be32(>sd_ctime.sec);
-   to->sd_ctime.nsec = get_be32(>sd_ctime.nsec);
-   to->sd_mtime.sec  = get_be32(>sd_mtime.sec);
-   to->sd_mtime.nsec = get_be32(>sd_mtime.nsec);
-   to->sd_dev= get_be32(>sd_dev);
-   to->sd_ino= get_be32(>sd_ino);
-   to->sd_uid= get_be32(>sd_uid);
-   to->sd_gid= get_be32(>sd_gid);
-   to->sd_size   = get_be32(>sd_size);
+   memcpy(to, data, sizeof(*to));
+   to->sd_ctime.sec  = ntohl(to->sd_ctime.sec);
+   to->sd_ctime.nsec = ntohl(to->sd_ctime.nsec);
+   to->sd_mtime.sec  = ntohl(to->sd_mtime.sec);
+   to->sd_mtime.nsec = ntohl(to->sd_mtime.nsec);
+   to->sd_dev= ntohl(to->sd_dev);
+   to->sd_ino= ntohl(to->sd_ino);
+   to->sd_uid= ntohl(to->sd_uid);
+   to->sd_gid= ntohl(to->sd_gid);
+   to->sd_size   = ntohl(to->sd_size);
 }
 
 static int read_one_dir(struct untracked_cache_dir **untracked_,
@@ -2645,7 +2647,7 @@ static void read_stat(size_t pos, void *cb)
rd->data = rd->end + 1;
return;
}
-   stat_data_from_disk(>stat_data, (struct stat_data *)rd->data);
+   stat_data_from_disk(>stat_data, rd->data);
rd->data += sizeof(struct stat_data);
ud->valid = 1;
 }
@@ -2663,22 +2665,22 @@ static void read_sha1(size_t pos, void *cb)
 }
 
 static void load_sha1_stat(struct sha1_stat *sha1_stat,
-  const struct stat_data *stat,
+  const unsigned char *data,
   const unsigned char *sha1)
 {
-   stat_data_from_disk(_stat->stat, stat);
+   stat_data_from_disk(_stat->stat, data);
hashcpy(sha1_stat->sha1, sha1);
sha1_stat->valid = 1;
 }
 
 struct untracked_cache *read_untracked_extension(const void *data, unsigned 
long sz)
 {
-   const struct ondisk_untracked_cache *ouc;
struct untracked_cache *uc;
struct read_data rd;
const unsigned char *next = data, *end = (const unsigned char *)data + 
sz;
const char *ident;
int ident_len, len;
+   const char *exclude_per_dir;
 
if (sz <= 1 || end[-1] != '\0')
return NULL;
@@ -2690,21 +2692,23 @@ struct untracked_cache *read_untracked_extension(const 
void *data, unsigned long
ident = (const char *)next;
next += ident_len;
 
-   ouc = (const struct ondisk_untracked_cache *)next;
if (next + ouc_size(0) > end)
return NULL;
 
uc = xcalloc(1, sizeof(*uc));
strbuf_init(>ident, ident_len);
strbuf_add(>ident, ident, ident_len);
-   load_sha1_stat(>ss_info_exclude, >info_exclude_stat,
-  ouc->info_exclude_sha1);
-   load_sha1_stat(>ss_excludes_file, >excludes_file_stat,
-  ouc->excludes_file_sha1);
-   uc->dir_flags = get_be32(>dir_flags);
-   uc->exclude_per_dir = xstrdup(ouc->exclude_per_dir);
+   

Re: [PATCH] ls-files: don't try to prune an empty index

2017-07-16 Thread René Scharfe

Am 16.07.2017 um 13:15 schrieb René Scharfe:

Am 16.07.2017 um 13:08 schrieb Jeff King:

On Sun, Jul 16, 2017 at 01:06:45PM +0200, René Scharfe wrote:


diff --git a/builtin/ls-files.c b/builtin/ls-files.c
index b8514a0029..adf572da68 100644
--- a/builtin/ls-files.c
+++ b/builtin/ls-files.c
@@ -362,7 +362,7 @@ static void prune_index(struct index_state 
*istate,

   int pos;
   unsigned int first, last;

-if (!prefix)
+if (!prefix || !istate->cache_nr)
   return;
   pos = index_name_pos(istate, prefix, prefixlen);
   if (pos < 0)


"git am" complained that this does not apply to its blobs. Did you
hand-edit?


I didn't, but perhaps I messed up the order of patches?  MOVE_ARRAY
patch 2 touches the same file, but I wouldn't expect the two changes to
conflict.  So not sure what's going on.


For some reason there's an extra space before the tab on each of the
context lines. MUA issue or cut-and-paste, maybe?


That's possible.  Will resend.


... pressed Send to fast.  Thanks for reporting the broken patch!

I use the extension Toggle Word Wrap with Thunderbird, but it wraps by
default with no way to change that, so I forgot toggling this time.
Grr, I've had enough of this!  Went past the warranty warning and set
mailnews.wraplength=0 now.

René


Re: [PATCH] ls-files: don't try to prune an empty index

2017-07-16 Thread René Scharfe

Am 16.07.2017 um 13:08 schrieb Jeff King:

On Sun, Jul 16, 2017 at 01:06:45PM +0200, René Scharfe wrote:


diff --git a/builtin/ls-files.c b/builtin/ls-files.c
index b8514a0029..adf572da68 100644
--- a/builtin/ls-files.c
+++ b/builtin/ls-files.c
@@ -362,7 +362,7 @@ static void prune_index(struct index_state *istate,
int pos;
unsigned int first, last;

-   if (!prefix)
+   if (!prefix || !istate->cache_nr)
return;
pos = index_name_pos(istate, prefix, prefixlen);
if (pos < 0)


"git am" complained that this does not apply to its blobs. Did you
hand-edit?


I didn't, but perhaps I messed up the order of patches?  MOVE_ARRAY
patch 2 touches the same file, but I wouldn't expect the two changes to
conflict.  So not sure what's going on.


For some reason there's an extra space before the tab on each of the
context lines. MUA issue or cut-and-paste, maybe?


That's possible.  Will resend.


[PATCH (resend)] ls-files: don't try to prune an empty index

2017-07-16 Thread René Scharfe
Exit early when asked to prune an index that contains no entries to
begin with.  This avoids pointer arithmetic on istate->cache, which is
possibly NULL in that case.

Found with Clang's UBSan.

Signed-off-by: Rene Scharfe 
---
Messed up whitespace when sending this the first time.

 builtin/ls-files.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/builtin/ls-files.c b/builtin/ls-files.c
index b8514a0029..adf572da68 100644
--- a/builtin/ls-files.c
+++ b/builtin/ls-files.c
@@ -362,7 +362,7 @@ static void prune_index(struct index_state *istate,
int pos;
unsigned int first, last;
 
-   if (!prefix)
+   if (!prefix || !istate->cache_nr)
return;
pos = index_name_pos(istate, prefix, prefixlen);
if (pos < 0)
-- 
2.13.3


Re: [PATCH] t: handle EOF in test_copy_bytes()

2017-07-16 Thread René Scharfe

Am 16.07.2017 um 12:47 schrieb Jeff King:

On Sun, Jul 16, 2017 at 06:45:32AM -0400, Jeff King wrote:


I was playing with SANITIZE=undefined after René's patches to see how
far we had left to go. I forgot to turn off sha1dc, which causes most
programs to die due to the unaligned loads. That means git-archive in
t5000 generates no output, triggering the bug. :)


And I was pleased to see that after setting OPENSSL_SHA1, the answer to
"how far" is "we are there". Yay.


True with GCC, but not with Clang 3.9.  A patch for alignment issues in
dir.c is coming up..

René


Re: [PATCH] ls-files: don't try to prune an empty index

2017-07-16 Thread René Scharfe

Am 16.07.2017 um 12:41 schrieb Jeff King:

On Sat, Jul 15, 2017 at 10:11:20PM +0200, René Scharfe wrote:


Exit early when asked to prune an index that contains no
entries to begin with.  This avoids pointer arithmetic on
istate->cache, which is possibly NULL in that case.

Found with Clang's UBSan.


Makes sense. We could use MOVE_ARRAY() here, but this is a sensible
short-circuit to the whole function.

Looks like a good solution.


diff --git a/builtin/ls-files.c b/builtin/ls-files.c
index b8514a0029..adf572da68 100644
--- a/builtin/ls-files.c
+++ b/builtin/ls-files.c
@@ -362,7 +362,7 @@ static void prune_index(struct index_state *istate,
int pos;
unsigned int first, last;

-   if (!prefix)
+   if (!prefix || !istate->cache_nr)
return;
pos = index_name_pos(istate, prefix, prefixlen);
if (pos < 0)


"git am" complained that this does not apply to its blobs. Did you
hand-edit?


I didn't, but perhaps I messed up the order of patches?  MOVE_ARRAY
patch 2 touches the same file, but I wouldn't expect the two changes to
conflict.  So not sure what's going on.

René


Re: [PATCH 5/5] Makefile: disable unaligned loads with UBSan

2017-07-16 Thread René Scharfe

Am 16.07.2017 um 12:17 schrieb Jeff King:

On Sat, Jul 15, 2017 at 07:18:56PM +0200, René Scharfe wrote:


-- >8 --
Subject: [PATCH] Makefile: allow combining UBSan with other sanitizers

Multiple sanitizers can be specified as a comma-separated list.  Set
the flag NO_UNALIGNED_LOADS even if UndefinedBehaviorSanitizer is not
the only sanitizer to build with.


Nice, I didn't know you could do comma-separation here. ;)

I always just built the various sanitizers separately since I was
testing whether each one worked. But if we can get UBSan to build
cleanly, I agree that "make SANITIZE=address,undefined test" would be a
nice thing to run periodically.


There are *some* restrictions: You can only use one of address, memory
and thread, as mentioned here:

http://clang.llvm.org/docs/UsersManual.html#controlling-code-generation

Combining ASan and UBSan works fine.

René


Re: [PATCH 3/2] apply: use COPY_ARRAY and MOVE_ARRAY in update_image()

2017-07-15 Thread René Scharfe

Am 16.07.2017 um 02:31 schrieb Ramsay Jones:



On 15/07/17 21:20, René Scharfe wrote:

Simplify the code by using the helper macros COPY_ARRAY and MOVE_ARRAY,
which also makes them more robust in the case we copy or move no lines,
as they allow using NULL points in that case, while memcpy(3) and
memmove(3) don't.

Found with Clang's UBSan.

Signed-off-by: Rene Scharfe <l@web.de>
---
I don't know why the rules in contrib/coccinelle/array.cocci didn't
match. :-?

  apply.c | 11 ---
  1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/apply.c b/apply.c
index f2d599141d..40707ca50c 100644
--- a/apply.c
+++ b/apply.c
@@ -2809,13 +2809,10 @@ static void update_image(struct apply_state *state,
img->line_allocated = img->line;
}
if (preimage_limit != postimage->nr)
-   memmove(img->line + applied_pos + postimage->nr,
-   img->line + applied_pos + preimage_limit,
-   (img->nr - (applied_pos + preimage_limit)) *
-   sizeof(*img->line));
-   memcpy(img->line + applied_pos,
-  postimage->line,
-  postimage->nr * sizeof(*img->line));
+   MOVE_ARRAY(img->line + applied_pos + postimage->nr,
+  img->line + applied_pos + preimage_limit,
+  img->nr - (applied_pos + preimage_limit));
+   COPY_ARRAY(img->line + applied_pos, postimage->line, postimage->nr);


My patch looks like:

-   memcpy(img->line + applied_pos,
-  postimage->line,
-  postimage->nr * sizeof(*img->line));
+   if (postimage->line && postimage->nr)
+   memcpy(img->line + applied_pos,
+  postimage->line,
+  postimage->nr * sizeof(*img->line));

... which I think I prefer (slightly).


Can postimage->line be NULL when postimage->nr is bigger than 0?  What
would that mean?  The only ways to arrive at that point that I an come
up with are bugs (we accidentally set ->line to NULL, or we forgot to
clean ->line).  We'd better notice them early by getting a nice
shrieking segfault.  Adding an assert would work as well.

René


Re: [PATCH] ls-files: don't try to prune an empty index

2017-07-15 Thread René Scharfe

Am 16.07.2017 um 02:28 schrieb Ramsay Jones:



On 15/07/17 21:11, René Scharfe wrote:

Exit early when asked to prune an index that contains no
entries to begin with.  This avoids pointer arithmetic on
istate->cache, which is possibly NULL in that case.

Found with Clang's UBSan.

Signed-off-by: Rene Scharfe <l@web.de>
---
  builtin/ls-files.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/builtin/ls-files.c b/builtin/ls-files.c
index b8514a0029..adf572da68 100644
--- a/builtin/ls-files.c
+++ b/builtin/ls-files.c
@@ -362,7 +362,7 @@ static void prune_index(struct index_state *istate,
  int pos;
  unsigned int first, last;

-if (!prefix)
+if (!prefix || !istate->cache_nr)
  return;
  pos = index_name_pos(istate, prefix, prefixlen);
  if (pos < 0)


My patch looked like:

-   if (!prefix)
+   if (!prefix || !istate->cache || istate->cache_nr == 0)

... which is probably a bit 'belt-n-braces'. ;-)


Not checking for !istate->cache at this point is a good thing, I think.
If we have entries, then ->cache must not be NULL, and if it is we'd get
a segfault, notifying us that we have a bug.  We could add an assert to
state this requirement explicitly, but that would be the topic of a
different patch.

René


[PATCH 3/2] apply: use COPY_ARRAY and MOVE_ARRAY in update_image()

2017-07-15 Thread René Scharfe
Simplify the code by using the helper macros COPY_ARRAY and MOVE_ARRAY,
which also makes them more robust in the case we copy or move no lines,
as they allow using NULL points in that case, while memcpy(3) and
memmove(3) don't.

Found with Clang's UBSan.

Signed-off-by: Rene Scharfe 
---
I don't know why the rules in contrib/coccinelle/array.cocci didn't
match. :-?

 apply.c | 11 ---
 1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/apply.c b/apply.c
index f2d599141d..40707ca50c 100644
--- a/apply.c
+++ b/apply.c
@@ -2809,13 +2809,10 @@ static void update_image(struct apply_state *state,
img->line_allocated = img->line;
}
if (preimage_limit != postimage->nr)
-   memmove(img->line + applied_pos + postimage->nr,
-   img->line + applied_pos + preimage_limit,
-   (img->nr - (applied_pos + preimage_limit)) *
-   sizeof(*img->line));
-   memcpy(img->line + applied_pos,
-  postimage->line,
-  postimage->nr * sizeof(*img->line));
+   MOVE_ARRAY(img->line + applied_pos + postimage->nr,
+  img->line + applied_pos + preimage_limit,
+  img->nr - (applied_pos + preimage_limit));
+   COPY_ARRAY(img->line + applied_pos, postimage->line, postimage->nr);
if (!state->allow_overlap)
for (i = 0; i < postimage->nr; i++)
img->line[applied_pos + i].flag |= LINE_PATCHED;
-- 
2.13.3


[PATCH] ls-files: don't try to prune an empty index

2017-07-15 Thread René Scharfe

Exit early when asked to prune an index that contains no
entries to begin with.  This avoids pointer arithmetic on
istate->cache, which is possibly NULL in that case.

Found with Clang's UBSan.

Signed-off-by: Rene Scharfe 
---
 builtin/ls-files.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/builtin/ls-files.c b/builtin/ls-files.c
index b8514a0029..adf572da68 100644
--- a/builtin/ls-files.c
+++ b/builtin/ls-files.c
@@ -362,7 +362,7 @@ static void prune_index(struct index_state *istate,
int pos;
unsigned int first, last;

-   if (!prefix)
+   if (!prefix || !istate->cache_nr)
return;
pos = index_name_pos(istate, prefix, prefixlen);
if (pos < 0)
--
2.13.3


[PATCH 2/2] use MOVE_ARRAY

2017-07-15 Thread René Scharfe
Simplify the code for moving members inside of an array and make it more
robust by using the helper macro MOVE_ARRAY.  It calculates the size
based on the specified number of elements for us and supports NULL
pointers when that number is zero.  Raw memmove(3) calls with NULL can
cause the compiler to (over-eagerly) optimize out later NULL checks.

This patch was generated with contrib/coccinelle/array.cocci and spatch
(Coccinelle).

Signed-off-by: Rene Scharfe 
---
Downside: The one in builtin/ls-files.c papers over a report by UBSan
for the case of istate->cache being NULL.  Technically we still try to
add pos to NULL if pruning an empty index, but we won't see it anymore.
Will send a separate patch.

 builtin/ls-files.c | 3 +--
 builtin/merge.c| 2 +-
 builtin/pack-objects.c | 5 ++---
 cache-tree.c   | 5 ++---
 commit.c   | 5 ++---
 notes-merge.c  | 3 +--
 read-cache.c   | 5 ++---
 reflog-walk.c  | 7 +++
 string-list.c  | 8 +++-
 9 files changed, 17 insertions(+), 26 deletions(-)

diff --git a/builtin/ls-files.c b/builtin/ls-files.c
index b8514a0029..dc4a6aa3d9 100644
--- a/builtin/ls-files.c
+++ b/builtin/ls-files.c
@@ -378,8 +378,7 @@ static void prune_index(struct index_state *istate,
}
last = next;
}
-   memmove(istate->cache, istate->cache + pos,
-   (last - pos) * sizeof(struct cache_entry *));
+   MOVE_ARRAY(istate->cache, istate->cache + pos, last - pos);
istate->cache_nr = last - pos;
 }
 
diff --git a/builtin/merge.c b/builtin/merge.c
index 900bafdb45..d5797b8fe7 100644
--- a/builtin/merge.c
+++ b/builtin/merge.c
@@ -537,7 +537,7 @@ static void parse_branch_merge_options(char *bmo)
die(_("Bad branch.%s.mergeoptions string: %s"), branch,
split_cmdline_strerror(argc));
REALLOC_ARRAY(argv, argc + 2);
-   memmove(argv + 1, argv, sizeof(*argv) * (argc + 1));
+   MOVE_ARRAY(argv + 1, argv, argc + 1);
argc++;
argv[0] = "branch.*.mergeoptions";
parse_options(argc, argv, NULL, builtin_merge_options,
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index f4a8441fe9..e730b415bf 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -1298,9 +1298,8 @@ static int check_pbase_path(unsigned hash)
   done_pbase_paths_alloc);
done_pbase_paths_num++;
if (pos < done_pbase_paths_num)
-   memmove(done_pbase_paths + pos + 1,
-   done_pbase_paths + pos,
-   (done_pbase_paths_num - pos - 1) * sizeof(unsigned));
+   MOVE_ARRAY(done_pbase_paths + pos + 1, done_pbase_paths + pos,
+  done_pbase_paths_num - pos - 1);
done_pbase_paths[pos] = hash;
return 0;
 }
diff --git a/cache-tree.c b/cache-tree.c
index ec23d8c03d..2440d1dc89 100644
--- a/cache-tree.c
+++ b/cache-tree.c
@@ -131,9 +131,8 @@ static int do_invalidate_path(struct cache_tree *it, const 
char *path)
 * move 4 and 5 up one place (2 entries)
 * 2 = 6 - 3 - 1 = subtree_nr - pos - 1
 */
-   memmove(it->down+pos, it->down+pos+1,
-   sizeof(struct cache_tree_sub *) *
-   (it->subtree_nr - pos - 1));
+   MOVE_ARRAY(it->down + pos, it->down + pos + 1,
+  it->subtree_nr - pos - 1);
it->subtree_nr--;
}
return 1;
diff --git a/commit.c b/commit.c
index cbfd689939..d3150d6270 100644
--- a/commit.c
+++ b/commit.c
@@ -223,9 +223,8 @@ int unregister_shallow(const struct object_id *oid)
if (pos < 0)
return -1;
if (pos + 1 < commit_graft_nr)
-   memmove(commit_graft + pos, commit_graft + pos + 1,
-   sizeof(struct commit_graft *)
-   * (commit_graft_nr - pos - 1));
+   MOVE_ARRAY(commit_graft + pos, commit_graft + pos + 1,
+  commit_graft_nr - pos - 1);
commit_graft_nr--;
return 0;
 }
diff --git a/notes-merge.c b/notes-merge.c
index 70e3fbeefb..c12b354f10 100644
--- a/notes-merge.c
+++ b/notes-merge.c
@@ -99,8 +99,7 @@ static struct notes_merge_pair *find_notes_merge_pair_pos(
else {
*occupied = 0;
if (insert_new && i < len) {
-   memmove(list + i + 1, list + i,
-   (len - i) * sizeof(struct notes_merge_pair));
+   MOVE_ARRAY(list + i + 1, list + i, len - i);
memset(list + i, 0, sizeof(struct notes_merge_pair));
}
}
diff --git a/read-cache.c b/read-cache.c
index 2121b6e7bb..acfb028f48 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -515,9 

[PATCH 1/2] add MOVE_ARRAY

2017-07-15 Thread René Scharfe
Similar to COPY_ARRAY (introduced in 60566cbb58), add a safe and
convenient helper for moving potentially overlapping ranges of array
entries.  It infers the element size, multiplies automatically and
safely to get the size in bytes, does a basic type safety check by
comparing element sizes and unlike memmove(3) it supports NULL
pointers iff 0 elements are to be moved.

Also add a semantic patch to demonstrate the helper's intended usage.

Signed-off-by: Rene Scharfe 
---
 contrib/coccinelle/array.cocci | 17 +
 git-compat-util.h  |  8 
 2 files changed, 25 insertions(+)

diff --git a/contrib/coccinelle/array.cocci b/contrib/coccinelle/array.cocci
index 4ba98b7eaf..c61d1ca8dc 100644
--- a/contrib/coccinelle/array.cocci
+++ b/contrib/coccinelle/array.cocci
@@ -27,6 +27,23 @@ expression n;
 
 @@
 type T;
+T *dst;
+T *src;
+expression n;
+@@
+(
+- memmove(dst, src, (n) * sizeof(*dst));
++ MOVE_ARRAY(dst, src, n);
+|
+- memmove(dst, src, (n) * sizeof(*src));
++ MOVE_ARRAY(dst, src, n);
+|
+- memmove(dst, src, (n) * sizeof(T));
++ MOVE_ARRAY(dst, src, n);
+)
+
+@@
+type T;
 T *ptr;
 expression n;
 @@
diff --git a/git-compat-util.h b/git-compat-util.h
index 047172d173..159f82154a 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -825,6 +825,14 @@ static inline void copy_array(void *dst, const void *src, 
size_t n, size_t size)
memcpy(dst, src, st_mult(size, n));
 }
 
+#define MOVE_ARRAY(dst, src, n) move_array((dst), (src), (n), sizeof(*(dst)) + 
\
+   BUILD_ASSERT_OR_ZERO(sizeof(*(dst)) == sizeof(*(src
+static inline void move_array(void *dst, const void *src, size_t n, size_t 
size)
+{
+   if (n)
+   memmove(dst, src, st_mult(size, n));
+}
+
 /*
  * These functions help you allocate structs with flex arrays, and copy
  * the data directly into the array. For example, if you had:
-- 
2.13.3


[PATCH 2/1] bswap: convert get_be16, get_be32 and put_be32 to inline functions

2017-07-15 Thread René Scharfe
Simplify the implementation and allow callers to use expressions with
side-effects by turning the macros get_be16, get_be32 and put_be32 into
inline functions.

Signed-off-by: Rene Scharfe 
---
All these redundant casts started to bother me, so I tried to come up
with nice and clean inline functions.  Successfully?  You tell me.
They are longer, but less cluttered.  Would it punish -O0 builds?  Is
it all worth it?

 compat/bswap.h | 38 --
 1 file changed, 24 insertions(+), 14 deletions(-)

diff --git a/compat/bswap.h b/compat/bswap.h
index 4582c1107a..7d063e9e40 100644
--- a/compat/bswap.h
+++ b/compat/bswap.h
@@ -162,19 +162,29 @@ static inline uint64_t git_bswap64(uint64_t x)
 
 #else
 
-#define get_be16(p)( \
-   (*((unsigned char *)(p) + 0) << 8) | \
-   (*((unsigned char *)(p) + 1) << 0) )
-#define get_be32(p)( \
-   ((uint32_t)*((unsigned char *)(p) + 0) << 24) | \
-   ((uint32_t)*((unsigned char *)(p) + 1) << 16) | \
-   ((uint32_t)*((unsigned char *)(p) + 2) <<  8) | \
-   ((uint32_t)*((unsigned char *)(p) + 3) <<  0) )
-#define put_be32(p, v) do { \
-   unsigned int __v = (v); \
-   *((unsigned char *)(p) + 0) = __v >> 24; \
-   *((unsigned char *)(p) + 1) = __v >> 16; \
-   *((unsigned char *)(p) + 2) = __v >>  8; \
-   *((unsigned char *)(p) + 3) = __v >>  0; } while (0)
+static inline uint16_t get_be16(const void *ptr)
+{
+   const unsigned char *p = ptr;
+   return  (uint16_t)p[0] << 8 |
+   (uint16_t)p[1] << 0;
+}
+
+static inline uint32_t get_be32(const void *ptr)
+{
+   const unsigned char *p = ptr;
+   return  (uint32_t)p[0] << 24 |
+   (uint32_t)p[1] << 16 |
+   (uint32_t)p[2] <<  8 |
+   (uint32_t)p[3] <<  0;
+}
+
+static inline void put_be32(void *ptr, uint32_t value)
+{
+   unsigned char *p = ptr;
+   p[0] = value >> 24;
+   p[1] = value >> 16;
+   p[2] = value >>  8;
+   p[3] = value >>  0;
+}
 
 #endif
-- 
2.13.3


[PATCH] bswap: convert to unsigned before shifting in get_be32

2017-07-15 Thread René Scharfe
The pointer p is dereferenced and we get an unsigned char.  Before
shifting it's automatically promoted to int.  Left-shifting a signed
32-bit value bigger than 127 by 24 places is undefined.  Explicitly
convert to a 32-bit unsigned type to avoid undefined behaviour if
the highest bit is set.

Found with Clang's UBSan.

Signed-off-by: Rene Scharfe 
---
 compat/bswap.h | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/compat/bswap.h b/compat/bswap.h
index d47c003544..4582c1107a 100644
--- a/compat/bswap.h
+++ b/compat/bswap.h
@@ -166,10 +166,10 @@ static inline uint64_t git_bswap64(uint64_t x)
(*((unsigned char *)(p) + 0) << 8) | \
(*((unsigned char *)(p) + 1) << 0) )
 #define get_be32(p)( \
-   (*((unsigned char *)(p) + 0) << 24) | \
-   (*((unsigned char *)(p) + 1) << 16) | \
-   (*((unsigned char *)(p) + 2) <<  8) | \
-   (*((unsigned char *)(p) + 3) <<  0) )
+   ((uint32_t)*((unsigned char *)(p) + 0) << 24) | \
+   ((uint32_t)*((unsigned char *)(p) + 1) << 16) | \
+   ((uint32_t)*((unsigned char *)(p) + 2) <<  8) | \
+   ((uint32_t)*((unsigned char *)(p) + 3) <<  0) )
 #define put_be32(p, v) do { \
unsigned int __v = (v); \
*((unsigned char *)(p) + 0) = __v >> 24; \
-- 
2.13.3


Re: [PATCH v3 06/20] builtin/receive-pack: convert portions to struct object_id

2017-07-15 Thread René Scharfe
Am 31.03.2017 um 03:39 schrieb brian m. carlson:
> @@ -1081,10 +1081,10 @@ static const char *update(struct command *cmd, struct 
> shallow_info *si)
>   return "hook declined";
>   }
>   
> - if (is_null_sha1(new_sha1)) {
> + if (is_null_oid(new_oid)) {
>   struct strbuf err = STRBUF_INIT;
> - if (!parse_object(old_sha1)) {
> - old_sha1 = NULL;
> + if (!parse_object(old_oid->hash)) {
> + old_oid = NULL;

So old_oid can become NULL...

>   if (ref_exists(name)) {
>   rp_warning("Allowing deletion of corrupt ref.");
>   } else {
> @@ -1094,7 +1094,7 @@ static const char *update(struct command *cmd, struct 
> shallow_info *si)
>   }
>   if (ref_transaction_delete(transaction,
>  namespaced_name,
> -old_sha1,
> +old_oid->hash,

... and here we dereference it.

-- >8 --
Subject: [PATCH] receive-pack: don't access hash of NULL object_id pointer

We set old_oid to NULL if we found out that it's a corrupt reference.
In that case don't try to access the hash member and pass NULL to
ref_transaction_delete() instead.

Found with Clang's UBSan.

Signed-off-by: Rene Scharfe 
---
That's the last bug of this kind which "make SANITIZE=undefined test"
turned up.

 builtin/receive-pack.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
index cabdc55e09..946cf55138 100644
--- a/builtin/receive-pack.c
+++ b/builtin/receive-pack.c
@@ -1138,7 +1138,7 @@ static const char *update(struct command *cmd, struct 
shallow_info *si)
}
if (ref_transaction_delete(transaction,
   namespaced_name,
-  old_oid->hash,
+  old_oid ? old_oid->hash : NULL,
   0, "push", )) {
rp_error("%s", err.buf);
strbuf_release();
-- 
2.13.3


Re: [PATCH 04/33] notes: make get_note return pointer to struct object_id

2017-07-15 Thread René Scharfe
Am 30.05.2017 um 19:30 schrieb Brandon Williams:
> @@ -392,7 +392,7 @@ static int add(int argc, const char **argv, const char 
> *prefix)
>   const char *object_ref;
>   struct notes_tree *t;
>   unsigned char object[20], new_note[20];
> - const unsigned char *note;
> + const struct object_id *note;
>   struct note_data d = { 0, 0, NULL, STRBUF_INIT };
>   struct option options[] = {
>   { OPTION_CALLBACK, 'm', "message", , N_("message"),

In between here, note can be set to NULL...

> @@ -453,7 +453,7 @@ static int add(int argc, const char **argv, const char 
> *prefix)
>   sha1_to_hex(object));
>   }
>   
> - prepare_note_data(object, , note);
> + prepare_note_data(object, , note->hash);

... which we then dereference here.

> @@ -598,13 +598,13 @@ static int append_edit(int argc, const char **argv, 
> const char *prefix)
>   t = init_notes_check(argv[0], NOTES_INIT_WRITABLE);
>   note = get_note(t, object);
>   
> - prepare_note_data(object, , edit ? note : NULL);
> + prepare_note_data(object, , edit && note ? note->hash : NULL);

Here a NULL check was added; we need a similar one above as well.

-- >8 --
Subject: [PATCH] notes: don't access hash of NULL object_id pointer

Check if note is NULL, as we already do for different purposes a few
lines above, and pass a NULL pointer to prepare_note_data() in that
case instead of trying to access the hash member.

Found with Clang's UBSan.

Signed-off-by: Rene Scharfe 
---
The third parameter of prepare_note_data() could easily be turned into
an object_id pointer (and it should), but this patch is meant to be a
minimal fix.

 builtin/notes.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/builtin/notes.c b/builtin/notes.c
index 77573cf1ea..4303848e04 100644
--- a/builtin/notes.c
+++ b/builtin/notes.c
@@ -456,7 +456,7 @@ static int add(int argc, const char **argv, const char 
*prefix)
oid_to_hex());
}
 
-   prepare_note_data(, , note->hash);
+   prepare_note_data(, , note ? note->hash : NULL);
if (d.buf.len || allow_empty) {
write_note_data(, new_note.hash);
if (add_note(t, , _note, combine_notes_overwrite))
-- 
2.13.3


Re: [PATCH 30/33] tree-diff: convert diff_tree_paths to struct object_id

2017-07-15 Thread René Scharfe
Am 30.05.2017 um 19:31 schrieb Brandon Williams:
> @@ -273,21 +274,20 @@ static struct combine_diff_path *emit_path(struct 
> combine_diff_path *p,
>   }
>   
>   if (recurse) {
> - const unsigned char **parents_sha1;
> + const struct object_id **parents_oid;
>   
> - FAST_ARRAY_ALLOC(parents_sha1, nparent);
> + FAST_ARRAY_ALLOC(parents_oid, nparent);
>   for (i = 0; i < nparent; ++i) {
>   /* same rule as in emitthis */
>   int tpi_valid = tp && !(tp[i].entry.mode & 
> S_IFXMIN_NEQ);
>   
> - parents_sha1[i] = tpi_valid ? tp[i].entry.oid->hash
> - : NULL;
> + parents_oid[i] = tpi_valid ? tp[i].entry.oid : NULL;
>   }

So elements of parents_oid can be NULL...

>   
>   strbuf_add(base, path, pathlen);
>   strbuf_addch(base, '/');
> - p = ll_diff_tree_paths(p, sha1, parents_sha1, nparent, base, 
> opt);
> - FAST_ARRAY_FREE(parents_sha1, nparent);
> + p = ll_diff_tree_paths(p, oid, parents_oid, nparent, base, opt);

... and we pass that array to ll_diff_tree_paths()...

> + FAST_ARRAY_FREE(parents_oid, nparent);
>   }
>   
>   strbuf_setlen(base, old_baselen);
> @@ -312,7 +312,7 @@ static void skip_uninteresting(struct tree_desc *t, 
> struct strbuf *base,
>   
>   
>   /*
> - * generate paths for combined diff D(sha1,parents_sha1[])
> + * generate paths for combined diff D(sha1,parents_oid[])
>*
>* Resulting paths are appended to combine_diff_path linked list, and also, 
> are
>* emitted on the go via opt->pathchange() callback, so it is possible to
> @@ -404,8 +404,8 @@ static inline void update_tp_entries(struct tree_desc 
> *tp, int nparent)
>   }
>   
>   static struct combine_diff_path *ll_diff_tree_paths(
> - struct combine_diff_path *p, const unsigned char *sha1,
> - const unsigned char **parents_sha1, int nparent,
> + struct combine_diff_path *p, const struct object_id *oid,
> + const struct object_id **parents_oid, int nparent,
>   struct strbuf *base, struct diff_options *opt)
>   {
>   struct tree_desc t, *tp;
> @@ -422,8 +422,8 @@ static struct combine_diff_path *ll_diff_tree_paths(
>*   diff_tree_oid(parent, commit) )
>*/
>   for (i = 0; i < nparent; ++i)
> - tptree[i] = fill_tree_descriptor([i], parents_sha1[i]);
> - ttree = fill_tree_descriptor(, sha1);
> + tptree[i] = fill_tree_descriptor([i], parents_oid[i]->hash);

... and here we are in that function, dereferencing a pointer that may
be NULL.

> + ttree = fill_tree_descriptor(, oid->hash);

oid here can also be NULL, but I think that's not as easy to see.

-- >8 --
Subject: [PATCH] tree-diff: don't access hash of NULL object_id pointer

The object_id pointers can be NULL for invalid entries.  Don't try to
dereference them and pass NULL along to fill_tree_descriptor() instead,
which handles them just fine.

Found with Clang's UBSan.

Signed-off-by: Rene Scharfe 
---
fill_tree_descriptor() can easily be converted to object_id, by the
way, which would get us rid of the extra check introduced here, but
this patch is meant as a minimal fix.

 tree-diff.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/tree-diff.c b/tree-diff.c
index bd6d65a409..2357f72899 100644
--- a/tree-diff.c
+++ b/tree-diff.c
@@ -421,8 +421,9 @@ static struct combine_diff_path *ll_diff_tree_paths(
 *   diff_tree_oid(parent, commit) )
 */
for (i = 0; i < nparent; ++i)
-   tptree[i] = fill_tree_descriptor([i], parents_oid[i]->hash);
-   ttree = fill_tree_descriptor(, oid->hash);
+   tptree[i] = fill_tree_descriptor([i],
+   parents_oid[i] ? parents_oid[i]->hash : NULL);
+   ttree = fill_tree_descriptor(, oid ? oid->hash : NULL);
 
/* Enable recursion indefinitely */
opt->pathspec.recursive = DIFF_OPT_TST(opt, RECURSIVE);
-- 
2.13.3


Re: [PATCH 5/5] Makefile: disable unaligned loads with UBSan

2017-07-15 Thread René Scharfe
Am 10.07.2017 um 15:24 schrieb Jeff King:
> The undefined behavior sanitizer complains about unaligned
> loads, even if they're OK for a particular platform in
> practice. It's possible that they _are_ a problem, of
> course, but since it's a known tradeoff the UBSan errors are
> just noise.
> 
> Let's quiet it automatically by building with
> NO_UNALIGNED_LOADS when SANITIZE=undefined is in use.
> 
> Signed-off-by: Jeff King 
> ---
>   Makefile | 3 +++
>   1 file changed, 3 insertions(+)
> 
> diff --git a/Makefile b/Makefile
> index cc03b3c95..3c341b2a6 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -1015,6 +1015,9 @@ endif
>   ifdef SANITIZE
>   BASIC_CFLAGS += -fsanitize=$(SANITIZE) -fno-sanitize-recover=$(SANITIZE)
>   BASIC_CFLAGS += -fno-omit-frame-pointer
> +ifeq ($(SANITIZE),undefined)
> +BASIC_CFLAGS += -DNO_UNALIGNED_LOADS
> +endif
>   endif
>   
>   ifndef sysconfdir

Nice, but let's be even nicer!

-- >8 --
Subject: [PATCH] Makefile: allow combining UBSan with other sanitizers

Multiple sanitizers can be specified as a comma-separated list.  Set
the flag NO_UNALIGNED_LOADS even if UndefinedBehaviorSanitizer is not
the only sanitizer to build with.

Signed-off-by: Rene Scharfe 
---
 Makefile | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/Makefile b/Makefile
index ba4359ef8d..9b98535a04 100644
--- a/Makefile
+++ b/Makefile
@@ -1022,10 +1022,15 @@ ifdef DEVELOPER
 CFLAGS += $(DEVELOPER_CFLAGS)
 endif
 
+comma := ,
+empty :=
+space := $(empty) $(empty)
+
 ifdef SANITIZE
+SANITIZERS := $(foreach flag,$(subst $(comma),$(space),$(SANITIZE)),$(flag))
 BASIC_CFLAGS += -fsanitize=$(SANITIZE) -fno-sanitize-recover=$(SANITIZE)
 BASIC_CFLAGS += -fno-omit-frame-pointer
-ifeq ($(SANITIZE),undefined)
+ifneq ($(filter undefined,$(SANITIZERS)),)
 BASIC_CFLAGS += -DNO_UNALIGNED_LOADS
 endif
 endif
-- 
2.13.3


Re: [PATCH] sha1_file: add slash once in for_each_file_in_obj_subdir()

2017-07-12 Thread René Scharfe

Am 10.07.2017 um 04:10 schrieb Junio C Hamano:

Jeff King  writes:


... And you could even drop origlen by
replacing it with "baselen - 3" at the end. But somehow doing the
computation on the fly actually seems more complicated to me (from the
perspective of a reader who is trying to make sure all is correct).


True enough.  I personally do not mind any of the three variants
(including the original) and would rather not spend too much time
micro-optimizing this codepath, lest the next clever person starts
to turn the loop in for_each_loose_file_in_objdir() to a nested loop
that runs 16 times each to avoid copying the same leading byte again
and again ;-)


*evil grin*

Anyway, I agree that this patch is not worth spending much time on.  I
still think the result is cleaner (perhaps due to my strbuf_addf
allergy), though it's longer and perhaps adding a twelfth variable makes
the function too complicated.  The magic chomping variant is a clever
compromise -- but needs a comment to explain itself.  So if you are
indifferent please just drop it.

Thanks,
René


Re: [PATCH] use DIV_ROUND_UP

2017-07-10 Thread René Scharfe

Am 10.07.2017 um 09:27 schrieb Jeff King:

On Sat, Jul 08, 2017 at 12:35:35PM +0200, René Scharfe wrote:

diff --git a/ewah/ewah_bitmap.c b/ewah/ewah_bitmap.c
index 2dc9c82ecf..06c479f70a 100644
--- a/ewah/ewah_bitmap.c
+++ b/ewah/ewah_bitmap.c
@@ -210,8 +210,8 @@ size_t ewah_add(struct ewah_bitmap *self, eword_t word)
  void ewah_set(struct ewah_bitmap *self, size_t i)
  {
const size_t dist =
-   (i + BITS_IN_EWORD) / BITS_IN_EWORD -
-   (self->bit_size + BITS_IN_EWORD - 1) / BITS_IN_EWORD;
+   DIV_ROUND_UP(i + 1, BITS_IN_EWORD) -
+   DIV_ROUND_UP(self->bit_size, BITS_IN_EWORD);


...this first one is a bit trickier. Our "n" in the first one is now
"i+1".  But that's because the original was implicitly canceling the
"-1" and "+1" terms.

So I think it's a true mechanical conversion, but I have to admit the
original is confusing. Without digging I suspect it's correct, though,
just because a simple bug here would mean that our ewah bitmaps totally
don't work. So it's probably not worth spending time on.


A few lines below there is "self->bit_size = i + 1", so the code
calculates how many words the old and the new value are apart (or by how
many words the bitmap needs to be extended), which becomes easier to see
with the patch.

René


Re: [PATCH] use DIV_ROUND_UP

2017-07-09 Thread René Scharfe
Am 09.07.2017 um 15:25 schrieb Martin Ågren:
> On 8 July 2017 at 12:35, René Scharfe <l@web.de> wrote:
>> Convert code that divides and rounds up to use DIV_ROUND_UP to make the
>> intent clearer and reduce the number of magic constants.
> ...
>> diff --git a/sha1_name.c b/sha1_name.c
>> index e7f7b12ceb..8c513dbff6 100644
>> --- a/sha1_name.c
>> +++ b/sha1_name.c
>> @@ -492,7 +492,7 @@ int find_unique_abbrev_r(char *hex, const unsigned char 
>> *sha1, int len)
>>   * together we need to divide by 2; but we also want to 
>> round
>>   * odd numbers up, hence adding one before dividing.
>>   */
>> -   len = (len + 1) / 2;
>> +   len = DIV_ROUND_UP(len, 2);
> 
> Since the addition is now an implementation detail of DIV_ROUND_UP,
> should the comment be adjusted, maybe simply by removing ", hence
> adding one before dividing"?
> 
> Or perhaps even better, "... divide by 2; but since len might be odd,
> we need to make sure we round up as we divide". My thinking being,
> we're not actually rounding odd numbers up (presumably to even
> numbers), but we're rounding the result of the division up (to the
> smallest larger integer).

Good point; perhaps just squash this in?

---
 sha1_name.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/sha1_name.c b/sha1_name.c
index 8c513dbff6..74fcb6d788 100644
--- a/sha1_name.c
+++ b/sha1_name.c
@@ -489,8 +489,7 @@ int find_unique_abbrev_r(char *hex, const unsigned char 
*sha1, int len)
 * We now know we have on the order of 2^len objects, which
 * expects a collision at 2^(len/2). But we also care about hex
 * chars, not bits, and there are 4 bits per hex. So all
-* together we need to divide by 2; but we also want to round
-* odd numbers up, hence adding one before dividing.
+* together we need to divide by 2 and round up.
 */
len = DIV_ROUND_UP(len, 2);
/*
-- 
2.13.2


Re: [PATCH] sha1_file: add slash once in for_each_file_in_obj_subdir()

2017-07-09 Thread René Scharfe

Am 09.07.2017 um 13:00 schrieb Jeff King:

On Sat, Jul 08, 2017 at 10:59:06AM +0200, René Scharfe wrote:


Add the slash between loose object subdirectory and file name just once
outside the loop instead of overwriting it with each readdir call.
Redefine baselen as the length with that slash, and add dirlen for the
length without it.  The result is slightly less wasteful and can use the
the cheaper strbuf_addstr instead of strbuf_addf without losing clarity.


This patch looks correct to me.

I'm a little lukewarm on it overall, though. I'd be shocked if the
efficiency change is measurable. What I really care about is whether the
result is easier to read or not.

On the plus side, this moves an invariant out of the loop. On the minus
side, it has to introduce an extra variable for "length we add on to"
versus "dir length to pass to the subdir_cb". That's not rocket science,
but it does slightly complicate things (though I note we already have
"origlen", so this is bumping us from 2 to 3 length variables, not 1 to
2).


Admittedly this is more of an OCD thing.  Overwriting a character 200
times or parsing a format string don't very long compared to the rest
of the function, but it's a bit itchy.

René


Re: [PATCH] apply: use strcmp(3) for comparing strings in gitdiff_verify_name()

2017-07-09 Thread René Scharfe

Am 09.07.2017 um 14:37 schrieb Andreas Schwab:

On Jul 09 2017, René Scharfe <l@web.de> wrote:


[2] http://pubs.opengroup.org/onlinepubs/009695399/functions/memchr.html


You are using an old revision.


OK, the final draft of C11 [3] says "The implementation shall behave as
if it reads the characters sequentially and stops as soon as a matching
character is found."

I wonder when we can begin to target C99 in git's source, though. :)

René


[3] http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf


Re: [PATCH] apply: use strcmp(3) for comparing strings in gitdiff_verify_name()

2017-07-09 Thread René Scharfe

Am 09.07.2017 um 00:29 schrieb Junio C Hamano:

René Scharfe <l@web.de> writes:


Am 08.07.2017 um 13:08 schrieb Ramsay Jones:

On 08/07/17 09:58, René Scharfe wrote:

Avoid running over the end of another -- a C string whose length we
don't know -- by using strcmp(3) instead of memcmp(3) for comparing it
with another C string.


I had to read this twice, along with the patch text, before this
made any sense. ;-) The missing information being that 'another'
was the name of the string variable that we were potentially
'running over the end of'.


Yeah, sorry, encasing that unusual variable name in quotes would
probably have helped.


What makes it even more confusing is that the variable with the
problematic name is referred to as "it" in the last part of the
description--- the second occurrence of 'another' is actually not
referring to that variable but yet another string that is being
compared with it ;-)


Perhaps like this instead?

We don't know the length of the C string "another".  It could be
shorter than "name", which we compare it to using memchr(3).  Call
strcmp(3) instead to avoid running over the end of the former, and
get rid of a strlen(3) call as a bonus.

René


Re: [PATCH] apply: use strcmp(3) for comparing strings in gitdiff_verify_name()

2017-07-09 Thread René Scharfe

Am 08.07.2017 um 15:38 schrieb Andreas Schwab:

On Jul 08 2017, René Scharfe <l@web.de> wrote:


Am 08.07.2017 um 13:08 schrieb Andreas Schwab:

On Jul 08 2017, René Scharfe <l@web.de> wrote:


Avoid running over the end of another -- a C string whose length we
don't know -- by using strcmp(3) instead of memcmp(3) for comparing it
with another C string.


That's not a good justification for the change, since memcmp never reads
past the differing characters.


Interesting.  Where does that guarantee come from?


Sorry, I misremembered.  It's only memchr that has this restriction.


Hmm, I can't get ASan to complain about memchr reading beyond the end of
a C string, but I don't know why.  Glibc reads full words [1], and I
don't see how the standard [2] forbids reading past the found byte.

René


[1] https://sourceware.org/git/?p=glibc.git;a=blob;f=string/memchr.c
[2] http://pubs.opengroup.org/onlinepubs/009695399/functions/memchr.html


Re: 0 bytes/s vs. ∞ bytes/s

2017-07-08 Thread René Scharfe
Am 07.07.2017 um 17:57 schrieb 積丹尼 Dan Jacobson:
> Receiving objects: 100% (1003/1003), 1.15 MiB | 0 bytes/s, done.
> Receiving objects: 100% (1861/1861), 11.74 MiB | 4.58 MiB/s, done.
> Receiving objects: 100% (474/474), 160.72 KiB | 0 bytes/s, done.
> Receiving objects: 100% (7190/7190), 26.02 MiB | 6.53 MiB/s, done.
> 
> If the connection is too fast to calculate, please report
> ∞ bytes/s or
> inf bytes/s or
> ? bytes/s or
> anything but 0 bytes/s, which means nothing (transmitted.)

I don't know your actual transfer rate, but I would guess it's closer to
zero than to infinity. :)

How about this, though:

-- >8 --
Subject: [PATCH] progress: show overall rate in last update

The values in struct throughput are only updated every 0.5 seconds.  If
we're all done before that time span then the final update will show a
rate of 0 bytes/s, which is misleading if some bytes had been handled.
Remember the start time and show the total throughput instead.

And avoid division by zero by enforcing a minimum time span value of 1
(unit: 1/1024th of a second).  That makes the resulting rate an
underestimation, but it's closer to the actual value than the currently
shown 0 bytes/s.

Reported-by: 積丹尼 Dan Jacobson 
Signed-off-by: Rene Scharfe 
---
 progress.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/progress.c b/progress.c
index 29378caa05..73e36d4a42 100644
--- a/progress.c
+++ b/progress.c
@@ -36,6 +36,7 @@ struct progress {
unsigned delay;
unsigned delayed_percent_treshold;
struct throughput *throughput;
+   uint64_t start_ns;
 };
 
 static volatile sig_atomic_t progress_update;
@@ -221,6 +222,7 @@ struct progress *start_progress_delay(const char *title, 
unsigned total,
progress->delayed_percent_treshold = percent_treshold;
progress->delay = delay;
progress->throughput = NULL;
+   progress->start_ns = getnanotime();
set_progress_signal();
return progress;
 }
@@ -247,8 +249,10 @@ void stop_progress_msg(struct progress **p_progress, const 
char *msg)
struct throughput *tp = progress->throughput;
 
if (tp) {
-   unsigned int rate = !tp->avg_misecs ? 0 :
-   tp->avg_bytes / tp->avg_misecs;
+   uint64_t now_ns = getnanotime();
+   unsigned int misecs, rate;
+   misecs = ((now_ns - progress->start_ns) * 4398) >> 32;
+   rate = tp->curr_total / (misecs ? misecs : 1);
throughput_string(>display, tp->curr_total, rate);
}
progress_update = 1;
-- 
2.13.2


Re: [PATCH] urlmatch: use hex2chr() in append_normalized_escapes()

2017-07-08 Thread René Scharfe

Am 08.07.2017 um 16:28 schrieb Kyle J. McKay:

On Jul 8, 2017, at 01:59, René Scharfe wrote:


Simplify the code by using hex2chr() to convert and check for invalid
characters at the same time instead of doing that sequentially with
one table lookup for each.


I think that comment may be a bit misleading as the changes are just
switching from one set of inlines to another.  Essentially the same
sequential check takes place in the hex2chr inlined function which is
being used to replace the "one table lookup for each".  An optimizing
compiler will likely eliminate any difference between the before and
after patch versions.  Nothing immediately comes to mind as an alternate
comment though, so I'm not proposing any changes to the comment.


Right, the table lookups for isxdigit and hexval are not duplicated when
compiling with -O2.

René


Re: [PATCH] apply: use strcmp(3) for comparing strings in gitdiff_verify_name()

2017-07-08 Thread René Scharfe

Am 08.07.2017 um 13:08 schrieb Ramsay Jones:

On 08/07/17 09:58, René Scharfe wrote:

Avoid running over the end of another -- a C string whose length we
don't know -- by using strcmp(3) instead of memcmp(3) for comparing it
with another C string.


I had to read this twice, along with the patch text, before this
made any sense. ;-) The missing information being that 'another'
was the name of the string variable that we were potentially
'running over the end of'.


Yeah, sorry, encasing that unusual variable name in quotes would
probably have helped.

René


Re: [PATCH] apply: use strcmp(3) for comparing strings in gitdiff_verify_name()

2017-07-08 Thread René Scharfe

Am 08.07.2017 um 13:08 schrieb Andreas Schwab:

On Jul 08 2017, René Scharfe <l@web.de> wrote:


Avoid running over the end of another -- a C string whose length we
don't know -- by using strcmp(3) instead of memcmp(3) for comparing it
with another C string.


That's not a good justification for the change, since memcmp never reads
past the differing characters.


Interesting.  Where does that guarantee come from?

ASan reports an overflow with the following test program for me on
Debian testing x64:

#include 

int main(int argc, char **argv)
{
char a[32] = "1234567890123456789012345678901";
char b[2] = "a";
return memcmp(a, b, 32);
}


[PATCH] wt-status: use separate variable for result of shorten_unambiguous_ref

2017-07-08 Thread René Scharfe
Store the pointer to the string allocated by shorten_unambiguous_ref in
a dedicated variable, short_base, and keep base unchanged.  A non-const
variable is more appropriate for such an object.  It avoids having to
cast const away on free and stops redefining the meaning of base, making
the code slightly clearer.

Signed-off-by: Rene Scharfe 
---
 wt-status.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/wt-status.c b/wt-status.c
index 8d2fb35b08..77c27c5113 100644
--- a/wt-status.c
+++ b/wt-status.c
@@ -1763,6 +1763,7 @@ static void wt_shortstatus_print_tracking(struct 
wt_status *s)
const char *branch_color_remote = color(WT_STATUS_REMOTE_BRANCH, s);
 
const char *base;
+   char *short_base;
const char *branch_name;
int num_ours, num_theirs;
int upstream_is_gone = 0;
@@ -1797,10 +1798,10 @@ static void wt_shortstatus_print_tracking(struct 
wt_status *s)
upstream_is_gone = 1;
}
 
-   base = shorten_unambiguous_ref(base, 0);
+   short_base = shorten_unambiguous_ref(base, 0);
color_fprintf(s->fp, header_color, "...");
-   color_fprintf(s->fp, branch_color_remote, "%s", base);
-   free((char *)base);
+   color_fprintf(s->fp, branch_color_remote, "%s", short_base);
+   free(short_base);
 
if (!upstream_is_gone && !num_ours && !num_theirs)
goto conclude;
-- 
2.13.2



[PATCH] use DIV_ROUND_UP

2017-07-08 Thread René Scharfe
Convert code that divides and rounds up to use DIV_ROUND_UP to make the
intent clearer and reduce the number of magic constants.

Signed-off-by: Rene Scharfe 
---
 builtin/gc.c   | 2 +-
 builtin/grep.c | 2 +-
 builtin/log.c  | 2 +-
 builtin/receive-pack.c | 2 +-
 diff.c | 2 +-
 ewah/ewah_bitmap.c | 4 ++--
 imap-send.c| 2 +-
 sha1_name.c| 2 +-
 shallow.c  | 8 
 9 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/builtin/gc.c b/builtin/gc.c
index bd91f136fe..2ba50a2873 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -149,7 +149,7 @@ static int too_many_loose_objects(void)
if (!dir)
return 0;
 
-   auto_threshold = (gc_auto_threshold + 255) / 256;
+   auto_threshold = DIV_ROUND_UP(gc_auto_threshold, 256);
while ((ent = readdir(dir)) != NULL) {
if (strspn(ent->d_name, "0123456789abcdef") != 38 ||
ent->d_name[38] != '\0')
diff --git a/builtin/grep.c b/builtin/grep.c
index fa351c49f4..0d6e669732 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -543,7 +543,7 @@ static void compile_submodule_options(const struct grep_opt 
*opt,
 * submodule process has its own thread pool.
 */
argv_array_pushf(_options, "--threads=%d",
-(num_threads + 1) / 2);
+DIV_ROUND_UP(num_threads, 2));
 
/* Add Pathspecs */
argv_array_push(_options, "--");
diff --git a/builtin/log.c b/builtin/log.c
index 8ca1de9894..c6362cf92e 100644
--- a/builtin/log.c
+++ b/builtin/log.c
@@ -1308,7 +1308,7 @@ static struct commit *get_base_commit(const char 
*base_commit,
 
if (rev_nr % 2)
rev[i] = rev[2 * i];
-   rev_nr = (rev_nr + 1) / 2;
+   rev_nr = DIV_ROUND_UP(rev_nr, 2);
}
 
if (!in_merge_bases(base, rev[0]))
diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
index 71c0c768db..cabdc55e09 100644
--- a/builtin/receive-pack.c
+++ b/builtin/receive-pack.c
@@ -1806,7 +1806,7 @@ static const char *unpack_with_sideband(struct 
shallow_info *si)
 static void prepare_shallow_update(struct command *commands,
   struct shallow_info *si)
 {
-   int i, j, k, bitmap_size = (si->ref->nr + 31) / 32;
+   int i, j, k, bitmap_size = DIV_ROUND_UP(si->ref->nr, 32);
 
ALLOC_ARRAY(si->used_shallow, si->shallow->nr);
assign_shallow_commits_to_refs(si, si->used_shallow, NULL);
diff --git a/diff.c b/diff.c
index 00b4c86698..85e714f6c6 100644
--- a/diff.c
+++ b/diff.c
@@ -2095,7 +2095,7 @@ static void show_dirstat_by_line(struct diffstat_t *data, 
struct diff_options *o
 * bytes per "line".
 * This is stupid and ugly, but very cheap...
 */
-   damage = (damage + 63) / 64;
+   damage = DIV_ROUND_UP(damage, 64);
ALLOC_GROW(dir.files, dir.nr + 1, dir.alloc);
dir.files[dir.nr].name = file->name;
dir.files[dir.nr].changed = damage;
diff --git a/ewah/ewah_bitmap.c b/ewah/ewah_bitmap.c
index 2dc9c82ecf..06c479f70a 100644
--- a/ewah/ewah_bitmap.c
+++ b/ewah/ewah_bitmap.c
@@ -210,8 +210,8 @@ size_t ewah_add(struct ewah_bitmap *self, eword_t word)
 void ewah_set(struct ewah_bitmap *self, size_t i)
 {
const size_t dist =
-   (i + BITS_IN_EWORD) / BITS_IN_EWORD -
-   (self->bit_size + BITS_IN_EWORD - 1) / BITS_IN_EWORD;
+   DIV_ROUND_UP(i + 1, BITS_IN_EWORD) -
+   DIV_ROUND_UP(self->bit_size, BITS_IN_EWORD);
 
assert(i >= self->bit_size);
 
diff --git a/imap-send.c b/imap-send.c
index 351e84aea1..b2d0b849bb 100644
--- a/imap-send.c
+++ b/imap-send.c
@@ -861,7 +861,7 @@ static char hexchar(unsigned int b)
return b < 10 ? '0' + b : 'a' + (b - 10);
 }
 
-#define ENCODED_SIZE(n) (4*((n+2)/3))
+#define ENCODED_SIZE(n) (4 * DIV_ROUND_UP((n), 3))
 static char *cram(const char *challenge_64, const char *user, const char *pass)
 {
int i, resp_len, encoded_len, decoded_len;
diff --git a/sha1_name.c b/sha1_name.c
index e7f7b12ceb..8c513dbff6 100644
--- a/sha1_name.c
+++ b/sha1_name.c
@@ -492,7 +492,7 @@ int find_unique_abbrev_r(char *hex, const unsigned char 
*sha1, int len)
 * together we need to divide by 2; but we also want to round
 * odd numbers up, hence adding one before dividing.
 */
-   len = (len + 1) / 2;
+   len = DIV_ROUND_UP(len, 2);
/*
 * For very small repos, we stick with our regular fallback.
 */
diff --git a/shallow.c b/shallow.c
index ef7ca78993..54359d5490 100644
--- a/shallow.c
+++ b/shallow.c
@@ -443,7 +443,7 @@ struct paint_info {
 
 static uint32_t *paint_alloc(struct paint_info *info)
 {
-   

[PATCH] urlmatch: use hex2chr() in append_normalized_escapes()

2017-07-08 Thread René Scharfe
Simplify the code by using hex2chr() to convert and check for invalid
characters at the same time instead of doing that sequentially with
one table lookup for each.

Signed-off-by: Rene Scharfe 
---
 urlmatch.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/urlmatch.c b/urlmatch.c
index 4bbde924e8..3e42bd7504 100644
--- a/urlmatch.c
+++ b/urlmatch.c
@@ -42,12 +42,12 @@ static int append_normalized_escapes(struct strbuf *buf,
 
from_len--;
if (ch == '%') {
-   if (from_len < 2 ||
-   !isxdigit(from[0]) ||
-   !isxdigit(from[1]))
+   if (from_len < 2)
return 0;
-   ch = hexval(*from++) << 4;
-   ch |= hexval(*from++);
+   ch = hex2chr(from);
+   if (ch < 0)
+   return 0;
+   from += 2;
from_len -= 2;
was_esc = 1;
}
-- 
2.13.2


[PATCH] sha1_file: add slash once in for_each_file_in_obj_subdir()

2017-07-08 Thread René Scharfe
Add the slash between loose object subdirectory and file name just once
outside the loop instead of overwriting it with each readdir call.
Redefine baselen as the length with that slash, and add dirlen for the
length without it.  The result is slightly less wasteful and can use the
the cheaper strbuf_addstr instead of strbuf_addf without losing clarity.

Signed-off-by: Rene Scharfe 
---
 sha1_file.c | 11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/sha1_file.c b/sha1_file.c
index 5862386cd0..6a9deb9e68 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -3749,7 +3749,7 @@ int for_each_file_in_obj_subdir(unsigned int subdir_nr,
each_loose_subdir_fn subdir_cb,
void *data)
 {
-   size_t origlen, baselen;
+   size_t origlen, dirlen, baselen;
DIR *dir;
struct dirent *de;
int r = 0;
@@ -3760,7 +3760,7 @@ int for_each_file_in_obj_subdir(unsigned int subdir_nr,
origlen = path->len;
strbuf_complete(path, '/');
strbuf_addf(path, "%02x", subdir_nr);
-   baselen = path->len;
+   dirlen = path->len;
 
dir = opendir(path->buf);
if (!dir) {
@@ -3770,12 +3770,15 @@ int for_each_file_in_obj_subdir(unsigned int subdir_nr,
return r;
}
 
+   strbuf_addch(path, '/');
+   baselen = path->len;
+
while ((de = readdir(dir))) {
if (is_dot_or_dotdot(de->d_name))
continue;
 
strbuf_setlen(path, baselen);
-   strbuf_addf(path, "/%s", de->d_name);
+   strbuf_addstr(path, de->d_name);
 
if (strlen(de->d_name) == GIT_SHA1_HEXSZ - 2)  {
char hex[GIT_MAX_HEXSZ+1];
@@ -3801,7 +3804,7 @@ int for_each_file_in_obj_subdir(unsigned int subdir_nr,
}
closedir(dir);
 
-   strbuf_setlen(path, baselen);
+   strbuf_setlen(path, dirlen);
if (!r && subdir_cb)
r = subdir_cb(subdir_nr, path->buf, data);
 
-- 
2.13.2


[PATCH] apply: use strcmp(3) for comparing strings in gitdiff_verify_name()

2017-07-08 Thread René Scharfe
Avoid running over the end of another -- a C string whose length we
don't know -- by using strcmp(3) instead of memcmp(3) for comparing it
with another C string.

Signed-off-by: Rene Scharfe 
---
 apply.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/apply.c b/apply.c
index 946be4d2f5..9b3df8a3aa 100644
--- a/apply.c
+++ b/apply.c
@@ -962,13 +962,12 @@ static int gitdiff_verify_name(struct apply_state *state,
}
 
if (*name) {
-   int len = strlen(*name);
char *another;
if (isnull)
return error(_("git apply: bad git-diff - expected 
/dev/null, got %s on line %d"),
 *name, state->linenr);
another = find_name(state, line, NULL, state->p_value, 
TERM_TAB);
-   if (!another || memcmp(another, *name, len + 1)) {
+   if (!another || strcmp(another, *name)) {
free(another);
return error((side == DIFF_NEW_NAME) ?
_("git apply: bad git-diff - inconsistent new 
filename on line %d") :
-- 
2.13.2


Re: [PATCH v6 2/2] strbuf: change an always NULL/"" strbuf_addftime() param to bool

2017-07-01 Thread René Scharfe

Am 01.07.2017 um 14:55 schrieb Ævar Arnfjörð Bjarmason:

strbuf_addstr() allows callers to pass a time zone name for expanding

  ^^^
That should be "strbuf_addftime()" instead (my typo), as Junio noted.


%Z. The only current caller either passes the empty string or NULL, in
which case %Z is handed over verbatim to strftime(3).  Replace that
string parameter with a flag controlling whether to remove %Z from the
format specification. This simplifies the code.

Commit-message-by: René Scharfe <l@web.de>
Signed-off-by: Ævar Arnfjörð Bjarmason <ava...@gmail.com>
---

On Sat, Jun 24 2017, Junio C. Hamano jotted:

René Scharfe <l@web.de> writes:

Here's an attempt at a commit message that would have be easier to
understand for me:

   strbuf_addstr() allows callers to pass a time zone name for expanding
   %Z.  The only current caller either passes the empty string or NULL,
   in which case %Z is handed over verbatim to strftime(3).  Replace that
   string parameter with a flag controlling whether to remove %Z from the
   format specification.  This simplifies the code.


I think the first one is strbuf_addftime(); other than that, I think
this version explains what is going on in this patch than the
original.


René


[PATCH] apply: use starts_with() in gitdiff_verify_name()

2017-07-01 Thread René Scharfe
Avoid running over the end of line -- a C string whose length is not
known to this function -- by using starts_with() instead of memcmp(3)
for checking if it starts with "/dev/null".  Also simply include the
newline in the string constant to compare against.  Drop a comment that
just states the obvious.

Signed-off-by: Rene Scharfe 
---
 apply.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/apply.c b/apply.c
index c442b89328..946be4d2f5 100644
--- a/apply.c
+++ b/apply.c
@@ -976,8 +976,7 @@ static int gitdiff_verify_name(struct apply_state *state,
}
free(another);
} else {
-   /* expect "/dev/null" */
-   if (memcmp("/dev/null", line, 9) || line[9] != '\n')
+   if (!starts_with(line, "/dev/null\n"))
return error(_("git apply: bad git-diff - expected 
/dev/null on line %d"), state->linenr);
}
 
-- 
2.13.2


Re: git log use of date format differs between Command Line and script usage.

2017-06-30 Thread René Scharfe

Am 30.06.2017 um 18:06 schrieb Shaun Uldrikis:

If you supply a non-standard format to the date configuration for git
log, something like:
[log]
 date = format:%Y-%m-%d %H:%M

then, when you run 'git log' inside a script, or when using gitk
(anywhere), it fails on decoding the format.

fatal: unknown date format format: %Y-%m-%d %H:%M


However, that format works correctly on the command line.
I do not have a patch to address this issue.


I guess you have two versions of git on your system, and the one used in
scripts is older than 2.6.0, which introduced this feature.

René


Re: [PATCH v4 4/6] coccinelle: add a rule to make "expression" code use FREE_AND_NULL()

2017-06-28 Thread René Scharfe

Am 29.06.2017 um 00:21 schrieb René Scharfe:

$ git am
# pasted my email
Applying: coccinelle: add a rule to make "expression" code use FREE_AND_NULL()
# results in commit message with scissor line, interesting..


"git am --scissors" commits with the correct message.  I'll do a
"git config --local --add mailinfo.scissors true" real quick..

René


Re: [PATCH v4 4/6] coccinelle: add a rule to make "expression" code use FREE_AND_NULL()

2017-06-28 Thread René Scharfe

Am 29.06.2017 um 00:21 schrieb René Scharfe:

Am 28.06.2017 um 23:39 schrieb Ævar Arnfjörð Bjarmason:


On Sun, Jun 25 2017, René Scharfe jotted:


Am 16.06.2017 um 21:43 schrieb Junio C Hamano:

Ævar Arnfjörð Bjarmason  <ava...@gmail.com> writes:


A follow-up to the existing "type" rule added in an earlier
change. This catches some occurrences that are missed by the previous
rule.

Signed-off-by: Ævar Arnfjörð Bjarmason <ava...@gmail.com>
---


Hmph, I wonder if the "type" thing is really needed.  Over there,
"ptr" is an expression and we can find "free(ptr); ptr = NULL" with
the rule in this patch already, no?


Indeed.  How about this on top of master?

-- >8 --
Subject: [PATCH] coccinelle: polish FREE_AND_NULL rules

There are two rules for using FREE_AND_NULL in free.cocci, one for
pointer types and one for expressions.  Both cause coccinelle to remove
empty lines and even newline characters between replacements for some
reason; consecutive "free(x);/x=NULL;" sequences end up as multiple
FREE_AND_NULL calls on the same time.

Remove the type rule, as the expression rule already covers it, and
rearrange the lines of the latter to place the addition of FREE_AND_NULL
between the removals, which causes coccinelle to leave surrounding
whitespace untouched.

Signed-off-by: Rene Scharfe <l@web.de>
---
   contrib/coccinelle/free.cocci | 10 +-
   1 file changed, 1 insertion(+), 9 deletions(-)

diff --git a/contrib/coccinelle/free.cocci b/contrib/coccinelle/free.cocci
index f2d97e755b..4490069df9 100644
--- a/contrib/coccinelle/free.cocci
+++ b/contrib/coccinelle/free.cocci
@@ -11,16 +11,8 @@ expression E;
 free(E);

   @@
-type T;
-T *ptr;
-@@
-- free(ptr);
-- ptr = NULL;
-+ FREE_AND_NULL(ptr);
-
-@@
   expression E;
   @@
   - free(E);
-- E = NULL;
   + FREE_AND_NULL(E);
+- E = NULL;


Late reply, sorry. What version of coccinelle are you running? I have
1.0.4 (from Debian) and can't get this to produce the same results as
what I have.

On top of next, I did:

  Revert "*.[ch] refactoring: make use of the FREE_AND_NULL() macro"
  Revert "coccinelle: make use of the "expression" FREE_AND_NULL() rule"
  Revert "coccinelle: make use of the "type" FREE_AND_NULL() rule"

And then generated the patch as usual with `make coccicheck`, and
applied it. This has your change.

I then re-applied the manual "*.[ch] refactoring" changes

This results in this diff with next:

  $ git diff --stat origin/next.. -- '*.[ch]'
   builtin/am.c |  3 ++-
   builtin/clean.c  |  6 --
   builtin/config.c |  6 --
   builtin/index-pack.c |  6 --
   builtin/pack-objects.c   | 12 
   builtin/unpack-objects.c |  3 ++-
   fast-import.c|  6 --
   http-push.c  | 24 
   http.c   | 15 ++-
   imap-send.c  |  3 ++-
   ref-filter.c |  3 ++-
   refs/files-backend.c |  3 ++-
   remote-testsvn.c |  3 ++-
   sequencer.c  |  3 ++-
   sha1-array.c |  3 ++-
   sha1_file.c  |  3 ++-
   transport-helper.c   | 27 ++-
   transport.c  |  3 ++-
   tree-diff.c  |  6 --
   tree.c   |  3 ++-
   20 files changed, 94 insertions(+), 47 deletions(-)

These are all cases where we now miss things that should use
FREE_AND_NULL(), e.g.:

  diff --git a/builtin/am.c b/builtin/am.c
  index c973bd96dc..2f89338ed7 100644
  --- a/builtin/am.c
  +++ b/builtin/am.c
  @@ -484,7 +484,8 @@ static int run_applypatch_msg_hook(struct am_state 
*state)
  ret = run_hook_le(NULL, "applypatch-msg", am_path(state, 
"final-commit"), NULL);

  if (!ret) {
  -   FREE_AND_NULL(state->msg);
  +   free(state->msg);
  +   state->msg = NULL;
  if (read_commit_msg(state) < 0)
  die(_("'%s' was deleted by the applypatch-msg 
hook"),
  am_path(state, "final-commit"));

So it looks to me like removing the "type T" rule breaks a lot of
things, but that the change you made to the expression rule is good, and
we should do that for the "type" rule as well. Your commit says the
"expression rule already covers it", but this doesn't seem to be the
case at all.

As an aside: Junio, did you mean to apply f8bb4631fb to next this way?
Looks like a mis-applied scissor commit.


I can't reproduce that strange result with master, but get a scissors
line into the commit message as well.  The diff at the end looks
reasonable (contains just the manual stuff).  Here's what I did:


On top of next it looks basically the same for me, only difference
being several new converted cases in repository.c.

Did you commit before diff?

René


Re: [PATCH v4 4/6] coccinelle: add a rule to make "expression" code use FREE_AND_NULL()

2017-06-28 Thread René Scharfe
Am 28.06.2017 um 23:39 schrieb Ævar Arnfjörð Bjarmason:
> 
> On Sun, Jun 25 2017, René Scharfe jotted:
> 
>> Am 16.06.2017 um 21:43 schrieb Junio C Hamano:
>>> Ævar Arnfjörð Bjarmason  <ava...@gmail.com> writes:
>>>
>>>> A follow-up to the existing "type" rule added in an earlier
>>>> change. This catches some occurrences that are missed by the previous
>>>> rule.
>>>>
>>>> Signed-off-by: Ævar Arnfjörð Bjarmason <ava...@gmail.com>
>>>> ---
>>>
>>> Hmph, I wonder if the "type" thing is really needed.  Over there,
>>> "ptr" is an expression and we can find "free(ptr); ptr = NULL" with
>>> the rule in this patch already, no?
>>
>> Indeed.  How about this on top of master?
>>
>> -- >8 --
>> Subject: [PATCH] coccinelle: polish FREE_AND_NULL rules
>>
>> There are two rules for using FREE_AND_NULL in free.cocci, one for
>> pointer types and one for expressions.  Both cause coccinelle to remove
>> empty lines and even newline characters between replacements for some
>> reason; consecutive "free(x);/x=NULL;" sequences end up as multiple
>> FREE_AND_NULL calls on the same time.
>>
>> Remove the type rule, as the expression rule already covers it, and
>> rearrange the lines of the latter to place the addition of FREE_AND_NULL
>> between the removals, which causes coccinelle to leave surrounding
>> whitespace untouched.
>>
>> Signed-off-by: Rene Scharfe <l@web.de>
>> ---
>>   contrib/coccinelle/free.cocci | 10 +-
>>   1 file changed, 1 insertion(+), 9 deletions(-)
>>
>> diff --git a/contrib/coccinelle/free.cocci b/contrib/coccinelle/free.cocci
>> index f2d97e755b..4490069df9 100644
>> --- a/contrib/coccinelle/free.cocci
>> +++ b/contrib/coccinelle/free.cocci
>> @@ -11,16 +11,8 @@ expression E;
>> free(E);
>>
>>   @@
>> -type T;
>> -T *ptr;
>> -@@
>> -- free(ptr);
>> -- ptr = NULL;
>> -+ FREE_AND_NULL(ptr);
>> -
>> -@@
>>   expression E;
>>   @@
>>   - free(E);
>> -- E = NULL;
>>   + FREE_AND_NULL(E);
>> +- E = NULL;
> 
> Late reply, sorry. What version of coccinelle are you running? I have
> 1.0.4 (from Debian) and can't get this to produce the same results as
> what I have.
> 
> On top of next, I did:
> 
>  Revert "*.[ch] refactoring: make use of the FREE_AND_NULL() macro"
>  Revert "coccinelle: make use of the "expression" FREE_AND_NULL() 
> rule"
>  Revert "coccinelle: make use of the "type" FREE_AND_NULL() rule"
> 
> And then generated the patch as usual with `make coccicheck`, and
> applied it. This has your change.
> 
> I then re-applied the manual "*.[ch] refactoring" changes
> 
> This results in this diff with next:
> 
>  $ git diff --stat origin/next.. -- '*.[ch]'
>   builtin/am.c |  3 ++-
>   builtin/clean.c  |  6 --
>   builtin/config.c |  6 --
>   builtin/index-pack.c |  6 --
>   builtin/pack-objects.c   | 12 
>   builtin/unpack-objects.c |  3 ++-
>   fast-import.c|  6 --
>   http-push.c  | 24 
>   http.c   | 15 ++-
>   imap-send.c  |  3 ++-
>   ref-filter.c |  3 ++-
>   refs/files-backend.c |  3 ++-
>   remote-testsvn.c |  3 ++-
>   sequencer.c  |  3 ++-
>   sha1-array.c |  3 ++-
>   sha1_file.c  |  3 ++-
>   transport-helper.c   | 27 ++-
>   transport.c  |  3 ++-
>   tree-diff.c  |  6 --
>   tree.c   |  3 ++-
>   20 files changed, 94 insertions(+), 47 deletions(-)
> 
> These are all cases where we now miss things that should use
> FREE_AND_NULL(), e.g.:
> 
>  diff --git a/builtin/am.c b/builtin/am.c
>  index c973bd96dc..2f89338ed7 100644
>  --- a/builtin/am.c
>  +++ b/builtin/am.c
>  @@ -484,7 +484,8 @@ static int run_applypatch_msg_hook(struct am_state 
> *state)
>  ret = run_hook_le(NULL, "applypatch-msg", am_path(state, 
> "final-commit"), NULL);
> 
>  if (!ret) {
>  -   FREE_AND_NULL(state->msg);
>  +   free(state->msg);
>  +   state->msg = NULL;
>  if (re

Re: [PATCH 2/2] apply: handle assertion failure gracefully

2017-06-27 Thread René Scharfe
Am 27.06.2017 um 20:08 schrieb Junio C Hamano:
> René Scharfe <l@web.de> writes:
> 
>> Thought a bit more about it, and as a result here's a simpler approach:
>>
>> -- >8 --
>> Subject: [PATCH] apply: check git diffs for mutually exclusive header lines
>>
>> A file can either be added, removed, copied, or renamed, but no two of
>> these actions can be done by the same patch.  Some of these combinations
>> provoke error messages due to missing file names, and some are only
>> caught by an assertion.  Check git patches already as they are parsed
>> and report conflicting lines on sight.
>>
>> Found by Vegard Nossum using AFL.
>>
>> Reported-by: Vegard Nossum <vegard.nos...@oracle.com>
>> Signed-off-by: Rene Scharfe <l@web.de>
>> ---
>>   apply.c| 14 ++
>>   apply.h|  1 +
>>   t/t4136-apply-check.sh | 18 ++
>>   3 files changed, 33 insertions(+)
>>
>> diff --git a/apply.c b/apply.c
>> index 8cd6435c74..8a5e44c474 100644
>> --- a/apply.c
>> +++ b/apply.c
>> @@ -1312,6 +1312,18 @@ static char *git_header_name(struct apply_state 
>> *state,
>>  }
>>   }
>>   
>> +static int check_header_line(struct apply_state *state, struct patch *patch)
>> +{
>> +int extensions = (patch->is_delete == 1) + (patch->is_new == 1) +
>> + (patch->is_rename == 1) + (patch->is_copy == 1);
>> +if (extensions > 1)
>> +return error(_("inconsistent header lines %d and %d"),
>> + state->extension_linenr, state->linenr);
>> +if (extensions && !state->extension_linenr)
>> +state->extension_linenr = state->linenr;
> 
> OK.  I wondered briefly what happens if the first git_header that
> sets one of the extensions can be at line 0 (calusng
> state->extension_linenr to be set to 0), but even in that case, the
> second problematic one will correctly report the 0th and its own
> line as culprit, so this is OK.  It makes me question if there is
> any point checking !state->extension_linenr in the if() statement,
> though.

It makes sure that ->extension_linenr is set to the first line number of
an extension (like, say, "copy from") and not advanced again (e.g. when
we hit "copy to", or more importantly some line like "similarity index"
which does not set one of the extension bits and thus can't actually
cause a conflict).

Hmm, pondering that, it seems I forgot to reset its value after each
patch.  Or better just move it into struct patch, next to the extension
bits:

-- >8 --
Subject: fixup! apply: check git diffs for mutually exclusive header lines
---
 apply.c | 7 ---
 apply.h | 1 -
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/apply.c b/apply.c
index db38bc3cdd..c442b89328 100644
--- a/apply.c
+++ b/apply.c
@@ -211,6 +211,7 @@ struct patch {
unsigned ws_rule;
int lines_added, lines_deleted;
int score;
+   int extension_linenr; /* first line specifying delete/new/rename/copy */
unsigned int is_toplevel_relative:1;
unsigned int inaccurate_eof:1;
unsigned int is_binary:1;
@@ -1325,9 +1326,9 @@ static int check_header_line(struct apply_state *state, 
struct patch *patch)
 (patch->is_rename == 1) + (patch->is_copy == 1);
if (extensions > 1)
return error(_("inconsistent header lines %d and %d"),
-state->extension_linenr, state->linenr);
-   if (extensions && !state->extension_linenr)
-   state->extension_linenr = state->linenr;
+patch->extension_linenr, state->linenr);
+   if (extensions && !patch->extension_linenr)
+   patch->extension_linenr = state->linenr;
return 0;
 }
 
diff --git a/apply.h b/apply.h
index b52078b486..b3d6783d55 100644
--- a/apply.h
+++ b/apply.h
@@ -79,7 +79,6 @@ struct apply_state {
 
/* Various "current state" */
int linenr; /* current line number */
-   int extension_linenr; /* first line specifying delete/new/rename/copy */
struct string_list symlink_changes; /* we have to track symlinks */
 
/*
-- 
2.13.2


Re: [PATCH 2/2] apply: handle assertion failure gracefully

2017-06-27 Thread René Scharfe
Am 25.02.2017 um 11:13 schrieb Vegard Nossum:
> For the patches in the added testcases, we were crashing with:
> 
>  git-apply: apply.c:3665: check_preimage: Assertion `patch->is_new <= 0' 
> failed.


> diff --git a/t/t4154-apply-git-header.sh b/t/t4154-apply-git-header.sh
> index d651af4a2..c440c48ad 100755
> --- a/t/t4154-apply-git-header.sh
> +++ b/t/t4154-apply-git-header.sh
> @@ -12,4 +12,40 @@ rename new 0
>   EOF
>   '
>   
> +test_expect_success 'apply deleted file mode / new file mode / wrong mode' '
> + test_must_fail git apply << EOF
> +diff --git a/. b/.
> +deleted file mode
> +new file mode
> +EOF
> +'

-- >8 --
Subject: [PATCH] apply: check git diffs for invalid file modes

An empty string as mode specification is accepted silently by git apply,
as Vegard Nossum found out using AFL.  It's interpreted as zero.  Reject
such bogus file modes, and only accept ones consisting exclusively of
octal digits.

Reported-by: Vegard Nossum 
Signed-off-by: Rene Scharfe 
---
 apply.c   | 17 -
 t/t4129-apply-samemode.sh | 16 +++-
 2 files changed, 27 insertions(+), 6 deletions(-)

diff --git a/apply.c b/apply.c
index 8a5e44c474..db38bc3cdd 100644
--- a/apply.c
+++ b/apply.c
@@ -1001,20 +1001,27 @@ static int gitdiff_newname(struct apply_state *state,
   DIFF_NEW_NAME);
 }
 
+static int parse_mode_line(const char *line, int linenr, unsigned int *mode)
+{
+   char *end;
+   *mode = strtoul(line, , 8);
+   if (end == line || !isspace(*end))
+   return error(_("invalid mode on line %d: %s"), linenr, line);
+   return 0;
+}
+
 static int gitdiff_oldmode(struct apply_state *state,
   const char *line,
   struct patch *patch)
 {
-   patch->old_mode = strtoul(line, NULL, 8);
-   return 0;
+   return parse_mode_line(line, state->linenr, >old_mode);
 }
 
 static int gitdiff_newmode(struct apply_state *state,
   const char *line,
   struct patch *patch)
 {
-   patch->new_mode = strtoul(line, NULL, 8);
-   return 0;
+   return parse_mode_line(line, state->linenr, >new_mode);
 }
 
 static int gitdiff_delete(struct apply_state *state,
@@ -1128,7 +1135,7 @@ static int gitdiff_index(struct apply_state *state,
memcpy(patch->new_sha1_prefix, line, len);
patch->new_sha1_prefix[len] = 0;
if (*ptr == ' ')
-   patch->old_mode = strtoul(ptr+1, NULL, 8);
+   return gitdiff_oldmode(state, ptr + 1, patch);
return 0;
 }
 
diff --git a/t/t4129-apply-samemode.sh b/t/t4129-apply-samemode.sh
index c268298eaf..5cdd76dfa7 100755
--- a/t/t4129-apply-samemode.sh
+++ b/t/t4129-apply-samemode.sh
@@ -13,7 +13,9 @@ test_expect_success setup '
echo modified >file &&
git diff --stat -p >patch-0.txt &&
chmod +x file &&
-   git diff --stat -p >patch-1.txt
+   git diff --stat -p >patch-1.txt &&
+   sed "s/^\(new mode \).*/\1/" patch-empty-mode.txt &&
+   sed "s/^\(new mode \).*/\1garbage/" patch-bogus-mode.txt
 '
 
 test_expect_success FILEMODE 'same mode (no index)' '
@@ -59,4 +61,16 @@ test_expect_success FILEMODE 'mode update (index only)' '
git ls-files -s file | grep "^100755"
 '
 
+test_expect_success FILEMODE 'empty mode is rejected' '
+   git reset --hard &&
+   test_must_fail git apply patch-empty-mode.txt 2>err &&
+   test_i18ngrep "invalid mode" err
+'
+
+test_expect_success FILEMODE 'bogus mode is rejected' '
+   git reset --hard &&
+   test_must_fail git apply patch-bogus-mode.txt 2>err &&
+   test_i18ngrep "invalid mode" err
+'
+
 test_done
-- 
2.13.2


Re: [PATCH 1/2] apply: guard against renames of non-existant empty files

2017-06-27 Thread René Scharfe
Am 27.02.2017 um 23:18 schrieb René Scharfe:
> Am 27.02.2017 um 21:10 schrieb Junio C Hamano:
>> René Scharfe <l@web.de> writes:
>>
>>> Would it make sense to mirror the previously existing condition and
>>> check for is_new instead?  I.e.:
>>>
>>> if ((!patch->is_delete && !patch->new_name) ||
>>> (!patch->is_new&& !patch->old_name)) {
>>>
>>
>> Yes, probably.

So let's actually do it!

-- >8 --
Subject: [PATCH] apply: check git diffs for missing old filenames

2c93286a (fix "git apply --index ..." not to deref NULL) added a check
for git patches missing a +++ line, preventing a segfault.  Check for
missing --- lines as well, and add a test for each case.

Found by Vegard Nossum using AFL.

Original-patch-by: Vegard Nossum <vegard.nos...@oracle.com>
Signed-off-by: Rene Scharfe <l@web.de>
---
 apply.c|  3 ++-
 t/t4133-apply-filenames.sh | 24 
 2 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/apply.c b/apply.c
index b963d7d8fb..8cd6435c74 100644
--- a/apply.c
+++ b/apply.c
@@ -1575,7 +1575,8 @@ static int find_header(struct apply_state *state,
patch->old_name = xstrdup(patch->def_name);
patch->new_name = xstrdup(patch->def_name);
}
-   if (!patch->is_delete && !patch->new_name) {
+   if ((!patch->new_name && !patch->is_delete) ||
+   (!patch->old_name && !patch->is_new)) {
error(_("git diff header lacks filename 
information "
 "(line %d)"), state->linenr);
return -128;
diff --git a/t/t4133-apply-filenames.sh b/t/t4133-apply-filenames.sh
index 2ecb4216b7..c5ed3b17c4 100755
--- a/t/t4133-apply-filenames.sh
+++ b/t/t4133-apply-filenames.sh
@@ -35,4 +35,28 @@ test_expect_success 'apply diff with inconsistent filenames 
in headers' '
test_i18ngrep "inconsistent old filename" err
 '
 
+test_expect_success 'apply diff with new filename missing from headers' '
+   cat >missing_new_filename.diff <<-\EOF &&
+   diff --git a/f b/f
+   index 000..d00491f
+   --- a/f
+   @@ -0,0 +1 @@
+   +1
+   EOF
+   test_must_fail git apply missing_new_filename.diff 2>err &&
+   test_i18ngrep "lacks filename information" err
+'
+
+test_expect_success 'apply diff with old filename missing from headers' '
+   cat >missing_old_filename.diff <<-\EOF &&
+   diff --git a/f b/f
+   index d00491f..000
+   +++ b/f
+   @@ -1 +0,0 @@
+   -1
+   EOF
+   test_must_fail git apply missing_old_filename.diff 2>err &&
+   test_i18ngrep "lacks filename information" err
+'
+
 test_done
-- 
2.13.2


Re: [PATCH 2/2] apply: handle assertion failure gracefully

2017-06-27 Thread René Scharfe
Am 28.02.2017 um 11:50 schrieb René Scharfe:
> Am 27.02.2017 um 23:33 schrieb Junio C Hamano:
>> René Scharfe <l@web.de> writes:
>>
>>> Am 27.02.2017 um 21:04 schrieb Junio C Hamano:
>>>> René Scharfe <l@web.de> writes:
>>>>
>>>>>> diff --git a/apply.c b/apply.c
>>>>>> index cbf7cc7f2..9219d2737 100644
>>>>>> --- a/apply.c
>>>>>> +++ b/apply.c
>>>>>> @@ -3652,7 +3652,6 @@ static int check_preimage(struct apply_state 
>>>>>> *state,
>>>>>>  if (!old_name)
>>>>>>  return 0;
>>>>>>
>>>>>> -assert(patch->is_new <= 0);
>>>>>
>>>>> 5c47f4c6 (builtin-apply: accept patch to an empty file) added that
>>>>> line. Its intent was to handle diffs that contain an old name even for
>>>>> a file that's created.  Citing from its commit message: "When we
>>>>> cannot be sure by parsing the patch that it is not a creation patch,
>>>>> we shouldn't complain when if there is no such a file."  Why not stop
>>>>> complaining also in case we happen to know for sure that it's a
>>>>> creation patch? I.e., why not replace the assert() with:
>>>>>
>>>>>   if (patch->is_new == 1)
>>>>>   goto is_new;
>>>>>
>>>>>>  previous = previous_patch(state, patch, );
>>>>
>>>> When the caller does know is_new is true, old_name must be made/left
>>>> NULL.  That is the invariant this assert is checking to catch an
>>>> error in the calling code.
>>>
>>> There are some places in apply.c that set ->is_new to 1, but none of
>>> them set ->old_name to NULL at the same time.
>>
>> I thought all of these are flipping ->is_new that used to be -1
>> (unknown) to (now we know it is new), and sets only new_name without
>> doing anything to old_name, because they know originally both names
>> are set to NULL.
>>
>>> Having to keep these two members in sync sounds iffy anyway.  Perhaps
>>> accessors can help, e.g. a setter which frees old_name when is_new is
>>> set to 1, or a getter which returns NULL for old_name if is_new is 1.
>>
>> Definitely, the setter would make it harder to make the mistake.
> 
> When I added setters, apply started to passed NULL to unlink(2) and
> rmdir(2) in some of the new tests, which still failed.
> 
> That's because three of the diffs trigger both gitdiff_delete(), which
> sets is_delete and old_name, and gitdiff_newfile(), which sets is_new
> and new_name.  Create and delete equals move, right?  Or should we
> error out at this point already?
> 
> The last new diff adds a new file that is copied.  Sounds impossible.
> How about something like this, which forbids combinations that make no
> sense.  Hope it's not too strict; at least all tests succeed.
> 
> ---
>   apply.c | 79 
> ++---
>   1 file changed, 61 insertions(+), 18 deletions(-)

Thought a bit more about it, and as a result here's a simpler approach:

-- >8 --
Subject: [PATCH] apply: check git diffs for mutually exclusive header lines

A file can either be added, removed, copied, or renamed, but no two of
these actions can be done by the same patch.  Some of these combinations
provoke error messages due to missing file names, and some are only
caught by an assertion.  Check git patches already as they are parsed
and report conflicting lines on sight.

Found by Vegard Nossum using AFL.

Reported-by: Vegard Nossum <vegard.nos...@oracle.com>
Signed-off-by: Rene Scharfe <l@web.de>
---
 apply.c| 14 ++
 apply.h|  1 +
 t/t4136-apply-check.sh | 18 ++
 3 files changed, 33 insertions(+)

diff --git a/apply.c b/apply.c
index 8cd6435c74..8a5e44c474 100644
--- a/apply.c
+++ b/apply.c
@@ -1312,6 +1312,18 @@ static char *git_header_name(struct apply_state *state,
}
 }
 
+static int check_header_line(struct apply_state *state, struct patch *patch)
+{
+   int extensions = (patch->is_delete == 1) + (patch->is_new == 1) +
+(patch->is_rename == 1) + (patch->is_copy == 1);
+   if (extensions > 1)
+   return error(_("inconsistent header lines %d and %d"),
+state->extension_linenr, state->linenr);
+   if (extensions && !state->extension_linenr)
+   state->extension_linenr = state->linenr;
+   re

Re: [PATCH v4 4/6] coccinelle: add a rule to make "expression" code use FREE_AND_NULL()

2017-06-25 Thread René Scharfe
Am 16.06.2017 um 21:43 schrieb Junio C Hamano:
> Ævar Arnfjörð Bjarmason   writes:
> 
>> A follow-up to the existing "type" rule added in an earlier
>> change. This catches some occurrences that are missed by the previous
>> rule.
>>
>> Signed-off-by: Ævar Arnfjörð Bjarmason 
>> ---
> 
> Hmph, I wonder if the "type" thing is really needed.  Over there,
> "ptr" is an expression and we can find "free(ptr); ptr = NULL" with
> the rule in this patch already, no?

Indeed.  How about this on top of master?

-- >8 --
Subject: [PATCH] coccinelle: polish FREE_AND_NULL rules

There are two rules for using FREE_AND_NULL in free.cocci, one for
pointer types and one for expressions.  Both cause coccinelle to remove
empty lines and even newline characters between replacements for some
reason; consecutive "free(x);/x=NULL;" sequences end up as multiple
FREE_AND_NULL calls on the same time.

Remove the type rule, as the expression rule already covers it, and
rearrange the lines of the latter to place the addition of FREE_AND_NULL
between the removals, which causes coccinelle to leave surrounding
whitespace untouched.

Signed-off-by: Rene Scharfe 
---
 contrib/coccinelle/free.cocci | 10 +-
 1 file changed, 1 insertion(+), 9 deletions(-)

diff --git a/contrib/coccinelle/free.cocci b/contrib/coccinelle/free.cocci
index f2d97e755b..4490069df9 100644
--- a/contrib/coccinelle/free.cocci
+++ b/contrib/coccinelle/free.cocci
@@ -11,16 +11,8 @@ expression E;
   free(E);
 
 @@
-type T;
-T *ptr;
-@@
-- free(ptr);
-- ptr = NULL;
-+ FREE_AND_NULL(ptr);
-
-@@
 expression E;
 @@
 - free(E);
-- E = NULL;
 + FREE_AND_NULL(E);
+- E = NULL;
-- 
2.13.2



Re: [PATCH] sha1_name: cache readdir(3) results in find_short_object_filename()

2017-06-24 Thread René Scharfe
Am 24.06.2017 um 14:20 schrieb Jeff King:
> On Sat, Jun 24, 2017 at 02:12:30PM +0200, René Scharfe wrote:
> 
>>> That's redundant with "subdir_nr". Should we push that logic down into
>>> the function, and basically do:
>>>
>>> if (subdir_nr < 0 || subdir_nr > 256)
>>> BUG("object subdir number out of range");
>>
>> Hmm.  I don't expect many more callers, so do we really need this safety
>> check?  It's cheap compared to the readdir(3) call, of course.
> 
> To me it's as much about documenting the assumptions as it is about
> catching buggy callers. I'd be OK with a comment, too.

I didn't catch the off-by-one error above in the first reading.  Did you
add it intentionally? ;-)  In any case, I'm convinced now that we need
such a check, but with hexadecimal notation to avoid such mistakes.

>> But wouldn't it make more sense to use an unsigned data type to avoid
>> the first half?  And an unsigned char specifically to only accept valid
>> values, leaving the overflow concern to the caller?  It warrants a
>> separate patch in any case IMHO.
> 
> Using unsigned makes sense. Using "char" because you know it only goes
> to 256 is a bit too subtle for my taste. And yes, I'd do it in a
> preparatory patch (or follow-on if you prefer).

It's subtle on the caller's side, as big numbers would just wrap if the
programmer ignored the limits of the type.  On the callee's side it's
fundamental, though.

Anyway, here's a patch on top of the others.

-- >8 --
Subject: [PATCH] sha1_file: guard against invalid loose subdirectory numbers

Loose object subdirectories have hexadecimal names based on the first
byte of the hash of contained objects, thus their numerical
representation can range from 0 (0x00) to 255 (0xff).  Change the type
of the corresponding variable in for_each_file_in_obj_subdir() and
associated callback functions to unsigned int and add a range check.

Suggested-by: Jeff King <p...@peff.net>
Signed-off-by: Rene Scharfe <l@web.de>
---
 builtin/fsck.c | 2 +-
 builtin/prune-packed.c | 2 +-
 builtin/prune.c| 2 +-
 cache.h| 4 ++--
 sha1_file.c| 5 -
 5 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index 3a2c27f241..5d113f8bc0 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -536,7 +536,7 @@ static int fsck_cruft(const char *basename, const char 
*path, void *data)
return 0;
 }
 
-static int fsck_subdir(int nr, const char *path, void *progress)
+static int fsck_subdir(unsigned int nr, const char *path, void *progress)
 {
display_progress(progress, nr + 1);
return 0;
diff --git a/builtin/prune-packed.c b/builtin/prune-packed.c
index c026299e78..ac978ad401 100644
--- a/builtin/prune-packed.c
+++ b/builtin/prune-packed.c
@@ -10,7 +10,7 @@ static const char * const prune_packed_usage[] = {
 
 static struct progress *progress;
 
-static int prune_subdir(int nr, const char *path, void *data)
+static int prune_subdir(unsigned int nr, const char *path, void *data)
 {
int *opts = data;
display_progress(progress, nr + 1);
diff --git a/builtin/prune.c b/builtin/prune.c
index f0e2bff04c..c378690545 100644
--- a/builtin/prune.c
+++ b/builtin/prune.c
@@ -68,7 +68,7 @@ static int prune_cruft(const char *basename, const char 
*path, void *data)
return 0;
 }
 
-static int prune_subdir(int nr, const char *path, void *data)
+static int prune_subdir(unsigned int nr, const char *path, void *data)
 {
if (!show_only)
rmdir(path);
diff --git a/cache.h b/cache.h
index 00a017dfcb..04dd2961ad 100644
--- a/cache.h
+++ b/cache.h
@@ -1819,10 +1819,10 @@ typedef int each_loose_object_fn(const struct object_id 
*oid,
 typedef int each_loose_cruft_fn(const char *basename,
const char *path,
void *data);
-typedef int each_loose_subdir_fn(int nr,
+typedef int each_loose_subdir_fn(unsigned int nr,
 const char *path,
 void *data);
-int for_each_file_in_obj_subdir(int subdir_nr,
+int for_each_file_in_obj_subdir(unsigned int subdir_nr,
struct strbuf *path,
each_loose_object_fn obj_cb,
each_loose_cruft_fn cruft_cb,
diff --git a/sha1_file.c b/sha1_file.c
index 98ce85acf9..90b9414170 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -3735,7 +3735,7 @@ void assert_sha1_type(const unsigned char *sha1, enum 
object_type expect)
typename(expect));
 }
 
-int for_each_file_in_obj_subdir(int subdir_nr,
+int for_each_file_in_obj_subdir(unsigned int subdir_nr,
struct strbuf *path,
each_loose_object_fn obj_cb,
  

Re: [PATCH v5 2/2] strbuf: change an always NULL/"" strbuf_addftime() param to bool

2017-06-24 Thread René Scharfe

Am 24.06.2017 um 14:14 schrieb Ævar Arnfjörð Bjarmason:

Change the code for deciding what's to be done about %Z to stop
passing always either a NULL or "" char * to
strbuf_addftime(). Instead pass a boolean int to indicate whether the
strftime() %Z format should be suppressed by converting it to an empty
string, which is what this code is actually doing.

This code grew organically between the changes in 9eafe86d58 ("Merge
branch 'rs/strbuf-addftime-zZ'", 2017-06-22). The intent was to use
this API in the future to pass a custom leave the door open to pass a
custom timezone name to the function (see my [1] and related
messages).


"leave the door open to pass a" seems redundant.


But that's not what this code does now, and this strbuf_addstr() call
always being redundant makes it hard to understand the current
functionality. So simplify this internal API to match its use, we can
always change it in the future if it gets a different use-case.


I don't understand the confusion, but of course I'm biased. And I don't
like binary parameters in general and would use named flags or two
function names in most cases.  But that aside I find the description
hard to follow (perhaps I should do something about my attention span).

Here's an attempt at a commit message that would have be easier to
understand for me:

  strbuf_addstr() allows callers to pass a time zone name for expanding
  %Z.  The only current caller either passes the empty string or NULL,
  in which case %Z is handed over verbatim to strftime(3).  Replace that
  string parameter with a flag controlling whether to remove %Z from the
  format specification.  This simplifies the code.

René


Re: [PATCH] sha1_name: cache readdir(3) results in find_short_object_filename()

2017-06-24 Thread René Scharfe
Am 23.06.2017 um 01:10 schrieb Jeff King:
> On Thu, Jun 22, 2017 at 08:19:48PM +0200, René Scharfe wrote:
>> @@ -1811,6 +1822,12 @@ typedef int each_loose_cruft_fn(const char *basename,
>>   typedef int each_loose_subdir_fn(int nr,
>>   const char *path,
>>   void *data);
>> +int for_each_file_in_obj_subdir(int subdir_nr,
>> +struct strbuf *path,
>> +each_loose_object_fn obj_cb,
>> +each_loose_cruft_fn cruft_cb,
>> +each_loose_subdir_fn subdir_cb,
>> +void *data);
> 
> Now that this is becoming public, I think we need to document what
> should be in "path" here. It is the complete path, including the 2-hex
> subdir.
> 
> That's redundant with "subdir_nr". Should we push that logic down into
> the function, and basically do:
> 
>if (subdir_nr < 0 || subdir_nr > 256)
>   BUG("object subdir number out of range");

Hmm.  I don't expect many more callers, so do we really need this safety
check?  It's cheap compared to the readdir(3) call, of course.

But wouldn't it make more sense to use an unsigned data type to avoid
the first half?  And an unsigned char specifically to only accept valid
values, leaving the overflow concern to the caller?  It warrants a
separate patch in any case IMHO.

>orig_len = path->len;
>strbuf_addf(path, "/%02x", subdir_nr);
>baselen = path->len;
> 
>...
> 
>strbuf_setlen(path, orig_len);
> 
> That's one less thing for the caller to worry about, and it's easy to
> explain the argument as "the path to the object directory root".
> 
>> +if (!alt->loose_objects_subdir_seen[subdir_nr]) {
>> +struct strbuf *buf = alt_scratch_buf(alt);
>> +strbuf_addf(buf, "%02x/", subdir_nr);
>> +for_each_file_in_obj_subdir(subdir_nr, buf,
>> +append_loose_object,
>> +NULL, NULL,
>> +>loose_objects_cache);
> 
> I think the trailing slash here is superfluous. If you take my
> suggestion above, that line goes away. But then the front slash it adds
> becomes superfluous. It should probably just do:
> 
>strbuf_complete(path, '/');
>strbuf_addf(path, "%02x", subdir_nr);

So how about this then as a follow-up patch?

-- >8 --
Subject: [PATCH] sha1_file: let for_each_file_in_obj_subdir() handle subdir 
names

The function for_each_file_in_obj_subdir() takes a object subdirectory
number and expects the name of the same subdirectory to be included in
the path strbuf.  Avoid this redundancy by letting the function append
the hexadecimal subdirectory name itself.  This makes it a bit easier
and safer to use the function -- it becomes impossible to specify
different subdirectories in subdir_nr and path.

Suggested-by: Jeff King <p...@peff.net>
Signed-off-by: Rene Scharfe <l@web.de>
---
 sha1_file.c | 22 ++
 sha1_name.c |  1 -
 2 files changed, 14 insertions(+), 9 deletions(-)

diff --git a/sha1_file.c b/sha1_file.c
index 5e0ee2b68b..98ce85acf9 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -3742,15 +3742,22 @@ int for_each_file_in_obj_subdir(int subdir_nr,
each_loose_subdir_fn subdir_cb,
void *data)
 {
-   size_t baselen = path->len;
-   DIR *dir = opendir(path->buf);
+   size_t origlen, baselen;
+   DIR *dir;
struct dirent *de;
int r = 0;
 
+   origlen = path->len;
+   strbuf_complete(path, '/');
+   strbuf_addf(path, "%02x", subdir_nr);
+   baselen = path->len;
+
+   dir = opendir(path->buf);
if (!dir) {
-   if (errno == ENOENT)
-   return 0;
-   return error_errno("unable to open %s", path->buf);
+   if (errno != ENOENT)
+   r = error_errno("unable to open %s", path->buf);
+   strbuf_setlen(path, origlen);
+   return r;
}
 
while ((de = readdir(dir))) {
@@ -3788,6 +3795,8 @@ int for_each_file_in_obj_subdir(int subdir_nr,
if (!r && subdir_cb)
r = subdir_cb(subdir_nr, path->buf, data);
 
+   strbuf_setlen(path, origlen);
+
return r;
 }
 
@@ -3797,15 +3806,12 @@ int for_each_loose_file_in_objdir_buf(struct strbuf 
*path,
each_loose_subdir_fn subdir_cb,
void *data)
 {
-   s

Re: [PATCH] sha1_name: cache readdir(3) results in find_short_object_filename()

2017-06-24 Thread René Scharfe
Am 23.06.2017 um 01:10 schrieb Jeff King:
> On Thu, Jun 22, 2017 at 08:19:48PM +0200, René Scharfe wrote:
> 
>> Read each loose object subdirectory at most once when looking for unique
>> abbreviated hashes.  This speeds up commands like "git log --pretty=%h"
>> considerably, which previously caused one readdir(3) call for each
>> candidate, even for subdirectories that were visited before.
> 
> Is it worth adding a perf test that's heavy on abbreviations?

Sure.  Here's a simple one.  It currently reports for me:

Testorigin/master origin/next  origin/pu
-
4205.1: log with %H 0.44(0.41+0.02)   0.43(0.42+0.01) -2.3%
0.43(0.43+0.00) -2.3%
4205.2: log with %h 1.03(0.60+0.42)   1.04(0.64+0.39) +1.0%
0.57(0.55+0.01) -44.7%
4205.3: log with %T 0.43(0.42+0.00)   0.43(0.42+0.01) +0.0%
0.43(0.40+0.02) +0.0%
4205.4: log with %t 1.05(0.64+0.38)   1.05(0.61+0.42) +0.0%
0.59(0.58+0.00) -43.8%
4205.5: log with %P 0.45(0.42+0.02)   0.43(0.42+0.00) -4.4%
0.43(0.42+0.01) -4.4%
4205.6: log with %p 1.21(0.63+0.57)   1.17(0.68+0.47) -3.3%
0.59(0.58+0.00) -51.2%
4205.7: log with %h-%h-%h   1.05(0.64+0.39)   2.00(0.83+1.15) +90.5%   
0.69(0.66+0.02) -34.3%

origin/next has fe9e2aefd4 (pretty: recalculate duplicate short hashes),
while origin/pu has cc817ca3ef (sha1_name: cache readdir(3) results in
find_short_object_filename()).

-- >8 --
Subject: [PATCH] p4205: add perf test script for pretty log formats

Add simple performance tests for expanded log format placeholders.

Suggested-by: Jeff King <p...@peff.net>
Signed-off-by: Rene Scharfe <l@web.de>
---
 t/perf/p4205-log-pretty-formats.sh | 16 
 1 file changed, 16 insertions(+)
 create mode 100755 t/perf/p4205-log-pretty-formats.sh

diff --git a/t/perf/p4205-log-pretty-formats.sh 
b/t/perf/p4205-log-pretty-formats.sh
new file mode 100755
index 00..7c26f4f337
--- /dev/null
+++ b/t/perf/p4205-log-pretty-formats.sh
@@ -0,0 +1,16 @@
+#!/bin/sh
+
+test_description='Tests the performance of various pretty format placeholders'
+
+. ./perf-lib.sh
+
+test_perf_default_repo
+
+for format in %H %h %T %t %P %p %h-%h-%h
+do
+   test_perf "log with $format" "
+   git log --format=\"$format\" >/dev/null
+   "
+done
+
+test_done
-- 
2.13.1



Re: [PATCH] xdiff: trim common tail with -U0 after diff

2017-06-23 Thread René Scharfe

Am 23.06.2017 um 12:36 schrieb Daniel Hahler:

When -U0 is used, trim_common_tail should be called after `xdl_diff`, so
that `--indent-heuristic` (and other diff processing) works as expected.

It also removes the check for `!(xecfg->flags & XDL_EMIT_FUNCCONTEXT)`
added in e0876bca4, which does not appear to be necessary anymore after
moving the trimming down.

This only adds a test to t4061-diff-indent.sh, but should also have one for
normal (i.e. non-experimental) diff mode probably?!

Ref: https://github.com/tomtom/quickfixsigns_vim/issues/74#issue-237900460
---
  t/t4061-diff-indent.sh | 15 +++
  xdiff-interface.c  |  7 ---
  2 files changed, 19 insertions(+), 3 deletions(-)

diff --git a/t/t4061-diff-indent.sh b/t/t4061-diff-indent.sh
index 2affd7a10..df3151393 100755
--- a/t/t4061-diff-indent.sh
+++ b/t/t4061-diff-indent.sh
@@ -116,6 +116,16 @@ test_expect_success 'prepare' '
 4
EOF
  
+	cat <<-\EOF >spaces-compacted-U0-expect &&

+   diff --git a/spaces.txt b/spaces.txt
+   --- a/spaces.txt
+   +++ b/spaces.txt
+   @@ -4,0 +5,3 @@ a
+   +b
+   +a
+   +
+   EOF
+
tr "_" " " <<-\EOF >functions-expect &&
diff --git a/functions.c b/functions.c
--- a/functions.c
@@ -184,6 +194,11 @@ test_expect_success 'diff: --indent-heuristic with 
--histogram' '
compare_diff spaces-compacted-expect out-compacted4
  '
  
+test_expect_success 'diff: --indent-heuristic with -U0' '

+   git diff -U0 --indent-heuristic old new -- spaces.txt >out-compacted5 &&
+   compare_diff spaces-compacted-U0-expect out-compacted5
+'
+
  test_expect_success 'diff: ugly functions' '
git diff --no-indent-heuristic old new -- functions.c >out &&
compare_diff functions-expect out


The changed test script passes just fine for me even without your change
to xdiff-interface.c, which is odd.  Do you have another way to
demonstrate the unexpected behavior?  And can someone replicate the
failure?

René


Re: [PATCH] strbuf: change an always NULL/"" strbuf_addftime() param to bool

2017-06-23 Thread René Scharfe

Am 23.06.2017 um 17:23 schrieb Jeff King:

On Fri, Jun 23, 2017 at 05:13:38PM +0200, Ævar Arnfjörð Bjarmason wrote:


The idea was that eventually the caller might be able to come up with a
TZ that is not blank, but is also not what strftime("%Z") would produce.
Conceivably that could be done if Git commits carried the "%Z"
information (not likely), or if we used a reverse-lookup table (also not
likely).

This closes the door on that.  Since we don't have immediate plans to go
that route, I'm OK with this patch. It would be easy enough to re-open
the door if we change our minds later.


Closes the door on doing that via passing the char * of the prepared
custom tz_name to strbuf_addftime().

I have a WIP patch (which may not make it on-list, depending) playing
with the idea I proposed in
CACBZZX5OQc45fUyDVayE89rkT=+8m5s4efsxcabcy7upme5...@mail.gmail.com which
just inserts the custom TZ name based on the offset inside that `if
(omit_strftime_tz_name)` branch.


OK. I'd assumed that would all happen outside of strbuf_addftime(). But
if it happens inside, then I agree a flag is better.


Oh, so the interface that was meant to allow better time zone names
without having to make strbuf_addftime() even bigger than it already is
turns out to be too ugly for its purpose?  I'm sorry. :(

René


Re: [PATCH] strbuf: change an always NULL/"" strbuf_addftime() param to bool

2017-06-23 Thread René Scharfe

Am 23.06.2017 um 17:25 schrieb Jeff King:

On Fri, Jun 23, 2017 at 05:20:19PM +0200, René Scharfe wrote:


diff --git a/strbuf.c b/strbuf.c
index be3b9e37b1..81ff3570e2 100644
--- a/strbuf.c
+++ b/strbuf.c
@@ -786,7 +786,7 @@ char *xstrfmt(const char *fmt, ...)
   }
   void strbuf_addftime(struct strbuf *sb, const char *fmt, const struct tm *tm,
-int tz_offset, const char *tz_name)
+int tz_offset, const int omit_strftime_tz_name)


Why const?  And as written above, naming the parameter local would make
it easier to understand instead of exposing an implementation detail in
the interface.


I think calling it "local" isn't right. That's a decision the _caller_
is making about whether to pass through %Z. But the actual
implementation is more like "should the function fill tzname based on
tz?" So some name along those lines would make sense.

In which case the caller would then pass "!mode->local" for the flag.


We only have a single caller currently, so responsibilities can still be
shifted, and it's a bit hard to draw the line.  "Here's a format and all
time information I have, expand!" is just as viable as "here's a format
and most time information, expand, and handle %Z in this particular way
when you see it!", I think.

René


Re: [PATCH] strbuf: change an always NULL/"" strbuf_addftime() param to bool

2017-06-23 Thread René Scharfe

Am 23.06.2017 um 16:46 schrieb Ævar Arnfjörð Bjarmason:

Change the code for deciding what's to be done about %Z to stop
passing always either a NULL or "" char * to
strbuf_addftime(). Instead pass a boolean int to indicate whether the
strftime() %Z format should be omitted, which is what this code is
actually doing.


"Omitting" sounds not quite right somehow.  We expand %Z to the empty
string because that's the best we can do -- which amounts to a removal,
but that's not the intent, just an implementation detail.  Calling it
"handling %Z internally" would be better, I think.


This code grew organically between the changes in 9eafe86d58 ("Merge
branch 'rs/strbuf-addftime-zZ'", 2017-06-22) yielding an end result
that wasn't very readable. Out of context it looked as though the call
to strbuf_addstr() might be adding a custom tz_name to the string, but
actually tz_name would always be "", so the call to strbuf_addstr()
just to add an empty string to the format was pointless.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
  date.c   | 2 +-
  strbuf.c | 5 ++---
  strbuf.h | 5 +++--
  3 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/date.c b/date.c
index 1fd6d66375..5f09743bad 100644
--- a/date.c
+++ b/date.c
@@ -256,7 +256,7 @@ const char *show_date(timestamp_t time, int tz, const 
struct date_mode *mode)
tm->tm_hour, tm->tm_min, tm->tm_sec, tz);
else if (mode->type == DATE_STRFTIME)
strbuf_addftime(, mode->strftime_fmt, tm, tz,
-   mode->local ? NULL : "");
+   mode->local ? 0 : 1);


I don't see how this is more readable -- both look about equally ugly to
me.  Passing mode->local unchanged would be better.


else
strbuf_addf(, "%.3s %.3s %d %02d:%02d:%02d %d%c%+05d",
weekday_names[tm->tm_wday],
diff --git a/strbuf.c b/strbuf.c
index be3b9e37b1..81ff3570e2 100644
--- a/strbuf.c
+++ b/strbuf.c
@@ -786,7 +786,7 @@ char *xstrfmt(const char *fmt, ...)
  }
  
  void strbuf_addftime(struct strbuf *sb, const char *fmt, const struct tm *tm,

-int tz_offset, const char *tz_name)
+int tz_offset, const int omit_strftime_tz_name)


Why const?  And as written above, naming the parameter local would make
it easier to understand instead of exposing an implementation detail in
the interface.


  {
struct strbuf munged_fmt = STRBUF_INIT;
size_t hint = 128;
@@ -815,8 +815,7 @@ void strbuf_addftime(struct strbuf *sb, const char *fmt, 
const struct tm *tm,
fmt++;
break;
case 'Z':
-   if (tz_name) {
-   strbuf_addstr(_fmt, tz_name);
+   if (omit_strftime_tz_name) {


Getting rid of this strbuf_addstr call is nice, but as Peff mentioned in
his reply it also reduces the flexibility of the function.  While it's
unlikely to be needed I'm not convinced that we should already block
this path (even though it could be easily reopened).


fmt++;
break;
}
diff --git a/strbuf.h b/strbuf.h
index 4559035c47..bad698058a 100644
--- a/strbuf.h
+++ b/strbuf.h
@@ -340,14 +340,15 @@ extern void strbuf_vaddf(struct strbuf *sb, const char 
*fmt, va_list ap);
  
  /**

   * Add the time specified by `tm`, as formatted by `strftime`.
- * `tz_name` is used to expand %Z internally unless it's NULL.
   * `tz_offset` is in decimal hhmm format, e.g. -600 means six hours west
   * of Greenwich, and it's used to expand %z internally.  However, tokens
   * with modifiers (e.g. %Ez) are passed to `strftime`.
+ * `omit_strftime_tz_name` when set, means don't let `strftime` format
+ * %Z, instead do our own formatting.
   */
  extern void strbuf_addftime(struct strbuf *sb, const char *fmt,
const struct tm *tm, int tz_offset,
-   const char *tz_name);
+   const int omit_strftime_tz_name);
  
  /**

   * Read a given size of data from a FILE* pointer to the buffer.



Re: [BUG] add_again() off-by-one error in custom format

2017-06-22 Thread René Scharfe

Am 18.06.2017 um 15:56 schrieb Jeff King:

On Sun, Jun 18, 2017 at 02:59:04PM +0200, René Scharfe wrote:


@@ -1586,6 +1587,9 @@ extern struct alternate_object_database {
struct strbuf scratch;
size_t base_len;
+   uint32_t loose_objects_subdir_bitmap[8];


Is it worth the complexity of having an actual bitmap and not just an
array of char? I guess it's not _that_ complex to access the bits, but
there are a lot of magic numbers involved.


That would be 224 bytes more per alternate_object_database, and we'd
gain simpler code.  Hmm.  We could add some bitmap helper macros, but
you're probably right that the first version should use the simplest
form for representing small bitmaps that we currently have.


I'd be OK with keeping it if we could reduce the number of magic
numbers. E.g,. rather than 32 elsewhere use:

   (sizeof(*loose_objects_subdir_bitmap) * CHAR_BIT)


We have a bitsizeof macro for that.


and similarly rather than 8 here use

   256 / sizeof(*loose_objects_subdir_bitmap) / CHAR_BIT


If we're pretending not to know the number of bits in a byte then we
need to round up, and we have DIV_ROUND_UP for that. :)


There's also already a bitmap data structure in ewah/bitmap.c. It's a
little bit overkill, though, because it mallocs and will grow the bitmap
as needed.


Yes, I feel that's too big a hammer for this purpose.

There is another example of a bitmap in shallow_c (just search for
"32").  It would benefit from the macros mentioned above.  That
might make it easier to switch to the native word size (unsigned int)
instead of using uint32_t everywhere.

But perhaps this one is actually a candidate for using EWAH, depending
on the number of refs the code is supposed to handle.


+static void read_loose_object_subdir(struct alternate_object_database *alt,
+int subdir_nr)


I think it's nice to pull this out into a helper function. I do wonder
if it should just be reusing for_each_loose_file_in_objdir(). You'd just
need to expose the in_obj_subdir() variant publicly.

It does do slightly more than we need (for the callbacks it actually
builds the filename), but I doubt memcpy()ing a few bytes would be
measurable.


Good point.  The function also copies the common first two hex digits
for each entry.  But all that extra work is certainly dwarfed by the
readdir calls.


Yes. You're welcome to micro-optimize that implementation if you want. ;)


My knee-jerk reaction was that this would lead to ugliness, but on
second look it might actually be doable.  Will check, but I don't
expect much of a speedup.

René


[PATCH] sha1_name: cache readdir(3) results in find_short_object_filename()

2017-06-22 Thread René Scharfe
Read each loose object subdirectory at most once when looking for unique
abbreviated hashes.  This speeds up commands like "git log --pretty=%h"
considerably, which previously caused one readdir(3) call for each
candidate, even for subdirectories that were visited before.

The new cache is kept until the program ends and never invalidated.  The
same is already true for pack indexes.  The inherent racy nature of
finding unique short hashes makes it still fit for this purpose -- a
conflicting new object may be added at any time.  Tasks with higher
consistency requirements should not use it, though.

The cached object names are stored in an oid_array, which is quite
compact.  The bitmap for remembering which subdir was already read is
stored as a char array, with one char per directory -- that's not quite
as compact, but really simple and incurs only an overhead equivalent to
11 hashes after all.

Suggested-by: Jeff King 
Helped-by: Jeff King 
Signed-off-by: Rene Scharfe 
---
 cache.h | 17 +
 sha1_file.c | 12 ++--
 sha1_name.c | 50 ++
 3 files changed, 53 insertions(+), 26 deletions(-)

diff --git a/cache.h b/cache.h
index d6ba8a2f11..00a017dfcb 100644
--- a/cache.h
+++ b/cache.h
@@ -11,6 +11,7 @@
 #include "string-list.h"
 #include "pack-revindex.h"
 #include "hash.h"
+#include "sha1-array.h"
 
 #ifndef platform_SHA_CTX
 /*
@@ -1593,6 +1594,16 @@ extern struct alternate_object_database {
struct strbuf scratch;
size_t base_len;
 
+   /*
+* Used to store the results of readdir(3) calls when searching
+* for unique abbreviated hashes.  This cache is never
+* invalidated, thus it's racy and not necessarily accurate.
+* That's fine for its purpose; don't use it for tasks requiring
+* greater accuracy!
+*/
+   char loose_objects_subdir_seen[256];
+   struct oid_array loose_objects_cache;
+
char path[FLEX_ARRAY];
 } *alt_odb_list;
 extern void prepare_alt_odb(void);
@@ -1811,6 +1822,12 @@ typedef int each_loose_cruft_fn(const char *basename,
 typedef int each_loose_subdir_fn(int nr,
 const char *path,
 void *data);
+int for_each_file_in_obj_subdir(int subdir_nr,
+   struct strbuf *path,
+   each_loose_object_fn obj_cb,
+   each_loose_cruft_fn cruft_cb,
+   each_loose_subdir_fn subdir_cb,
+   void *data);
 int for_each_loose_file_in_objdir(const char *path,
  each_loose_object_fn obj_cb,
  each_loose_cruft_fn cruft_cb,
diff --git a/sha1_file.c b/sha1_file.c
index 59a4ed2ed3..5e0ee2b68b 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -3735,12 +3735,12 @@ void assert_sha1_type(const unsigned char *sha1, enum 
object_type expect)
typename(expect));
 }
 
-static int for_each_file_in_obj_subdir(int subdir_nr,
-  struct strbuf *path,
-  each_loose_object_fn obj_cb,
-  each_loose_cruft_fn cruft_cb,
-  each_loose_subdir_fn subdir_cb,
-  void *data)
+int for_each_file_in_obj_subdir(int subdir_nr,
+   struct strbuf *path,
+   each_loose_object_fn obj_cb,
+   each_loose_cruft_fn cruft_cb,
+   each_loose_subdir_fn subdir_cb,
+   void *data)
 {
size_t baselen = path->len;
DIR *dir = opendir(path->buf);
diff --git a/sha1_name.c b/sha1_name.c
index 5126853bb5..ccb5144d0d 100644
--- a/sha1_name.c
+++ b/sha1_name.c
@@ -77,10 +77,19 @@ static void update_candidates(struct disambiguate_state 
*ds, const struct object
/* otherwise, current can be discarded and candidate is still good */
 }
 
+static int append_loose_object(const struct object_id *oid, const char *path,
+  void *data)
+{
+   oid_array_append(data, oid);
+   return 0;
+}
+
+static int match_sha(unsigned, const unsigned char *, const unsigned char *);
+
 static void find_short_object_filename(struct disambiguate_state *ds)
 {
+   int subdir_nr = ds->bin_pfx.hash[0];
struct alternate_object_database *alt;
-   char hex[GIT_MAX_HEXSZ];
static struct alternate_object_database *fakeent;
 
if (!fakeent) {
@@ -95,29 +104,30 @@ static void find_short_object_filename(struct 
disambiguate_state *ds)
}
fakeent->next = alt_odb_list;
 
-   xsnprintf(hex, sizeof(hex), "%.2s", ds->hex_pfx);
for (alt = fakeent; alt && !ds->ambiguous; alt = alt->next) {
-   struct 

Re: [BUG] add_again() off-by-one error in custom format

2017-06-18 Thread René Scharfe

Am 18.06.2017 um 13:49 schrieb Jeff King:

On Sun, Jun 18, 2017 at 12:58:37PM +0200, René Scharfe wrote:


An oid_array spends ca. 30 bytes per entry (20 bytes for a hash times
a factor of 1.5 from alloc_nr).  How many loose objects do we have to
expect?  30 MB for a million of them sounds not too bad, but can there
realistically be orders of magnitude more?


Good point. We gc by default at 6000 loose objects, and lots of thing
will suffer poor performance by the time you hit a million. So a little
extra space is probably not worth worrying about.


So here's a patch for caching loose object hashes in an oid_array
without a way to invalidate or release the cache.  It uses a single
oid_array for simplicity, but reads each subdirectory individually and
remembers that fact using a bitmap.


I like the direction of this very much.


@@ -1586,6 +1587,9 @@ extern struct alternate_object_database {
struct strbuf scratch;
size_t base_len;
  
+	uint32_t loose_objects_subdir_bitmap[8];


Is it worth the complexity of having an actual bitmap and not just an
array of char? I guess it's not _that_ complex to access the bits, but
there are a lot of magic numbers involved.


That would be 224 bytes more per alternate_object_database, and we'd
gain simpler code.  Hmm.  We could add some bitmap helper macros, but
you're probably right that the first version should use the simplest
form for representing small bitmaps that we currently have.


+   struct oid_array loose_objects_cache;


We should probably insert a comment here warning that the cache may go
out of date during the process run, and should only be used when that's
an acceptable tradeoff.


Then we need to offer a way for opting out.  Through a global variable?
(I'd rather reduce their number, but don't see how allow programs to
nicely toggle this cache otherwise.)


+static void read_loose_object_subdir(struct alternate_object_database *alt,
+int subdir_nr)


I think it's nice to pull this out into a helper function. I do wonder
if it should just be reusing for_each_loose_file_in_objdir(). You'd just
need to expose the in_obj_subdir() variant publicly.

It does do slightly more than we need (for the callbacks it actually
builds the filename), but I doubt memcpy()ing a few bytes would be
measurable.


Good point.  The function also copies the common first two hex digits
for each entry.  But all that extra work is certainly dwarfed by the
readdir calls.

René


Re: [BUG] add_again() off-by-one error in custom format

2017-06-18 Thread René Scharfe
Am 15.06.2017 um 15:25 schrieb Jeff King:
> On Thu, Jun 15, 2017 at 01:33:34PM +0200, René Scharfe wrote:
>> Can we trust object directory time stamps for cache invalidation?
> 
> I think it would work on POSIX-ish systems, since loose object changes
> always involve new files, not modifications of existing ones. I don't
> know if there are systems that don't update directory mtimes even for
> newly added files.
> 
> That would still be a stat() per request. I'd be curious how the timing
> compares to the opendir/readdir that happens now.

Modification times wouldn't be exact, as you mentioned above: An object
could be added just after checking the time stamp.  So perhaps we don't
need that, or a time-based invalidation suffices, e.g. we discard the
cache after five minutes or so?

Anyway, here's a patch for stat-based invalidation, on top of the other
one.  Array removal is really slow (hope I didn't sneak a bug in there,
but my confidence in this code isn't very high).  No locking is done;
parallel threads removing and adding entries could make a mess, but
that's not an issue for log.

Timings for "time git log --pretty=%h >/dev/null" in my git repository
with 5094 loose objects on Debian:

master   first patch  this patch
real0m1.065s 0m0.581s 0m0.633s
user0m0.648s 0m0.564s 0m0.580s
sys 0m0.412s 0m0.016s 0m0.052s


And on mingw with 227 loose objects:

master   first patch  this patch
real0m1.756s 0m0.546s 0m1.659s
user0m0.000s 0m0.000s 0m0.000s
sys 0m0.000s 0m0.000s 0m0.000s

So at least for Windows it would be really nice if we could avoid
calling stat..
---
 cache.h |  1 +
 sha1_name.c | 32 
 2 files changed, 29 insertions(+), 4 deletions(-)

diff --git a/cache.h b/cache.h
index 8c6748907b..386b09710d 100644
--- a/cache.h
+++ b/cache.h
@@ -1589,6 +1589,7 @@ extern struct alternate_object_database {
 
uint32_t loose_objects_subdir_bitmap[8];
struct oid_array loose_objects_cache;
+   time_t loose_objects_subdir_mtime[256];
 
char path[FLEX_ARRAY];
 } *alt_odb_list;
diff --git a/sha1_name.c b/sha1_name.c
index ae6a5c3abe..baecb10454 100644
--- a/sha1_name.c
+++ b/sha1_name.c
@@ -77,6 +77,24 @@ static void update_candidates(struct disambiguate_state *ds, 
const struct object
/* otherwise, current can be discarded and candidate is still good */
 }
 
+static void remove_subdir_entries(struct oid_array *array, int subdir_nr)
+{
+   struct object_id oid;
+   int start, end;
+
+   memset(, 0, sizeof(oid));
+   oid.hash[0] = subdir_nr;
+   start = oid_array_lookup(array, );
+   if (start < 0)
+   start = -1 - start;
+   end = start;
+   while (end < array->nr && array->oid[end].hash[0] == subdir_nr)
+   end++;
+   memmove(array->oid + start, array->oid + end,
+   (array->nr - end) * sizeof(array->oid[0]));
+   array->nr -= end - start;
+}
+
 static void read_loose_object_subdir(struct alternate_object_database *alt,
 int subdir_nr)
 {
@@ -86,15 +104,21 @@ static void read_loose_object_subdir(struct 
alternate_object_database *alt,
struct dirent *de;
size_t bitmap_index = subdir_nr / 32;
uint32_t bitmap_mask = 1 << (subdir_nr % 32);
-
-   if (alt->loose_objects_subdir_bitmap[bitmap_index] & bitmap_mask)
-   return;
-   alt->loose_objects_subdir_bitmap[bitmap_index] |= bitmap_mask;
+   struct stat st;
 
buf = alt_scratch_buf(alt);
strbuf_addf(buf, "%02x/", subdir_nr);
xsnprintf(hex, sizeof(hex), "%02x", subdir_nr);
 
+   stat(buf->buf, );
+   if (alt->loose_objects_subdir_bitmap[bitmap_index] & bitmap_mask) {
+   if (alt->loose_objects_subdir_mtime[subdir_nr] == st.st_mtime)
+   return;
+   remove_subdir_entries(>loose_objects_cache, subdir_nr);
+   }
+   alt->loose_objects_subdir_mtime[subdir_nr] = st.st_mtime;
+   alt->loose_objects_subdir_bitmap[bitmap_index] |= bitmap_mask;
+
dir = opendir(buf->buf);
if (!dir)
return;
-- 
2.13.0


Re: [BUG] add_again() off-by-one error in custom format

2017-06-18 Thread René Scharfe
Am 15.06.2017 um 15:25 schrieb Jeff King:
> On Thu, Jun 15, 2017 at 01:33:34PM +0200, René Scharfe wrote:
>>> I'm not really sure how, though, short of caching the directory
>>> contents. That opens up questions of whether and when to invalidate the
>>> cache. If the cache were _just_ about short hashes, it might be OK to
>>> just assume that it remains valid through the length of the program (so
>>> worst case, a simultaneous write might mean that we generate a sha1
>>> which just became ambiguous, but that's generally going to be racy
>>> anyway).
>>>
>>> The other downside of course is that we'd spend RAM on it. We could
>>> bound the size of the cache, I suppose.
>>
>> You mean like an in-memory pack index for loose objects?  In order to
>> avoid the readdir call in sha1_name.c::find_short_object_filename()?
>> We'd only need to keep the hashes of found objects.  An oid_array
>> would be quite compact.
> 
> Yes, that's what I was thinking of.

An oid_array spends ca. 30 bytes per entry (20 bytes for a hash times
a factor of 1.5 from alloc_nr).  How many loose objects do we have to
expect?  30 MB for a million of them sounds not too bad, but can there
realistically be orders of magnitude more?

>> Non-racy writes inside a process should not be ignored (write, read
>> later) -- e.g. checkout does something like that.
> 
> Ideally, yes. Though thinking on this more, it's racy even today,
> because we rely on the in-memory packed_git list. We usually re-check the
> directory only when we look for an object and it's missing. So any new
> packs which have been added while the process runs will be missed when
> doing short-sha1 lookups (and actually, find_short_packed_object does
> not even seem to do the usual retry-on-miss).
> 
> A normal process like "checkout" isn't writing new packs, but a
> simultaneous repack could be migrating its new objects to a pack behind
> its back. (It also _can_ write packs, but only for large blobs).
> 
> Given that we are talking about finding "unique" abbreviations here, and
> that those abbreviations can become invalidated immediately anyway as
> more objects are added (even by the same process), I'm not sure we need
> to strive for absolute accuracy.

Agreed.  And processes that add objects and use them later probably use
full hashes (at least checkout does).

So here's a patch for caching loose object hashes in an oid_array
without a way to invalidate or release the cache.  It uses a single
oid_array for simplicity, but reads each subdirectory individually and
remembers that fact using a bitmap.
---
 cache.h |  4 
 sha1_name.c | 69 +++--
 2 files changed, 53 insertions(+), 20 deletions(-)

diff --git a/cache.h b/cache.h
index 4d92aae0e8..8c6748907b 100644
--- a/cache.h
+++ b/cache.h
@@ -11,6 +11,7 @@
 #include "string-list.h"
 #include "pack-revindex.h"
 #include "hash.h"
+#include "sha1-array.h"
 
 #ifndef platform_SHA_CTX
 /*
@@ -1586,6 +1587,9 @@ extern struct alternate_object_database {
struct strbuf scratch;
size_t base_len;
 
+   uint32_t loose_objects_subdir_bitmap[8];
+   struct oid_array loose_objects_cache;
+
char path[FLEX_ARRAY];
 } *alt_odb_list;
 extern void prepare_alt_odb(void);
diff --git a/sha1_name.c b/sha1_name.c
index 5126853bb5..ae6a5c3abe 100644
--- a/sha1_name.c
+++ b/sha1_name.c
@@ -77,10 +77,46 @@ static void update_candidates(struct disambiguate_state 
*ds, const struct object
/* otherwise, current can be discarded and candidate is still good */
 }
 
+static void read_loose_object_subdir(struct alternate_object_database *alt,
+int subdir_nr)
+{
+   struct strbuf *buf;
+   char hex[GIT_MAX_HEXSZ];
+   DIR *dir;
+   struct dirent *de;
+   size_t bitmap_index = subdir_nr / 32;
+   uint32_t bitmap_mask = 1 << (subdir_nr % 32);
+
+   if (alt->loose_objects_subdir_bitmap[bitmap_index] & bitmap_mask)
+   return;
+   alt->loose_objects_subdir_bitmap[bitmap_index] |= bitmap_mask;
+
+   buf = alt_scratch_buf(alt);
+   strbuf_addf(buf, "%02x/", subdir_nr);
+   xsnprintf(hex, sizeof(hex), "%02x", subdir_nr);
+
+   dir = opendir(buf->buf);
+   if (!dir)
+   return;
+
+   while ((de = readdir(dir)) != NULL) {
+   struct object_id oid;
+
+   if (strlen(de->d_name) != GIT_SHA1_HEXSZ - 2)
+   continue;
+   memcpy(hex + 2, de->d_name, GIT_SHA1_HEXSZ - 2);
+   if (!get_oid_hex(hex, ))
+   oid_array_append(>loose_objects_cache, );
+   }
+   closedir(dir);
+}

Re: [PATCH 2/2] date: use localtime() for "-local" time formats

2017-06-15 Thread René Scharfe

Am 15.06.2017 um 15:52 schrieb Jeff King:

But for the special case of the "-local" formats, we can
just skip the adjustment and use localtime() instead of
gmtime(). This makes --date=format-local:%Z work correctly,
showing the local timezone instead of an empty string.


Documentation/rev-list-options.txt should be updated to mention that %Z
is passed to strftime in the local case, no?


The new test checks the result for "UTC", our default
test-lib value for $TZ. Using something like EST5 might be
more interesting, but the actual zone string is
system-dependent (for instance, on my system it expands to
just EST). Hopefully "UTC" is vanilla enough that every
system treats it the same.

Signed-off-by: Jeff King 
---
I don't have a Windows system to test this on, but from the output Dscho
provided earlier, I believe this should pass.


The first patch applies with some fuzz on master of Git for Windows, the
second one applies cleanly.  A "typedef unsigned long timestamp_t;" is
required to compile it; such a fixup won't be needed for long, I guess.
t0006 succeeds.

René


Re: [PATCH] checkout: don't write merge results into the object database

2017-06-15 Thread René Scharfe
Am 15.06.2017 um 15:57 schrieb Jeff King:
> On Thu, Jun 15, 2017 at 01:33:42PM +0200, René Scharfe wrote:
> 
>> Merge results need to be written to the worktree, of course, but we
>> don't necessarily need object entries for them, especially if they
>> contain conflict markers.  Use pretend_sha1_file() to create fake
>> blobs to pass to make_cache_entry() and checkout_entry() instead.
> 
> Conceptually this makes sense, although I have a comment below.
> 
> Out of curiosity, did this come up when looking into the cherry-pick
> segfault/error from a few hours ago?

No, it came up in the discussion of Dscho's memory leak plug series [1].

>> @@ -225,8 +225,8 @@ static int checkout_merged(int pos, const struct 
>> checkout *state)
>>   * (it also writes the merge result to the object database even
>>   * when it may contain conflicts).
>>   */
>> -if (write_sha1_file(result_buf.ptr, result_buf.size,
>> -blob_type, oid.hash))
>> +if (pretend_sha1_file(result_buf.ptr, result_buf.size,
>> +  OBJ_BLOB, oid.hash))
>>  die(_("Unable to add merge result for '%s'"), path);
>>  free(result_buf.ptr);
> 
> I wondered if pretend_sha1_file() makes a copy of the buffer, and indeed
> it does. So this is correct.
> 
> But that raises an interesting question: how big are these objects, and
> is it a good idea to store them in RAM? Obviously they're in RAM
> already, but I'm not sure if it's one-at-a-time. We could be bumping the
> peak RAM used if there's a large number of these entries. Touching the
> on-disk odb does feel hacky, but it also means they behave like other
> objects.
> 
> If it is a concern, I think it could be solved by "unpretending" after
> our call to checkout_entry completes. That would need a new call in
> sha1_file.c, but it should be easy to write.

Good point; we'd accumulate fake entries that we'll never need again.
The patch should clean them up.

Alternatively we could finally address the NEEDSWORK comment and
provide a way to checkout a file without faking an index entry..

René


[1] 
https://public-inbox.org/git/2704e145927c851c4163a68cfdfd5ada48fff21d.1493906085.git.johannes.schinde...@gmx.de/


[PATCH v3] strbuf: let strbuf_addftime handle %z and %Z itself

2017-06-15 Thread René Scharfe
There is no portable way to pass timezone information to strftime.  Add
parameters for timezone offset and name to strbuf_addftime and let it
handle the timezone-related format specifiers %z and %Z internally.

Callers can opt out for %Z by passing NULL as timezone name.  %z is
always handled internally -- this helps on Windows, where strftime would
expand it to a timezone name (same as %Z), in violation of POSIX.
Modifiers are not handled, e.g. %Ez is still passed to strftime.

Use an empty string as timezone name in show_date (the only current
caller) for now because we only have the timezone offset in non-local
mode.  POSIX allows %Z to resolve to an empty string in case of missing
information.

Helped-by: Ulrich Mueller 
Helped-by: Jeff King 
Signed-off-by: Rene Scharfe 
---
Changes from v3:
- Updated developer documentation in strbuf.h.
- Added short note to user documentation.

 Documentation/rev-list-options.txt |  3 ++-
 date.c |  2 +-
 strbuf.c   | 41 ++
 strbuf.h   | 10 --
 t/t0006-date.sh|  6 ++
 5 files changed, 54 insertions(+), 8 deletions(-)

diff --git a/Documentation/rev-list-options.txt 
b/Documentation/rev-list-options.txt
index a46f70c2b1..34ae2553f1 100644
--- a/Documentation/rev-list-options.txt
+++ b/Documentation/rev-list-options.txt
@@ -768,7 +768,8 @@ timezone value.
 1970).  As with `--raw`, this is always in UTC and therefore `-local`
 has no effect.
 +
-`--date=format:...` feeds the format `...` to your system `strftime`.
+`--date=format:...` feeds the format `...` to your system `strftime`,
+except for %z and %Z, which are handled internally.
 Use `--date=format:%c` to show the date in your system locale's
 preferred format.  See the `strftime` manual for a complete list of
 format placeholders. When using `-local`, the correct syntax is
diff --git a/date.c b/date.c
index 63fa99685e..5580577334 100644
--- a/date.c
+++ b/date.c
@@ -246,7 +246,7 @@ const char *show_date(timestamp_t time, int tz, const 
struct date_mode *mode)
month_names[tm->tm_mon], tm->tm_year + 1900,
tm->tm_hour, tm->tm_min, tm->tm_sec, tz);
else if (mode->type == DATE_STRFTIME)
-   strbuf_addftime(, mode->strftime_fmt, tm);
+   strbuf_addftime(, mode->strftime_fmt, tm, tz, "");
else
strbuf_addf(, "%.3s %.3s %d %02d:%02d:%02d %d%c%+05d",
weekday_names[tm->tm_wday],
diff --git a/strbuf.c b/strbuf.c
index 00457940cf..be3b9e37b1 100644
--- a/strbuf.c
+++ b/strbuf.c
@@ -785,14 +785,48 @@ char *xstrfmt(const char *fmt, ...)
return ret;
 }
 
-void strbuf_addftime(struct strbuf *sb, const char *fmt, const struct tm *tm)
+void strbuf_addftime(struct strbuf *sb, const char *fmt, const struct tm *tm,
+int tz_offset, const char *tz_name)
 {
+   struct strbuf munged_fmt = STRBUF_INIT;
size_t hint = 128;
size_t len;
 
if (!*fmt)
return;
 
+   /*
+* There is no portable way to pass timezone information to
+* strftime, so we handle %z and %Z here.
+*/
+   for (;;) {
+   const char *percent = strchrnul(fmt, '%');
+   strbuf_add(_fmt, fmt, percent - fmt);
+   if (!*percent)
+   break;
+   fmt = percent + 1;
+   switch (*fmt) {
+   case '%':
+   strbuf_addstr(_fmt, "%%");
+   fmt++;
+   break;
+   case 'z':
+   strbuf_addf(_fmt, "%+05d", tz_offset);
+   fmt++;
+   break;
+   case 'Z':
+   if (tz_name) {
+   strbuf_addstr(_fmt, tz_name);
+   fmt++;
+   break;
+   }
+   /* FALLTHROUGH */
+   default:
+   strbuf_addch(_fmt, '%');
+   }
+   }
+   fmt = munged_fmt.buf;
+
strbuf_grow(sb, hint);
len = strftime(sb->buf + sb->len, sb->alloc - sb->len, fmt, tm);
 
@@ -804,17 +838,16 @@ void strbuf_addftime(struct strbuf *sb, const char *fmt, 
const struct tm *tm)
 * output contains at least one character, and then drop the 
extra
 * character before returning.
 */
-   struct strbuf munged_fmt = STRBUF_INIT;
-   strbuf_addf(_fmt, "%s ", fmt);
+   strbuf_addch(_fmt, ' ');
while (!len) {
hint *= 2;
strbuf_grow(sb, hint);
len = strftime(sb->buf + sb->len, sb->alloc - sb->len,
   

Re: [PATCH v2] strbuf: let strbuf_addftime handle %z and %Z itself

2017-06-15 Thread René Scharfe

Am 15.06.2017 um 13:27 schrieb Ulrich Mueller:

On Thu, 15 Jun 2017, René Scharfe wrote:



Callers can opt out for %Z by passing NULL as timezone name.  %z is
always handled internally -- this helps on Windows, where strftime would
expand it to a timezone name (same as %Z), in violation of POSIX.
Modifiers are not handled, e.g. %Ez is still passed to strftime.


POSIX would also allow other things, like a field width:
http://pubs.opengroup.org/onlinepubs/9699919799/functions/strftime.html

$ date '+%8z'
+200

(But I believe that's not very useful, and supporting it might require
duplicating much of strftime's code.)


Windows doesn't support that (unsurprisingly), but it accepts %#z,
which does the same as %z.  Let's wait for someone to request support
for modifiers and just document the behavior for now.


Changes from v1:
- Always handle %z internally.


Minor nitpick: Shouldn't the comment in strbuf.h be updated to reflect
that change?


+ * Add the time specified by `tm`, as formatted by `strftime`.  `tz_offset`
+ * and `tz_name` are used to expand %z and %Z internally, unless `tz_name`
+ * is NULL.  `tz_offset` is in decimal hhmm format, e.g. -600 means six
+ * hours west of Greenwich.


Yes, it should.  Thanks for paying attention! :)

René


[PATCH] checkout: don't write merge results into the object database

2017-06-15 Thread René Scharfe
Merge results need to be written to the worktree, of course, but we
don't necessarily need object entries for them, especially if they
contain conflict markers.  Use pretend_sha1_file() to create fake
blobs to pass to make_cache_entry() and checkout_entry() instead.

Signed-off-by: Rene Scharfe 
---
 builtin/checkout.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/builtin/checkout.c b/builtin/checkout.c
index 1624eed7e7..f51c15af9f 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -215,7 +215,7 @@ static int checkout_merged(int pos, const struct checkout 
*state)
 
/*
 * NEEDSWORK:
-* There is absolutely no reason to write this as a blob object
+* There is absolutely no reason to build a fake blob object
 * and create a phony cache entry.  This hack is primarily to get
 * to the write_entry() machinery that massages the contents to
 * work-tree format and writes out which only allows it for a
@@ -225,8 +225,8 @@ static int checkout_merged(int pos, const struct checkout 
*state)
 * (it also writes the merge result to the object database even
 * when it may contain conflicts).
 */
-   if (write_sha1_file(result_buf.ptr, result_buf.size,
-   blob_type, oid.hash))
+   if (pretend_sha1_file(result_buf.ptr, result_buf.size,
+ OBJ_BLOB, oid.hash))
die(_("Unable to add merge result for '%s'"), path);
free(result_buf.ptr);
ce = make_cache_entry(mode, oid.hash, path, 2, 0);
-- 
2.13.0



Re: [BUG] add_again() off-by-one error in custom format

2017-06-15 Thread René Scharfe

Am 15.06.2017 um 07:56 schrieb Jeff King:

One interesting thing is that the cost of finding short hashes very much
depends on your loose object setup. I timed:

   git log --format=%H >/dev/null

versus

   git log --format=%h >/dev/null

on git.git. It went from about 400ms to about 800ms. But then I noticed
I had a lot of loose object directories, and ran "git gc --prune=now".
Afterwards, my timings were more like 380ms and 460ms.

The difference is that in the "before" case, we actually opened each
directory and ran getdents(). But after gc, the directories are gone
totally and open() fails. We also have to do a linear walk through the
objects in each directory, since the contents are sorted.


Do you mean "unsorted"?


So I wonder if it is worth trying to optimize the short-sha1 computation
in the first place. Double-%h aside, that would make _everything_
faster, including --oneline.


Right.


I'm not really sure how, though, short of caching the directory
contents. That opens up questions of whether and when to invalidate the
cache. If the cache were _just_ about short hashes, it might be OK to
just assume that it remains valid through the length of the program (so
worst case, a simultaneous write might mean that we generate a sha1
which just became ambiguous, but that's generally going to be racy
anyway).

The other downside of course is that we'd spend RAM on it. We could
bound the size of the cache, I suppose.


You mean like an in-memory pack index for loose objects?  In order to
avoid the readdir call in sha1_name.c::find_short_object_filename()?
We'd only need to keep the hashes of found objects.  An oid_array
would be quite compact.

Non-racy writes inside a process should not be ignored (write, read
later) -- e.g. checkout does something like that.

Can we trust object directory time stamps for cache invalidation?

René


Re: rs/strbuf-addftime-zZ, was Re: What's cooking in git.git (Jun 2017, #04; Tue, 13)

2017-06-15 Thread René Scharfe

Am 15.06.2017 um 07:42 schrieb Jeff King:

On Thu, Jun 15, 2017 at 01:03:29AM +0200, René Scharfe wrote:

But there's more.  strftime on Windows doesn't support common POSIX-
defined tokens like %F (%Y-%m-%d) and %T (%H:%M:%S). We could handle
them as well.  Do we want that?  At least we'd have to update the
added test that uses them..

Here's the full list of tokens in POSIX [1], but not supported by
Windows [2]: %C, %D, %F, %G, %R, %T, %V, %e, %g, %h, %n, %r, %t, %u
plus the modifiers %E and %O.


I don't have a real opinion on that. The point of adding strftime was
always to give the user access to whatever their system supports. In
particular "%c" which we cannot emulate ourselves.

If people want support for those other things on platforms that don't
have it, I have no real objection. But I also don't know that it's worth
spending time on if nobody is asking for it.


Agreed; let's make the tests more focused (i.e. not exercise %F and %T
needlessly).

René


[PATCH v2] strbuf: let strbuf_addftime handle %z and %Z itself

2017-06-15 Thread René Scharfe
There is no portable way to pass timezone information to strftime.  Add
parameters for timezone offset and name to strbuf_addftime and let it
handle the timezone-related format specifiers %z and %Z internally.

Callers can opt out for %Z by passing NULL as timezone name.  %z is
always handled internally -- this helps on Windows, where strftime would
expand it to a timezone name (same as %Z), in violation of POSIX.
Modifiers are not handled, e.g. %Ez is still passed to strftime.

Use an empty string as timezone name in show_date (the only current
caller) for now because we only have the timezone offset in non-local
mode.  POSIX allows %Z to resolve to an empty string in case of missing
information.

Helped-by: Ulrich Mueller 
Helped-by: Jeff King 
Signed-off-by: Rene Scharfe 
---
Changes from v1:
- Always handle %z internally.
- Move tests from t6006 to t0006, as that's a more appropriate place.
- Changed tests to only use %%, %z and %Z to avoid incompatibilities.
- Tested on mingw (applies there with patch and some fuzz).

 date.c  |  2 +-
 strbuf.c| 41 +
 strbuf.h| 11 ---
 t/t0006-date.sh |  6 ++
 4 files changed, 52 insertions(+), 8 deletions(-)

diff --git a/date.c b/date.c
index 63fa99685e..5580577334 100644
--- a/date.c
+++ b/date.c
@@ -246,7 +246,7 @@ const char *show_date(timestamp_t time, int tz, const 
struct date_mode *mode)
month_names[tm->tm_mon], tm->tm_year + 1900,
tm->tm_hour, tm->tm_min, tm->tm_sec, tz);
else if (mode->type == DATE_STRFTIME)
-   strbuf_addftime(, mode->strftime_fmt, tm);
+   strbuf_addftime(, mode->strftime_fmt, tm, tz, "");
else
strbuf_addf(, "%.3s %.3s %d %02d:%02d:%02d %d%c%+05d",
weekday_names[tm->tm_wday],
diff --git a/strbuf.c b/strbuf.c
index 00457940cf..be3b9e37b1 100644
--- a/strbuf.c
+++ b/strbuf.c
@@ -785,14 +785,48 @@ char *xstrfmt(const char *fmt, ...)
return ret;
 }
 
-void strbuf_addftime(struct strbuf *sb, const char *fmt, const struct tm *tm)
+void strbuf_addftime(struct strbuf *sb, const char *fmt, const struct tm *tm,
+int tz_offset, const char *tz_name)
 {
+   struct strbuf munged_fmt = STRBUF_INIT;
size_t hint = 128;
size_t len;
 
if (!*fmt)
return;
 
+   /*
+* There is no portable way to pass timezone information to
+* strftime, so we handle %z and %Z here.
+*/
+   for (;;) {
+   const char *percent = strchrnul(fmt, '%');
+   strbuf_add(_fmt, fmt, percent - fmt);
+   if (!*percent)
+   break;
+   fmt = percent + 1;
+   switch (*fmt) {
+   case '%':
+   strbuf_addstr(_fmt, "%%");
+   fmt++;
+   break;
+   case 'z':
+   strbuf_addf(_fmt, "%+05d", tz_offset);
+   fmt++;
+   break;
+   case 'Z':
+   if (tz_name) {
+   strbuf_addstr(_fmt, tz_name);
+   fmt++;
+   break;
+   }
+   /* FALLTHROUGH */
+   default:
+   strbuf_addch(_fmt, '%');
+   }
+   }
+   fmt = munged_fmt.buf;
+
strbuf_grow(sb, hint);
len = strftime(sb->buf + sb->len, sb->alloc - sb->len, fmt, tm);
 
@@ -804,17 +838,16 @@ void strbuf_addftime(struct strbuf *sb, const char *fmt, 
const struct tm *tm)
 * output contains at least one character, and then drop the 
extra
 * character before returning.
 */
-   struct strbuf munged_fmt = STRBUF_INIT;
-   strbuf_addf(_fmt, "%s ", fmt);
+   strbuf_addch(_fmt, ' ');
while (!len) {
hint *= 2;
strbuf_grow(sb, hint);
len = strftime(sb->buf + sb->len, sb->alloc - sb->len,
   munged_fmt.buf, tm);
}
-   strbuf_release(_fmt);
len--; /* drop munged space */
}
+   strbuf_release(_fmt);
strbuf_setlen(sb, sb->len + len);
 }
 
diff --git a/strbuf.h b/strbuf.h
index 80047b1bb7..39d5836abd 100644
--- a/strbuf.h
+++ b/strbuf.h
@@ -339,9 +339,14 @@ __attribute__((format (printf,2,0)))
 extern void strbuf_vaddf(struct strbuf *sb, const char *fmt, va_list ap);
 
 /**
- * Add the time specified by `tm`, as formatted by `strftime`.
- */
-extern void strbuf_addftime(struct strbuf *sb, const char *fmt, const struct 
tm *tm);
+ * Add the time specified by `tm`, as formatted by `strftime`.  `tz_offset`
+ * and 

Re: rs/strbuf-addftime-zZ, was Re: What's cooking in git.git (Jun 2017, #04; Tue, 13)

2017-06-14 Thread René Scharfe
Am 14.06.2017 um 23:04 schrieb Johannes Schindelin:
> On Wed, 14 Jun 2017, René Scharfe wrote:
> 
>> Does someone actually expect %z to show time zone names instead of
>> offsets on Windows?
> 
> Not me ;-)
> 
> I cannot speak for anyone else, as I lack that information, though.

Before the patch %z would always expand to + on Linux and to the
name of the local time zone on Windows, no matter which offset was
actually given.  So it was broken in either case (even though it got
at least some aspects right by accident for some commits).  Based on
that I'd think handling %z internally should be OK.

But there's more.  strftime on Windows doesn't support common POSIX-
defined tokens like %F (%Y-%m-%d) and %T (%H:%M:%S). We could handle
them as well.  Do we want that?  At least we'd have to update the
added test that uses them..

Here's the full list of tokens in POSIX [1], but not supported by
Windows [2]: %C, %D, %F, %G, %R, %T, %V, %e, %g, %h, %n, %r, %t, %u
plus the modifiers %E and %O.

René


[1] http://pubs.opengroup.org/onlinepubs/009695399/functions/strftime.html
[2] https://msdn.microsoft.com/en-us/library/fe06s4ak.aspx


Re: rs/strbuf-addftime-zZ, was Re: What's cooking in git.git (Jun 2017, #04; Tue, 13)

2017-06-14 Thread René Scharfe

Am 14.06.2017 um 13:10 schrieb Jeff King:

On Wed, Jun 14, 2017 at 12:57:06PM +0200, Johannes Schindelin wrote:

But even then, it fails in t0006 on Windows with this error:

-- snip --
++ eval 'diff -u "$@" '
+++ diff -u expect actual
--- expect  2017-06-14 10:53:40.126136900 +
+++ actual  2017-06-14 10:53:40.171146800 +
@@ -1 +1 @@
-146600 +0200 -> 2016-06-15 14:13:20 + (UTC)
+146600 +0200 -> 2016-06-15 14:13:20 UTC (UTC)


Ugh, I was worried about that some systems might display timezones
differently (that's why I _didn't_ check %Z in the EST5 case). But I
must admit this was not an incompatibility I was expecting. It looks
like your system strftime() turns %z into "UTC". POSIX says:

   %z
 Replaced by the offset from UTC in the ISO 8601:2000 standard format
 (+hhmm or -hhmm), or by no characters if no timezone is
 determinable.

So it seems like the mingw strftime is violating POSIX. I don't see an
easy solution beyond marking this as !MINGW. Though if we wanted a
partial test, we could test %z and %Z separately.


Hmm.  The patches currently either let strftime handle both %z and %Z
(in the local case) or handle both internally.  Perhaps we need a third
option, namely to handle %z internally in all cases for systems whose
implementation violates POSIX?  Nah, it would be easier to always handle
%z internally.  Any downsides?  Does someone actually expect %z to show
time zone names instead of offsets on Windows?

René


Re: [BUG] add_again() off-by-one error in custom format

2017-06-14 Thread René Scharfe
Am 13.06.2017 um 23:20 schrieb Junio C Hamano:
> René Scharfe <l@web.de> writes:
> 
>> The difference is about the same as the one between:
>>
>>  $ time git log --format="" >/dev/null
>>
>>  real0m0.463s
>>  user0m0.448s
>>  sys 0m0.012s
>>
>> and:
>>
>>  $ time git log --format="%h" >/dev/null
>>
>>  real0m1.062s
>>  user0m0.636s
>>  sys 0m0.416s
>>
>> With caching duplicates are basically free and without it short
>> hashes have to be looked up again.  Other placeholders may reduce
>> the relative slowdown, depending on how expensive they are.
> 
> I think the real question is how likely people use more than one
> occurrence of the same thing in their custom format, and how deeply
> they care that --format='%h %h' costs more than --format='%h'.  The
> cost won't of course be double (because the main traversal costs
> without any output), but it would be rather unreasonable to expect
> that --format='%h %h %h %h %h' to cost the same as --format='%h';
> after all, Git is doing more for them ;-)

The answer to the first half is obviously "very likely" -- otherwise
this bug wouldn't have been found, right? :)

Regarding the question of how bad a 50% slowdown for a second %h
would be: No idea.  If ran interactively it may not even be noticeable
because the user can read the first few lines in less while the rest
is prepared in the background.  We don't have a perf test for formats
with duplicate short hashes, so we don't promise anything, right? :)

René

-- >8 --
Subject: [PATCH] pretty: recalculate duplicate short hashes

b9c6232138 (--format=pretty: avoid calculating expensive expansions
twice) optimized adding short hashes multiple times by using the
fact that the output strbuf was only ever simply appended to and
copying the added string from the previous run.  That prerequisite
is no longer given; we now have modfiers like %< and %+ that can
cause the cache to lose track of the correct offsets.  Remove it.

Reported-by: Michael Giuffrida <michae...@chromium.org>
Signed-off-by: Rene Scharfe <l@web.de>
---
I'm sending this out in the hope that there might be a simple way
to fix it after all, like Gábor's patch does for %+.  %< and %>
seem to be the only other problematic modifiers for now -- I'm
actually surprised that %w seems to be OK.

 pretty.c | 32 
 strbuf.c |  7 ---
 strbuf.h |  6 --
 3 files changed, 45 deletions(-)

diff --git a/pretty.c b/pretty.c
index 09701bd2ff..cc099dfdd1 100644
--- a/pretty.c
+++ b/pretty.c
@@ -783,29 +783,9 @@ struct format_commit_context {
size_t body_off;
 
/* The following ones are relative to the result struct strbuf. */
-   struct chunk abbrev_commit_hash;
-   struct chunk abbrev_tree_hash;
-   struct chunk abbrev_parent_hashes;
size_t wrap_start;
 };
 
-static int add_again(struct strbuf *sb, struct chunk *chunk)
-{
-   if (chunk->len) {
-   strbuf_adddup(sb, chunk->off, chunk->len);
-   return 1;
-   }
-
-   /*
-* We haven't seen this chunk before.  Our caller is surely
-* going to add it the hard way now.  Remember the most likely
-* start of the to-be-added chunk: the current end of the
-* struct strbuf.
-*/
-   chunk->off = sb->len;
-   return 0;
-}
-
 static void parse_commit_header(struct format_commit_context *context)
 {
const char *msg = context->message;
@@ -1147,24 +1127,16 @@ static size_t format_commit_one(struct strbuf *sb, /* 
in UTF-8 */
return 1;
case 'h':   /* abbreviated commit hash */
strbuf_addstr(sb, diff_get_color(c->auto_color, DIFF_COMMIT));
-   if (add_again(sb, >abbrev_commit_hash)) {
-   strbuf_addstr(sb, diff_get_color(c->auto_color, 
DIFF_RESET));
-   return 1;
-   }
strbuf_add_unique_abbrev(sb, commit->object.oid.hash,
 c->pretty_ctx->abbrev);
strbuf_addstr(sb, diff_get_color(c->auto_color, DIFF_RESET));
-   c->abbrev_commit_hash.len = sb->len - c->abbrev_commit_hash.off;
return 1;
case 'T':   /* tree hash */
strbuf_addstr(sb, oid_to_hex(>tree->object.oid));
return 1;
case 't':   /* abbreviated tree hash */
-   if (add_again(sb, >abbrev_tree_hash))
-   return 1;
strbuf_add_unique_abbrev(sb, commit->tree->object.oid.hash,
 c->pretty_ctx->abbrev);
-

Re: [BUG] add_again() off-by-one error in custom format

2017-06-14 Thread René Scharfe

Am 14.06.2017 um 00:24 schrieb SZEDER Gábor:

[sorry for double post, forgot the mailing list...]

To throw in a fourth option, this one adjusts the expansions' cached
offsets when the magic makes it necessary.  It's not necessary for
'%-', because it only makes a difference when the expansion is empty,
and in that case

   - add_again() doesn't consider it cached,
   - and even if it did, the offset of a zero-length string wouldn't
 matter.

  pretty.c | 32 +---
  1 file changed, 21 insertions(+), 11 deletions(-)


There are other modifiers, e.g. try the format '%h%>(20)%h' -- its
output is obviously wrong (contains control characters), even with your
patch.  We'd have to update the offsets for each placeholder that
changes the output of others.  I don't think that approach scales, alas.

René


Re: [BUG] add_again() off-by-one error in custom format

2017-06-13 Thread René Scharfe

Am 13.06.2017 um 20:29 schrieb Junio C Hamano:

René Scharfe <l@web.de> writes:


Indeed, a very nice bug report!


I think the call to format_commit_one() needs to be taught to pass
'magic' through, and then add_again() mechanism needs to be told not
to cache when magic is in effect, or something like that.

Perhaps something along this line...?

   pretty.c | 64 
++--
   1 file changed, 38 insertions(+), 26 deletions(-)


That looks quite big.  You even invent a way to classify magic.


Hmph, by "a way to classify", do you mean the enum?  That thing was
there from before, just moved.


Oh, yes, sorry.  I didn't even get that far into the patch.  (I'll
better go to bed after hitting send..)


Also I think we do not have to change add_again() at all, because
chunk->off tolerates being given a garbage value as long as
chunk->len stays 0, and add_again() does not change chunk->len at
all.

Which cuts the diffstat down to

  pretty.c | 40 +---
  1 file changed, 25 insertions(+), 15 deletions(-)


Does the caching feature justify the added complexity?


That's a good question.  I thought about your second alternative
while adding the "don't cache" thing, but if we can measure and find
out that we are not gaining much from caching, certainly that sounds
much simpler.


The difference is about the same as the one between:

$ time git log --format="" >/dev/null

real0m0.463s
user0m0.448s
sys 0m0.012s

and:

$ time git log --format="%h" >/dev/null

real0m1.062s
user0m0.636s
sys 0m0.416s

With caching duplicates are basically free and without it short
hashes have to be looked up again.  Other placeholders may reduce
the relative slowdown, depending on how expensive they are.

Forgot a third option, probably because it's not a particularly good
idea: Replacing the caching in pretty.c with a short static cache in
find_unique_abbrev_r().

René


Re: [BUG] add_again() off-by-one error in custom format

2017-06-13 Thread René Scharfe

Am 13.06.2017 um 00:49 schrieb Junio C Hamano:

Michael Giuffrida  writes:


For the abbreviated commit hash placeholder ('h'), pretty.c uses
add_again() to cache the result of the expansion, and then uses that
result in future expansions. This causes problems when the expansion
includes whitespace:

 $ git log -n 1 --pretty='format:newline:%+h%nno_newline:%h'
 newline:
 a0b1c2d
 no_newline:
 a0b1c2

The second expansion of the hash added an unwanted newline and removed
the final character. It seemingly used the cached expansion *starting
from the inserted newline*.

The expected output is:

 $ git log -n 1 --pretty='format:newline:%+h%nno_newline:%h'
 newline:
 a0b1c2d
 no_newline:a0b1c2d


Nicely explained.  The add_again() mechanism caches an earlier
result by remembering the offset and the length in the strbuf the
formatted string is being accumulated to, but because %+x (and
probably %-x) magic placeholders futz with the result of
format_commit_one() in place, the cache goes out of sync, of course.


Indeed, a very nice bug report!


I think the call to format_commit_one() needs to be taught to pass
'magic' through, and then add_again() mechanism needs to be told not
to cache when magic is in effect, or something like that.

Perhaps something along this line...?

  pretty.c | 64 ++--
  1 file changed, 38 insertions(+), 26 deletions(-)


That looks quite big.  You even invent a way to classify magic. Does the
caching feature justify the added complexity?  Alternatives:

- Don't cache anymore, now that we have placeholders that change output
  on top of the original append-only ones.  Yields a net code reduction.
  Duplicated %h, %t and %p will have to be resolved at each occurrence.

- Provide some space for the cache instead of reusing the output, like
  a strbuf for each placeholder.  Adds a memory allocation to each
  first occurrence of %h, %t and %p.  Such a cache could be reused for
  multiple commits by truncating it instead of releasing; not sure how
  to pass it in nicely for it to survive a sequence of calls, though.

René


Re: [PATCH] strbuf: let strbuf_addftime handle %z and %Z itself

2017-06-12 Thread René Scharfe

Am 12.06.2017 um 21:02 schrieb Ævar Arnfjörð Bjarmason:

I only ever use the time offset info to quickly make a mental note of
"oh +0200, this guy's in Europe", or "oh -0400 America East". Having
any info at all for %Z would allow me to easily replace that already
buggy mapping that exists in my head right now with something that's
at least a bit more accurate, e.g. I remember that +0900 is Japan, but
I can't now offhand recall what timezones India is at.

Now, obviously we can't know whether to translate e.g. -0400 to New
York or Santiago, but for the purposes of human readable output I
thought it would be more useful to guess than show nothing at all. So
for that purpose the most useful output of %Z *for me* would be just
showing a string with the 3 most notable cities/areas, weighted for
showing locations on different continents, e.g.:

   + = Iceland, West Africa, Ittoqqortoormiit
   +0100 = London, Lisbon, Kinshasa
   ...
   +0900 = Tokyo, Seul, Manokwari
   
   -0900 = San Francisco, Vancouver, Tijuana
   
   -0600 = Denver, Managua, Calgary

Then I could run:

 git log --date=format-local:"%Y[...](%Z)"

And get output like:

 commit 826c06412e (HEAD -> master, origin/master, origin/HEAD)
 Author: Junio C Hamano 
 Date:   Fri Jun 2 15:07:36 2017 +0900 (San Francisco, Vancouver,
Tijuana etc.)

 Fifth batch for 2.14
 [...]
 commit 30d005c020
 Author: Jeff King 
 Date:   Fri May 19 08:59:34 2017 -0400 (New York, Havana, Santiago etc.)

 diff: use blob path for blob/file diffs

Which gives me a pretty good idea of where the people who are making
my colleges / collaborators who are making commits all over the world
are located, for the purposes of reinforcing the abstract numeric
mapping with a best-guess at what the location might be, or at least
something that's close to the actual location.


Half the time this would be off by a zone in areas that use daylight
saving time, or you'd need to determine when DST starts and ends, but
since that depends on the exact time zone it will be tricky.

You could use military time zones, which are nice and easy to convert:
Alpha (UTC+1) to Mike (UTC+12) (Juliet is skipped), November (UTC-1) to 
Yankee (UTC-12), and Zulu time zone (UTC+0).  Downside: Most civilians

don't use them. :)  Also there are no names for zones that have an
offset of a fraction of an hour.


I'd definitely use a feature like that, and could hack it up on top of
the current patches if there wasn't vehement opposition to it. Maybe
the above examples change someone's mind, I can't think of a case
where someone's relying on %Z now for date-local, or at least cases
where something like the above wouldn't be more useful in practice
than just an empty string, but if nobody agrees so be it :)


Personally I don't have a use for time information; dates alone would
suffice for me -- so I certainly won't hold you back.  But I don't see
how it could be done.

René


Re: [PATCH] strbuf: let strbuf_addftime handle %z and %Z itself

2017-06-12 Thread René Scharfe

Am 12.06.2017 um 18:16 schrieb Ævar Arnfjörð Bjarmason:

On Mon, Jun 12, 2017 at 5:12 PM, Junio C Hamano <gits...@pobox.com> wrote:

René Scharfe <l@web.de> writes:


Am 07.06.2017 um 10:17 schrieb Jeff King:

On Sat, Jun 03, 2017 at 12:40:34PM +0200, René Scharfe wrote:

Duplicates strbuf_expand to a certain extent, but not too badly, I
think.  Leaves the door open for letting strftime handle the local
case.


I guess you'd plan to do that like this in the caller:

if (date->local)
  tz_name = NULL;
else
  tz_name = "";

and then your strftime() doesn't do any %z expansion when tz_name is
NULL.


Yes, or you could look up a time zone name somewhere else -- except we
don't have a way to do that, at least for now.


Is that only "for now"?  I have a feeling that it is fundamentally
impossible with the data we record.  When GMTOFF 9:00 is the only
thing we have for a timestamp, can we tell if we should label it as
JST (aka Asia/Tokyo) or KST (aka Asia/Seoul)?


We could track the time zone on commit, e.g. in a new header.  I doubt
that the need for showing time zone names will be strong enough to go to
that length, though.


It's obviously not perfect for all the reasons mentioned in this
thread, but we already have a timezone->offset mapping in the
timezone_names variable in date.c, a good enough solution might be to
simply reverse that lookup when formatting %Z

Of course we can never know if you were in Tokyo or Seul from the info
in the commit object, but we don't need to, it's enough that we just
emit JST for +0900 and anyone reading the output has at least some
idea what +0900 maps to.


I suspect that there will be different opinions, for cultural and
political reasons.


We could also simply replace "%Z" with the empty string, as the the
POSIX strftime() documentation allows for:
http://pubs.opengroup.org/onlinepubs/009695399/functions/strftime.html
("Replaced by the timezone name or abbreviation, or by no bytes if no
timezone information exists.").


Yes, that's what the patch does.

René


Re: [PATCH] strbuf: let strbuf_addftime handle %z and %Z itself

2017-06-11 Thread René Scharfe

Am 07.06.2017 um 10:17 schrieb Jeff King:

On Sat, Jun 03, 2017 at 12:40:34PM +0200, René Scharfe wrote:

Duplicates strbuf_expand to a certain extent, but not too badly, I
think.  Leaves the door open for letting strftime handle the local
case.


I guess you'd plan to do that like this in the caller:

   if (date->local)
tz_name = NULL;
   else
tz_name = "";

and then your strftime() doesn't do any %z expansion when tz_name is
NULL.


Yes, or you could look up a time zone name somewhere else -- except we
don't have a way to do that, at least for now.


I was thinking that we would need to have it take the actual time_t, and
then it would be able to do the tzset/localtime dance itself. But since
I don't think we're planning to do that (if anything we'd just handle
the normal localtime() case), the complication it would add to the
interface isn't worth it.


A caller that really needs to do that can, and pass the result as a
string.  Not pretty, but at least it's a possibility.

René


Re: [PATCH] strbuf: let strbuf_addftime handle %z and %Z itself

2017-06-03 Thread René Scharfe

Am 03.06.2017 um 15:13 schrieb Ulrich Mueller:

On Sat, 3 Jun 2017, René Scharfe wrote:



+   case 'Z':
+   strbuf_addstr(_fmt, tz_name);


Is it guaranteed that tz_name cannot contain a percent sign itself?


Currently yes, because the only caller passes an empty string.

The fact that tz_name is subject to expansion by strftime could be
mentioned explicitly in strbuf.h.  I'm not sure if that's a desirable
property, but it allows callers to expand %z internally and %Z using
strftime.

René


[PATCH] strbuf: let strbuf_addftime handle %z and %Z itself

2017-06-03 Thread René Scharfe
There is no portable way to pass timezone information to strftime.  Add
parameters for timezone offset and name to strbuf_addftime and let it
handle the timezone-related format specifiers %z and %Z internally.
Callers can opt out by passing NULL as timezone name.

Use an empty string as timezone name in show_date (the only current
caller) for now because we only have the timezone offset in non-local
mode.  POSIX allows %Z to resolve to nothing in case of missing info.

Helped-by: Ulrich Mueller 
Helped-by: Jeff King 
Signed-off-by: Rene Scharfe 
---
Duplicates strbuf_expand to a certain extent, but not too badly, I
think.  Leaves the door open for letting strftime handle the local
case.

 date.c |  2 +-
 strbuf.c   | 42 ++
 strbuf.h   | 11 ---
 t/t6006-rev-list-format.sh | 12 
 4 files changed, 59 insertions(+), 8 deletions(-)

diff --git a/date.c b/date.c
index 63fa99685e..5580577334 100644
--- a/date.c
+++ b/date.c
@@ -246,7 +246,7 @@ const char *show_date(timestamp_t time, int tz, const 
struct date_mode *mode)
month_names[tm->tm_mon], tm->tm_year + 1900,
tm->tm_hour, tm->tm_min, tm->tm_sec, tz);
else if (mode->type == DATE_STRFTIME)
-   strbuf_addftime(, mode->strftime_fmt, tm);
+   strbuf_addftime(, mode->strftime_fmt, tm, tz, "");
else
strbuf_addf(, "%.3s %.3s %d %02d:%02d:%02d %d%c%+05d",
weekday_names[tm->tm_wday],
diff --git a/strbuf.c b/strbuf.c
index 00457940cf..b880107370 100644
--- a/strbuf.c
+++ b/strbuf.c
@@ -785,14 +785,47 @@ char *xstrfmt(const char *fmt, ...)
return ret;
 }
 
-void strbuf_addftime(struct strbuf *sb, const char *fmt, const struct tm *tm)
+void strbuf_addftime(struct strbuf *sb, const char *fmt, const struct tm *tm,
+int tz_offset, const char *tz_name)
 {
+   struct strbuf munged_fmt = STRBUF_INIT;
size_t hint = 128;
size_t len;
 
if (!*fmt)
return;
 
+   /*
+* There is no portable way to pass timezone information to
+* strftime, so we handle %z and %Z here.
+*/
+   if (tz_name) {
+   for (;;) {
+   const char *percent = strchrnul(fmt, '%');
+   strbuf_add(_fmt, fmt, percent - fmt);
+   if (!*percent)
+   break;
+   fmt = percent + 1;
+   switch (*fmt) {
+   case '%':
+   strbuf_addstr(_fmt, "%%");
+   fmt++;
+   break;
+   case 'z':
+   strbuf_addf(_fmt, "%+05d", tz_offset);
+   fmt++;
+   break;
+   case 'Z':
+   strbuf_addstr(_fmt, tz_name);
+   fmt++;
+   break;
+   default:
+   strbuf_addch(_fmt, '%');
+   }
+   }
+   fmt = munged_fmt.buf;
+   }
+
strbuf_grow(sb, hint);
len = strftime(sb->buf + sb->len, sb->alloc - sb->len, fmt, tm);
 
@@ -804,17 +837,18 @@ void strbuf_addftime(struct strbuf *sb, const char *fmt, 
const struct tm *tm)
 * output contains at least one character, and then drop the 
extra
 * character before returning.
 */
-   struct strbuf munged_fmt = STRBUF_INIT;
-   strbuf_addf(_fmt, "%s ", fmt);
+   if (fmt != munged_fmt.buf)
+   strbuf_addstr(_fmt, fmt);
+   strbuf_addch(_fmt, ' ');
while (!len) {
hint *= 2;
strbuf_grow(sb, hint);
len = strftime(sb->buf + sb->len, sb->alloc - sb->len,
   munged_fmt.buf, tm);
}
-   strbuf_release(_fmt);
len--; /* drop munged space */
}
+   strbuf_release(_fmt);
strbuf_setlen(sb, sb->len + len);
 }
 
diff --git a/strbuf.h b/strbuf.h
index 80047b1bb7..39d5836abd 100644
--- a/strbuf.h
+++ b/strbuf.h
@@ -339,9 +339,14 @@ __attribute__((format (printf,2,0)))
 extern void strbuf_vaddf(struct strbuf *sb, const char *fmt, va_list ap);
 
 /**
- * Add the time specified by `tm`, as formatted by `strftime`.
- */
-extern void strbuf_addftime(struct strbuf *sb, const char *fmt, const struct 
tm *tm);
+ * Add the time specified by `tm`, as formatted by `strftime`.  `tz_offset`
+ * and `tz_name` are used to expand %z and %Z internally, unless `tz_name`
+ * is NULL.  `tz_offset` is in 

Re: git-2.13.0: log --date=format:%z not working

2017-06-02 Thread René Scharfe

Am 02.06.2017 um 05:08 schrieb Jeff King:

In theory the solution is:

   1. Start using localtime() instead of gmtime() with an adjustment when
  we are converting to the local timezone (i.e., format-local). We
  should be able to do this portably.

  This is easy to do, and it's better than handling %z ourselves,
  because it makes %Z work, too.

   2. When showing the author's timezone, do some trickery to set the
  program's timezone, then use localtime(), then restore the program
  timezone.

  I couldn't get this to work reliably. And anyway, we'd still have
  nothing to put in %Z since we don't have a timezone name at all in
  the git objects. We just have "+0400" or whatever.

So I don't see a portable way to make (2) work.


We could create a strftime wrapper that also takes a time zone offset,
with platform-specific implementations.  Is it worth the effort?

What reliability issues did you run into?


But it seems a shame
that %Z does not work for case (1) with René's patch.

I guess we could do (1) for the local cases and then handle "%z"
ourselves otherwise. That sounds even _more_ confusing, but it at least
gets the most cases right.

If we do handle "%z" ourselves (either always or for just the one case),
what should the matching %Z say? Right now (and I think with René's
patch) it says GMT, which is actively misleading. We should probably
replace it with the same text as "%z". That's not quite what the user
wanted, but at least it's accurate.


On Linux "%z %Z" is expanded to "+0200 CEST" for me, while on Windows I
get "Mitteleurop▒ische Sommerzeit Mitteleurop▒ische Sommerzeit".  (That
"▒" is probably supposed to be an "ä".)  POSIX requires  +hhmm or -hhmm
format for %z, and for %Z is to be "Replaced by the timezone name or
abbreviation".

I'd say "GMT+0200" etc. is a nice enough timezone name, i.e. having %Z
resolve to the same as %z plus a literal prefix of "GMT" should at least
not be wrong.

Alternatively we could have a lookup table mapping a few typical offsets
to timezone names, but e.g. handling daylight saving times would
probably be too hard (when did that part of the world switch in the
given year?  north or south of the equator?)..


As far as the patch itself goes, I'm disappointed to lose the automatic
"%" handling for all of the other callers. But I suspect the boilerplate
involved in any solution that lets callers choose whether or not to use
it would end up being longer than just handling it in each caller.


Actually I felt uneasy when you added that forced %% handling because it
put a policy into an otherwise neutral interpreter function.  I just had
no practical argument against it -- until now.

I'd rather see strbuf_expand also lose the hard-coded percent sign, but
again I don't have an actual user for such a flexibility (yet).

Perhaps we should add a fully neutral strbuf_expand_core (or whatever),
make strbuf_expand a wrapper with hard-coded % and %% handling and use
the core function in the strftime wrapper.  Except that the function is
not easily stackable.  Hmm..

René


Re: git-2.13.0: log --date=format:%z not working

2017-05-28 Thread René Scharfe
Am 27.05.2017 um 23:46 schrieb Jeff King:
> On Sat, May 27, 2017 at 06:57:08PM +0200, Ævar Arnfjörð Bjarmason wrote:
> 
>> There's another test which breaks if we just s/gmtime/localtime/g. As
>> far as I can tell to make the non-local case work we'd need to do a
>> whole dance where we set the TZ variable to e.g. UTC$offset, then call
>> strftime(), then call it again. Maybe there's some way to just specify
>> the tz offset, but I didn't find any in a quick skimming of time.h.
> 
> There isn't.
Right.  We could handle %z internally, though.  %Z would be harder (left
as an exercise for readers..).

First we'd have to undo 0a0416a3 (strbuf_expand: convert "%%" to "%"),
though, in order to give full control back to strbuf_expand callbacks.

2-pack patch:

--- >8 ---
 builtin/cat-file.c |  5 +
 convert.c  |  1 +
 daemon.c   |  3 +++
 date.c |  2 +-
 ll-merge.c |  5 +++--
 pretty.c   |  3 +++
 strbuf.c   | 39 ++-
 strbuf.h   | 11 +--
 t/t6006-rev-list-format.sh | 12 
 9 files changed, 63 insertions(+), 18 deletions(-)

diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 1890d7a639..9cf2e4291d 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -252,6 +252,11 @@ static size_t expand_format(struct strbuf *sb, const char 
*start, void *data)
 {
const char *end;
 
+   if (*start == '%') {
+   strbuf_addch(sb, '%');
+   return 1;
+   }
+
if (*start != '(')
return 0;
end = strchr(start + 1, ')');
diff --git a/convert.c b/convert.c
index 8d652bf27c..8336fff694 100644
--- a/convert.c
+++ b/convert.c
@@ -399,6 +399,7 @@ static int filter_buffer_or_fd(int in, int out, void *data)
struct strbuf path = STRBUF_INIT;
struct strbuf_expand_dict_entry dict[] = {
{ "f", NULL, },
+   { "%", "%" },
{ NULL, NULL, },
};
 
diff --git a/daemon.c b/daemon.c
index ac7181a483..191fac2716 100644
--- a/daemon.c
+++ b/daemon.c
@@ -148,6 +148,9 @@ static size_t expand_path(struct strbuf *sb, const char 
*placeholder, void *ctx)
case 'D':
strbuf_addstr(sb, context->directory);
return 1;
+   case '%':
+   strbuf_addch(sb, '%');
+   return 1;
}
return 0;
 }
diff --git a/date.c b/date.c
index 63fa99685e..d16a88eea5 100644
--- a/date.c
+++ b/date.c
@@ -246,7 +246,7 @@ const char *show_date(timestamp_t time, int tz, const 
struct date_mode *mode)
month_names[tm->tm_mon], tm->tm_year + 1900,
tm->tm_hour, tm->tm_min, tm->tm_sec, tz);
else if (mode->type == DATE_STRFTIME)
-   strbuf_addftime(, mode->strftime_fmt, tm);
+   strbuf_addftime(, mode->strftime_fmt, tm, tz);
else
strbuf_addf(, "%.3s %.3s %d %02d:%02d:%02d %d%c%+05d",
weekday_names[tm->tm_wday],
diff --git a/ll-merge.c b/ll-merge.c
index ac0d4a5d78..e6202c7397 100644
--- a/ll-merge.c
+++ b/ll-merge.c
@@ -172,7 +172,7 @@ static int ll_ext_merge(const struct ll_merge_driver *fn,
 {
char temp[4][50];
struct strbuf cmd = STRBUF_INIT;
-   struct strbuf_expand_dict_entry dict[6];
+   struct strbuf_expand_dict_entry dict[7];
struct strbuf path_sq = STRBUF_INIT;
const char *args[] = { NULL, NULL };
int status, fd, i;
@@ -185,7 +185,8 @@ static int ll_ext_merge(const struct ll_merge_driver *fn,
dict[2].placeholder = "B"; dict[2].value = temp[2];
dict[3].placeholder = "L"; dict[3].value = temp[3];
dict[4].placeholder = "P"; dict[4].value = path_sq.buf;
-   dict[5].placeholder = NULL; dict[5].value = NULL;
+   dict[5].placeholder = "%"; dict[5].value = "%";
+   dict[6].placeholder = NULL; dict[6].value = NULL;
 
if (fn->cmdline == NULL)
die("custom merge driver %s lacks command line.", fn->name);
diff --git a/pretty.c b/pretty.c
index 587d48371b..56872bfa25 100644
--- a/pretty.c
+++ b/pretty.c
@@ -1428,6 +1428,9 @@ static size_t format_commit_item(struct strbuf *sb, /* in 
UTF-8 */
} magic = NO_MAGIC;
 
switch (placeholder[0]) {
+   case '%':
+   strbuf_addch(sb, '%');
+   return 1;
case '-':
magic = DEL_LF_BEFORE_EMPTY;
break;
diff --git a/strbuf.c b/strbuf.c
index 00457940cf..206dff6037 100644
--- a/strbuf.c
+++ b/strbuf.c
@@ -309,12 +309,6 @@ void strbuf_expand(struct strbuf *sb, const char *format, 
expand_fn_t fn,
break;
format = percent + 1;
 
-   if (*format == '%') {
-   strbuf_addch(sb, '%');
-   format++;
-   continue;
-   }
-
  

Re: [PATCH v4 00/10] The final building block for a faster rebase -i

2017-05-27 Thread René Scharfe

Am 26.05.2017 um 05:15 schrieb Liam Beguin:

I tried to time the execution on an interactive rebase (on Linux) but
I did not notice a significant change in speed.
Do we have a way to measure performance / speed changes between version?


Well, there's performance test script p3404-rebase-interactive.sh.  You
could run it e.g. like this:

$ (cd t/perf && ./run origin/master HEAD ./p3404*.sh)

This would compare the performance of master with the current branch
you're on.  The results of p3404 are quite noisy for me on master,
though (saw 15% difference between runs without any code changes), so
take them with a bag of salt.

There's more info on performance tests in general in t/perf/README.

René


Re: [PATCH] mingw: simplify PATH handling

2017-05-20 Thread René Scharfe

Am 20.05.2017 um 19:00 schrieb Johannes Sixt:

Am 20.05.2017 um 17:29 schrieb René Scharfe:

-static char *path_lookup(const char *cmd, char **path, int exe_only)
+static char *path_lookup(const char *cmd, int exe_only)
  {
+const char *path;
  char *prog = NULL;
  int len = strlen(cmd);
  int isexe = len >= 4 && !strcasecmp(cmd+len-4, ".exe");
  if (strchr(cmd, '/') || strchr(cmd, '\\'))
-prog = xstrdup(cmd);
+return xstrdup(cmd);
-while (!prog && *path)
-prog = lookup_prog(*path++, cmd, isexe, exe_only);
+path = mingw_getenv("PATH");
+if (!path)
+return NULL;
+
+for (; !prog && *path; path++) {
+const char *sep = strchrnul(path, ';');
+if (sep == path)
+continue;
+prog = lookup_prog(path, sep - path, cmd, isexe, exe_only);
+path = sep;
+}


The loop termination does not work here. When the final PATH component 
is investigated, sep points to the NUL. This pointer is assigned to 
path, which is incremented and now points one past NUL. Then the loop 
condition (*path) accesses the char behind NUL.


Ugh, yes.  Thanks for catching!

Cause: Last minute/hour edit, had used strspn() before.


  return prog;
  }
@@ -1569,13 +1527,10 @@ static pid_t mingw_spawnve_fd(const char *cmd, 
const char **argv, char **deltaen

  }
  if (getenv("GIT_STRACE_COMMANDS")) {
-char **path = get_path_split();
-char *p = path_lookup("strace.exe", path, 1);
+char *p = path_lookup("strace.exe", 1);
  if (!p) {
-free_path_split(path);
  return error("strace not found!");
  }
-free_path_split(path);
  if (xutftowcs_path(wcmd, p) < 0) {
  free(p);
  return -1;


Upstream does not have this hunk.



Right, it's in next, but not yet in master.  And there are other
differences, so it's a bad time to do that cleanup. :(

René


[PATCH v2] mingw: simplify PATH handling

2017-05-20 Thread René Scharfe
On Windows the environment variable PATH contains a semicolon-separated
list of directories to search for, in order, when looking for the
location of a binary to run.  get_path_split() parses it and returns an
array of string copies, which is iterated by path_lookup(), which in
turn passes each entry to lookup_prog().

Change lookup_prog() to take the directory name as a length-limited
string instead of as a NUL-terminated one and parse PATH directly in
path_lookup().  This avoids memory allocations, simplifying the code.

Helped-by: Johannes Sixt 
Signed-off-by: Rene Scharfe 
---
Rebased against Junio's master, fixed string overrun.  Can hold and
resubmit in a few months if it gets in the way right now.

 compat/mingw.c | 91 +++---
 1 file changed, 23 insertions(+), 68 deletions(-)

diff --git a/compat/mingw.c b/compat/mingw.c
index 3fbfda5978..c6134f7c81 100644
--- a/compat/mingw.c
+++ b/compat/mingw.c
@@ -941,64 +941,14 @@ static const char *parse_interpreter(const char *cmd)
 }
 
 /*
- * Splits the PATH into parts.
- */
-static char **get_path_split(void)
-{
-   char *p, **path, *envpath = mingw_getenv("PATH");
-   int i, n = 0;
-
-   if (!envpath || !*envpath)
-   return NULL;
-
-   envpath = xstrdup(envpath);
-   p = envpath;
-   while (p) {
-   char *dir = p;
-   p = strchr(p, ';');
-   if (p) *p++ = '\0';
-   if (*dir) { /* not earlier, catches series of ; */
-   ++n;
-   }
-   }
-   if (!n)
-   return NULL;
-
-   ALLOC_ARRAY(path, n + 1);
-   p = envpath;
-   i = 0;
-   do {
-   if (*p)
-   path[i++] = xstrdup(p);
-   p = p+strlen(p)+1;
-   } while (i < n);
-   path[i] = NULL;
-
-   free(envpath);
-
-   return path;
-}
-
-static void free_path_split(char **path)
-{
-   char **p = path;
-
-   if (!path)
-   return;
-
-   while (*p)
-   free(*p++);
-   free(path);
-}
-
-/*
  * exe_only means that we only want to detect .exe files, but not scripts
  * (which do not have an extension)
  */
-static char *lookup_prog(const char *dir, const char *cmd, int isexe, int 
exe_only)
+static char *lookup_prog(const char *dir, int dirlen, const char *cmd,
+int isexe, int exe_only)
 {
char path[MAX_PATH];
-   snprintf(path, sizeof(path), "%s/%s.exe", dir, cmd);
+   snprintf(path, sizeof(path), "%.*s\\%s.exe", dirlen, dir, cmd);
 
if (!isexe && access(path, F_OK) == 0)
return xstrdup(path);
@@ -1013,17 +963,29 @@ static char *lookup_prog(const char *dir, const char 
*cmd, int isexe, int exe_on
  * Determines the absolute path of cmd using the split path in path.
  * If cmd contains a slash or backslash, no lookup is performed.
  */
-static char *path_lookup(const char *cmd, char **path, int exe_only)
+static char *path_lookup(const char *cmd, int exe_only)
 {
+   const char *path;
char *prog = NULL;
int len = strlen(cmd);
int isexe = len >= 4 && !strcasecmp(cmd+len-4, ".exe");
 
if (strchr(cmd, '/') || strchr(cmd, '\\'))
-   prog = xstrdup(cmd);
+   return xstrdup(cmd);
+
+   path = mingw_getenv("PATH");
+   if (!path)
+   return NULL;
 
-   while (!prog && *path)
-   prog = lookup_prog(*path++, cmd, isexe, exe_only);
+   while (!prog) {
+   const char *sep = strchrnul(path, ';');
+   int dirlen = sep - path;
+   if (dirlen)
+   prog = lookup_prog(path, dirlen, cmd, isexe, exe_only);
+   if (!*sep)
+   break;
+   path = sep + 1;
+   }
 
return prog;
 }
@@ -1190,8 +1152,7 @@ pid_t mingw_spawnvpe(const char *cmd, const char **argv, 
char **deltaenv,
 int fhin, int fhout, int fherr)
 {
pid_t pid;
-   char **path = get_path_split();
-   char *prog = path_lookup(cmd, path, 0);
+   char *prog = path_lookup(cmd, 0);
 
if (!prog) {
errno = ENOENT;
@@ -1202,7 +1163,7 @@ pid_t mingw_spawnvpe(const char *cmd, const char **argv, 
char **deltaenv,
 
if (interpr) {
const char *argv0 = argv[0];
-   char *iprog = path_lookup(interpr, path, 1);
+   char *iprog = path_lookup(interpr, 1);
argv[0] = prog;
if (!iprog) {
errno = ENOENT;
@@ -1220,21 +1181,18 @@ pid_t mingw_spawnvpe(const char *cmd, const char 
**argv, char **deltaenv,
   fhin, fhout, fherr);
free(prog);
}
-   free_path_split(path);
return pid;
 }
 
 static int 

[PATCH] mingw: simplify PATH handling

2017-05-20 Thread René Scharfe
On Windows the environment variable PATH contains a semicolon-separated
list of directories to search for, in order, when looking for the
location of a binary to run.  get_path_split() parses it and returns an
array of string copies, which is iterated by path_lookup(), which in
turn passes each entry to lookup_prog().

Change lookup_prog() to take the directory name as a length-limited
string instead of as a NUL-terminated one and parse PATH directly in
path_lookup().  This avoids memory allocations, simplifying the code.

Signed-off-by: Rene Scharfe 
---
 compat/mingw.c | 96 ++
 1 file changed, 22 insertions(+), 74 deletions(-)

diff --git a/compat/mingw.c b/compat/mingw.c
index 5113071bc7..7bc61d4066 100644
--- a/compat/mingw.c
+++ b/compat/mingw.c
@@ -1154,67 +1154,15 @@ static const char *parse_interpreter(const char *cmd)
 }
 
 /*
- * Splits the PATH into parts.
- */
-static char **get_path_split(void)
-{
-   char *p, **path, *envpath = mingw_getenv("PATH");
-   int i, n = 0;
-
-   if (!envpath || !*envpath)
-   return NULL;
-
-   envpath = xstrdup(envpath);
-   p = envpath;
-   while (p) {
-   char *dir = p;
-   p = strchr(p, ';');
-   if (p) *p++ = '\0';
-   if (*dir) { /* not earlier, catches series of ; */
-   ++n;
-   }
-   }
-   if (!n) {
-   free(envpath);
-   return NULL;
-   }
-
-   ALLOC_ARRAY(path, n + 1);
-   p = envpath;
-   i = 0;
-   do {
-   if (*p)
-   path[i++] = xstrdup(p);
-   p = p+strlen(p)+1;
-   } while (i < n);
-   path[i] = NULL;
-
-   free(envpath);
-
-   return path;
-}
-
-static void free_path_split(char **path)
-{
-   char **p = path;
-
-   if (!path)
-   return;
-
-   while (*p)
-   free(*p++);
-   free(path);
-}
-
-/*
  * exe_only means that we only want to detect .exe files, but not scripts
  * (which do not have an extension)
  */
-static char *lookup_prog(const char *dir, const char *cmd, int isexe, int 
exe_only)
+static char *lookup_prog(const char *dir, int dirlen, const char *cmd,
+int isexe, int exe_only)
 {
char path[MAX_PATH];
wchar_t wpath[MAX_PATH];
-   snprintf(path, sizeof(path), "%s\\%s.exe", dir, cmd);
+   snprintf(path, sizeof(path), "%.*s\\%s.exe", dirlen, dir, cmd);
 
if (xutftowcs_path(wpath, path) < 0)
return NULL;
@@ -1235,17 +1183,27 @@ static char *lookup_prog(const char *dir, const char 
*cmd, int isexe, int exe_on
  * Determines the absolute path of cmd using the split path in path.
  * If cmd contains a slash or backslash, no lookup is performed.
  */
-static char *path_lookup(const char *cmd, char **path, int exe_only)
+static char *path_lookup(const char *cmd, int exe_only)
 {
+   const char *path;
char *prog = NULL;
int len = strlen(cmd);
int isexe = len >= 4 && !strcasecmp(cmd+len-4, ".exe");
 
if (strchr(cmd, '/') || strchr(cmd, '\\'))
-   prog = xstrdup(cmd);
+   return xstrdup(cmd);
 
-   while (!prog && *path)
-   prog = lookup_prog(*path++, cmd, isexe, exe_only);
+   path = mingw_getenv("PATH");
+   if (!path)
+   return NULL;
+
+   for (; !prog && *path; path++) {
+   const char *sep = strchrnul(path, ';');
+   if (sep == path)
+   continue;
+   prog = lookup_prog(path, sep - path, cmd, isexe, exe_only);
+   path = sep;
+   }
 
return prog;
 }
@@ -1569,13 +1527,10 @@ static pid_t mingw_spawnve_fd(const char *cmd, const 
char **argv, char **deltaen
}
 
if (getenv("GIT_STRACE_COMMANDS")) {
-   char **path = get_path_split();
-   char *p = path_lookup("strace.exe", path, 1);
+   char *p = path_lookup("strace.exe", 1);
if (!p) {
-   free_path_split(path);
return error("strace not found!");
}
-   free_path_split(path);
if (xutftowcs_path(wcmd, p) < 0) {
free(p);
return -1;
@@ -1634,8 +1589,7 @@ pid_t mingw_spawnvpe(const char *cmd, const char **argv, 
char **deltaenv,
 int fhin, int fhout, int fherr)
 {
pid_t pid;
-   char **path = get_path_split();
-   char *prog = path_lookup(cmd, path, 0);
+   char *prog = path_lookup(cmd, 0);
 
if (!prog) {
errno = ENOENT;
@@ -1646,7 +1600,7 @@ pid_t mingw_spawnvpe(const char *cmd, const char **argv, 
char **deltaenv,
 
if (interpr) {
const char *argv0 = argv[0];
-   char *iprog = 

[PATCH 5/5] p0004: don't error out if test repo is too small

2017-05-13 Thread René Scharfe
Repositories with less than 4000 entries are always handled using a
single thread, causing test-lazy-init-name-hash --multi to error out.
Don't abort the whole test script in that case, but simply skip the
multi-threaded performance check.  We can still use it to compare the
single-threaded speed of different versions in that case.

Signed-off-by: Rene Scharfe 
---
 t/perf/p0004-lazy-init-name-hash.sh | 13 -
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/t/perf/p0004-lazy-init-name-hash.sh 
b/t/perf/p0004-lazy-init-name-hash.sh
index 3c2135a185..8de5a98cfc 100755
--- a/t/perf/p0004-lazy-init-name-hash.sh
+++ b/t/perf/p0004-lazy-init-name-hash.sh
@@ -8,10 +8,13 @@ test_checkout_worktree
 
 test_expect_success 'verify both methods build the same hashmaps' '
test-lazy-init-name-hash --dump --single >out.single &&
-   test-lazy-init-name-hash --dump --multi >out.multi &&
-   sort sorted.single &&
-   sort sorted.multi &&
-   test_cmp sorted.single sorted.multi
+   if test-lazy-init-name-hash --dump --multi >out.multi
+   then
+   test_set_prereq REPO_BIG_ENOUGH_FOR_MULTI &&
+   sort sorted.single &&
+   sort sorted.multi &&
+   test_cmp sorted.single sorted.multi
+   fi
 '
 
 test_expect_success 'calibrate' '
@@ -46,7 +49,7 @@ test_perf "single-threaded, $desc" "
test-lazy-init-name-hash --single --count=$count
 "
 
-test_perf "multi-threaded, $desc" "
+test_perf REPO_BIG_ENOUGH_FOR_MULTI "multi-threaded, $desc" "
test-lazy-init-name-hash --multi --count=$count
 "
 
-- 
2.12.2



[PATCH 4/5] p0004: don't abort if multi-threaded is too slow

2017-05-13 Thread René Scharfe
If the single-threaded variant beats the multi-threaded one then we may
have a performance bug, but that doesn't justify aborting the test.
Drop that check; we can compare the results for --single and --multi
using the actual performance tests.

Signed-off-by: Rene Scharfe 
---
 t/perf/p0004-lazy-init-name-hash.sh | 4 
 1 file changed, 4 deletions(-)

diff --git a/t/perf/p0004-lazy-init-name-hash.sh 
b/t/perf/p0004-lazy-init-name-hash.sh
index d30c32f97b..3c2135a185 100755
--- a/t/perf/p0004-lazy-init-name-hash.sh
+++ b/t/perf/p0004-lazy-init-name-hash.sh
@@ -14,10 +14,6 @@ test_expect_success 'verify both methods build the same 
hashmaps' '
test_cmp sorted.single sorted.multi
 '
 
-test_expect_success 'multithreaded should be faster' '
-   test-lazy-init-name-hash --perf >out.perf
-'
-
 test_expect_success 'calibrate' '
entries=$(wc -l 

[PATCH 3/5] p0004: use test_perf

2017-05-13 Thread René Scharfe
The perf test suite (more specifically: t/perf/aggregate.perl) requires
each test script to write test results into a file, otherwise it aborts
when aggregating.  Add actual performance tests with test_perf to allow
p0004 to be run together with other perf scripts.

Calibrate the value for the parameter --count based on the size of the
test repository, in order to get meaningful results with smaller repos
yet still be able to finish the script against huge ones without having
to wait for hours.

Signed-off-by: Rene Scharfe 
---
The numbers are just guesses; I didn't actually test all ranges.

 t/perf/p0004-lazy-init-name-hash.sh | 36 
 1 file changed, 36 insertions(+)

diff --git a/t/perf/p0004-lazy-init-name-hash.sh 
b/t/perf/p0004-lazy-init-name-hash.sh
index 576bdc3c4e..d30c32f97b 100755
--- a/t/perf/p0004-lazy-init-name-hash.sh
+++ b/t/perf/p0004-lazy-init-name-hash.sh
@@ -18,4 +18,40 @@ test_expect_success 'multithreaded should be faster' '
test-lazy-init-name-hash --perf >out.perf
 '
 
+test_expect_success 'calibrate' '
+   entries=$(wc -l 

[PATCH 2/5] p0004: avoid using pipes

2017-05-13 Thread René Scharfe
The return code of commands on the producing end of a pipe is ignored.
Evaluate the outcome of test-lazy-init-name-hash by calling sort
separately.

Signed-off-by: Rene Scharfe 
---
 t/perf/p0004-lazy-init-name-hash.sh | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/t/perf/p0004-lazy-init-name-hash.sh 
b/t/perf/p0004-lazy-init-name-hash.sh
index 716c951553..576bdc3c4e 100755
--- a/t/perf/p0004-lazy-init-name-hash.sh
+++ b/t/perf/p0004-lazy-init-name-hash.sh
@@ -7,9 +7,11 @@ test_perf_large_repo
 test_checkout_worktree
 
 test_expect_success 'verify both methods build the same hashmaps' '
-   test-lazy-init-name-hash --dump --single | sort >out.single &&
-   test-lazy-init-name-hash --dump --multi  | sort >out.multi  &&
-   test_cmp out.single out.multi
+   test-lazy-init-name-hash --dump --single >out.single &&
+   test-lazy-init-name-hash --dump --multi >out.multi &&
+   sort sorted.single &&
+   sort sorted.multi &&
+   test_cmp sorted.single sorted.multi
 '
 
 test_expect_success 'multithreaded should be faster' '
-- 
2.12.2



[PATCH 1/5] p0004: simplify calls of test-lazy-init-name-hash

2017-05-13 Thread René Scharfe
The test library puts helpers into $PATH, so we can simply call them
without specifying their location.

The suffix $X is also not necessary because .exe files on Windows can be
started without specifying their extension, and on other platforms it's
empty anyway.

Signed-off-by: Rene Scharfe 
---
 t/perf/p0004-lazy-init-name-hash.sh | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/t/perf/p0004-lazy-init-name-hash.sh 
b/t/perf/p0004-lazy-init-name-hash.sh
index 5afa8c8df3..716c951553 100755
--- a/t/perf/p0004-lazy-init-name-hash.sh
+++ b/t/perf/p0004-lazy-init-name-hash.sh
@@ -7,13 +7,13 @@ test_perf_large_repo
 test_checkout_worktree
 
 test_expect_success 'verify both methods build the same hashmaps' '
-   $GIT_BUILD_DIR/t/helper/test-lazy-init-name-hash$X --dump --single | 
sort >out.single &&
-   $GIT_BUILD_DIR/t/helper/test-lazy-init-name-hash$X --dump --multi  | 
sort >out.multi  &&
+   test-lazy-init-name-hash --dump --single | sort >out.single &&
+   test-lazy-init-name-hash --dump --multi  | sort >out.multi  &&
test_cmp out.single out.multi
 '
 
 test_expect_success 'multithreaded should be faster' '
-   $GIT_BUILD_DIR/t/helper/test-lazy-init-name-hash$X --perf >out.perf
+   test-lazy-init-name-hash --perf >out.perf
 '
 
 test_done
-- 
2.12.2



[PATCH 0/5] p0004: support being called by t/perf/run

2017-05-13 Thread René Scharfe
p0004-lazy-init-name-hash.sh errors out if the test repo is too small,
and doesn't generate any perf test results even if it finishes
successfully.  That prevents t/perf/run from running the whole test
suite.  This series tries to address these issues.

  p0004: simplify calls of test-lazy-init-name-hash
  p0004: avoid using pipes
  p0004: use test_perf
  p0004: don't abort if multi-threaded is too slow
  p0004: don't error out if test repo is too small

 t/perf/p0004-lazy-init-name-hash.sh | 47 +
 1 file changed, 42 insertions(+), 5 deletions(-)

-- 
2.12.2


Re: [PATCH v3 12/25] split_commit_in_progress(): fix memory leak

2017-05-06 Thread René Scharfe

Am 04.05.2017 um 12:59 schrieb Johannes Schindelin:

Hi René,

On Wed, 3 May 2017, René Scharfe wrote:


Am 02.05.2017 um 18:02 schrieb Johannes Schindelin:

Reported via Coverity.

Signed-off-by: Johannes Schindelin <johannes.schinde...@gmx.de>
---
   wt-status.c | 7 ++-
   1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/wt-status.c b/wt-status.c
index 0a6e16dbe0f..1f3f6bcb980 100644
--- a/wt-status.c
+++ b/wt-status.c
@@ -1088,8 +1088,13 @@ static int split_commit_in_progress(struct wt_status
*s)
char *rebase_orig_head =
read_line_from_git_path("rebase-merge/orig-head");
   
   	if (!head || !orig_head || !rebase_amend || !rebase_orig_head ||

-   !s->branch || strcmp(s->branch, "HEAD"))
+   !s->branch || strcmp(s->branch, "HEAD")) {
+   free(head);
+   free(orig_head);
+   free(rebase_amend);
+   free(rebase_orig_head);
return split_in_progress;
+   }
   
if (!strcmp(rebase_amend, rebase_orig_head)) {

 if (strcmp(head, rebase_amend))



The return line could be replaced by "; else" to achieve the same
result, without duplicating the free calls.


True. It is much harder to explain it that way, though: the context looks
like this:

static int split_commit_in_progress(struct wt_status *s)
  {
 int split_in_progress = 0;
 char *head = read_line_from_git_path("HEAD");
 char *orig_head = read_line_from_git_path("ORIG_HEAD");
 char *rebase_amend = read_line_from_git_path("rebase-merge/amend");
 char *rebase_orig_head = 
read_line_from_git_path("rebase-merge/orig-head");

 if (!head || !orig_head || !rebase_amend || !rebase_orig_head ||
-   !s->branch || strcmp(s->branch, "HEAD"))
+   !s->branch || strcmp(s->branch, "HEAD")) {
+   free(head);
+   free(orig_head);
+   free(rebase_amend);
+   free(rebase_orig_head);
 return split_in_progress;
+   }

 if (!strcmp(rebase_amend, rebase_orig_head)) {
 if (strcmp(head, rebase_amend))
 split_in_progress = 1;
 } else if (strcmp(orig_head, rebase_orig_head)) {
 split_in_progress = 1;
 }

 if (!s->amend && !s->nowarn && !s->workdir_dirty)
 split_in_progress = 0;

 free(head);
 free(orig_head);
 free(rebase_amend);
 free(rebase_orig_head);
 return split_in_progress;
  }

So as you see, if we make the change you suggested, the next if() is hit
which possibly sets split_in_progress = 0.

The thing is: this variable has been initialized to 0 in the beginning! So
this conditional assignment would be a noop, unless any of the code paths
before this conditional modified split_in_progress.


Right. It could have been used as a quick two-line fix.


After you fully wrapped your head around this code, I am sure you agree
that the code is unnecessarily confusing to begin with: why do we go out
of our way to allocate and read all those strings, compare them to figure
out whether there is a split in progress, just to look at bits in the
wt_status struct (that we had available from the beginning) to potentially
undo all of that.


The only function with a higher cyclomatic complexity in this file is
wt_longstatus_print(), which spans more than 100 lines.  Amazing.


What's worse, I cannot even find a logical explanation why the code is so
confusing, as it has been added as it is today in commit 2d1ccebae41
(status: better advices when splitting a commit (during rebase -i),
2012-06-10).


Repeatedly I get the impression that defects or shortcomings tend to
cluster.  It pays to take a good look at the code surrounding a find of
an automatic checker for other possible improvements.


So I think this function really wants to be fixed more fully (although I
feel bad for inserting such a non-trivial fix into a patch series
otherwise populated by trivial memory/resource leak plugs):


It could be done in two steps.  Doing it in one is probably OK as well.


-- snipsnap --
Subject: [PATCH] squash! split_commit_in_progress(): fix memory leak

split_commit_in_progress(): simplify & fix memory leak

This function did a whole lot of unnecessary work, such as reading in
four files just to figure out that, oh, hey, we do not need to look at
them after all because the HEAD is not detached.

Simplify the entire function to return early when possible, to read in
the files only when necessary, and to release the allocated memory
always (there was a leak, reported via Coverity, where we failed to
release the allocated strings if the HEAD is not detached).

Signed-off-by: Johannes Schindelin <johannes.schinde...@gmx.de>
---
  wt-status.c | 39 

<    1   2   3   4   5   6   7   8   9   10   >