Re: [PATCH 2/2] hashcmp: use memcmp instead of open-coded loop

2017-08-09 Thread Jeff King
On Wed, Aug 09, 2017 at 04:55:43PM +0200, René Scharfe wrote:

> > I also wondered if using memcmp() could be a hint to the compiler to use
> > an intrinsic or some other trick, especially because the "len" here is a
> > constant. But in a toy function compiled with "gcc -S", it looks like we
> > do keep the call to memcmp (so the speedup really is glibc, and not some
> > compiler magic).
> 
> GCC 7 inlines memcmp() if we only need a binary result:
> 
>   https://godbolt.org/g/iZ11Ne

Cute.  It turns it into a series of 8-byte xors. The original open-coded
loop doesn't end up nearly as optimized with gcc-7.

I suspect many calls in practice are of the binary-result type. So some
of the speedup I saw may have been from compiler improvements and not
libc improvements. Still, I think the general argument is the same,
replacing "if your libc memcmp is fast" with "if your libc/compiler
makes memcmp fast".

-Peff


Re: [PATCH 2/2] hashcmp: use memcmp instead of open-coded loop

2017-08-09 Thread René Scharfe
Am 09.08.2017 um 12:16 schrieb Jeff King:
> In 1a812f3a70 (hashcmp(): inline memcmp() by hand to
> optimize, 2011-04-28), it was reported that an open-coded
> loop outperformed memcmp() for comparing sha1s.
> 
> Discussion[1] a few years later in 2013 showed that this
> depends on your libc's version of memcmp(). In particular,
> glibc 2.13 optimized their memcmp around 2011. Here are
> current timings with glibc 2.24 (best-of-five, on
> linux.git):
> 
>[before this patch, open-coded]
>$ time git rev-list --objects --all
>real   0m35.357s
>user   0m35.016s
>sys0m0.340s
> 
>[after this patch, memcmp]
>real   0m32.930s
>user   0m32.630s
>sys0m0.300s

Nice.  And here's the size of the git executable in my build:

 unstripped stripped
  before8048176  2082416
  after 8006064  2037360

> I also wondered if using memcmp() could be a hint to the compiler to use
> an intrinsic or some other trick, especially because the "len" here is a
> constant. But in a toy function compiled with "gcc -S", it looks like we
> do keep the call to memcmp (so the speedup really is glibc, and not some
> compiler magic).

GCC 7 inlines memcmp() if we only need a binary result:

https://godbolt.org/g/iZ11Ne

René


[PATCH 2/2] hashcmp: use memcmp instead of open-coded loop

2017-08-09 Thread Jeff King
In 1a812f3a70 (hashcmp(): inline memcmp() by hand to
optimize, 2011-04-28), it was reported that an open-coded
loop outperformed memcmp() for comparing sha1s.

Discussion[1] a few years later in 2013 showed that this
depends on your libc's version of memcmp(). In particular,
glibc 2.13 optimized their memcmp around 2011. Here are
current timings with glibc 2.24 (best-of-five, on
linux.git):

  [before this patch, open-coded]
  $ time git rev-list --objects --all
  real  0m35.357s
  user  0m35.016s
  sys   0m0.340s

  [after this patch, memcmp]
  real  0m32.930s
  user  0m32.630s
  sys   0m0.300s

Now that we've had 6 years for that version of glibc to
make its way onto people's machines, it's worth revisiting
our benchmarks and switching to memcmp().

It may be that there are other non-glibc systems where
memcmp() isn't as well optimized. But since our single data
point in favor of open-coding was on a now-ancient glibc, we
should probably assume the system memcmp is good unless
proven otherwise. We may end up with a SLOW_MEMCMP Makefile
knob, but we can hold off on that until we actually find
such a system in practice.

[1] https://public-inbox.org/git/20130318073229.ga5...@sigill.intra.peff.net/

Signed-off-by: Jeff King 
---
I also wondered if using memcmp() could be a hint to the compiler to use
an intrinsic or some other trick, especially because the "len" here is a
constant. But in a toy function compiled with "gcc -S", it looks like we
do keep the call to memcmp (so the speedup really is glibc, and not some
compiler magic).

 cache.h | 9 +
 1 file changed, 1 insertion(+), 8 deletions(-)

diff --git a/cache.h b/cache.h
index 71fe092644..46af4ecddb 100644
--- a/cache.h
+++ b/cache.h
@@ -939,14 +939,7 @@ extern const struct object_id null_oid;
 
 static inline int hashcmp(const unsigned char *sha1, const unsigned char *sha2)
 {
-   int i;
-
-   for (i = 0; i < GIT_SHA1_RAWSZ; i++, sha1++, sha2++) {
-   if (*sha1 != *sha2)
-   return *sha1 - *sha2;
-   }
-
-   return 0;
+   return memcmp(sha1, sha2, GIT_SHA1_RAWSZ);
 }
 
 static inline int oidcmp(const struct object_id *oid1, const struct object_id 
*oid2)
-- 
2.14.0.609.gd2d1f7ddf