[PATCH v8] refs.c: SSE2 optimizations for check_refname_component

2014-06-16 Thread David Turner
Optimize check_refname_component using SSE2 on x86_64.

git rev-parse HEAD is a good test-case for this, since it does almost
nothing except parse refs.  For one particular repo with about 60k
refs, almost all packed, the timings are:

Look up table: 29 ms
SSE2:  23 ms

This cuts about 20% off of the runtime.

The configure.ac changes include code from the GNU C Library written
by Joseph S. Myers joseph at codesourcery dot com.

Ondřej Bílka nel...@seznam.cz suggested an SSE2 approach to the
substring searches, which netted a speed boost over the SSE4.2 code I
had initially written.

Signed-off-by: David Turner dtur...@twitter.com
---
 git-compat-util.h  |  10 +++
 refs.c | 244 +
 t/t5511-refspec.sh |  19 +
 3 files changed, 239 insertions(+), 34 deletions(-)

diff --git a/git-compat-util.h b/git-compat-util.h
index f6d3a46..291d46b 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -668,6 +668,16 @@ void git_qsort(void *base, size_t nmemb, size_t size,
 #endif
 #endif
 
+#if defined(__GNUC__)  defined(__x86_64__)
+#include emmintrin.h
+/* This is the system memory page size; it's used so that we can read
+ * outside the bounds of an allocation without segfaulting.
+ */
+#ifndef PAGE_SIZE
+#define PAGE_SIZE 4096
+#endif
+#endif
+
 #ifdef UNRELIABLE_FSTAT
 #define fstat_is_reliable() 0
 #else
diff --git a/refs.c b/refs.c
index 46139d2..4c39af7 100644
--- a/refs.c
+++ b/refs.c
@@ -8,22 +8,43 @@
 /*
  * How to handle various characters in refnames:
  * 0: An acceptable character for refs
- * 1: End-of-component
- * 2: ., look for a preceding . to reject .. in refs
- * 3: {, look for a preceding @ to reject @{ in refs
- * 4: A bad character: ASCII control characters, ~, ^, : or SP
+ * 1: \0: End-of-component and string
+ * 2: /: End-of-component
+ * 3: ., look for a preceding . to reject .. in refs
+ * 4: {, look for a preceding @ to reject @{ in refs
+ * 5: *, usually a bad character except, once as a wildcard
+ * 6: A bad character except * (see check_refname_component below)
  */
 static unsigned char refname_disposition[256] = {
-   1, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,
-   4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,
-   4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 0, 0, 0, 2, 1,
-   0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 0, 0, 0, 0, 4,
+   1, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6,
+   6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6,
+   6, 0, 0, 0, 0, 0, 0, 0, 0, 0, 5, 0, 0, 0, 3, 2,
+   0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 6, 0, 0, 0, 0, 6,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
-   0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 4, 0, 4, 0,
+   0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 6, 6, 0, 6, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
-   0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 4, 4
+   0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 0, 0, 6, 6
 };
 
+static int check_refname_component_trailer(const char *cp, const char 
*refname, int flags)
+{
+   if (cp == refname)
+   return 0; /* Component has zero length. */
+   if (refname[0] == '.') {
+   if (!(flags  REFNAME_DOT_COMPONENT))
+   return -1; /* Component starts with '.'. */
+   /*
+* Even if leading dots are allowed, don't allow .
+* as a component (.. is prevented by a rule above).
+*/
+   if (refname[1] == '\0')
+   return -1; /* Component equals .. */
+   }
+   if (cp - refname = 5  !memcmp(cp - 5, .lock, 5))
+   return -1; /* Refname ends with .lock. */
+   return cp - refname;
+}
+
 /*
  * Try to read one refname component from the front of refname.
  * Return the length of the component found, or -1 if the component is
@@ -33,8 +54,9 @@ static unsigned char refname_disposition[256] = {
  * - any path component of it begins with ., or
  * - it has double dots .., or
  * - it has ASCII control character, ~, ^, : or SP, anywhere, or
- * - it ends with a /.
- * - it ends with .lock
+ * - it has pattern-matching notation *, ?, [, anywhere, or
+ * - it ends with a /, or
+ * - it ends with .lock, or
  * - it contains a \ (backslash)
  */
 static int check_refname_component(const char *refname, int flags)
@@ -46,47 +68,32 @@ static int check_refname_component(const char *refname, int 
flags)
int ch = *cp  255;
unsigned char disp = refname_disposition[ch];
switch(disp) {
-   case 1:
-   goto out;
+   case 1: /* fall-through */
case 2:
+   return check_refname_component_trailer(cp, refname, 
flags);
+   case 3:
if (last == '.')
return -1; /* Refname contains ... */
break;
-   case 3:
+   case 4:
   

Re: [PATCH v8] refs.c: SSE2 optimizations for check_refname_component

2014-06-16 Thread Junio C Hamano
David Turner dtur...@twopensource.com writes:

 Optimize check_refname_component using SSE2 on x86_64.

 git rev-parse HEAD is a good test-case for this, since it does almost
 nothing except parse refs.  For one particular repo with about 60k
 refs, almost all packed, the timings are:

 Look up table: 29 ms
 SSE2:  23 ms

 This cuts about 20% off of the runtime.

 The configure.ac changes include code from the GNU C Library written
 by Joseph S. Myers joseph at codesourcery dot com.

 Ondřej Bílka nel...@seznam.cz suggested an SSE2 approach to the

One e-mail address is obfuscated while the other not; intended?

 substring searches, which netted a speed boost over the SSE4.2 code I
 had initially written.

 Signed-off-by: David Turner dtur...@twitter.com
 ---
 diff --git a/git-compat-util.h b/git-compat-util.h
 index f6d3a46..291d46b 100644
 --- a/git-compat-util.h
 +++ b/git-compat-util.h
 @@ -668,6 +668,16 @@ void git_qsort(void *base, size_t nmemb, size_t size,
  #endif
  #endif
  
 +#if defined(__GNUC__)  defined(__x86_64__)
 +#include emmintrin.h
 +/* This is the system memory page size; it's used so that we can read

Style (there are other instances of the same kind).

/*
 * This is the ...

 + * outside the bounds of an allocation without segfaulting.
 + */

 +static int check_refname_component_trailer(const char *cp, const char 
 *refname, int flags)
 +{
 + if (cp == refname)
 + return 0; /* Component has zero length. */
 + if (refname[0] == '.') {
 + if (!(flags  REFNAME_DOT_COMPONENT))
 + return -1; /* Component starts with '.'. */
 + /*
 +  * Even if leading dots are allowed, don't allow .
 +  * as a component (.. is prevented by a rule above).
 +  */
 + if (refname[1] == '\0')
 + return -1; /* Component equals .. */
 + }
 + if (cp - refname = 5  !memcmp(cp - 5, .lock, 5))
 + return -1; /* Refname ends with .lock. */

This is merely a moved code that retained the same comment, but it
is more like the current refname component ends with .lock, I
suspect.  In other words, we do not allow refs/heads/foo.lock/bar.
Am I reading the patch correctly?

 +#if defined(__GNUC__)  defined(__x86_64__)
 +#define SSE_VECTOR_BYTES 16
 +
 +/* Vectorized version of check_refname_format. */
 +int check_refname_format(const char *refname, int flags)
 +{
 + const char *cp = refname;
 +
 + const __m128i dot = _mm_set1_epi8 ('.');

Style (there are other instances of the same kind).  No SP between
function/macro name and opening parenthesis.

 + if (refname[0] == '.') {
 + if (refname[1] == '/' || refname[1] == '\0')
 + return -1;
 + if (!(flags  REFNAME_DOT_COMPONENT))
 + return -1;
 + }
 + while(1) {
 + __m128i tmp, tmp1, result;
 + uint64_t mask;
 +
 + if ((uintptr_t) cp % PAGE_SIZE  PAGE_SIZE - SSE_VECTOR_BYTES  
 - 1)

OK, so we make sure we do not overrun by reading too much near the
end of the page, as the next page might be unmapped.

I am showing my ignorance but does cp (i.e. refname) upon entry to
this function need to be aligned in some way?

Thanks.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v8] refs.c: SSE2 optimizations for check_refname_component

2014-06-16 Thread David Turner
On Mon, 2014-06-16 at 17:06 -0700, Junio C Hamano wrote:
 David Turner dtur...@twopensource.com writes:
 
  Optimize check_refname_component using SSE2 on x86_64.
 
  git rev-parse HEAD is a good test-case for this, since it does almost
  nothing except parse refs.  For one particular repo with about 60k
  refs, almost all packed, the timings are:
 
  Look up table: 29 ms
  SSE2:  23 ms
 
  This cuts about 20% off of the runtime.
 
  The configure.ac changes include code from the GNU C Library written
  by Joseph S. Myers joseph at codesourcery dot com.
 
  Ondřej Bílka nel...@seznam.cz suggested an SSE2 approach to the
 
 One e-mail address is obfuscated while the other not; intended?

Yes (although I'll happily obfuscate or remove Ondřej's if he prefers);
Joseph's was obfuscated because that's how I found it in the source
material.

  substring searches, which netted a speed boost over the SSE4.2 code I
  had initially written.
 
  Signed-off-by: David Turner dtur...@twitter.com
  ---
  diff --git a/git-compat-util.h b/git-compat-util.h
  index f6d3a46..291d46b 100644
  --- a/git-compat-util.h
  +++ b/git-compat-util.h
  @@ -668,6 +668,16 @@ void git_qsort(void *base, size_t nmemb, size_t size,
   #endif
   #endif
   
  +#if defined(__GNUC__)  defined(__x86_64__)
  +#include emmintrin.h
  +/* This is the system memory page size; it's used so that we can read
 
 Style (there are other instances of the same kind).

Will fix.

  +static int check_refname_component_trailer(const char *cp, const char 
  *refname, int flags)
  +{
  +   if (cp == refname)
  +   return 0; /* Component has zero length. */
  +   if (refname[0] == '.') {
  +   if (!(flags  REFNAME_DOT_COMPONENT))
  +   return -1; /* Component starts with '.'. */
  +   /*
  +* Even if leading dots are allowed, don't allow .
  +* as a component (.. is prevented by a rule above).
  +*/
  +   if (refname[1] == '\0')
  +   return -1; /* Component equals .. */
  +   }
  +   if (cp - refname = 5  !memcmp(cp - 5, .lock, 5))
  +   return -1; /* Refname ends with .lock. */
 
 This is merely a moved code that retained the same comment, but it
 is more like the current refname component ends with .lock, I
 suspect.  In other words, we do not allow refs/heads/foo.lock/bar.
 Am I reading the patch correctly?

This code didn't actually need to be moved; I'll move it back.

But it now occurs to me that the SSE2 code does not actually check
for .lock at the end of each component, but only at the end of string.

I don't know why we don't allow refs/heads/foo.lock/bar.  I can
understand not allowing refs/heads/foo.lock, since it might be a lock
file.  But I have never heard of a lock directory like this. 

I'm happy to replicate what the existing code does, but do you think it
is correct?

 Style (there are other instances of the same kind).  No SP between
 function/macro name and opening parenthesis.

Will fix.

  +   if (refname[0] == '.') {
  +   if (refname[1] == '/' || refname[1] == '\0')
  +   return -1;
  +   if (!(flags  REFNAME_DOT_COMPONENT))
  +   return -1;
  +   }
  +   while(1) {
  +   __m128i tmp, tmp1, result;
  +   uint64_t mask;
  +
  +   if ((uintptr_t) cp % PAGE_SIZE  PAGE_SIZE - SSE_VECTOR_BYTES  
  - 1)
 
 OK, so we make sure we do not overrun by reading too much near the
 end of the page, as the next page might be unmapped.

Exactly.

 I am showing my ignorance but does cp (i.e. refname) upon entry to
 this function need to be aligned in some way?

Nope -- _mm_loadu_si128 does an unaligned load.

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html