Re: [PATCH v6 2/2] refs.c: SSE4.2 optimizations for check_refname_component

2014-06-05 Thread David Turner
On Thu, 2014-06-05 at 23:42 +0200, Torsten Bögershausen wrote:
> On 2014-06-05 21.26, David Turner wrote:
> > On Thu, 2014-06-05 at 14:30 +0200, Torsten Bögershausen wrote:
> >> On 2014-06-04 23.16, David Turner wrote:
> >>>
> >>> Sure!  I actually went with > 120k to make measurement easier:
> >>> https://github.com/dturner-tw/many-refs
> >> Hm, I didn't get so man
> >>
> >> git remote -v
> >> origin  https://github.com/dturner-tw/many-refs 
> >>
> >>  wc .git/packed-refs 
> >>  7501130   38868 .git/packed-refs
> >>
> > 
> > Oops.  It looks like I forgot to push all of the refs.  And when I try,
> > it fails with "fatal: cannot exec 'send-pack': Argument list too long"
> 
> I just noticed that I may be able to re-use code from t5551 to create a repo
> with 5 tags.

That's good, because github really didn't like me having a repo with
thousands of refs.  Try 100k, tho, because the difference is easier to
see.



--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v6 2/2] refs.c: SSE4.2 optimizations for check_refname_component

2014-06-05 Thread Torsten Bögershausen
On 2014-06-05 21.26, David Turner wrote:
> On Thu, 2014-06-05 at 14:30 +0200, Torsten Bögershausen wrote:
>> On 2014-06-04 23.16, David Turner wrote:
>>>
>>> Sure!  I actually went with > 120k to make measurement easier:
>>> https://github.com/dturner-tw/many-refs
>> Hm, I didn't get so man
>>
>> git remote -v
>> origin  https://github.com/dturner-tw/many-refs 
>>
>>  wc .git/packed-refs 
>>  7501130   38868 .git/packed-refs
>>
> 
> Oops.  It looks like I forgot to push all of the refs.  And when I try,
> it fails with "fatal: cannot exec 'send-pack': Argument list too long"

I just noticed that I may be able to re-use code from t5551 to create a repo
with 5 tags.


And how about renaming check_refname_component_1() into
check_refname_component_slow() ;-)








--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v6 2/2] refs.c: SSE4.2 optimizations for check_refname_component

2014-06-05 Thread David Turner
On Wed, 2014-06-04 at 14:46 -0700, Junio C Hamano wrote:
> David Turner  writes:
> 
> > On Wed, 2014-06-04 at 10:04 +0200, Torsten Bögershausen wrote:
> > [snip discussion of compiler flags; I'll look into a cpuid approach]
> 
> H, I am not sure if the complexity is really worth it.
> 
> In any case, [PATCH 1/2] is fairly uncontroversial, so I am inclined
> to queue it by itself early without waiting for the discussion on
> 2/2 to settle.
> 
> >> The name check_refname_component_1() doesn't tell too much,
> >> (check_refname_component_sse42()  or check_refname_component_nonsse42() 
> >> say more)
> >
> > I'll go with "_bytewise", since that's how it works.
> 
> That naming assumes that there will never be any alternative
> implementation of the bytewise checker other than the one that uses
> sse42, no?

check_refname_component_1 is the non-sse (LUT) one; I assume that there
will only be one implementation of that (and if there's later another
one we can rename it).  I guess this is strong evidence for _1 being a
bad name.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v6 2/2] refs.c: SSE4.2 optimizations for check_refname_component

2014-06-05 Thread David Turner
On Thu, 2014-06-05 at 14:30 +0200, Torsten Bögershausen wrote:
> On 2014-06-04 23.16, David Turner wrote:
> > 
> > Sure!  I actually went with > 120k to make measurement easier:
> > https://github.com/dturner-tw/many-refs
> Hm, I didn't get so man
> 
> git remote -v
> origin  https://github.com/dturner-tw/many-refs 
> 
>  wc .git/packed-refs 
>  7501130   38868 .git/packed-refs
> 

Oops.  It looks like I forgot to push all of the refs.  And when I try,
it fails with "fatal: cannot exec 'send-pack': Argument list too long"

I hacked git to send batches of 1000; maybe I'll actually make a real
patch with ARG_MAX at some point. Anyway, this is uploading now, but I
estimate that it will take at least five hours, because github is being
really slow about this.

...
> where only patch 1/2 doesn't seem to speed up things on my system:
...

I would not expect a noticeable change on a tiny number of refs; it's
only when ref parsing takes up a large percentage of runtime -- tens of
thousands of refs.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v6 2/2] refs.c: SSE4.2 optimizations for check_refname_component

2014-06-05 Thread Ondřej Bílka
On Thu, Jun 05, 2014 at 02:30:17PM +0200, Torsten Bögershausen wrote:
> On 2014-06-04 23.16, David Turner wrote:
> > 
> > Sure!  I actually went with > 120k to make measurement easier:
> > https://github.com/dturner-tw/many-refs
> Hm, I didn't get so man
> 
> git remote -v
> origin  https://github.com/dturner-tw/many-refs 
> 
>  wc .git/packed-refs 
>  7501130   38868 .git/packed-refs
> 
> 
> 
> time git rev-parse HEAD
> 7ac416f789fd4f1e398449113f6e1ec1d699141a
> 
> real0m0.008s
> user0m0.002s
> sys 0m0.004s
> 
> where only patch 1/2 doesn't seem to speed up things on my system:
> 
> time ~/projects/git/tb.140604_DavidTurner_SSE4/git rev-parse HEAD
> 7ac416f789fd4f1e398449113f6e1ec1d699141a
> 
> real0m0.010s
> user0m0.002s
> sys 0m0.005s
> 
> 
Could you run it 100 times to get better resolution? This could be just
measurement error.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v6 2/2] refs.c: SSE4.2 optimizations for check_refname_component

2014-06-05 Thread Torsten Bögershausen
On 2014-06-04 23.16, David Turner wrote:
> 
> Sure!  I actually went with > 120k to make measurement easier:
> https://github.com/dturner-tw/many-refs
Hm, I didn't get so man

git remote -v
origin  https://github.com/dturner-tw/many-refs 

 wc .git/packed-refs 
 7501130   38868 .git/packed-refs



time git rev-parse HEAD
7ac416f789fd4f1e398449113f6e1ec1d699141a

real0m0.008s
user0m0.002s
sys 0m0.004s

where only patch 1/2 doesn't seem to speed up things on my system:

time ~/projects/git/tb.140604_DavidTurner_SSE4/git rev-parse HEAD
7ac416f789fd4f1e398449113f6e1ec1d699141a

real0m0.010s
user0m0.002s
sys 0m0.005s



Intel Core Duo @ 2.4 Ghz, Mac OS

(and I get similar values under an AMD Dual core running 2Ghz under Linux)


--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v6 2/2] refs.c: SSE4.2 optimizations for check_refname_component

2014-06-04 Thread Junio C Hamano
David Turner  writes:

> On Wed, 2014-06-04 at 10:04 +0200, Torsten Bögershausen wrote:
> [snip discussion of compiler flags; I'll look into a cpuid approach]

H, I am not sure if the complexity is really worth it.

In any case, [PATCH 1/2] is fairly uncontroversial, so I am inclined
to queue it by itself early without waiting for the discussion on
2/2 to settle.

>> The name check_refname_component_1() doesn't tell too much,
>> (check_refname_component_sse42()  or check_refname_component_nonsse42() say 
>> more)
>
> I'll go with "_bytewise", since that's how it works.

That naming assumes that there will never be any alternative
implementation of the bytewise checker other than the one that uses
sse42, no?

>> can I suggest to move all SSE code out to a file under compat/,
>> like compat/refs_sse42.c, or something similar ?
>
> Since this is a relatively small section of code, I think that would be
> overkill.  Does anyone else have an opinion?

If we foresee people on other architectures to invent different
vectorized implementations on their favourite archs, we may end up
separating it out into compat/.  I have no opinion on how likely
that will happen, though, and because this is a small piece of code
right now, it shouldn't be too painful to reorganize when the time
comes.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v6 2/2] refs.c: SSE4.2 optimizations for check_refname_component

2014-06-04 Thread David Turner
On Wed, 2014-06-04 at 16:25 +0200, Torsten Bögershausen wrote:
> On the other hand, looking here: 
> http://sourceware.org/ml/libc-alpha/2009-10/msg00063.html
> and looking into refs.c,
> it seems as if we can try to run 
> strcspn(refname, bad_characters)
> and 
> strstr(refname, "@{"
> and 
> strstr(refname, ".."
> on each refname, instead of checking each char in a loop.
> The library will pick the fastest version for strcspn() automatically.

Yes, you could try that, but I worry that it would be less efficient,
because it duplicates the looping machinery.

> David, the repo you run the tests on, is it public?

Unfortunately, it is an internal Twitter repo.

> Or is there a public repo with this many refs ?

I do not know of one.

> Or can you make a dummy repo with 60k refs ?

Sure!  I actually went with > 120k to make measurement easier:
https://github.com/dturner-tw/many-refs


--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v6 2/2] refs.c: SSE4.2 optimizations for check_refname_component

2014-06-04 Thread David Turner
On Wed, 2014-06-04 at 10:04 +0200, Torsten Bögershausen wrote:
[snip discussion of compiler flags; I'll look into a cpuid approach]

> > --- a/git-compat-util.h
> > +++ b/git-compat-util.h
> > @@ -668,6 +668,28 @@ void git_qsort(void *base, size_t nmemb, size_t size,
> >  #endif
> >  #endif
> >  
> > +#ifndef NO_SSE42
> > +#include 
> > +/*
> > + * Clang ships with a version of nmmintrin.h that's incomplete; if
> > + * necessary, we define the constants that we're going to use.
> > + */
> > +#ifndef _SIDD_UBYTE_OPS
> > +#define _SIDD_UBYTE_OPS 0x00
> > +#define _SIDD_CMP_EQUAL_ANY 0x00
> > +#define _SIDD_CMP_RANGES0x04
> > +#define _SIDD_CMP_EQUAL_ORDERED 0x0c
> > +#define _SIDD_NEGATIVE_POLARITY 0x10
> > +#endif
> Why do this defines end up in git-compat-util.h when they are needed by one 
> file?
> (see even below)

Because Junio told me to:
"We would prefer not to add inclusion of any system header files in
random *.c files, as there often are system dependencies (order of
inclusion, definition of feature macros, etc.) we would rather want
to encapsulate in one place, that is git-compat-util.h."

> > --- a/refs.c
> > +++ b/refs.c
> > @@ -24,6 +24,25 @@ static unsigned char refname_disposition[256] = {
> > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 4, 4
> >  };
> >  
> > +static int check_refname_component_trailer(const char *cp, const char 
> > *refname, int flags)
> > +{
> > +   if (cp == refname)
> > +   return 0; /* Component has zero length. */
> > +   if (refname[0] == '.') {
> > +   if (!(flags & REFNAME_DOT_COMPONENT))
> > +   return -1; /* Component starts with '.'. */
> > +   /*
> > +* Even if leading dots are allowed, don't allow "."
> > +* as a component (".." is prevented by a rule above).
> > +*/
> > +   if (refname[1] == '\0')
> > +   return -1; /* Component equals ".". */
> > +   }
> > +   if (cp - refname >= 5 && !memcmp(cp - 5, ".lock", 5))
> > +   return -1; /* Refname ends with ".lock". */
> > +   return cp - refname;
> > +}
> > +
> >  /*
> >   * Try to read one refname component from the front of refname.
> >   * Return the length of the component found, or -1 if the component is
> > @@ -37,7 +56,7 @@ static unsigned char refname_disposition[256] = {
> >   * - it ends with ".lock"
> >   * - it contains a "\" (backslash)
> >   */
> > -static int check_refname_component(const char *refname, int flags)
> > +static int check_refname_component_1(const char *refname, int flags)
> The name check_refname_component_1() doesn't tell too much,
> (check_refname_component_sse42()  or check_refname_component_nonsse42() say 
> more)

I'll go with "_bytewise", since that's how it works.

> can I suggest to move all SSE code out to a file under compat/,
> like compat/refs_sse42.c, or something similar ?

Since this is a relatively small section of code, I think that would be
overkill.  Does anyone else have an opinion?

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v6 2/2] refs.c: SSE4.2 optimizations for check_refname_component

2014-06-04 Thread Torsten Bögershausen
On 2014-06-04 13.21, Duy Nguyen wrote:
> On Wed, Jun 4, 2014 at 3:04 PM, Torsten Bögershausen  wrote:
>>
>> On 2014-06-04 05.38, David Turner wrote:
>> []
>>> []
>>> diff --git a/Makefile b/Makefile
>>> index a53f3a8..dd2127a 100644
>>> --- a/Makefile
>>> +++ b/Makefile
>>> @@ -1326,6 +1326,11 @@ else
>>>   COMPAT_OBJS += compat/win32mmap.o
>>>   endif
>>>  endif
>>> +ifdef NO_SSE42
>>> + BASIC_CFLAGS += -DNO_SSE42
>>> +else
>>> + BASIC_CFLAGS += -msse4.2
>>> +endif
>> This does work for some people, but break for others, like the systems in my 
>> test-lab.
>> On 2 different systems the gcc has support for -msse4.2, but the processor 
>> has not,
>> and t5511 fails with "Illegal instruction".
>> How can that be?
>> The maintainer of a Linux distro wants to ship gcc with all possible 
>> features,
>> an the end-user can compile the code with all the features his very 
>> processor has.
> 
> I think glibc code uses cpuid instruction to decide whether to use
> optimized version. May be we can do the same? If we go that route and
> have a way to detect sse support from compiler, then we can drop
> NO_SSE42, enable all and pick one at runtime.
> 
Running make under a non-X86 processor like arm fails, as his gcc does not have 
-msse4.2

On the other hand, looking here: 
http://sourceware.org/ml/libc-alpha/2009-10/msg00063.html
and looking into refs.c,
it seems as if we can try to run 
strcspn(refname, bad_characters)
and 
strstr(refname, "@{"
and 
strstr(refname, ".."
on each refname, instead of checking each char in a loop.
The library will pick the fastest version for strcspn() automatically.

David, the repo you run the tests on, is it public?
Or is there a public repo with this many refs ?
Or can you make a dummy repo with 60k refs ?



--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v6 2/2] refs.c: SSE4.2 optimizations for check_refname_component

2014-06-04 Thread Duy Nguyen
On Wed, Jun 4, 2014 at 3:04 PM, Torsten Bögershausen  wrote:
>
> On 2014-06-04 05.38, David Turner wrote:
> []
>> []
>> diff --git a/Makefile b/Makefile
>> index a53f3a8..dd2127a 100644
>> --- a/Makefile
>> +++ b/Makefile
>> @@ -1326,6 +1326,11 @@ else
>>   COMPAT_OBJS += compat/win32mmap.o
>>   endif
>>  endif
>> +ifdef NO_SSE42
>> + BASIC_CFLAGS += -DNO_SSE42
>> +else
>> + BASIC_CFLAGS += -msse4.2
>> +endif
> This does work for some people, but break for others, like the systems in my 
> test-lab.
> On 2 different systems the gcc has support for -msse4.2, but the processor 
> has not,
> and t5511 fails with "Illegal instruction".
> How can that be?
> The maintainer of a Linux distro wants to ship gcc with all possible features,
> an the end-user can compile the code with all the features his very processor 
> has.

I think glibc code uses cpuid instruction to decide whether to use
optimized version. May be we can do the same? If we go that route and
have a way to detect sse support from compiler, then we can drop
NO_SSE42, enable all and pick one at runtime.
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v6 2/2] refs.c: SSE4.2 optimizations for check_refname_component

2014-06-04 Thread Torsten Bögershausen

On 2014-06-04 05.38, David Turner wrote:
[]
> []
> diff --git a/Makefile b/Makefile
> index a53f3a8..dd2127a 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -1326,6 +1326,11 @@ else
>   COMPAT_OBJS += compat/win32mmap.o
>   endif
>  endif
> +ifdef NO_SSE42
> + BASIC_CFLAGS += -DNO_SSE42
> +else
> + BASIC_CFLAGS += -msse4.2
> +endif
This does work for some people, but break for others, like the systems in my 
test-lab.
On 2 different systems the gcc has support for -msse4.2, but the processor has 
not,
and t5511 fails with "Illegal instruction".
How can that be?
The maintainer of a Linux distro wants to ship gcc with all possible features,
an the end-user can compile the code with all the features his very processor 
has.

On the other hand, a pre-compiled package like e.g. Git is compiled into a 
binary package
with all the latest features switched of, to be able to run the binary on as 
many different
processor variants as possible.

He already needs to make 3 binaries only for x86:

- the minimum version is a 32 bit processor like 486/586/686.
- a "medium" version for systems with 4GB RAM (or more) which have 32 bit 
processors with PAE (686-pae)
- a version for x86_64

E.g. a maintainer wants to have SSE42 enabled, when he builds Git for his 
system,
but disabled when he builds an RPM.
The people compiling Git need to know what the binary is used for, how about
using something like this in Makefile:

ifdef HAVE_SSE42
BASIC_CFLAGS += -msse4.2 -DHAS_SSE42

+endif

>  ifdef OBJECT_CREATION_USES_RENAMES
>   COMPAT_CFLAGS += -DOBJECT_CREATION_MODE=1
>  endif
> @@ -2199,6 +2204,7 @@ GIT-BUILD-OPTIONS: FORCE
>   @echo NO_PERL=\''$(subst ','\'',$(subst ','\'',$(NO_PERL)))'\' >>$@
>   @echo NO_PYTHON=\''$(subst ','\'',$(subst ','\'',$(NO_PYTHON)))'\' >>$@
>   @echo NO_UNIX_SOCKETS=\''$(subst ','\'',$(subst 
> ','\'',$(NO_UNIX_SOCKETS)))'\' >>$@
> + @echo NO_SSE42=\''$(subst ','\'',$(subst ','\'',$(NO_SSE42)))'\' >>$@
Same here: Use HAVE_SSE42 rather than NO_SSE42

> diff --git a/aclocal.m4 b/aclocal.m4
> index f11bc7e..d9f3f19 100644
> --- a/aclocal.m4
> +++ b/aclocal.m4
> @@ -38,3 +38,9 @@ AC_DEFUN([TYPE_SOCKLEN_T],
>[#include 
>  #include ])
>  ])
As the whole detection logic does not work as expected (we need to compile and 
test-run the code,
not only compile),
can we drop this part completely ? (at least for the first round)

> --- a/git-compat-util.h
> +++ b/git-compat-util.h
> @@ -668,6 +668,28 @@ void git_qsort(void *base, size_t nmemb, size_t size,
>  #endif
>  #endif
>  
> +#ifndef NO_SSE42
> +#include 
> +/*
> + * Clang ships with a version of nmmintrin.h that's incomplete; if
> + * necessary, we define the constants that we're going to use.
> + */
> +#ifndef _SIDD_UBYTE_OPS
> +#define _SIDD_UBYTE_OPS 0x00
> +#define _SIDD_CMP_EQUAL_ANY 0x00
> +#define _SIDD_CMP_RANGES0x04
> +#define _SIDD_CMP_EQUAL_ORDERED 0x0c
> +#define _SIDD_NEGATIVE_POLARITY 0x10
> +#endif
Why do this defines end up in git-compat-util.h when they are needed by one 
file?
(see even below)

> --- a/refs.c
> +++ b/refs.c
> @@ -24,6 +24,25 @@ static unsigned char refname_disposition[256] = {
>   0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 4, 4
>  };
>  
> +static int check_refname_component_trailer(const char *cp, const char 
> *refname, int flags)
> +{
> + if (cp == refname)
> + return 0; /* Component has zero length. */
> + if (refname[0] == '.') {
> + if (!(flags & REFNAME_DOT_COMPONENT))
> + return -1; /* Component starts with '.'. */
> + /*
> +  * Even if leading dots are allowed, don't allow "."
> +  * as a component (".." is prevented by a rule above).
> +  */
> + if (refname[1] == '\0')
> + return -1; /* Component equals ".". */
> + }
> + if (cp - refname >= 5 && !memcmp(cp - 5, ".lock", 5))
> + return -1; /* Refname ends with ".lock". */
> + return cp - refname;
> +}
> +
>  /*
>   * Try to read one refname component from the front of refname.
>   * Return the length of the component found, or -1 if the component is
> @@ -37,7 +56,7 @@ static unsigned char refname_disposition[256] = {
>   * - it ends with ".lock"
>   * - it contains a "\" (backslash)
>   */
> -static int check_refname_component(const char *refname, int flags)
> +static int check_refname_component_1(const char *refname, int flags)
The name check_refname_component_1() doesn't tell too much,
(check_refname_component_sse42()  or check_refname_component_nonsse42() say 
more)

can I suggest to move all SSE code out to a file under compat/,
like compat/refs_sse42.c, or something similar ?
(And here we need the missing sse4 defines from git-compat-util.h)
>  {
>   const char *cp;
>   char last = '\0';
> @@ -47,7 +66,7 @@ static int check_refname_component(const char *refname, int 
> flag