Re: [PATCH v6 2/2] refs.c: SSE4.2 optimizations for check_refname_component
On Thu, 2014-06-05 at 23:42 +0200, Torsten Bögershausen wrote: > On 2014-06-05 21.26, David Turner wrote: > > On Thu, 2014-06-05 at 14:30 +0200, Torsten Bögershausen wrote: > >> On 2014-06-04 23.16, David Turner wrote: > >>> > >>> Sure! I actually went with > 120k to make measurement easier: > >>> https://github.com/dturner-tw/many-refs > >> Hm, I didn't get so man > >> > >> git remote -v > >> origin https://github.com/dturner-tw/many-refs > >> > >> wc .git/packed-refs > >> 7501130 38868 .git/packed-refs > >> > > > > Oops. It looks like I forgot to push all of the refs. And when I try, > > it fails with "fatal: cannot exec 'send-pack': Argument list too long" > > I just noticed that I may be able to re-use code from t5551 to create a repo > with 5 tags. That's good, because github really didn't like me having a repo with thousands of refs. Try 100k, tho, because the difference is easier to see. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v6 2/2] refs.c: SSE4.2 optimizations for check_refname_component
On 2014-06-05 21.26, David Turner wrote: > On Thu, 2014-06-05 at 14:30 +0200, Torsten Bögershausen wrote: >> On 2014-06-04 23.16, David Turner wrote: >>> >>> Sure! I actually went with > 120k to make measurement easier: >>> https://github.com/dturner-tw/many-refs >> Hm, I didn't get so man >> >> git remote -v >> origin https://github.com/dturner-tw/many-refs >> >> wc .git/packed-refs >> 7501130 38868 .git/packed-refs >> > > Oops. It looks like I forgot to push all of the refs. And when I try, > it fails with "fatal: cannot exec 'send-pack': Argument list too long" I just noticed that I may be able to re-use code from t5551 to create a repo with 5 tags. And how about renaming check_refname_component_1() into check_refname_component_slow() ;-) -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v6 2/2] refs.c: SSE4.2 optimizations for check_refname_component
On Wed, 2014-06-04 at 14:46 -0700, Junio C Hamano wrote: > David Turner writes: > > > On Wed, 2014-06-04 at 10:04 +0200, Torsten Bögershausen wrote: > > [snip discussion of compiler flags; I'll look into a cpuid approach] > > H, I am not sure if the complexity is really worth it. > > In any case, [PATCH 1/2] is fairly uncontroversial, so I am inclined > to queue it by itself early without waiting for the discussion on > 2/2 to settle. > > >> The name check_refname_component_1() doesn't tell too much, > >> (check_refname_component_sse42() or check_refname_component_nonsse42() > >> say more) > > > > I'll go with "_bytewise", since that's how it works. > > That naming assumes that there will never be any alternative > implementation of the bytewise checker other than the one that uses > sse42, no? check_refname_component_1 is the non-sse (LUT) one; I assume that there will only be one implementation of that (and if there's later another one we can rename it). I guess this is strong evidence for _1 being a bad name. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v6 2/2] refs.c: SSE4.2 optimizations for check_refname_component
On Thu, 2014-06-05 at 14:30 +0200, Torsten Bögershausen wrote: > On 2014-06-04 23.16, David Turner wrote: > > > > Sure! I actually went with > 120k to make measurement easier: > > https://github.com/dturner-tw/many-refs > Hm, I didn't get so man > > git remote -v > origin https://github.com/dturner-tw/many-refs > > wc .git/packed-refs > 7501130 38868 .git/packed-refs > Oops. It looks like I forgot to push all of the refs. And when I try, it fails with "fatal: cannot exec 'send-pack': Argument list too long" I hacked git to send batches of 1000; maybe I'll actually make a real patch with ARG_MAX at some point. Anyway, this is uploading now, but I estimate that it will take at least five hours, because github is being really slow about this. ... > where only patch 1/2 doesn't seem to speed up things on my system: ... I would not expect a noticeable change on a tiny number of refs; it's only when ref parsing takes up a large percentage of runtime -- tens of thousands of refs. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v6 2/2] refs.c: SSE4.2 optimizations for check_refname_component
On Thu, Jun 05, 2014 at 02:30:17PM +0200, Torsten Bögershausen wrote: > On 2014-06-04 23.16, David Turner wrote: > > > > Sure! I actually went with > 120k to make measurement easier: > > https://github.com/dturner-tw/many-refs > Hm, I didn't get so man > > git remote -v > origin https://github.com/dturner-tw/many-refs > > wc .git/packed-refs > 7501130 38868 .git/packed-refs > > > > time git rev-parse HEAD > 7ac416f789fd4f1e398449113f6e1ec1d699141a > > real0m0.008s > user0m0.002s > sys 0m0.004s > > where only patch 1/2 doesn't seem to speed up things on my system: > > time ~/projects/git/tb.140604_DavidTurner_SSE4/git rev-parse HEAD > 7ac416f789fd4f1e398449113f6e1ec1d699141a > > real0m0.010s > user0m0.002s > sys 0m0.005s > > Could you run it 100 times to get better resolution? This could be just measurement error. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v6 2/2] refs.c: SSE4.2 optimizations for check_refname_component
On 2014-06-04 23.16, David Turner wrote: > > Sure! I actually went with > 120k to make measurement easier: > https://github.com/dturner-tw/many-refs Hm, I didn't get so man git remote -v origin https://github.com/dturner-tw/many-refs wc .git/packed-refs 7501130 38868 .git/packed-refs time git rev-parse HEAD 7ac416f789fd4f1e398449113f6e1ec1d699141a real0m0.008s user0m0.002s sys 0m0.004s where only patch 1/2 doesn't seem to speed up things on my system: time ~/projects/git/tb.140604_DavidTurner_SSE4/git rev-parse HEAD 7ac416f789fd4f1e398449113f6e1ec1d699141a real0m0.010s user0m0.002s sys 0m0.005s Intel Core Duo @ 2.4 Ghz, Mac OS (and I get similar values under an AMD Dual core running 2Ghz under Linux) -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v6 2/2] refs.c: SSE4.2 optimizations for check_refname_component
David Turner writes: > On Wed, 2014-06-04 at 10:04 +0200, Torsten Bögershausen wrote: > [snip discussion of compiler flags; I'll look into a cpuid approach] H, I am not sure if the complexity is really worth it. In any case, [PATCH 1/2] is fairly uncontroversial, so I am inclined to queue it by itself early without waiting for the discussion on 2/2 to settle. >> The name check_refname_component_1() doesn't tell too much, >> (check_refname_component_sse42() or check_refname_component_nonsse42() say >> more) > > I'll go with "_bytewise", since that's how it works. That naming assumes that there will never be any alternative implementation of the bytewise checker other than the one that uses sse42, no? >> can I suggest to move all SSE code out to a file under compat/, >> like compat/refs_sse42.c, or something similar ? > > Since this is a relatively small section of code, I think that would be > overkill. Does anyone else have an opinion? If we foresee people on other architectures to invent different vectorized implementations on their favourite archs, we may end up separating it out into compat/. I have no opinion on how likely that will happen, though, and because this is a small piece of code right now, it shouldn't be too painful to reorganize when the time comes. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v6 2/2] refs.c: SSE4.2 optimizations for check_refname_component
On Wed, 2014-06-04 at 16:25 +0200, Torsten Bögershausen wrote: > On the other hand, looking here: > http://sourceware.org/ml/libc-alpha/2009-10/msg00063.html > and looking into refs.c, > it seems as if we can try to run > strcspn(refname, bad_characters) > and > strstr(refname, "@{" > and > strstr(refname, ".." > on each refname, instead of checking each char in a loop. > The library will pick the fastest version for strcspn() automatically. Yes, you could try that, but I worry that it would be less efficient, because it duplicates the looping machinery. > David, the repo you run the tests on, is it public? Unfortunately, it is an internal Twitter repo. > Or is there a public repo with this many refs ? I do not know of one. > Or can you make a dummy repo with 60k refs ? Sure! I actually went with > 120k to make measurement easier: https://github.com/dturner-tw/many-refs -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v6 2/2] refs.c: SSE4.2 optimizations for check_refname_component
On Wed, 2014-06-04 at 10:04 +0200, Torsten Bögershausen wrote: [snip discussion of compiler flags; I'll look into a cpuid approach] > > --- a/git-compat-util.h > > +++ b/git-compat-util.h > > @@ -668,6 +668,28 @@ void git_qsort(void *base, size_t nmemb, size_t size, > > #endif > > #endif > > > > +#ifndef NO_SSE42 > > +#include > > +/* > > + * Clang ships with a version of nmmintrin.h that's incomplete; if > > + * necessary, we define the constants that we're going to use. > > + */ > > +#ifndef _SIDD_UBYTE_OPS > > +#define _SIDD_UBYTE_OPS 0x00 > > +#define _SIDD_CMP_EQUAL_ANY 0x00 > > +#define _SIDD_CMP_RANGES0x04 > > +#define _SIDD_CMP_EQUAL_ORDERED 0x0c > > +#define _SIDD_NEGATIVE_POLARITY 0x10 > > +#endif > Why do this defines end up in git-compat-util.h when they are needed by one > file? > (see even below) Because Junio told me to: "We would prefer not to add inclusion of any system header files in random *.c files, as there often are system dependencies (order of inclusion, definition of feature macros, etc.) we would rather want to encapsulate in one place, that is git-compat-util.h." > > --- a/refs.c > > +++ b/refs.c > > @@ -24,6 +24,25 @@ static unsigned char refname_disposition[256] = { > > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 4, 4 > > }; > > > > +static int check_refname_component_trailer(const char *cp, const char > > *refname, int flags) > > +{ > > + if (cp == refname) > > + return 0; /* Component has zero length. */ > > + if (refname[0] == '.') { > > + if (!(flags & REFNAME_DOT_COMPONENT)) > > + return -1; /* Component starts with '.'. */ > > + /* > > +* Even if leading dots are allowed, don't allow "." > > +* as a component (".." is prevented by a rule above). > > +*/ > > + if (refname[1] == '\0') > > + return -1; /* Component equals ".". */ > > + } > > + if (cp - refname >= 5 && !memcmp(cp - 5, ".lock", 5)) > > + return -1; /* Refname ends with ".lock". */ > > + return cp - refname; > > +} > > + > > /* > > * Try to read one refname component from the front of refname. > > * Return the length of the component found, or -1 if the component is > > @@ -37,7 +56,7 @@ static unsigned char refname_disposition[256] = { > > * - it ends with ".lock" > > * - it contains a "\" (backslash) > > */ > > -static int check_refname_component(const char *refname, int flags) > > +static int check_refname_component_1(const char *refname, int flags) > The name check_refname_component_1() doesn't tell too much, > (check_refname_component_sse42() or check_refname_component_nonsse42() say > more) I'll go with "_bytewise", since that's how it works. > can I suggest to move all SSE code out to a file under compat/, > like compat/refs_sse42.c, or something similar ? Since this is a relatively small section of code, I think that would be overkill. Does anyone else have an opinion? -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v6 2/2] refs.c: SSE4.2 optimizations for check_refname_component
On 2014-06-04 13.21, Duy Nguyen wrote: > On Wed, Jun 4, 2014 at 3:04 PM, Torsten Bögershausen wrote: >> >> On 2014-06-04 05.38, David Turner wrote: >> [] >>> [] >>> diff --git a/Makefile b/Makefile >>> index a53f3a8..dd2127a 100644 >>> --- a/Makefile >>> +++ b/Makefile >>> @@ -1326,6 +1326,11 @@ else >>> COMPAT_OBJS += compat/win32mmap.o >>> endif >>> endif >>> +ifdef NO_SSE42 >>> + BASIC_CFLAGS += -DNO_SSE42 >>> +else >>> + BASIC_CFLAGS += -msse4.2 >>> +endif >> This does work for some people, but break for others, like the systems in my >> test-lab. >> On 2 different systems the gcc has support for -msse4.2, but the processor >> has not, >> and t5511 fails with "Illegal instruction". >> How can that be? >> The maintainer of a Linux distro wants to ship gcc with all possible >> features, >> an the end-user can compile the code with all the features his very >> processor has. > > I think glibc code uses cpuid instruction to decide whether to use > optimized version. May be we can do the same? If we go that route and > have a way to detect sse support from compiler, then we can drop > NO_SSE42, enable all and pick one at runtime. > Running make under a non-X86 processor like arm fails, as his gcc does not have -msse4.2 On the other hand, looking here: http://sourceware.org/ml/libc-alpha/2009-10/msg00063.html and looking into refs.c, it seems as if we can try to run strcspn(refname, bad_characters) and strstr(refname, "@{" and strstr(refname, ".." on each refname, instead of checking each char in a loop. The library will pick the fastest version for strcspn() automatically. David, the repo you run the tests on, is it public? Or is there a public repo with this many refs ? Or can you make a dummy repo with 60k refs ? -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v6 2/2] refs.c: SSE4.2 optimizations for check_refname_component
On Wed, Jun 4, 2014 at 3:04 PM, Torsten Bögershausen wrote: > > On 2014-06-04 05.38, David Turner wrote: > [] >> [] >> diff --git a/Makefile b/Makefile >> index a53f3a8..dd2127a 100644 >> --- a/Makefile >> +++ b/Makefile >> @@ -1326,6 +1326,11 @@ else >> COMPAT_OBJS += compat/win32mmap.o >> endif >> endif >> +ifdef NO_SSE42 >> + BASIC_CFLAGS += -DNO_SSE42 >> +else >> + BASIC_CFLAGS += -msse4.2 >> +endif > This does work for some people, but break for others, like the systems in my > test-lab. > On 2 different systems the gcc has support for -msse4.2, but the processor > has not, > and t5511 fails with "Illegal instruction". > How can that be? > The maintainer of a Linux distro wants to ship gcc with all possible features, > an the end-user can compile the code with all the features his very processor > has. I think glibc code uses cpuid instruction to decide whether to use optimized version. May be we can do the same? If we go that route and have a way to detect sse support from compiler, then we can drop NO_SSE42, enable all and pick one at runtime. -- Duy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v6 2/2] refs.c: SSE4.2 optimizations for check_refname_component
On 2014-06-04 05.38, David Turner wrote: [] > [] > diff --git a/Makefile b/Makefile > index a53f3a8..dd2127a 100644 > --- a/Makefile > +++ b/Makefile > @@ -1326,6 +1326,11 @@ else > COMPAT_OBJS += compat/win32mmap.o > endif > endif > +ifdef NO_SSE42 > + BASIC_CFLAGS += -DNO_SSE42 > +else > + BASIC_CFLAGS += -msse4.2 > +endif This does work for some people, but break for others, like the systems in my test-lab. On 2 different systems the gcc has support for -msse4.2, but the processor has not, and t5511 fails with "Illegal instruction". How can that be? The maintainer of a Linux distro wants to ship gcc with all possible features, an the end-user can compile the code with all the features his very processor has. On the other hand, a pre-compiled package like e.g. Git is compiled into a binary package with all the latest features switched of, to be able to run the binary on as many different processor variants as possible. He already needs to make 3 binaries only for x86: - the minimum version is a 32 bit processor like 486/586/686. - a "medium" version for systems with 4GB RAM (or more) which have 32 bit processors with PAE (686-pae) - a version for x86_64 E.g. a maintainer wants to have SSE42 enabled, when he builds Git for his system, but disabled when he builds an RPM. The people compiling Git need to know what the binary is used for, how about using something like this in Makefile: ifdef HAVE_SSE42 BASIC_CFLAGS += -msse4.2 -DHAS_SSE42 +endif > ifdef OBJECT_CREATION_USES_RENAMES > COMPAT_CFLAGS += -DOBJECT_CREATION_MODE=1 > endif > @@ -2199,6 +2204,7 @@ GIT-BUILD-OPTIONS: FORCE > @echo NO_PERL=\''$(subst ','\'',$(subst ','\'',$(NO_PERL)))'\' >>$@ > @echo NO_PYTHON=\''$(subst ','\'',$(subst ','\'',$(NO_PYTHON)))'\' >>$@ > @echo NO_UNIX_SOCKETS=\''$(subst ','\'',$(subst > ','\'',$(NO_UNIX_SOCKETS)))'\' >>$@ > + @echo NO_SSE42=\''$(subst ','\'',$(subst ','\'',$(NO_SSE42)))'\' >>$@ Same here: Use HAVE_SSE42 rather than NO_SSE42 > diff --git a/aclocal.m4 b/aclocal.m4 > index f11bc7e..d9f3f19 100644 > --- a/aclocal.m4 > +++ b/aclocal.m4 > @@ -38,3 +38,9 @@ AC_DEFUN([TYPE_SOCKLEN_T], >[#include > #include ]) > ]) As the whole detection logic does not work as expected (we need to compile and test-run the code, not only compile), can we drop this part completely ? (at least for the first round) > --- a/git-compat-util.h > +++ b/git-compat-util.h > @@ -668,6 +668,28 @@ void git_qsort(void *base, size_t nmemb, size_t size, > #endif > #endif > > +#ifndef NO_SSE42 > +#include > +/* > + * Clang ships with a version of nmmintrin.h that's incomplete; if > + * necessary, we define the constants that we're going to use. > + */ > +#ifndef _SIDD_UBYTE_OPS > +#define _SIDD_UBYTE_OPS 0x00 > +#define _SIDD_CMP_EQUAL_ANY 0x00 > +#define _SIDD_CMP_RANGES0x04 > +#define _SIDD_CMP_EQUAL_ORDERED 0x0c > +#define _SIDD_NEGATIVE_POLARITY 0x10 > +#endif Why do this defines end up in git-compat-util.h when they are needed by one file? (see even below) > --- a/refs.c > +++ b/refs.c > @@ -24,6 +24,25 @@ static unsigned char refname_disposition[256] = { > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 4, 4 > }; > > +static int check_refname_component_trailer(const char *cp, const char > *refname, int flags) > +{ > + if (cp == refname) > + return 0; /* Component has zero length. */ > + if (refname[0] == '.') { > + if (!(flags & REFNAME_DOT_COMPONENT)) > + return -1; /* Component starts with '.'. */ > + /* > + * Even if leading dots are allowed, don't allow "." > + * as a component (".." is prevented by a rule above). > + */ > + if (refname[1] == '\0') > + return -1; /* Component equals ".". */ > + } > + if (cp - refname >= 5 && !memcmp(cp - 5, ".lock", 5)) > + return -1; /* Refname ends with ".lock". */ > + return cp - refname; > +} > + > /* > * Try to read one refname component from the front of refname. > * Return the length of the component found, or -1 if the component is > @@ -37,7 +56,7 @@ static unsigned char refname_disposition[256] = { > * - it ends with ".lock" > * - it contains a "\" (backslash) > */ > -static int check_refname_component(const char *refname, int flags) > +static int check_refname_component_1(const char *refname, int flags) The name check_refname_component_1() doesn't tell too much, (check_refname_component_sse42() or check_refname_component_nonsse42() say more) can I suggest to move all SSE code out to a file under compat/, like compat/refs_sse42.c, or something similar ? (And here we need the missing sse4 defines from git-compat-util.h) > { > const char *cp; > char last = '\0'; > @@ -47,7 +66,7 @@ static int check_refname_component(const char *refname, int > flag