Re: Auto-vectorization speeds up multiplication of large-precision numerics

2020-09-08 Thread Tom Lane
Amit Khandekar writes: > Thanks. I must admit it did not occur to me that I could have very > well installed clang on my linux machine and tried compiling this > file, or tested with some older gcc versions. I think I was using gcc > 8. Do you know what was the gcc compiler version that gave these

Re: Auto-vectorization speeds up multiplication of large-precision numerics

2020-09-07 Thread Amit Khandekar
On Tue, 8 Sep 2020 at 02:19, Tom Lane wrote: > > I wrote: > > I experimented with a few different ideas such as adding restrict > > decoration to the pointers, and eventually found that what works > > is to write the loop termination condition as "i2 < limit" > > rather than "i2 <= limit". It too

Re: Auto-vectorization speeds up multiplication of large-precision numerics

2020-09-07 Thread Tom Lane
I wrote: > I experimented with a few different ideas such as adding restrict > decoration to the pointers, and eventually found that what works > is to write the loop termination condition as "i2 < limit" > rather than "i2 <= limit". It took me a long time to think of > trying that, because it see

Re: Auto-vectorization speeds up multiplication of large-precision numerics

2020-09-07 Thread Tom Lane
Amit Khandekar writes: > On Mon, 7 Sep 2020 at 11:23, Tom Lane wrote: >> BTW, poking at this further, it seems that the patch only really >> works for gcc. clang accepts the -ftree-vectorize switch, but >> looking at the generated asm shows that it does nothing useful. >> Which is odd, because c

Re: Auto-vectorization speeds up multiplication of large-precision numerics

2020-09-07 Thread Amit Khandekar
On Mon, 7 Sep 2020 at 11:23, Tom Lane wrote: > > I wrote: > > I made some cosmetic changes to this and committed it. Thanks! > > BTW, poking at this further, it seems that the patch only really > works for gcc. clang accepts the -ftree-vectorize switch, but > looking at the generated asm shows

Re: Auto-vectorization speeds up multiplication of large-precision numerics

2020-09-06 Thread Tom Lane
I wrote: > I made some cosmetic changes to this and committed it. BTW, poking at this further, it seems that the patch only really works for gcc. clang accepts the -ftree-vectorize switch, but looking at the generated asm shows that it does nothing useful. Which is odd, because clang does do loop

Re: Auto-vectorization speeds up multiplication of large-precision numerics

2020-09-06 Thread Tom Lane
Amit Khandekar writes: > I did as above. Attached is the v2 patch. I made some cosmetic changes to this and committed it. AFAICT, there's no measurable performance change to the "numeric" regression test, but I got a solid 45% speedup on "numeric_big", so it's clearly a win for wider arithmetic

Re: Auto-vectorization speeds up multiplication of large-precision numerics

2020-07-21 Thread Amit Khandekar
On Mon, 13 Jul 2020 at 14:27, Amit Khandekar wrote: > I tried this in utils/adt/Makefile : > + > +numeric.o: CFLAGS += ${CFLAGS_VECTOR} > + > and it works. > > CFLAGS_VECTOR also includes the -funroll-loops option, which I > believe, had showed improvements in the checksum.c runs ( [1] ). This > o

Re: Auto-vectorization speeds up multiplication of large-precision numerics

2020-07-13 Thread Amit Khandekar
On Fri, 10 Jul 2020 at 19:02, Tom Lane wrote: > > Peter Eisentraut writes: > > We normally don't compile with -O3, so very few users would get the > > benefit of this. > > Yeah. I don't think changing that baseline globally would be a wise move. > > > We have CFLAGS_VECTOR for the checksum code.

Re: Auto-vectorization speeds up multiplication of large-precision numerics

2020-07-10 Thread Tom Lane
Peter Eisentraut writes: > We normally don't compile with -O3, so very few users would get the > benefit of this. Yeah. I don't think changing that baseline globally would be a wise move. > We have CFLAGS_VECTOR for the checksum code. I > suppose if we are making the numeric code vectorizabl

Re: Auto-vectorization speeds up multiplication of large-precision numerics

2020-07-10 Thread Peter Eisentraut
On 2020-06-10 14:15, Amit Khandekar wrote: Well, how do we make sure we keep it that way? How do we prevent some random rearranging of the code or some random compiler change to break this again? I believe the compiler rearranges the code segments w.r.t. one another when those are independent o

Re: Auto-vectorization speeds up multiplication of large-precision numerics

2020-07-08 Thread Amit Khandekar
FYI : this one is present in the July commitfest.

Re: Auto-vectorization speeds up multiplication of large-precision numerics

2020-06-10 Thread Amit Khandekar
On Wed, 10 Jun 2020 at 04:20, Peter Eisentraut wrote: > > On 2020-06-09 13:50, Amit Khandekar wrote: > > Also, the regress/sql/numeric_big test itself speeds up by 80% > > That's nice. I can confirm the speedup: > > -O3 without the patch: > > numeric ... ok 737

Re: Auto-vectorization speeds up multiplication of large-precision numerics

2020-06-09 Thread Peter Eisentraut
On 2020-06-09 13:50, Amit Khandekar wrote: Also, the regress/sql/numeric_big test itself speeds up by 80% That's nice. I can confirm the speedup: -O3 without the patch: numeric ... ok 737 ms test numeric_big ... ok 1014 ms -O3 with