Re: Using POPCNT and other advanced bit manipulation instructions

2019-02-15 Thread Andres Freund
Hi, On 2019-02-14 16:45:38 -0500, Tom Lane wrote: > Andres Freund writes: > > On 2019-02-14 15:47:13 -0300, Alvaro Herrera wrote: > >> Hah, I just realized you have to add -mlzcnt in order for these builtins > >> to use the lzcnt instructions. It goes from something like > >> > >> bsrq %r

Re: Using POPCNT and other advanced bit manipulation instructions

2019-02-15 Thread Tom Lane
Alvaro Herrera writes: > Here's a final version that I intend to push shortly, to have time > before EOB today to handle any fallout. I think this is likely to result in a lot of complaints about rightmost_one_pos[] being unreferenced, in non-HAVE__BUILTIN_CTZ builds. Probably that has to be an

Re: Using POPCNT and other advanced bit manipulation instructions

2019-02-15 Thread Alvaro Herrera
Here's a final version that I intend to push shortly, to have time before EOB today to handle any fallout. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services >From 085650a174ff080f578bb289d3707173aaf4f07b Mon Sep 17

Re: Using POPCNT and other advanced bit manipulation instructions

2019-02-15 Thread Alvaro Herrera
On 2019-Feb-15, Tom Lane wrote: > Alvaro Herrera writes: > > Ah, I understand it now: > > https://stackoverflow.com/questions/25683690/confusion-about-bsr-and-lzcnt/43443701#43443701 > > if you call LZCNT/TZCNT on a CPU that doesn't support it, it won't raise > > SIGILL or anything ... it'll just

Re: Using POPCNT and other advanced bit manipulation instructions

2019-02-15 Thread Tom Lane
Alvaro Herrera writes: > I noticed in Compiler Explorer that some (ancient?) Power cpus > implement instruction "popcntb", and GCC support for those uses > -mpopcntb switch enabling __builtin_popcount() to use it. I added the > switch to configure.in but I'm not sure how well that will work ...

Re: Using POPCNT and other advanced bit manipulation instructions

2019-02-15 Thread Tom Lane
Alvaro Herrera writes: > Ah, I understand it now: > https://stackoverflow.com/questions/25683690/confusion-about-bsr-and-lzcnt/43443701#43443701 > if you call LZCNT/TZCNT on a CPU that doesn't support it, it won't raise > SIGILL or anything ... it'll just silently compute the wrong result. > That'

Re: Using POPCNT and other advanced bit manipulation instructions

2019-02-15 Thread Alvaro Herrera
On 2019-Feb-14, Tom Lane wrote: > Then the correct test to see if we want to build pg_popcount.c (BTW, > please pick a less generic name for that) and the choose function > is whether we have *both* HAVE__BUILTIN_POPCOUNT and nonempty > CFLAGS_POPCNT. I used pg_bitutils_sse42.c to host the specia

Re: Using POPCNT and other advanced bit manipulation instructions

2019-02-15 Thread Alvaro Herrera
On 2019-Feb-15, Alvaro Herrera wrote: > On 2019-Feb-15, Kyotaro HORIGUCHI wrote: > > And even worse lzcntx is accidentially "fallback"s to bsrx on > > unsupported CPUs, which leads to bogus results. > > __builtin_clzll(8) = 3, which should be 60. > > I'm not sure I understand this point. Are yo

Re: Using POPCNT and other advanced bit manipulation instructions

2019-02-15 Thread Alvaro Herrera
On 2019-Feb-15, Kyotaro HORIGUCHI wrote: > I understand that the behavior of __builtin_c[tl]z(0) is > undefined from the reason, they convert to bs[rf]. So if we use > these builtins, additional check is required. > > https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html Okay -- the functions c

Re: Using POPCNT and other advanced bit manipulation instructions

2019-02-14 Thread Kyotaro HORIGUCHI
At Thu, 14 Feb 2019 16:45:38 -0500, Tom Lane wrote in <822.1550180...@sss.pgh.pa.us> > Andres Freund writes: > > On 2019-02-14 15:47:13 -0300, Alvaro Herrera wrote: > >> Hah, I just realized you have to add -mlzcnt in order for these builtins > >> to use the lzcnt instructions. It goes from som

Re: Using POPCNT and other advanced bit manipulation instructions

2019-02-14 Thread Alvaro Herrera
On 2019-Feb-14, Tom Lane wrote: > I think we need a clean test for __builtin_popcount(), and to be willing > to use it if available, independently of -mpopcnt. Then separately we > should test to see if -mpopcnt works, probably with the same > infrastructure we use for checking for other compiler

Re: Using POPCNT and other advanced bit manipulation instructions

2019-02-14 Thread Tom Lane
Alvaro Herrera writes: > That leads me to the attached patch. It creates a new file > pg_popcount.c which is the only one compiled with -mpopcnt (if > available); if there's no compiler switch to enable POPCNT, we just > don't compile the file. I'm not sure that's kosher -- in particular I'm > n

Re: Using POPCNT and other advanced bit manipulation instructions

2019-02-14 Thread Alvaro Herrera
On 2019-Feb-14, Tom Lane wrote: > static inline int > pg_clz(...) Hmm, I missed this bit. So we put all these functions in the header, as in the attached. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services commit

Re: Using POPCNT and other advanced bit manipulation instructions

2019-02-14 Thread Alvaro Herrera
On 2019-Feb-14, Tom Lane wrote: > I'd bet a fair amount of money that we'd be better off *not* using > lzcnt, even if available, because then we could just expose things > along this line: > > static inline int > pg_clz(...) > { > #ifdef HAVE__BUILTIN_CLZ > return __builtin_clz(x); > #else >

Re: Using POPCNT and other advanced bit manipulation instructions

2019-02-14 Thread Tom Lane
Andres Freund writes: > On 2019-02-14 15:47:13 -0300, Alvaro Herrera wrote: >> Hah, I just realized you have to add -mlzcnt in order for these builtins >> to use the lzcnt instructions. It goes from something like >> >> bsrq %rax, %rax >> xorq $63, %rax > I'm confused how this is a general coun

Re: Using POPCNT and other advanced bit manipulation instructions

2019-02-14 Thread Andres Freund
Hi, On 2019-02-14 15:47:13 -0300, Alvaro Herrera wrote: > On 2019-Feb-14, Tom Lane wrote: > > > Some further thoughts here ... > > > > Does the "lzcnt" runtime probe actually do anything useful? > > On the x86_64 compilers I tried (gcc 8.2.1 and 4.4.7), __builtin_clz > > and __builtin_ctz compil

Re: Using POPCNT and other advanced bit manipulation instructions

2019-02-14 Thread Tom Lane
Alvaro Herrera writes: > Hah, I just realized you have to add -mlzcnt in order for these builtins > to use the lzcnt instructions. It goes from something like > bsrq%rax, %rax > xorq$63, %rax > to > lzcntq %rax, %rax > Significant? I'd bet a fair amount of money tha

Re: Using POPCNT and other advanced bit manipulation instructions

2019-02-14 Thread Alvaro Herrera
On 2019-Feb-14, Tom Lane wrote: > Some further thoughts here ... > > Does the "lzcnt" runtime probe actually do anything useful? > On the x86_64 compilers I tried (gcc 8.2.1 and 4.4.7), __builtin_clz > and __builtin_ctz compile to sequences involving bsrq and bsfq > regardless of -mpopcnt. It's

Re: Using POPCNT and other advanced bit manipulation instructions

2019-02-13 Thread Tom Lane
Gavin Flower writes: > From my memory of reading of K&R many moons ago, it said that C only > guarantees that the lengths are such that > byte <= half word <= word <= long Indeed. > But I don't recall ever seeing a long less than 32 bits! I'm not sure offhand what C89 said, but C99 requires "s

Re: Using POPCNT and other advanced bit manipulation instructions

2019-02-13 Thread Gavin Flower
On 14/02/2019 11:17, Alvaro Herrera wrote: On 2019-Feb-13, Alvaro Herrera wrote: It definitely is ... plans have changed from using IndexOnly scans to Seqscans, which is likely fallout from the visibilitymap_count() change. I think the problem here is that "unsigned long" is 32 bits in this ma

Re: Using POPCNT and other advanced bit manipulation instructions

2019-02-13 Thread Tom Lane
Some further thoughts here ... Does the "lzcnt" runtime probe actually do anything useful? On the x86_64 compilers I tried (gcc 8.2.1 and 4.4.7), __builtin_clz and __builtin_ctz compile to sequences involving bsrq and bsfq regardless of -mpopcnt. It's fairly hard to see how lzcnt would buy anythi

Re: Using POPCNT and other advanced bit manipulation instructions

2019-02-13 Thread Tom Lane
Thomas Munro writes: > On Thu, Feb 14, 2019 at 4:38 PM Tom Lane wrote: >> I'd be inclined to rip out all of the run-time-detection logic here; >> I doubt any of it is buying anything that's worth the price of an >> indirect call. > No view on that but apparently there were Intel Atom and AMD C c

Re: Using POPCNT and other advanced bit manipulation instructions

2019-02-13 Thread Thomas Munro
On Thu, Feb 14, 2019 at 4:38 PM Tom Lane wrote: > And, while I'm complaining: why the devil is use of the compiler builtins > gated by HAVE__GET_CPUID? This is unbelievably Intel-centric, because > it prevents use of the builtins on other architectures. If the builtin > exists, we should use it,

Re: Using POPCNT and other advanced bit manipulation instructions

2019-02-13 Thread Tom Lane
And, while I'm complaining: why the devil is use of the compiler builtins gated by HAVE__GET_CPUID? This is unbelievably Intel-centric, because it prevents use of the builtins on other architectures. If the builtin exists, we should use it, full stop. There's no reason to expect that it would be

Re: Using POPCNT and other advanced bit manipulation instructions

2019-02-13 Thread Alvaro Herrera
On 2019-Feb-13, Tom Lane wrote: > Alvaro Herrera writes: > > and we have defined pg_popcount64() like this: > > > static int > > pg_popcount64_sse42(uint64 word) > > { > > return __builtin_popcountl(word); > > } > > That is clearly completely broken. Pushed my proposed fix, which includes

Re: Using POPCNT and other advanced bit manipulation instructions

2019-02-13 Thread Tom Lane
... btw, why is pg_popcount casting away the const from its pointer argument? regards, tom lane

Re: Using POPCNT and other advanced bit manipulation instructions

2019-02-13 Thread Tom Lane
Alvaro Herrera writes: > and we have defined pg_popcount64() like this: > static int > pg_popcount64_sse42(uint64 word) > { > return __builtin_popcountl(word); > } That is clearly completely broken. > If that's correct, then I think we need something like this patch. But > it makes me w

Re: Using POPCNT and other advanced bit manipulation instructions

2019-02-13 Thread Alvaro Herrera
On 2019-Feb-13, Alvaro Herrera wrote: > It definitely is ... plans have changed from using IndexOnly scans to > Seqscans, which is likely fallout from the visibilitymap_count() change. I think the problem here is that "unsigned long" is 32 bits in this machine: checking whether long int is 64 b

Re: Using POPCNT and other advanced bit manipulation instructions

2019-02-13 Thread Andrew Gierth
> "Alvaro" == Alvaro Herrera writes: >> Lapwing's latest failure looks like yours rather than mine now? (the >> previous two were mine) Alvaro> It definitely is ... plans have changed from using IndexOnly Alvaro> scans to Seqscans, which is likely fallout from the Alvaro> visibilitymap_

Re: Using POPCNT and other advanced bit manipulation instructions

2019-02-13 Thread Alvaro Herrera
On 2019-Feb-13, Andrew Gierth wrote: > > "Alvaro" == Alvaro Herrera writes: > > Alvaro> I've pushed this now. Let's see what the buildfarm has to say > Alvaro> about it. > > Lapwing's latest failure looks like yours rather than mine now? (the > previous two were mine) It definitely is ..

Re: Using POPCNT and other advanced bit manipulation instructions

2019-02-13 Thread Andrew Gierth
> "Alvaro" == Alvaro Herrera writes: Alvaro> I've pushed this now. Let's see what the buildfarm has to say Alvaro> about it. Lapwing's latest failure looks like yours rather than mine now? (the previous two were mine) -- Andrew (irc:RhodiumToad)

Re: Using POPCNT and other advanced bit manipulation instructions

2019-02-13 Thread Andrew Gierth
> "Andrew" == Andrew Gierth writes: Andrew> IA64 is working fine as far as I can see (specifically, anole Andrew> is passing); it's ICC on x86_64 that broke (fulmar). And I know what's wrong on fulmar now, so that'll be fixed shortly. -- Andrew (irc:RhodiumToad)

Re: Using POPCNT and other advanced bit manipulation instructions

2019-02-13 Thread Alvaro Herrera
On 2019-Feb-13, Andres Freund wrote: > Hi, > > On February 13, 2019 8:40:14 PM GMT+01:00, Tom Lane > wrote: > >Alvaro Herrera writes: > >> I've pushed this now. Let's see what the buildfarm has to say about > >it. > > > >It's likely to be hard to tell, given the amount of pink from the Ryu >

Re: Using POPCNT and other advanced bit manipulation instructions

2019-02-13 Thread Andrew Gierth
> "Andres" == Andres Freund writes: >> It's likely to be hard to tell, given the amount of pink from the >> Ryu patch. If Andrew is not planning to clean that up PDQ, Besides crake (x-version), fulmar (icc) and lorikeet (cygwin), I hope the rest of the known failures should pass on the nex

Re: Using POPCNT and other advanced bit manipulation instructions

2019-02-13 Thread Tom Lane
Andres Freund writes: > I'd assume that breaking bit counting would cause distinct enough damage > (compile time or crashes). That's not to say that reverting ryu shouldn't be > considered (although I'm not that bothered by cross version, ia64 and cygwin > failures, especially because the latt

Re: Using POPCNT and other advanced bit manipulation instructions

2019-02-13 Thread Andres Freund
Hi, On February 13, 2019 8:40:14 PM GMT+01:00, Tom Lane wrote: >Alvaro Herrera writes: >> I've pushed this now. Let's see what the buildfarm has to say about >it. > >It's likely to be hard to tell, given the amount of pink from the Ryu >patch. If Andrew is not planning to clean that up PDQ, I'

Re: Using POPCNT and other advanced bit manipulation instructions

2019-02-13 Thread Tom Lane
Alvaro Herrera writes: > I've pushed this now. Let's see what the buildfarm has to say about it. It's likely to be hard to tell, given the amount of pink from the Ryu patch. If Andrew is not planning to clean that up PDQ, I'd suggest reverting that patch pending having some repairs for it.

Re: Using POPCNT and other advanced bit manipulation instructions

2019-02-13 Thread Alvaro Herrera
On 2019-Feb-04, David Rowley wrote: > On Fri, 1 Feb 2019 at 11:45, Alvaro Herrera wrote: > > > > I only have cosmetic suggestions for this patch. For one thing, I think > > the .c file should be in src/port and its header should be in > > src/include/port/, right beside the likes of pg_bswap.h a

Re: Using POPCNT and other advanced bit manipulation instructions

2019-02-04 Thread David Rowley
Thanks for looking at this. On Fri, 1 Feb 2019 at 11:45, Alvaro Herrera wrote: > > I only have cosmetic suggestions for this patch. For one thing, I think > the .c file should be in src/port and its header should be in > src/include/port/, right beside the likes of pg_bswap.h and pg_crc32c.h. I

Re: Using POPCNT and other advanced bit manipulation instructions

2019-02-03 Thread Michael Paquier
On Thu, Jan 31, 2019 at 07:45:02PM -0300, Alvaro Herrera wrote: > Other than those minor changes, I think we should just get this pushed > and see what the buildfarm thinks. In the words of a famous PG hacker: > if a platform ain't in the buildfarm, we don't support it. Moved to next CF, waiting

Re: Using POPCNT and other advanced bit manipulation instructions

2019-01-31 Thread Alvaro Herrera
I only have cosmetic suggestions for this patch. For one thing, I think the .c file should be in src/port and its header should be in src/include/port/, right beside the likes of pg_bswap.h and pg_crc32c.h. For another, I think the arrangement of all those "ifdef HAVE_THIS_OR_THAT" in the bitutils

Re: Using POPCNT and other advanced bit manipulation instructions

2019-01-08 Thread Dmitry Dolgov
> On Fri, Jan 4, 2019 at 1:38 PM David Rowley > wrote: > > On Thu, 20 Dec 2018 at 23:56, Dmitry Dolgov <9erthali...@gmail.com> wrote: > > I've checked for Clang 6, it turns out that indeed it generates popcnt > > without > > any macro, but only in one place for bloom_prop_bits_set. After looking

Re: Using POPCNT and other advanced bit manipulation instructions

2019-01-04 Thread David Rowley
On Thu, 20 Dec 2018 at 23:59, Jose Luis Tallon wrote: > IMVHO: Please do not disregard potential optimization by the compiler > around those calls.. o_0 That might explain the reduced performance > improvement observed. It was a speedup that I measured. Did you see something else? > > What I'm

Re: Using POPCNT and other advanced bit manipulation instructions

2019-01-04 Thread David Rowley
Thanks for looking at this. On Thu, 20 Dec 2018 at 23:56, Dmitry Dolgov <9erthali...@gmail.com> wrote: > I've checked for Clang 6, it turns out that indeed it generates popcnt without > any macro, but only in one place for bloom_prop_bits_set. After looking at > this > function it seems that it w

Re: Using POPCNT and other advanced bit manipulation instructions

2018-12-20 Thread Jose Luis Tallon
On 20/12/18 6:53, David Rowley wrote: Back in 2016 [1] there was some discussion about using the POPCNT instruction to improve the performance of counting the number of bits set in a word. Improving this helps various cases, such as bms_num_members and also things like counting the allvisible an

Re: Using POPCNT and other advanced bit manipulation instructions

2018-12-20 Thread Dmitry Dolgov
> On Thu, Dec 20, 2018 at 6:53 AM David Rowley > wrote: > > Thomas mentions in [1], to get the GCC to use the POPCNT instruction, > we must pass -mpopcnt in the build flags. After doing a bit of > research, I found [2] which mentions that some compilers have some > pattern matching code to allow

Re: Using POPCNT and other advanced bit manipulation instructions

2018-12-20 Thread David Rowley
On Thu, 20 Dec 2018 at 20:17, Gavin Flower wrote: > Looking at the normalized standard deviations, the patched results have > a higher than 5% chance of being better simply by chance. I suspect > that you have made an improvement, but the statistics are not convincing. Yeah, I'd hoped that I cou

Re: Using POPCNT and other advanced bit manipulation instructions

2018-12-19 Thread Gavin Flower
On 20/12/2018 18:53, David Rowley wrote [...] Patched: postgres=# analyze t1; Time: 680.833 ms Time: 699.976 ms Time: 695.608 ms Time: 676.007 ms Time: 693.487 ms Time: 726.982 ms Time: 677.835 ms Time: 688.426 ms Master: postgres=# analyze t1; Time: 721.837 ms Time: 756.035 ms Time: 734.545 m