Hi,
On 2019-02-14 16:45:38 -0500, Tom Lane wrote:
> Andres Freund writes:
> > On 2019-02-14 15:47:13 -0300, Alvaro Herrera wrote:
> >> Hah, I just realized you have to add -mlzcnt in order for these builtins
> >> to use the lzcnt instructions. It goes from something like
> >>
> >> bsrq %r
Alvaro Herrera writes:
> Here's a final version that I intend to push shortly, to have time
> before EOB today to handle any fallout.
I think this is likely to result in a lot of complaints about
rightmost_one_pos[] being unreferenced, in non-HAVE__BUILTIN_CTZ
builds. Probably that has to be an
Here's a final version that I intend to push shortly, to have time
before EOB today to handle any fallout.
--
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From 085650a174ff080f578bb289d3707173aaf4f07b Mon Sep 17
On 2019-Feb-15, Tom Lane wrote:
> Alvaro Herrera writes:
> > Ah, I understand it now:
> > https://stackoverflow.com/questions/25683690/confusion-about-bsr-and-lzcnt/43443701#43443701
> > if you call LZCNT/TZCNT on a CPU that doesn't support it, it won't raise
> > SIGILL or anything ... it'll just
Alvaro Herrera writes:
> I noticed in Compiler Explorer that some (ancient?) Power cpus
> implement instruction "popcntb", and GCC support for those uses
> -mpopcntb switch enabling __builtin_popcount() to use it. I added the
> switch to configure.in but I'm not sure how well that will work ...
Alvaro Herrera writes:
> Ah, I understand it now:
> https://stackoverflow.com/questions/25683690/confusion-about-bsr-and-lzcnt/43443701#43443701
> if you call LZCNT/TZCNT on a CPU that doesn't support it, it won't raise
> SIGILL or anything ... it'll just silently compute the wrong result.
> That'
On 2019-Feb-14, Tom Lane wrote:
> Then the correct test to see if we want to build pg_popcount.c (BTW,
> please pick a less generic name for that) and the choose function
> is whether we have *both* HAVE__BUILTIN_POPCOUNT and nonempty
> CFLAGS_POPCNT.
I used pg_bitutils_sse42.c to host the specia
On 2019-Feb-15, Alvaro Herrera wrote:
> On 2019-Feb-15, Kyotaro HORIGUCHI wrote:
> > And even worse lzcntx is accidentially "fallback"s to bsrx on
> > unsupported CPUs, which leads to bogus results.
> > __builtin_clzll(8) = 3, which should be 60.
>
> I'm not sure I understand this point. Are yo
On 2019-Feb-15, Kyotaro HORIGUCHI wrote:
> I understand that the behavior of __builtin_c[tl]z(0) is
> undefined from the reason, they convert to bs[rf]. So if we use
> these builtins, additional check is required.
>
> https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html
Okay -- the functions c
At Thu, 14 Feb 2019 16:45:38 -0500, Tom Lane wrote in
<822.1550180...@sss.pgh.pa.us>
> Andres Freund writes:
> > On 2019-02-14 15:47:13 -0300, Alvaro Herrera wrote:
> >> Hah, I just realized you have to add -mlzcnt in order for these builtins
> >> to use the lzcnt instructions. It goes from som
On 2019-Feb-14, Tom Lane wrote:
> I think we need a clean test for __builtin_popcount(), and to be willing
> to use it if available, independently of -mpopcnt. Then separately we
> should test to see if -mpopcnt works, probably with the same
> infrastructure we use for checking for other compiler
Alvaro Herrera writes:
> That leads me to the attached patch. It creates a new file
> pg_popcount.c which is the only one compiled with -mpopcnt (if
> available); if there's no compiler switch to enable POPCNT, we just
> don't compile the file. I'm not sure that's kosher -- in particular I'm
> n
On 2019-Feb-14, Tom Lane wrote:
> static inline int
> pg_clz(...)
Hmm, I missed this bit. So we put all these functions in the header, as
in the attached.
--
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
commit
On 2019-Feb-14, Tom Lane wrote:
> I'd bet a fair amount of money that we'd be better off *not* using
> lzcnt, even if available, because then we could just expose things
> along this line:
>
> static inline int
> pg_clz(...)
> {
> #ifdef HAVE__BUILTIN_CLZ
> return __builtin_clz(x);
> #else
>
Andres Freund writes:
> On 2019-02-14 15:47:13 -0300, Alvaro Herrera wrote:
>> Hah, I just realized you have to add -mlzcnt in order for these builtins
>> to use the lzcnt instructions. It goes from something like
>>
>> bsrq %rax, %rax
>> xorq $63, %rax
> I'm confused how this is a general coun
Hi,
On 2019-02-14 15:47:13 -0300, Alvaro Herrera wrote:
> On 2019-Feb-14, Tom Lane wrote:
>
> > Some further thoughts here ...
> >
> > Does the "lzcnt" runtime probe actually do anything useful?
> > On the x86_64 compilers I tried (gcc 8.2.1 and 4.4.7), __builtin_clz
> > and __builtin_ctz compil
Alvaro Herrera writes:
> Hah, I just realized you have to add -mlzcnt in order for these builtins
> to use the lzcnt instructions. It goes from something like
> bsrq%rax, %rax
> xorq$63, %rax
> to
> lzcntq %rax, %rax
> Significant?
I'd bet a fair amount of money tha
On 2019-Feb-14, Tom Lane wrote:
> Some further thoughts here ...
>
> Does the "lzcnt" runtime probe actually do anything useful?
> On the x86_64 compilers I tried (gcc 8.2.1 and 4.4.7), __builtin_clz
> and __builtin_ctz compile to sequences involving bsrq and bsfq
> regardless of -mpopcnt. It's
Gavin Flower writes:
> From my memory of reading of K&R many moons ago, it said that C only
> guarantees that the lengths are such that
> byte <= half word <= word <= long
Indeed.
> But I don't recall ever seeing a long less than 32 bits!
I'm not sure offhand what C89 said, but C99 requires "s
On 14/02/2019 11:17, Alvaro Herrera wrote:
On 2019-Feb-13, Alvaro Herrera wrote:
It definitely is ... plans have changed from using IndexOnly scans to
Seqscans, which is likely fallout from the visibilitymap_count() change.
I think the problem here is that "unsigned long" is 32 bits in this
ma
Some further thoughts here ...
Does the "lzcnt" runtime probe actually do anything useful?
On the x86_64 compilers I tried (gcc 8.2.1 and 4.4.7), __builtin_clz
and __builtin_ctz compile to sequences involving bsrq and bsfq
regardless of -mpopcnt. It's fairly hard to see how lzcnt would
buy anythi
Thomas Munro writes:
> On Thu, Feb 14, 2019 at 4:38 PM Tom Lane wrote:
>> I'd be inclined to rip out all of the run-time-detection logic here;
>> I doubt any of it is buying anything that's worth the price of an
>> indirect call.
> No view on that but apparently there were Intel Atom and AMD C c
On Thu, Feb 14, 2019 at 4:38 PM Tom Lane wrote:
> And, while I'm complaining: why the devil is use of the compiler builtins
> gated by HAVE__GET_CPUID? This is unbelievably Intel-centric, because
> it prevents use of the builtins on other architectures. If the builtin
> exists, we should use it,
And, while I'm complaining: why the devil is use of the compiler builtins
gated by HAVE__GET_CPUID? This is unbelievably Intel-centric, because
it prevents use of the builtins on other architectures. If the builtin
exists, we should use it, full stop. There's no reason to expect that it
would be
On 2019-Feb-13, Tom Lane wrote:
> Alvaro Herrera writes:
> > and we have defined pg_popcount64() like this:
>
> > static int
> > pg_popcount64_sse42(uint64 word)
> > {
> > return __builtin_popcountl(word);
> > }
>
> That is clearly completely broken.
Pushed my proposed fix, which includes
... btw, why is pg_popcount casting away the const from its pointer
argument?
regards, tom lane
Alvaro Herrera writes:
> and we have defined pg_popcount64() like this:
> static int
> pg_popcount64_sse42(uint64 word)
> {
> return __builtin_popcountl(word);
> }
That is clearly completely broken.
> If that's correct, then I think we need something like this patch. But
> it makes me w
On 2019-Feb-13, Alvaro Herrera wrote:
> It definitely is ... plans have changed from using IndexOnly scans to
> Seqscans, which is likely fallout from the visibilitymap_count() change.
I think the problem here is that "unsigned long" is 32 bits in this
machine:
checking whether long int is 64 b
> "Alvaro" == Alvaro Herrera writes:
>> Lapwing's latest failure looks like yours rather than mine now? (the
>> previous two were mine)
Alvaro> It definitely is ... plans have changed from using IndexOnly
Alvaro> scans to Seqscans, which is likely fallout from the
Alvaro> visibilitymap_
On 2019-Feb-13, Andrew Gierth wrote:
> > "Alvaro" == Alvaro Herrera writes:
>
> Alvaro> I've pushed this now. Let's see what the buildfarm has to say
> Alvaro> about it.
>
> Lapwing's latest failure looks like yours rather than mine now? (the
> previous two were mine)
It definitely is ..
> "Alvaro" == Alvaro Herrera writes:
Alvaro> I've pushed this now. Let's see what the buildfarm has to say
Alvaro> about it.
Lapwing's latest failure looks like yours rather than mine now? (the
previous two were mine)
--
Andrew (irc:RhodiumToad)
> "Andrew" == Andrew Gierth writes:
Andrew> IA64 is working fine as far as I can see (specifically, anole
Andrew> is passing); it's ICC on x86_64 that broke (fulmar).
And I know what's wrong on fulmar now, so that'll be fixed shortly.
--
Andrew (irc:RhodiumToad)
On 2019-Feb-13, Andres Freund wrote:
> Hi,
>
> On February 13, 2019 8:40:14 PM GMT+01:00, Tom Lane
> wrote:
> >Alvaro Herrera writes:
> >> I've pushed this now. Let's see what the buildfarm has to say about
> >it.
> >
> >It's likely to be hard to tell, given the amount of pink from the Ryu
>
> "Andres" == Andres Freund writes:
>> It's likely to be hard to tell, given the amount of pink from the
>> Ryu patch. If Andrew is not planning to clean that up PDQ,
Besides crake (x-version), fulmar (icc) and lorikeet (cygwin), I hope
the rest of the known failures should pass on the nex
Andres Freund writes:
> I'd assume that breaking bit counting would cause distinct enough damage
> (compile time or crashes). That's not to say that reverting ryu shouldn't be
> considered (although I'm not that bothered by cross version, ia64 and cygwin
> failures, especially because the latt
Hi,
On February 13, 2019 8:40:14 PM GMT+01:00, Tom Lane wrote:
>Alvaro Herrera writes:
>> I've pushed this now. Let's see what the buildfarm has to say about
>it.
>
>It's likely to be hard to tell, given the amount of pink from the Ryu
>patch. If Andrew is not planning to clean that up PDQ, I'
Alvaro Herrera writes:
> I've pushed this now. Let's see what the buildfarm has to say about it.
It's likely to be hard to tell, given the amount of pink from the Ryu
patch. If Andrew is not planning to clean that up PDQ, I'd suggest
reverting that patch pending having some repairs for it.
On 2019-Feb-04, David Rowley wrote:
> On Fri, 1 Feb 2019 at 11:45, Alvaro Herrera wrote:
> >
> > I only have cosmetic suggestions for this patch. For one thing, I think
> > the .c file should be in src/port and its header should be in
> > src/include/port/, right beside the likes of pg_bswap.h a
Thanks for looking at this.
On Fri, 1 Feb 2019 at 11:45, Alvaro Herrera wrote:
>
> I only have cosmetic suggestions for this patch. For one thing, I think
> the .c file should be in src/port and its header should be in
> src/include/port/, right beside the likes of pg_bswap.h and pg_crc32c.h.
I
On Thu, Jan 31, 2019 at 07:45:02PM -0300, Alvaro Herrera wrote:
> Other than those minor changes, I think we should just get this pushed
> and see what the buildfarm thinks. In the words of a famous PG hacker:
> if a platform ain't in the buildfarm, we don't support it.
Moved to next CF, waiting
I only have cosmetic suggestions for this patch. For one thing, I think
the .c file should be in src/port and its header should be in
src/include/port/, right beside the likes of pg_bswap.h and pg_crc32c.h.
For another, I think the arrangement of all those "ifdef
HAVE_THIS_OR_THAT" in the bitutils
> On Fri, Jan 4, 2019 at 1:38 PM David Rowley
> wrote:
>
> On Thu, 20 Dec 2018 at 23:56, Dmitry Dolgov <9erthali...@gmail.com> wrote:
> > I've checked for Clang 6, it turns out that indeed it generates popcnt
> > without
> > any macro, but only in one place for bloom_prop_bits_set. After looking
On Thu, 20 Dec 2018 at 23:59, Jose Luis Tallon
wrote:
> IMVHO: Please do not disregard potential optimization by the compiler
> around those calls.. o_0 That might explain the reduced performance
> improvement observed.
It was a speedup that I measured. Did you see something else?
> > What I'm
Thanks for looking at this.
On Thu, 20 Dec 2018 at 23:56, Dmitry Dolgov <9erthali...@gmail.com> wrote:
> I've checked for Clang 6, it turns out that indeed it generates popcnt without
> any macro, but only in one place for bloom_prop_bits_set. After looking at
> this
> function it seems that it w
On 20/12/18 6:53, David Rowley wrote:
Back in 2016 [1] there was some discussion about using the POPCNT
instruction to improve the performance of counting the number of bits
set in a word. Improving this helps various cases, such as
bms_num_members and also things like counting the allvisible an
> On Thu, Dec 20, 2018 at 6:53 AM David Rowley
> wrote:
>
> Thomas mentions in [1], to get the GCC to use the POPCNT instruction,
> we must pass -mpopcnt in the build flags. After doing a bit of
> research, I found [2] which mentions that some compilers have some
> pattern matching code to allow
On Thu, 20 Dec 2018 at 20:17, Gavin Flower
wrote:
> Looking at the normalized standard deviations, the patched results have
> a higher than 5% chance of being better simply by chance. I suspect
> that you have made an improvement, but the statistics are not convincing.
Yeah, I'd hoped that I cou
On 20/12/2018 18:53, David Rowley wrote
[...]
Patched:
postgres=# analyze t1;
Time: 680.833 ms
Time: 699.976 ms
Time: 695.608 ms
Time: 676.007 ms
Time: 693.487 ms
Time: 726.982 ms
Time: 677.835 ms
Time: 688.426 ms
Master:
postgres=# analyze t1;
Time: 721.837 ms
Time: 756.035 ms
Time: 734.545 m
48 matches
Mail list logo