Re: [HACKERS] Re: Abbreviated keys for Numeric
On Sat, Jan 31, 2015 at 7:07 PM, Peter Geoghegan wrote: > I don't want to get bogged down on this - the numeric abbreviation > patch *is* still much more compelling - but maybe abbreviation of > float8 isn't a red herring after all. I'm completely on-board with doing something about numeric. I think it might be pretty foolish to try to do anything about any data type the CPU has hard-wired knowledge of. We're basically betting that we can do better in software than they did in hardware, and even if that happens to be true on some systems under some circumstances, it leaves us in a poor position to leverage future improvements to the silicon. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Abbreviated keys for Numeric
On Mon, Jan 26, 2015 at 3:35 PM, Peter Geoghegan wrote: > I am not seriously suggesting pursuing abbreviation for float8 in the > near term - numeric is clearly what we should concentrate on. It's > interesting that abbreviation of float8 could potentially make sense, > though. Note that in the IEEE 754 standard, the exponent does not have a sign. Rather, an exponent bias is subtracted from it (127 for single precision floats, and 1023 for double precision floats). This, and the bit sequence of the mantissa allows floats to be compared and sorted correctly even when interpreting them as integers. The exception is NaN, but then we have an exception to that exception. This is a really old idea, actually. I first saw it in a paper written in the 1960s, long before math coprocessors became standard. Haven't really thrashed this out enough, but I offhand I guess it would work. The other problem is that positive IEEE floating-point numbers sort like integers with the same bits, and negative IEEE floating-point numbers sort in the reverse order of integers with the same bits. So we'd probably end up with an encoding scheme that accounted for that, and forget about tie-breakers (or have a NOOP "return 0" tie-breaker). An example of the problem: postgres=# create table foo (a float8); CREATE TABLE postgres=# insert into foo values (1), (2), (3), (-1), (-2), (-3); INSERT 0 6 postgres=# select * from foo order by a; a -1 -2 -3 1 2 3 (6 rows) The reason that this conversion usually doesn't occur in library sorting routines is because it only helps significantly on x86, has additional memory overhead, and ordinarily requires that we convert back when we're done sorting. The costs/benefit analysis for tuplesort would be much more favorable than a generic float sorting case, given that we pretty much have datum1 storage as a sunk costs anyway, and given that we don't need to convert back the datum1 representation, and given that the encoding process would be dirt cheap and occur at a time when we were likely totally bottlenecked on memory bandwidth. I don't want to get bogged down on this - the numeric abbreviation patch *is* still much more compelling - but maybe abbreviation of float8 isn't a red herring after all. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Abbreviated keys for Numeric
On 28/01/15 06:29, Andrew Gierth wrote: "Peter" == Peter Geoghegan writes: Peter> What I find particularly interesting about this patch is that it Peter> makes sorting numerics significantly faster than even sorting Peter> float8 values, Played some more with this. Testing on some different gcc versions showed that the results were not consistent between versions; the latest I tried (4.9) showed float8 as somewhat faster, while 4.7 showed float8 as slightly slower; the difference was all in the time of the float8 case, the time for numeric was virtually the same. For one specific test query, taking the best time of multiple runs, float8: gcc4.7 = 980ms, gcc4.9 = 833ms numeric: gcc4.7 = 940ms, gcc4.9 = 920ms (vs. 650ms for bigint on either version) So honestly I think abbreviation for float8 is a complete red herring. Also, I couldn't get any detectable benefit from inlining DatumGetFloat8, though I may have to play more with that to be certain (above tests did not have any float8-related modifications at all, just the datum and numeric abbrevs patches). Since gcc5.0 is due to be released in less than 3 months, it might be worth testing with that. Cheers, Gavin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Abbreviated keys for Numeric
> "Peter" == Peter Geoghegan writes: Peter> What I find particularly interesting about this patch is that it Peter> makes sorting numerics significantly faster than even sorting Peter> float8 values, Played some more with this. Testing on some different gcc versions showed that the results were not consistent between versions; the latest I tried (4.9) showed float8 as somewhat faster, while 4.7 showed float8 as slightly slower; the difference was all in the time of the float8 case, the time for numeric was virtually the same. For one specific test query, taking the best time of multiple runs, float8: gcc4.7 = 980ms, gcc4.9 = 833ms numeric: gcc4.7 = 940ms, gcc4.9 = 920ms (vs. 650ms for bigint on either version) So honestly I think abbreviation for float8 is a complete red herring. Also, I couldn't get any detectable benefit from inlining DatumGetFloat8, though I may have to play more with that to be certain (above tests did not have any float8-related modifications at all, just the datum and numeric abbrevs patches). -- Andrew (irc:RhodiumToad) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Abbreviated keys for Numeric
On 27/01/15 00:51, Andres Freund wrote: On 2015-01-26 15:35:44 -0800, Peter Geoghegan wrote: On Mon, Jan 26, 2015 at 3:12 PM, Andrew Gierth wrote: Obvious overheads in float8 comparison include having to check for NaN, and the fact that DatumGetFloat8 on 64bit doesn't get inlined and forces a store/load to memory rather than just using a register. Looking at those might be more beneficial than messing with abbreviations. Aren't there issues with the alignment of double precision floating point numbers on x86, too? Maybe my information there is at least partially obsolete. But it seems we'd have to control for this to be sure. I think getting rid of the function call for DatumGetFloat8() would be quite the win. On x86-64 the conversion then should amount to mov %rd?,-0x8(%rsp);movsd -0x8(%rsp),%xmm0 - that's pretty cheap. Both instructions have a cycle count of 1 + L1 access latency (4) + 2 because they use the same exection port. So it's about 12 fully pipelineable cycles. 2 if the pipeline can kept busy otherwise. I doubt that'd be noticeable if the conversion were inlined. IIRC the DatumGetFloat8 was quite visible in the perf when I was writing the array version of width_bucket. It was one of the motivations for making special float8 version since not having to call it had significant effect. Sadly I don't remember if it was the function call itself or the conversion anymore. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Abbreviated keys for Numeric
On 2015-01-26 15:35:44 -0800, Peter Geoghegan wrote: > On Mon, Jan 26, 2015 at 3:12 PM, Andrew Gierth > wrote: > > Obvious overheads in float8 comparison include having to check for NaN, > > and the fact that DatumGetFloat8 on 64bit doesn't get inlined and forces > > a store/load to memory rather than just using a register. Looking at > > those might be more beneficial than messing with abbreviations. > > Aren't there issues with the alignment of double precision floating > point numbers on x86, too? Maybe my information there is at least > partially obsolete. But it seems we'd have to control for this to be > sure. I think getting rid of the function call for DatumGetFloat8() would be quite the win. On x86-64 the conversion then should amount to mov %rd?,-0x8(%rsp);movsd -0x8(%rsp),%xmm0 - that's pretty cheap. Both instructions have a cycle count of 1 + L1 access latency (4) + 2 because they use the same exection port. So it's about 12 fully pipelineable cycles. 2 if the pipeline can kept busy otherwise. I doubt that'd be noticeable if the conversion were inlined. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Abbreviated keys for Numeric
On Mon, Jan 26, 2015 at 3:12 PM, Andrew Gierth wrote: > Obvious overheads in float8 comparison include having to check for NaN, > and the fact that DatumGetFloat8 on 64bit doesn't get inlined and forces > a store/load to memory rather than just using a register. Looking at > those might be more beneficial than messing with abbreviations. Aren't there issues with the alignment of double precision floating point numbers on x86, too? Maybe my information there is at least partially obsolete. But it seems we'd have to control for this to be sure. I am not seriously suggesting pursuing abbreviation for float8 in the near term - numeric is clearly what we should concentrate on. It's interesting that abbreviation of float8 could potentially make sense, though. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Abbreviated keys for Numeric
> "Peter" == Peter Geoghegan writes: Peter> What I find particularly interesting about this patch is that it Peter> makes sorting numerics significantly faster than even sorting Peter> float8 values, I get a much smaller difference there than you do. Obvious overheads in float8 comparison include having to check for NaN, and the fact that DatumGetFloat8 on 64bit doesn't get inlined and forces a store/load to memory rather than just using a register. Looking at those might be more beneficial than messing with abbreviations. -- Andrew (irc:RhodiumToad) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: Abbreviated keys for Numeric (was: Re: B-Tree support function number 3 (strxfrm() optimization))
On Mon, Jan 26, 2015 at 8:43 AM, Andrew Gierth wrote: > Another spinoff from the abbreviation discussion. Peter Geoghegan > suggested on IRC that numeric would benefit from abbreviation, and > indeed it does (in some cases by a factor of about 6-7x or more, because > numeric comparison is no speed demon). Cool. What I find particularly interesting about this patch is that it makes sorting numerics significantly faster than even sorting float8 values, at least some of the time, even though the latter has generic SortSupport (for fmgr elision). Example: postgres=# create table foo as select x::float8 x, x::numeric y from (select random() * 1000 x from generate_series(1,100) a) b; SELECT 100 This query takes about 525ms after repeated executions: select * from (select * from foo order by x offset 10) i; However, this query takes about 412ms: select * from (select * from foo order by y offset 10) i; There is probably a good case to be made for float8 abbreviation supportjust as well that your datum abbreviation patch doesn't imply that pass-by-value types cannot be abbreviated across the board (it only implies that abbreviation of pass-by-value types is not supported in the datum sort case).:-) Anyway, the second query above (the one with the numeric ORDER BY column) is enormously faster than the same query executed against master's tip. That takes about 1720ms following repeated executions. So at least that case is over 4x faster, suggesting that abbreviation support for numeric is well worthwhile. So I'm signed up to review this one too. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers