Re: [HACKERS] Improving avg performance for numeric

2013-11-16 Thread Tom Lane
Pavel Stehule pavel.steh...@gmail.com writes:
 [ numeric-optimize-v8.patch ]

I've committed this with some adjustments.  Aside from cosmetic
changes and documentation/comment improvements:

* I don't agree at all with the claim that aggtransspace is only useful
for INTERNAL transition data.  There are lots of cases with regular
SQL types where the planner doesn't have a good idea of how large the
transition value will be, and with the current code it tends to guess
on the small side (frequently 32 bytes :-().  It seems to me to be
useful to give users a knob to twiddle there.  Consider for example
an aggregate that uses an integer array as transition state; the author
of the aggregate might know that there are usually hundreds of entries
in the array, and wish to dial up aggtransspace to prevent the planner
from optimistically choosing hash aggregation.

As committed, I just took out the restriction in CREATE AGGREGATE
altogether; it will let you set SSPACE for any transition datatype.
The planner will ignore it for pass-by-value types, where the behavior
is known, but otherwise honor it.  It's possible that we should put in
a restriction to INTERNAL plus pass-by-reference types, or even INTERNAL
plus pass-by-reference variable-length types, but I can't get excited
about that.  The error message would be too confusing I think.

* There was a stray else added to clauses.c that disabled space
accounting for polymorphic aggregates.

* I simplified the summing code in do_numeric_accum; the copying and
reallocating it was doing was not only unnecessary but contained
unpleasant assumptions about what you can do with a NumericVar.  This
probably makes the committed patch a bit faster than what was submitted,
but I didn't try to benchmark the change.

* I made sure all the functions accessed their input state pointer with
 state = PG_ARGISNULL(0) ? NULL : (NumericAggState *) PG_GETARG_POINTER(0);
There were places that omitted the ARGISNULL test, and thus only happened
to work if the uninitialized Datum value they got was zero.  While that
might chance to be true in the current state of the code, it's not an
assumption you're supposed to make.

* numeric sum/avg failed to check for NaN inputs.

* I fixed incorrect tests in the code added to pg_dump.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Improving avg performance for numeric

2013-10-14 Thread Tomas Vondra
On 24.9.2013 17:57, Pavel Stehule wrote:
 
 
 
 2013/9/24 Robert Haas robertmh...@gmail.com
 mailto:robertmh...@gmail.com
 
 On Mon, Sep 23, 2013 at 4:15 PM, Tomas Vondra t...@fuzzy.cz 
 mailto:t...@fuzzy.cz wrote:
 Seems ready for commiter to me. I'll wait a few days for others
 to comment, and then I'll update the commitfest page.
 
 Some thoughts:
 
 The documentation doesn't reflect the restriction to type internal. 
 On a related note, why restrict this to type internal?
 
 
 
 Now, for almost all types Postgres well estimate size of state
 value. Only arrays with unknown size can be a different. When we
 enable this value for all types, then users can specify some bad
 values for scalar buildin types. Next argument is simply and bad - I
 don't see a good use case for customization this value for other than
 types than internal type.
 
 I have no strong position here - prefer joining with internal type
 due little bit higher robustness.

I share this oppinion. I was not able to come up with a single use case
benefiting from allowing this for types other than internal. Seems like
a footgun to me, with no potential benefit.

So +1 to keeping the patch 'type internal only' from me.

With the formatting issues now fixed, I believe the patch is ready for
committer (and someone already switched it to that state).

Tomas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Improving avg performance for numeric

2013-09-24 Thread Robert Haas
On Mon, Sep 23, 2013 at 4:15 PM, Tomas Vondra t...@fuzzy.cz wrote:
 Seems ready for commiter to me. I'll wait a few days for others to
 comment, and then I'll update the commitfest page.

Some thoughts:

The documentation doesn't reflect the restriction to type internal.
On a related note, why restrict this to type internal?

Formatting fixes are needed:

+   if (aggtransspace  0)
+   {
+   costs-transitionSpace += aggtransspace;
+   }

Project style is not to use curly-braces for single statements.  Also,
the changes to numeric.c add blank lines before and after function
header comments, which is not the usual style.

!   if (state == NULL)
!   PG_RETURN_NULL();
!   else
!   PG_RETURN_POINTER(state);

I think this should just say PG_RETURN_POINTER(state).  PG_RETURN_NULL
is for returning an SQL NULL, not (void *) 0.  Is there some reason
why we need an SQL NULL here, rather than a NULL pointer?

On the whole this looks fairly solid on a first read-through.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Improving avg performance for numeric

2013-09-23 Thread Tomas Vondra
On 23 Září 2013, 18:18, Pavel Stehule wrote:
 Hello


 2013/9/22 Tomas Vondra t...@fuzzy.cz

 Hi,

 I've reviewed the v6 of the numeric optimize patch
 (http://www.postgresql.org/**message-id/**CAFj8pRDQhG7Pqmf8XqXY0PnHfakkP**
 QLPHnoRLJ_=EKFSbOAWeA@mail.**gmail.comhttp://www.postgresql.org/message-id/CAFj8pRDQhG7Pqmf8XqXY0PnHfakkPQLPHnoRLJ_=ekfsboa...@mail.gmail.com
 ),
 as Pavel did some hacking on the patch and asked me to do the review.

 The patch seems fine to me, the following comments are mostly
 nitpicking:

 1) Applies cleanly to the HEAD (although only by patch and not git
 apply).

 2) I think we should use estimate instead of approximation in the
 docs, it seems more correct / natural to me (but maybe I'm wrong on this
 one).

 3) state_data_size does not make much sense to me - it should be
 state_size. This probably comes from the state_data_type, but that's
 ('state' + 'data type') and by replacing the second part with 'size'
 you'll get state_size.


 This name is consistent with previous field state_data_type - I expected
 so
 this mean 'state data' + 'type'. I am not native speaker, so my position
 is
 not strong, but in this moment I am thinking so state_data_size has a
 sense.  In this case both variant has sense - 'state data' + type or
 'state' + 'data type'.

OK, let's leave this up to a native speaker.


 8) The records in pg_aggregate.c are using either 0 (for fixed-length)
 or
 128. This seems slightly excessive to me. What is the reasoning behind
 this? Is that because of the two NumericVar fields?


 NumericAggState has 96 bytes - but you have to add a space for digits of
 included numeric values (inclued in NumericVar) -- so it is others 16 + 16
 = 128. I am not able to specify how much digits will be used exactly - 16
 bytes is just good enough estimation - it is not used for memory
 allocation, it is used for some planner magic.

OK, makes sense.

I've made some basic tests on a 40M table with random numerics, for
example this query:

   select avg(val), sum(val), var_pop(val) from numeric_test ;

takes ~57 seconds on current master, but only ~26 seconds with the v7
patch. Granted, in practice the improvements won't be as good because of
I/O costs etc., but it's a nice gain.

Seems ready for commiter to me. I'll wait a few days for others to
comment, and then I'll update the commitfest page.

regards
Tomas



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Improving avg performance for numeric

2013-09-22 Thread Tomas Vondra

Hi,

I've reviewed the v6 of the numeric optimize patch
(http://www.postgresql.org/message-id/CAFj8pRDQhG7Pqmf8XqXY0PnHfakkPQLPHnoRLJ_=ekfsboa...@mail.gmail.com),
as Pavel did some hacking on the patch and asked me to do the review.

The patch seems fine to me, the following comments are mostly nitpicking:

1) Applies cleanly to the HEAD (although only by patch and not git
apply).

2) I think we should use estimate instead of approximation in the
docs, it seems more correct / natural to me (but maybe I'm wrong on this
one).

3) state_data_size does not make much sense to me - it should be
state_size. This probably comes from the state_data_type, but that's
('state' + 'data type') and by replacing the second part with 'size'
you'll get state_size.

4) currently the state size may be specified for all data types, no
matter if their size is fixed or variable. Wouldn't it be safer to allow
this option only for variable-length data types? Right now you can
specify stype=int and sspace=200 which does not make much sense to me.
We can always relax the restrictions if needed, the opposite is much
more difficult.

5) in numeric.c, there are no comments for
  - fields of the NumericAggState (some are obvious, some are not)
  - multiple functions are missing the initial comments (e.g.
numeric_accum, do_numeric_accum)

6) I think the first variable in do_numeric_accum is rather useless,
it's trivial to remove it - just use the state-first instead and move
the assignment to the proper branch (ok, this is nitpicking).

7) I think the error message in makeNumericAggState should be something 
like This must not be called in non-aggregate context! or something 
like that.


8) The records in pg_aggregate.c are using either 0 (for fixed-length) 
or 128. This seems slightly excessive to me. What is the reasoning 
behind this? Is that because of the two NumericVar fields?


regards
Tomas


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Improving avg performance for numeric

2013-09-04 Thread Peter Eisentraut
On 7/8/13 10:05 AM, Pavel Stehule wrote:
 I am testing your code, and It increase speed of sum about 24% faster
 then original implementation.

This patch needs to be rebased (and/or the later version registered in
the commit fest).



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Improving avg performance for numeric

2013-09-04 Thread Pavel Stehule
2013/9/4 Peter Eisentraut pete...@gmx.net

 On 7/8/13 10:05 AM, Pavel Stehule wrote:
  I am testing your code, and It increase speed of sum about 24% faster
  then original implementation.

 This patch needs to be rebased (and/or the later version registered in
 the commit fest).


I updated a commit fest info

Regards

Pavel


Re: [HACKERS] Improving avg performance for numeric

2013-09-04 Thread Peter Eisentraut
On 9/4/13 2:26 PM, Pavel Stehule wrote:
 
 
 
 2013/9/4 Peter Eisentraut pete...@gmx.net mailto:pete...@gmx.net
 
 On 7/8/13 10:05 AM, Pavel Stehule wrote:
  I am testing your code, and It increase speed of sum about 24% faster
  then original implementation.
 
 This patch needs to be rebased (and/or the later version registered in
 the commit fest).
 
 
 I updated a commit fest info

The new patch also needs to be rebased.





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Improving avg performance for numeric

2013-08-28 Thread Hadi Moshayedi
Hello,

 int, float, double 26829 ms (26675 ms) -- 0.5% slower .. statistic error ..
 cleaner code
 numeric sum 6490 ms (7224 ms) --  10% faster
 numeric avg 6487 ms (12023 ms) -- 46% faster

I also got very similar results.

On the other hand, initially I was receiving sigsegv's whenever I
wanted to try to run an aggregate function. gdb was telling that this
was happening somewhere in nodeAgg.c at ExecInitAgg. While trying to
find the reason, I had to reboot my computer at some point, after the
reboot the sigsegv's went away. I want to look into this and find the
reason, I think I have missed something here. Any thoughts about why
this would happen?

--Hadi


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Improving avg performance for numeric

2013-08-28 Thread Pavel Stehule
2013/8/29 Hadi Moshayedi h...@moshayedi.net

 Hello,

  int, float, double 26829 ms (26675 ms) -- 0.5% slower .. statistic error
 ..
  cleaner code
  numeric sum 6490 ms (7224 ms) --  10% faster
  numeric avg 6487 ms (12023 ms) -- 46% faster

 I also got very similar results.

 On the other hand, initially I was receiving sigsegv's whenever I
 wanted to try to run an aggregate function. gdb was telling that this
 was happening somewhere in nodeAgg.c at ExecInitAgg. While trying to
 find the reason, I had to reboot my computer at some point, after the
 reboot the sigsegv's went away. I want to look into this and find the
 reason, I think I have missed something here. Any thoughts about why
 this would happen?


I found a few bugs, that I fixed. There was a issue with empty sets. Other
issues I didn't find.

Regards

Pavel



 --Hadi



Re: [HACKERS] Improving avg performance for numeric

2013-08-26 Thread Pavel Stehule
Hello

here is a rebased patch. Hadi, please, can verify this version?

Regards

Pavel

p.s. Performance tests

postgres=# create table foo(a int, b float, c double precision, d numeric,
gr int);
CREATE TABLE
postgres=#
postgres=# insert into foo select 1, 2.0, 3.0, 3.14, random()*1 from
generate_series(1,1000);

postgres=# \d foo
  Table public.foo
 Column |   Type   | Modifiers
+--+---
 a  | integer  |
 b  | double precision |
 c  | double precision |
 d  | numeric  |
 gr | integer  |


set work_mem to '2MB';

postgres=# show debug_assertions;
 debug_assertions
--
 off
(1 row)



postgres=# explain (analyze, timing off) select sum(a) from foo;
QUERY
PLAN
---
 Aggregate  (cost=208332.23..208332.24 rows=1 width=4) (actual rows=1
loops=1)
   -  Seq Scan on foo  (cost=0.00..183332.58 rows=858 width=4) (actual
rows=1000 loops=1)
 Total runtime: 1210.321 ms (1195.117 ms) -- patched (original)
(3 rows)

Time: 1210.709 ms
postgres=# explain (analyze, timing off) select sum(a) from foo group by gr;
QUERY
PLAN
---
 HashAggregate  (cost=21.87..233431.71 rows=9984 width=8) (actual
rows=10001 loops=1)
   -  Seq Scan on foo  (cost=0.00..183332.58 rows=858 width=8) (actual
rows=1000 loops=1)
 Total runtime: 2923.987 ms (2952.292 ms)
(3 rows)

Time: 2924.384 ms

postgres=# explain (analyze, timing off) select avg(a) from foo;
QUERY
PLAN
---
 Aggregate  (cost=208332.23..208332.24 rows=1 width=4) (actual rows=1
loops=1)
   -  Seq Scan on foo  (cost=0.00..183332.58 rows=858 width=4) (actual
rows=1000 loops=1)
 Total runtime: 1331.627 ms (1312.140 ms)
(3 rows)

postgres=# explain (analyze, timing off) select avg(a) from foo group by gr;
QUERY
PLAN
---
 HashAggregate  (cost=21.87..233456.67 rows=9984 width=8) (actual
rows=10001 loops=1)
   -  Seq Scan on foo  (cost=0.00..183332.58 rows=858 width=8) (actual
rows=1000 loops=1)
 Total runtime: 3139.296 ms (3079.479 ms)
(3 rows)

postgres=# explain (analyze, timing off) select sum(b) from foo;
QUERY
PLAN
---
 Aggregate  (cost=208332.23..208332.24 rows=1 width=8) (actual rows=1
loops=1)
   -  Seq Scan on foo  (cost=0.00..183332.58 rows=858 width=8) (actual
rows=1000 loops=1)
 Total runtime: 1327.841 ms (1339.214 ms)
(3 rows)

postgres=# explain (analyze, timing off) select sum(b) from foo group by gr;
 QUERY
PLAN

 HashAggregate  (cost=21.87..233431.71 rows=9984 width=12) (actual
rows=10001 loops=1)
   -  Seq Scan on foo  (cost=0.00..183332.58 rows=858 width=12)
(actual rows=1000 loops=1)
 Total runtime: 3047.893 ms (3095.591 ms)
(3 rows)

postgres=# explain (analyze, timing off) select avg(b) from foo;
QUERY
PLAN
---
 Aggregate  (cost=208332.23..208332.24 rows=1 width=8) (actual rows=1
loops=1)
   -  Seq Scan on foo  (cost=0.00..183332.58 rows=858 width=8) (actual
rows=1000 loops=1)
 Total runtime: 1454.665 ms (1471.413 ms)
(3 rows)

postgres=# explain (analyze, timing off) select avg(b) from foo group by gr;
 QUERY
PLAN

 HashAggregate  (cost=21.87..233456.67 rows=9984 width=12) (actual
rows=10001 loops=1)
   -  Seq Scan on foo  (cost=0.00..183332.58 rows=858 width=12)
(actual rows=1000 loops=1)
 Total runtime: 3282.838 ms (3187.157 ms)
(3 rows)

postgres=# explain (analyze, timing off) select sum(c) from foo;
QUERY
PLAN
---
 Aggregate  (cost=208332.23..208332.24 rows=1 width=8) (actual rows=1
loops=1)
   -  Seq Scan on foo  (cost=0.00..183332.58 rows=858 width=8) (actual
rows=1000 loops=1)
 Total runtime: 1348.555 ms (1364.585 ms)
(3 rows)

postgres=# explain (analyze, timing off) select sum(c) from foo group by gr;
 

Re: [HACKERS] Improving avg performance for numeric

2013-07-08 Thread Pavel Stehule
2013/7/8 Hadi Moshayedi h...@moshayedi.net:
 I am attaching the updated the patch, which also fixes a bug which
 caused one of the regression tests failed.

 I'll subscribe this patch to the commitfest in the next hour.

 Can you please review the patch?

sure, :)

Pavel


 Thanks,
-- Hadi


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Improving avg performance for numeric

2013-07-08 Thread Josh Berkus
On 07/07/2013 09:14 PM, Hadi Moshayedi wrote:
 I am attaching the updated the patch, which also fixes a bug which
 caused one of the regression tests failed.
 
 I'll subscribe this patch to the commitfest in the next hour.
 
 Can you please review the patch?

I'm afraid that, since this patch wasn't included anywhere near the
first week of the CommitFest, I can't possibly include it in the June
commitfest now.  Accordingly, I have moved it to the September
commitfest.  Hopefully someone can look at it before then.

Sorry for missing this in my patch sweep at the beginning of the CF.
Searching email for patches is, at best, inexact.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Improving avg performance for numeric

2013-07-08 Thread Pavel Stehule
Hello

2013/7/8 Josh Berkus j...@agliodbs.com:
 On 07/07/2013 09:14 PM, Hadi Moshayedi wrote:
 I am attaching the updated the patch, which also fixes a bug which
 caused one of the regression tests failed.

 I'll subscribe this patch to the commitfest in the next hour.

 Can you please review the patch?

 I'm afraid that, since this patch wasn't included anywhere near the
 first week of the CommitFest, I can't possibly include it in the June
 commitfest now.  Accordingly, I have moved it to the September
 commitfest.  Hopefully someone can look at it before then.

 Sorry for missing this in my patch sweep at the beginning of the CF.
 Searching email for patches is, at best, inexact.


sure, it should be in September CF. It is relative simple patch
without global impacts. But I like it, it increase speed for
sum(numeric) about 25% and avg(numeric) about 50%

Regards

Pavel

 --
 Josh Berkus
 PostgreSQL Experts Inc.
 http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Improving avg performance for numeric

2013-07-08 Thread Josh Berkus
Pavel,

 
 sure, it should be in September CF. It is relative simple patch
 without global impacts. But I like it, it increase speed for
 sum(numeric) about 25% and avg(numeric) about 50%

Do you think you could give this a review after CF1 ends, but before
September?  I hate to make Hadi wait just because I didn't see his patch.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Improving avg performance for numeric

2013-07-08 Thread Pavel Stehule
2013/7/8 Josh Berkus j...@agliodbs.com:
 Pavel,


 sure, it should be in September CF. It is relative simple patch
 without global impacts. But I like it, it increase speed for
 sum(numeric) about 25% and avg(numeric) about 50%

 Do you think you could give this a review after CF1 ends, but before
 September?  I hate to make Hadi wait just because I didn't see his patch.

yes, I can.

Regards

Pavel



 --
 Josh Berkus
 PostgreSQL Experts Inc.
 http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Improving avg performance for numeric

2013-07-03 Thread Pavel Stehule
Hello

2013/3/20 Hadi Moshayedi h...@moshayedi.net:
 Hi Tom,

 Tom Lane t...@sss.pgh.pa.us wrote:
 After thinking about that for awhile: if we pursue this type of
 optimization, what would probably be appropriate is to add an aggregate
 property (stored in pg_aggregate) that allows direct specification of
 the size that the planner should assume for the aggregate's transition
 value.  We were getting away with a hardwired assumption of 8K for
 internal because the existing aggregates that used that transtype all
 had similar properties, but it was always really a band-aid not a proper
 solution.  A per-aggregate override could be useful in other cases too.

 Cool.

 I created a patch which adds an aggregate property to pg_aggregate, so
 the transition space is can be overridden. This patch doesn't contain
 the numeric optimizations. It uses 0 (meaning not-set) for all
 existing aggregates.

 I manual-tested it a bit, by changing this value for aggregates and
 observing the changes in plan. I also updated some docs and pg_dump.

 Does this look like something along the lines of what you meant?

please, can you subscribe your patch to next commitfest?

I tested this patch, and it increase performance about 20% what is
interesting. More - it allows more comfortable custom aggregates for
custom types with better hash agg support.

Regards

Pavel


 Thanks,
   -- Hadi


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Improving avg performance for numeric

2013-03-20 Thread Pavel Stehule
Hello

2013/3/19 Tom Lane t...@sss.pgh.pa.us:
 I wrote:
 [ looks at patch... ]  Oh, I see what's affecting the plan: you changed
 the aggtranstypes to internal for a bunch of aggregates.  That's not
 very good, because right now the planner takes that to mean that the
 aggregate could eat a lot of space.  We don't want that to happen for
 these aggregates, I think.

 After thinking about that for awhile: if we pursue this type of
 optimization, what would probably be appropriate is to add an aggregate
 property (stored in pg_aggregate) that allows direct specification of
 the size that the planner should assume for the aggregate's transition
 value.  We were getting away with a hardwired assumption of 8K for
 internal because the existing aggregates that used that transtype all
 had similar properties, but it was always really a band-aid not a proper
 solution.  A per-aggregate override could be useful in other cases too.

 This was looking like 9.4 material already, but adding such a property
 would definitely put it over the top of what we could think about
 squeezing into 9.3, IMO.


Postgres is not a in memory OLAP database, but lot of companies use
it for OLAP queries due pg comfortable usage. This feature can be very
interesting for these users - and can introduce interesting speedup
with relative low price.

Regards

Pavel

 regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Improving avg performance for numeric

2013-03-20 Thread Hadi Moshayedi
Hi Tom,

Tom Lane t...@sss.pgh.pa.us wrote:
 After thinking about that for awhile: if we pursue this type of
 optimization, what would probably be appropriate is to add an aggregate
 property (stored in pg_aggregate) that allows direct specification of
 the size that the planner should assume for the aggregate's transition
 value.  We were getting away with a hardwired assumption of 8K for
 internal because the existing aggregates that used that transtype all
 had similar properties, but it was always really a band-aid not a proper
 solution.  A per-aggregate override could be useful in other cases too.

Cool.

I created a patch which adds an aggregate property to pg_aggregate, so
the transition space is can be overridden. This patch doesn't contain
the numeric optimizations. It uses 0 (meaning not-set) for all
existing aggregates.

I manual-tested it a bit, by changing this value for aggregates and
observing the changes in plan. I also updated some docs and pg_dump.

Does this look like something along the lines of what you meant?

Thanks,
  -- Hadi


aggregate-transspace.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Improving avg performance for numeric

2013-03-19 Thread Hadi Moshayedi
Hello,

I updated the patch by taking ideas from your patch, and unifying the
transition struct and update function for different aggregates. The speed
of avg improved even more. It now has 60% better performance than the
current committed version.

This patch optimizes numeric/int8 sum, avg, stddev_pop, stddev_samp,
var_pop, var_samp.

I also noticed that this patch makes matview test fail. It seems that it
just changes the ordering of rows for queries like SELECT * FROM tv;.
Does this seem like a bug in my patch, or should we add ORDER BY clauses
to this test to make it more deterministic?

I also agree with you that adding sum functions to use preallocated buffers
will make even more optimization. I'll try to see if I can find a simple
way to do this.

Thanks,
  -- Hadi

On Mon, Mar 18, 2013 at 5:25 PM, Pavel Stehule pavel.steh...@gmail.comwrote:

 Hello

 I played with sum(numeric) optimization

 Now it is based on generic numeric_add function - this code is
 relative old - now we can design aggregates with internal transition
 buffers, and probably we can do this work more effective.

 I just removed useles palloc/free operations and I got a 30% better
 performance! My patch is ugly - because I used a generic add_var
 function. Because Sum, Avg and almost all aggregates functions is
 limited by speed of sum calculation I thing so we need a new numeric
 routines optimized for calculation sum, that use a only preallocated
 buffers. A speed of numeric is more important now, because there are
 more and more warehouses, where CPU is botleneck.

 Regards

 Pavel


 2013/3/18 Hadi Moshayedi h...@moshayedi.net:
  Hi Pavel,
 
  Thanks a lot for your feedback.
 
  I'll work more on this patch this week, and will send a more complete
 patch
  later this week.
 
  I'll also try to see how much is the speed up of this method for other
  types.
 
  Thanks,
-- Hadi
 
 
  On Mon, Mar 18, 2013 at 10:36 AM, Pavel Stehule pavel.steh...@gmail.com
 
  wrote:
 
  2013/3/16 Hadi Moshayedi h...@moshayedi.net:
   Revisiting:
   http://www.postgresql.org/message-id/45661be7.4050...@paradise.net.nz
  
   I think the reasons which the numeric average was slow were:
 (1) Using Numeric for count, which is slower than int8 to increment,
 (2) Constructing/deconstructing arrays at each transition step.
  
   This is also discussed at:
  
  
 http://www.citusdata.com/blog/53-postgres-performance-to-avg-or-to-sum-divided-by-count
  
   So, I think we can improve the speed of numeric average by keeping the
   transition state as an struct in the aggregate context, and just
 passing
   the
   pointer to that struct from/to the aggregate transition function.
  
   The attached patch uses this method.
  
   I tested it using the data generated using:
   CREATE TABLE avg_test AS SELECT (random() * 999)::decimal(5,2) as d
 FROM
   generate_series(1, 1000) s;
  
   After applying this patch, run time of SELECT avg(d) FROM avg_test;
   improves from 10.701 seconds to 5.204 seconds, which seems to be a
 huge
   improvement.
  
   I think we may also be able to use a similar method to improve
   performance
   of some other numeric aggregates (like stddev). But I want to know
 your
   feedback first.
  
   Is this worth working on?
 
  I checked this patch and it has a interesting speedup - and a price of
  this methoud should not be limited to numeric type only
 
  Pavel
 
  
   Thanks,
 -- Hadi
  
  
  
   --
   Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
   To make changes to your subscription:
   http://www.postgresql.org/mailpref/pgsql-hackers
  
 
 



numeric-optimize-v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Improving avg performance for numeric

2013-03-19 Thread Kevin Grittner
Hadi Moshayedi h...@moshayedi.net wrote:

 I updated the patch by taking ideas from your patch, and unifying
 the transition struct and update function for different
 aggregates. The speed of avg improved even more. It now has 60%
 better performance than the current committed version.

Outstanding!

 I also noticed that this patch makes matview test fail. It seems
 that it just changes the ordering of rows for queries like
 SELECT * FROM tv;. Does this seem like a bug in my patch, or
 should we add ORDER BY clauses to this test to make it more
 deterministic?

I added some ORDER BY clauses.  That is probably a good thing
anyway for purposes of code coverage.  Does that fix it for you?

-- 
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Improving avg performance for numeric

2013-03-19 Thread Pavel Stehule
2013/3/19 Kevin Grittner kgri...@ymail.com:
 Hadi Moshayedi h...@moshayedi.net wrote:

 I updated the patch by taking ideas from your patch, and unifying
 the transition struct and update function for different
 aggregates. The speed of avg improved even more. It now has 60%
 better performance than the current committed version.

 Outstanding!

I did some tests ala  OLAP queries and I am thinking so ~ 40% speedup
for queries with AVG is realistic. Depends on other conditions.

But there are lot of situation when data are in shared buffers or file
system memory and then this patch can carry significant speedup - and
probably can be better if some better algorithm for sum two numeric
numbers in aggregate.

Regards

Pavel


 I also noticed that this patch makes matview test fail. It seems
 that it just changes the ordering of rows for queries like
 SELECT * FROM tv;. Does this seem like a bug in my patch, or
 should we add ORDER BY clauses to this test to make it more
 deterministic?

 I added some ORDER BY clauses.  That is probably a good thing
 anyway for purposes of code coverage.  Does that fix it for you?

 --
 Kevin Grittner
 EnterpriseDB: http://www.enterprisedb.com
 The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Improving avg performance for numeric

2013-03-19 Thread Hadi Moshayedi
I am not sure how this works, but I also changed numeric sum(), and the
views in question had a numeric sum() column. Can that have any impact?

I am going to dig deeper to see why this happens.

On Tue, Mar 19, 2013 at 6:25 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 Kevin Grittner kgri...@ymail.com writes:
  Hadi Moshayedi h...@moshayedi.net wrote:
  I also noticed that this patch makes matview test fail. It seems
  that it just changes the ordering of rows for queries like
  SELECT * FROM tv;. Does this seem like a bug in my patch, or
  should we add ORDER BY clauses to this test to make it more
  deterministic?

  I added some ORDER BY clauses.  That is probably a good thing
  anyway for purposes of code coverage.  Does that fix it for you?

 Uh, what?  Fooling around with the implementation of avg() should surely
 not change any planning decisions.

 regards, tom lane



Re: [HACKERS] Improving avg performance for numeric

2013-03-19 Thread Tom Lane
[ please do not top-reply ]

Hadi Moshayedi h...@moshayedi.net writes:
 On Tue, Mar 19, 2013 at 6:25 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Uh, what?  Fooling around with the implementation of avg() should surely
 not change any planning decisions.

 I am not sure how this works, but I also changed numeric sum(), and the
 views in question had a numeric sum() column. Can that have any impact?

[ looks at patch... ]  Oh, I see what's affecting the plan: you changed
the aggtranstypes to internal for a bunch of aggregates.  That's not
very good, because right now the planner takes that to mean that the
aggregate could eat a lot of space.  We don't want that to happen for
these aggregates, I think.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Improving avg performance for numeric

2013-03-19 Thread Tom Lane
I wrote:
 [ looks at patch... ]  Oh, I see what's affecting the plan: you changed
 the aggtranstypes to internal for a bunch of aggregates.  That's not
 very good, because right now the planner takes that to mean that the
 aggregate could eat a lot of space.  We don't want that to happen for
 these aggregates, I think.

After thinking about that for awhile: if we pursue this type of
optimization, what would probably be appropriate is to add an aggregate
property (stored in pg_aggregate) that allows direct specification of
the size that the planner should assume for the aggregate's transition
value.  We were getting away with a hardwired assumption of 8K for
internal because the existing aggregates that used that transtype all
had similar properties, but it was always really a band-aid not a proper
solution.  A per-aggregate override could be useful in other cases too.

This was looking like 9.4 material already, but adding such a property
would definitely put it over the top of what we could think about
squeezing into 9.3, IMO.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Improving avg performance for numeric

2013-03-18 Thread Pavel Stehule
2013/3/16 Hadi Moshayedi h...@moshayedi.net:
 Revisiting:
 http://www.postgresql.org/message-id/45661be7.4050...@paradise.net.nz

 I think the reasons which the numeric average was slow were:
   (1) Using Numeric for count, which is slower than int8 to increment,
   (2) Constructing/deconstructing arrays at each transition step.

 This is also discussed at:
 http://www.citusdata.com/blog/53-postgres-performance-to-avg-or-to-sum-divided-by-count

 So, I think we can improve the speed of numeric average by keeping the
 transition state as an struct in the aggregate context, and just passing the
 pointer to that struct from/to the aggregate transition function.

 The attached patch uses this method.

 I tested it using the data generated using:
 CREATE TABLE avg_test AS SELECT (random() * 999)::decimal(5,2) as d FROM
 generate_series(1, 1000) s;

 After applying this patch, run time of SELECT avg(d) FROM avg_test;
 improves from 10.701 seconds to 5.204 seconds, which seems to be a huge
 improvement.

 I think we may also be able to use a similar method to improve performance
 of some other numeric aggregates (like stddev). But I want to know your
 feedback first.

 Is this worth working on?

I checked this patch and it has a interesting speedup - and a price of
this methoud should not be limited to numeric type only

Pavel


 Thanks,
   -- Hadi



 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Improving avg performance for numeric

2013-03-18 Thread Hadi Moshayedi
Hi Pavel,

Thanks a lot for your feedback.

I'll work more on this patch this week, and will send a more complete patch
later this week.

I'll also try to see how much is the speed up of this method for other
types.

Thanks,
  -- Hadi

On Mon, Mar 18, 2013 at 10:36 AM, Pavel Stehule pavel.steh...@gmail.comwrote:

 2013/3/16 Hadi Moshayedi h...@moshayedi.net:
  Revisiting:
  http://www.postgresql.org/message-id/45661be7.4050...@paradise.net.nz
 
  I think the reasons which the numeric average was slow were:
(1) Using Numeric for count, which is slower than int8 to increment,
(2) Constructing/deconstructing arrays at each transition step.
 
  This is also discussed at:
 
 http://www.citusdata.com/blog/53-postgres-performance-to-avg-or-to-sum-divided-by-count
 
  So, I think we can improve the speed of numeric average by keeping the
  transition state as an struct in the aggregate context, and just passing
 the
  pointer to that struct from/to the aggregate transition function.
 
  The attached patch uses this method.
 
  I tested it using the data generated using:
  CREATE TABLE avg_test AS SELECT (random() * 999)::decimal(5,2) as d FROM
  generate_series(1, 1000) s;
 
  After applying this patch, run time of SELECT avg(d) FROM avg_test;
  improves from 10.701 seconds to 5.204 seconds, which seems to be a huge
  improvement.
 
  I think we may also be able to use a similar method to improve
 performance
  of some other numeric aggregates (like stddev). But I want to know your
  feedback first.
 
  Is this worth working on?

 I checked this patch and it has a interesting speedup - and a price of
 this methoud should not be limited to numeric type only

 Pavel

 
  Thanks,
-- Hadi
 
 
 
  --
  Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
  To make changes to your subscription:
  http://www.postgresql.org/mailpref/pgsql-hackers
 



Re: [HACKERS] Improving avg performance for numeric

2013-03-18 Thread Pavel Stehule
Hello

I played with sum(numeric) optimization

Now it is based on generic numeric_add function - this code is
relative old - now we can design aggregates with internal transition
buffers, and probably we can do this work more effective.

I just removed useles palloc/free operations and I got a 30% better
performance! My patch is ugly - because I used a generic add_var
function. Because Sum, Avg and almost all aggregates functions is
limited by speed of sum calculation I thing so we need a new numeric
routines optimized for calculation sum, that use a only preallocated
buffers. A speed of numeric is more important now, because there are
more and more warehouses, where CPU is botleneck.

Regards

Pavel


2013/3/18 Hadi Moshayedi h...@moshayedi.net:
 Hi Pavel,

 Thanks a lot for your feedback.

 I'll work more on this patch this week, and will send a more complete patch
 later this week.

 I'll also try to see how much is the speed up of this method for other
 types.

 Thanks,
   -- Hadi


 On Mon, Mar 18, 2013 at 10:36 AM, Pavel Stehule pavel.steh...@gmail.com
 wrote:

 2013/3/16 Hadi Moshayedi h...@moshayedi.net:
  Revisiting:
  http://www.postgresql.org/message-id/45661be7.4050...@paradise.net.nz
 
  I think the reasons which the numeric average was slow were:
(1) Using Numeric for count, which is slower than int8 to increment,
(2) Constructing/deconstructing arrays at each transition step.
 
  This is also discussed at:
 
  http://www.citusdata.com/blog/53-postgres-performance-to-avg-or-to-sum-divided-by-count
 
  So, I think we can improve the speed of numeric average by keeping the
  transition state as an struct in the aggregate context, and just passing
  the
  pointer to that struct from/to the aggregate transition function.
 
  The attached patch uses this method.
 
  I tested it using the data generated using:
  CREATE TABLE avg_test AS SELECT (random() * 999)::decimal(5,2) as d FROM
  generate_series(1, 1000) s;
 
  After applying this patch, run time of SELECT avg(d) FROM avg_test;
  improves from 10.701 seconds to 5.204 seconds, which seems to be a huge
  improvement.
 
  I think we may also be able to use a similar method to improve
  performance
  of some other numeric aggregates (like stddev). But I want to know your
  feedback first.
 
  Is this worth working on?

 I checked this patch and it has a interesting speedup - and a price of
 this methoud should not be limited to numeric type only

 Pavel

 
  Thanks,
-- Hadi
 
 
 
  --
  Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
  To make changes to your subscription:
  http://www.postgresql.org/mailpref/pgsql-hackers
 




numeric-sum-optimize.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Improving avg performance for numeric

2013-03-15 Thread Pavel Stehule
Hello

2013/3/16 Hadi Moshayedi h...@moshayedi.net:
 Revisiting:
 http://www.postgresql.org/message-id/45661be7.4050...@paradise.net.nz

 I think the reasons which the numeric average was slow were:
   (1) Using Numeric for count, which is slower than int8 to increment,
   (2) Constructing/deconstructing arrays at each transition step.

 This is also discussed at:
 http://www.citusdata.com/blog/53-postgres-performance-to-avg-or-to-sum-divided-by-count

 So, I think we can improve the speed of numeric average by keeping the
 transition state as an struct in the aggregate context, and just passing the
 pointer to that struct from/to the aggregate transition function.

 The attached patch uses this method.

 I tested it using the data generated using:
 CREATE TABLE avg_test AS SELECT (random() * 999)::decimal(5,2) as d FROM
 generate_series(1, 1000) s;

 After applying this patch, run time of SELECT avg(d) FROM avg_test;
 improves from 10.701 seconds to 5.204 seconds, which seems to be a huge
 improvement.

 I think we may also be able to use a similar method to improve performance
 of some other numeric aggregates (like stddev). But I want to know your
 feedback first.

 Is this worth working on?

nice

+1

Regards

Pavel



 Thanks,
   -- Hadi



 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers