Re: [SQL] Aggregate query for multiple records
Scott Gerhardt <[EMAIL PROTECTED]> writes:
> Hello, I am new to the list, my apology if this question is beyond the scope or
> charter of this list.
>
> My questions is:
> What is the best method to perform an aggregate query to calculate sum() values
> for each distinct wid as in the example below, but except for all wid's (not
> just WHERE wid='01/1-6-1-30w1/0').
>
> Also, performance wise, would it be better to build a function for this query.
> The table has 9 million records and these aggregate queries take hours.
The "top n" type query (or in this case "first n" or "last n" but it's the
same thing) is actually very tricky to do in standard SQL. The best solution
seen here for postgres is to use arrays and custom aggregate functions.
The following is based on a previous answer from Tom Lane to a similar
question. (I thought I already posted this for you on pgsql-general but the
list archives are down and you don't seem to have seen it, so I'm resending
it)
It allows you to do the whole query with a single sort for the grouping and
the ordering by date together. You would have to use it with something like:
SELECT sum_first_6(oil)
FROM (SELECT oil from prd_data ORDER BY wid, "date")
GROUP BY wid
If you pump up sort_mem enough -- you can do it within the session for the
single connection using "SET sort_mem" it should be pretty fast.
I think it's the best you're going to get. If you're absolutely sure the data
is physically stored in chronological order -- which I would only feel
comfortable with if you've never done any updates or deletes, only inserts and
perhaps occasional truncates, then you might be able to get by without
ordering and convince it to do a hash aggregate. That would be the optimal
result, no sorts at all. But it would be hard to make sure it would always
work.
test=> create or replace function first_6_accum (integer[], integer) returns integer[]
language sql immutable as 'select case when array_upper($1,1)>=6 then $1 else $1||$2
end';
CREATE FUNCTION
test=> create function sum_6(integer[]) returns integer immutable language sql as
'select $1[1]+$1[2]+$1[3]+$1[4]+$1[5]+$1[6]';
CREATE FUNCTION
test=> create aggregate sum_first_6 (basetype=integer, sfunc=first_6_accum,
stype=integer[],initcond='{}',finalfunc=sum_6);
CREATE AGGREGATE
test=> select sum_first_6(i) from (select i from (select 1 as i union select 2 union
select 3 union select 4 union select 5 union select 6 union select 7 union select 8)
as x order by i desc) as x;
sum_first_6
-
33
(1 row)
test=> select sum_first_6(i) from (select i from (select 1 as i union select 2 union
select 3 union select 4 union select 5 union select 6 union select 7 union select 8)
as x order by i asc) as x;
sum_first_6
-
21
(1 row)
This can easily be switched around to make it "last_6" and you can write
functions to handle 6 records or 9 records. And all of these could be combined
in a single query, so you only have to do the sort once.
Unfortunately you cannot make aggregate functions that take multiple
parameters, nor can you pass extra parameters to the state function. So you'll
have to create a separate set of functions for each variant.
Also, you'll have to change it to use reals.
--
greg
---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match
Re: [SQL] backup of a specific schema
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Kenneth Gonsalves wrote: | On Saturday 28 August 2004 07:46 am, Gaetano Mendola wrote: | |>Kenneth Gonsalves wrote: |> |>>On Friday 27 August 2004 01:17 pm, Michalis Kabrianis wrote: |>> |>>>Kenneth Gonsalves wrote: |>>> |hi, |is there anyway to backup/restore a specfic schema in a database? |>>> |>>>Hi, |>>>isn't this what pg_dump --schema=SCHEMA does? |>> |>>thanx - was looking in the wrong place in the manual |> |>Which one ? | | | backup/restore Well, that chapter speak about pg_dump, don't you had the curiosity to look at the complete options for that command ? :-) Regards Gaetano Mendola -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.4 (MingW32) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFBMH8m7UpzwH2SGd4RAquSAKCDzx1l0dwnar1ZLIjSeHnr2TKyZgCgzKW9 2Jk/+LoMiTv0yRXEqbu3A3o= =jUGa -END PGP SIGNATURE- ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [SQL] from PG_DUMP to CVS
At 02:02 AM 28/08/2004, Josh Berkus wrote:
some-dbname/some-schema/TABLES/sometable.sql
some-dbname/some-schema/VIEWS/someview.sql
some-dbname/some-schema/FUNCTIONS/somefunction-param{codes}.sql
some-dbname/some-schema/TYPES/sometype.sql
some-dbname/some-schema/OPERATORS/OPsomeoperator.sql
In this last, all dependant objects of, for example, a table (rules,
triggers,
indexes, etc. ) would be rolled up into one file. It's this last version
that I personally favor.
Not sure I like it from the PoV of easily seeing what has changed.
Similarly:
some-dbname/some-schema/TABLES/sometable/create.sql
some-dbname/some-schema/TABLES/sometable/indexes.sql
some-dbname/some-schema/TABLES/sometable/constraints.sql
combines all indexes into one file. It also has the disadvantage if being
impossible to construct from an existing dump file.
I'd like to be able to construct the structure from the information stored
in a dump file, without parsing SQL. OTOH, it might be nice to add some
more information to the dump file.
Philip Warner| __---_
Albatross Consulting Pty. Ltd. |/ - \
(A.B.N. 75 008 659 498) | /(@) __---_
Tel: (+61) 0500 83 82 81 | _ \
Fax: (+61) 03 5330 3172 | ___ |
Http://www.rhyme.com.au |/ \|
|----
PGP key available upon request, | /
and from pgp.mit.edu:11371 |/
---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match
Re: [SQL] Aggregate query for multiple records
Greg Stark <[EMAIL PROTECTED]> writes:
> [ nice example snipped ]
> ... Also, you'll have to change it to use reals.
That part, at least, can be worked around as of 7.4: use polymorphic
functions. You can declare the functions and aggregate as working on
anyelement/anyarray, and then they will automatically work on any
datatype that has a + operator.
regression=# create or replace function first_6_accum (anyarray,anyelement) returns
anyarray
regression-# language sql immutable as 'select case when array_upper($1,1)>=6 then $1
else $1||$2 end';
CREATE FUNCTION
regression=# create function sum_6(anyarray) returns anyelement immutable language sql
as 'select $1[1]+$1[2]+$1[3]+$1[4]+$1[5]+$1[6]';
CREATE FUNCTION
regression=# create aggregate sum_first_6 (basetype=anyelement, sfunc=first_6_accum,
stype=anyarray,initcond='{}',finalfunc=sum_6);
CREATE AGGREGATE
regression=# select sum_first_6(i) from (select i from (select 1 as i union select 2
union select 3 union select 4 union select 5 union select 6 union select 7 union
select 8) as x order by i desc) as x;
sum_first_6
-
33
(1 row)
regression=# select sum_first_6(i) from (select i from (select 1.1 as i union select 2
union select 3 union select 4 union select 5 union select 6 union select 7.7 union
select 8) as x order by i desc) as x;
sum_first_6
-
33.7
(1 row)
regression=#
regards, tom lane
---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
http://www.postgresql.org/docs/faqs/FAQ.html
Re: [SQL] Aggregate query for multiple records
Greg Stark <[EMAIL PROTECTED]> writes:
[ nice example snipped ]
... Also, you'll have to change it to use reals.
That part, at least, can be worked around as of 7.4: use polymorphic
functions. You can declare the functions and aggregate as working on
anyelement/anyarray, and then they will automatically work on any
datatype that has a + operator.
regression=# create or replace function first_6_accum
(anyarray,anyelement) returns anyarray
regression-# language sql immutable as 'select case when
array_upper($1,1)>=6 then $1 else $1||$2 end';
CREATE FUNCTION
regression=# create function sum_6(anyarray) returns anyelement
immutable language sql as 'select
$1[1]+$1[2]+$1[3]+$1[4]+$1[5]+$1[6]';
CREATE FUNCTION
regression=# create aggregate sum_first_6 (basetype=anyelement,
sfunc=first_6_accum, stype=anyarray,initcond='{}',finalfunc=sum_6);
CREATE AGGREGATE
regression=# select sum_first_6(i) from (select i from (select 1 as i
union select 2 union select 3 union select 4 union select 5 union
select 6 union select 7 union select 8) as x order by i desc) as x;
sum_first_6
-
33
(1 row)
regression=# select sum_first_6(i) from (select i from (select 1.1 as
i union select 2 union select 3 union select 4 union select 5 union
select 6 union select 7.7 union select 8) as x order by i desc) as x;
sum_first_6
-
33.7
(1 row)
regression=#
regards, tom lane
An alternate solution I'm thinking is to add column to hold a
"total_months" value that could be used to simplify queries and speed
queries ( i.e. first month of oil productin = 1, second = 2 etc.) That
way I can use select the first 6 months by using "where < 6", or any
month interval for that matter.
The following query, suggested by another list member (thanks Josh
Berkus), to populate the "total_months" column sort of work but doesn't
handle the year wrapping as it adds 88 when the year wraps (see output
below).
UPDATE prd_data_test SET months_prod = prd_data_test."date" -
prd2."date" + 1
FROM prd_data_test prd2
WHERE prd_data_test.wid = prd2.wid
AND prd2."date" = ( SELECT "date" FROM prd_data_test prd3
WHERE prd3.wid = prd2.wid
ORDER BY "date" LIMIT 1 );
The results are:
SEM=# select * from prd_data_test order by wid, date limit 20;
date | hours | oil | gas | water | pwid | wid | year
| month_prd | months_prod
+---+---+--+---+--+-+--
+---+-
196505 | 480 | 194.3 | 10.3 | 0 |1 | 01/1-6-1-30w1/0 | 1965
| | 1
196506 | 600 | 279.4 | 13.1 | 0 |1 | 01/1-6-1-30w1/0 | 1965
| | 2
196507 | 744 | 288.1 | 4.5 | 0 |1 | 01/1-6-1-30w1/0 | 1965
| | 3
196508 | 720 | 234.6 | 9.4 | 2.9 |1 | 01/1-6-1-30w1/0 | 1965
| | 4
196509 | 648 | 208.2 | 12.5 | 6 |1 | 01/1-6-1-30w1/0 | 1965
| | 5
196510 | 744 | 209.8 | 15.3 | 0 |1 | 01/1-6-1-30w1/0 | 1965
| | 6
196511 | 720 | 180.5 | 13.9 | 27.7 |1 | 01/1-6-1-30w1/0 | 1965
| | 7
196512 | 744 | 227.4 | 22.8 | 5.2 |1 | 01/1-6-1-30w1/0 | 1965
| | 8
196601 | 744 | 230.3 | 22.7 |10 |1 | 01/1-6-1-30w1/0 | 1966
| | 97
196602 | 672 | 173.2 | 16.5 |17 |1 | 01/1-6-1-30w1/0 | 1966
| | 98
196603 | 744 | 197.2 | 18.7 | 9.2 |1 | 01/1-6-1-30w1/0 | 1966
| | 99
196604 | 720 | 168.1 | 14.1 | 3 |1 | 01/1-6-1-30w1/0 | 1966
| | 100
Table description:
Table "prd_data"
Column | Type | Modifiers
+---+---
date | integer |
hours | real |
oil| real |
gas| real |
water | real |
pwid | integer |
wid| character varying(20) |
year | smallint |
Indexes: wid_index6
Actual table (prd_data), 9 million records:
date | hours | oil | gas | water | pwid | wid | year
+---+---+--+---+--+-+--
196507 | 360 | 159.4 | 11.3 | 40.9 | 413 | 01/1-1-1-31w1/0 | 1965
196508 | 744 | 280 | 20 | 27.2 | 413 | 01/1-1-1-31w1/0 | 1965
196509 | 360 | 171.1 | 11.4 | 50.4 | 413 | 01/1-1-1-31w1/0 | 1965
196510 | 744 | 202.1 | 25 | 89.8 | 413 | 01/1-1-1-31w1/0 | 1965
196512 | 744 | 201.3 | 23.8 | 71.9 | 413 | 01/1-1-1-31w1/0 | 1965
196511 | 720 | 184 | 17.6 | 78.9 | 413 | 01/1-1-1-31w1/0 | 1965
196610 | 744 | 99.8 | 15.4 | 53.7 | 413 | 01/1-1-1-31w1/0 | 1966
196612 | 744 |86 | 12.8 | 36.1 | 413 | 01/1-1-1-31w1/0 | 1966
196611 | 720 |86 | 12.6 | 41.7 | 413 | 01/1-1-1-31w1/0 | 1966
196601 | 744 | 191.6 | 22.6 | 50.7 |
Re: [SQL] backup of a specific schema
On Saturday 28 August 2004 06:18 pm, Gaetano Mendola wrote: > | > | backup/restore > > Well, that chapter speak about pg_dump, don't you had the curiosity > to look at the complete options for that command ? :-) yes. so i looked in the book from which i was learning postgres. it gave about 3 options and didnt say there were more. so i didnt look further. i thought maybe it has to be done programmatically which is why i posted the query -- regards kg http://www.onlineindianhotels.net - fastest hotel search website in the world http://www.ootygolfclub.org ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [SQL] Aggregate query for multiple records
Scott, > Unfortunately, your revised query works like a charm except for the > fact that prd_data."date" - prd2."date" + 1 give incorrect values when > the year wraps, see in the output below. Need to conditionally > subtract 88 from the date or use an incrementing count() function > instead of date math to get the correct values. Oh, that column is text, not a serial number. Well, it's your fault for using wierd custom data types; you figure it out. > I'm also wondering if using the date functions of Postgres would be > helpful since the date column does have the -year and MM-month > parts. Well, you'd want to convert the column to a timestamp, and then you could compute months. Or you could break it in seperate integer "year" and "month" columns and do the same thing. -- Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
