date:20040828

Re: [SQL] Aggregate query for multiple records

2004-08-28 Thread Greg Stark

Scott Gerhardt <[EMAIL PROTECTED]> writes:

> Hello, I am new to the list, my apology if this question is beyond the scope or
> charter of this list.
> 
> My questions is:
> What is the best method to perform an aggregate query to calculate sum() values
> for each distinct wid as in the example below, but except for all wid's (not
> just WHERE wid='01/1-6-1-30w1/0').
> 
> Also, performance wise, would it be better to build a function for this query.
> The table has 9 million records and these aggregate queries take hours.

The "top n" type query (or in this case "first n" or "last n" but it's the
same thing) is actually very tricky to do in standard SQL. The best solution
seen here for postgres is to use arrays and custom aggregate functions. 

The following is based on a previous answer from Tom Lane to a similar
question. (I thought I already posted this for you on pgsql-general but the
list archives are down and you don't seem to have seen it, so I'm resending
it)

It allows you to do the whole query with a single sort for the grouping and
the ordering by date together. You would have to use it with something like:

SELECT sum_first_6(oil) 
  FROM (SELECT oil from prd_data ORDER BY wid, "date") 
 GROUP BY wid

If you pump up sort_mem enough -- you can do it within the session for the
single connection using "SET sort_mem" it should be pretty fast.

I think it's the best you're going to get. If you're absolutely sure the data
is physically stored in chronological order -- which I would only feel
comfortable with if you've never done any updates or deletes, only inserts and
perhaps occasional truncates, then you might be able to get by without
ordering and convince it to do a hash aggregate. That would be the optimal
result, no sorts at all. But it would be hard to make sure it would always
work.

test=> create or replace function first_6_accum (integer[], integer) returns integer[] 
language sql immutable as 'select case when array_upper($1,1)>=6 then $1 else $1||$2 
end';
CREATE FUNCTION

test=> create function sum_6(integer[]) returns integer immutable language sql as 
'select $1[1]+$1[2]+$1[3]+$1[4]+$1[5]+$1[6]';
CREATE FUNCTION

test=> create aggregate sum_first_6 (basetype=integer, sfunc=first_6_accum, 
stype=integer[],initcond='{}',finalfunc=sum_6);
CREATE AGGREGATE

test=> select sum_first_6(i) from (select i from (select 1 as i union select 2 union 
select 3 union select 4 union select 5 union select 6 union select 7 union select 8) 
as x order by i desc) as x;
 sum_first_6 
-
  33
(1 row)

test=> select sum_first_6(i) from (select i from (select 1 as i union select 2 union 
select 3 union select 4 union select 5 union select 6 union select 7 union select 8) 
as x order by i asc) as x;
 sum_first_6 
-
  21
(1 row)

This can easily be switched around to make it "last_6" and you can write
functions to handle 6 records or 9 records. And all of these could be combined
in a single query, so you only have to do the sort once.

Unfortunately you cannot make aggregate functions that take multiple
parameters, nor can you pass extra parameters to the state function. So you'll
have to create a separate set of functions for each variant.

Also, you'll have to change it to use reals.

-- 
greg

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

Re: [SQL] backup of a specific schema

2004-08-28 Thread Gaetano Mendola

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
Kenneth Gonsalves wrote:
| On Saturday 28 August 2004 07:46 am, Gaetano Mendola wrote:
|
|>Kenneth Gonsalves wrote:
|>
|>>On Friday 27 August 2004 01:17 pm, Michalis Kabrianis wrote:
|>>
|>>>Kenneth Gonsalves wrote:
|>>>
|hi,
|is there anyway to backup/restore a specfic schema in a database?
|>>>
|>>>Hi,
|>>>isn't this what pg_dump --schema=SCHEMA does?
|>>
|>>thanx - was looking in the wrong place in the manual
|>
|>Which one ?
|
|
| backup/restore
Well, that chapter speak about pg_dump, don't you had the curiosity
to look at the complete options for that command ? :-)

Regards
Gaetano Mendola


-BEGIN PGP SIGNATURE-
Version: GnuPG v1.2.4 (MingW32)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org
iD8DBQFBMH8m7UpzwH2SGd4RAquSAKCDzx1l0dwnar1ZLIjSeHnr2TKyZgCgzKW9
2Jk/+LoMiTv0yRXEqbu3A3o=
=jUGa
-END PGP SIGNATURE-
---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org

Re: [SQL] from PG_DUMP to CVS

2004-08-28 Thread Philip Warner

At 02:02 AM 28/08/2004, Josh Berkus wrote:
some-dbname/some-schema/TABLES/sometable.sql
some-dbname/some-schema/VIEWS/someview.sql
some-dbname/some-schema/FUNCTIONS/somefunction-param{codes}.sql
some-dbname/some-schema/TYPES/sometype.sql
some-dbname/some-schema/OPERATORS/OPsomeoperator.sql
In this last, all dependant objects of, for example, a table (rules, 
triggers,
indexes, etc. ) would be rolled up into one file.  It's this last version
that I personally favor.
Not sure I like it from the PoV of easily seeing what has changed.
Similarly:
some-dbname/some-schema/TABLES/sometable/create.sql
some-dbname/some-schema/TABLES/sometable/indexes.sql
some-dbname/some-schema/TABLES/sometable/constraints.sql
combines all indexes into one file. It also has the disadvantage if being 
impossible to construct from an existing dump file.

I'd like to be able to construct the structure from the information stored 
in a dump file, without parsing SQL. OTOH, it might be nice to add some 
more information to the dump file.



Philip Warner| __---_
Albatross Consulting Pty. Ltd.   |/   -  \
(A.B.N. 75 008 659 498)  |  /(@)   __---_
Tel: (+61) 0500 83 82 81 | _  \
Fax: (+61) 03 5330 3172  | ___ |
Http://www.rhyme.com.au  |/   \|
 |----
PGP key available upon request,  |  /
and from pgp.mit.edu:11371   |/ 

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match

Re: [SQL] Aggregate query for multiple records

2004-08-28 Thread Tom Lane

Greg Stark <[EMAIL PROTECTED]> writes:
> [ nice example snipped ]
> ... Also, you'll have to change it to use reals.

That part, at least, can be worked around as of 7.4: use polymorphic
functions.  You can declare the functions and aggregate as working on
anyelement/anyarray, and then they will automatically work on any
datatype that has a + operator.

regression=# create or replace function first_6_accum (anyarray,anyelement) returns 
anyarray
regression-# language sql immutable as 'select case when array_upper($1,1)>=6 then $1 
else $1||$2 end';
CREATE FUNCTION
regression=# create function sum_6(anyarray) returns anyelement immutable language sql 
as 'select $1[1]+$1[2]+$1[3]+$1[4]+$1[5]+$1[6]';
CREATE FUNCTION
regression=# create aggregate sum_first_6 (basetype=anyelement, sfunc=first_6_accum, 
stype=anyarray,initcond='{}',finalfunc=sum_6);
CREATE AGGREGATE
regression=# select sum_first_6(i) from (select i from (select 1 as i union select 2 
union select 3 union select 4 union select 5 union select 6 union select 7 union 
select 8) as x order by i desc) as x;
 sum_first_6
-
  33
(1 row)

regression=# select sum_first_6(i) from (select i from (select 1.1 as i union select 2 
union select 3 union select 4 union select 5 union select 6 union select 7.7 union 
select 8) as x order by i desc) as x;
 sum_first_6
-
33.7
(1 row)

regression=#

regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html

Re: [SQL] Aggregate query for multiple records

2004-08-28 Thread Scott Gerhardt


Greg Stark <[EMAIL PROTECTED]> writes:
[ nice example snipped ]
... Also, you'll have to change it to use reals.
That part, at least, can be worked around as of 7.4: use polymorphic
functions.  You can declare the functions and aggregate as working on
anyelement/anyarray, and then they will automatically work on any
datatype that has a + operator.
regression=# create or replace function first_6_accum  
(anyarray,anyelement) returns anyarray
regression-# language sql immutable as 'select case when  
array_upper($1,1)>=6 then $1 else $1||$2 end';
CREATE FUNCTION
regression=# create function sum_6(anyarray) returns anyelement  
immutable language sql as 'select  
$1[1]+$1[2]+$1[3]+$1[4]+$1[5]+$1[6]';
CREATE FUNCTION
regression=# create aggregate sum_first_6 (basetype=anyelement,  
sfunc=first_6_accum, stype=anyarray,initcond='{}',finalfunc=sum_6);
CREATE AGGREGATE
regression=# select sum_first_6(i) from (select i from (select 1 as i  
union select 2 union select 3 union select 4 union select 5 union  
select 6 union select 7 union select 8) as x order by i desc) as x;
 sum_first_6
-
  33
(1 row)

regression=# select sum_first_6(i) from (select i from (select 1.1 as  
i union select 2 union select 3 union select 4 union select 5 union  
select 6 union select 7.7 union select 8) as x order by i desc) as x;
 sum_first_6
-
33.7
(1 row)

regression=#
regards, tom lane

An alternate solution I'm thinking is to add column to hold a  
"total_months" value that could be used to simplify queries and speed  
queries ( i.e. first month of oil productin = 1, second = 2 etc.)  That  
way I can use select the first 6 months by using "where < 6", or any  
month interval for that matter.

The following query, suggested by another list member (thanks Josh  
Berkus), to populate the "total_months" column sort of work but doesn't  
handle the year wrapping as it adds 88 when the year wraps (see output  
below).

UPDATE prd_data_test SET months_prod = prd_data_test."date" -  
prd2."date" + 1
FROM prd_data_test prd2
WHERE prd_data_test.wid = prd2.wid
AND prd2."date" = ( SELECT "date" FROM prd_data_test prd3
	WHERE prd3.wid = prd2.wid
	ORDER BY "date" LIMIT 1 );

The results are:
SEM=# select * from prd_data_test order by wid, date limit 20;
  date  | hours |  oil  | gas  | water | pwid |   wid   | year  
| month_prd | months_prod
+---+---+--+---+--+-+-- 
+---+-
 196505 |   480 | 194.3 | 10.3 | 0 |1 | 01/1-6-1-30w1/0 | 1965  
|   |   1
 196506 |   600 | 279.4 | 13.1 | 0 |1 | 01/1-6-1-30w1/0 | 1965  
|   |   2
 196507 |   744 | 288.1 |  4.5 | 0 |1 | 01/1-6-1-30w1/0 | 1965  
|   |   3
 196508 |   720 | 234.6 |  9.4 |   2.9 |1 | 01/1-6-1-30w1/0 | 1965  
|   |   4
 196509 |   648 | 208.2 | 12.5 | 6 |1 | 01/1-6-1-30w1/0 | 1965  
|   |   5
 196510 |   744 | 209.8 | 15.3 | 0 |1 | 01/1-6-1-30w1/0 | 1965  
|   |   6
 196511 |   720 | 180.5 | 13.9 |  27.7 |1 | 01/1-6-1-30w1/0 | 1965  
|   |   7
 196512 |   744 | 227.4 | 22.8 |   5.2 |1 | 01/1-6-1-30w1/0 | 1965  
|   |   8
 196601 |   744 | 230.3 | 22.7 |10 |1 | 01/1-6-1-30w1/0 | 1966  
|   |  97
 196602 |   672 | 173.2 | 16.5 |17 |1 | 01/1-6-1-30w1/0 | 1966  
|   |  98
 196603 |   744 | 197.2 | 18.7 |   9.2 |1 | 01/1-6-1-30w1/0 | 1966  
|   |  99
 196604 |   720 | 168.1 | 14.1 | 3 |1 | 01/1-6-1-30w1/0 | 1966  
|   | 100

Table description:
  Table "prd_data"
 Column | Type  | Modifiers
+---+---
 date   | integer   |
 hours  | real  |
 oil| real  |
 gas| real  |
 water  | real  |
 pwid   | integer   |
 wid| character varying(20) |
 year   | smallint  |
Indexes: wid_index6
Actual table (prd_data), 9 million records:
  date  | hours |  oil  | gas  | water | pwid |   wid   | year
+---+---+--+---+--+-+--
 196507 |   360 | 159.4 | 11.3 |  40.9 |  413 | 01/1-1-1-31w1/0 | 1965
 196508 |   744 |   280 |   20 |  27.2 |  413 | 01/1-1-1-31w1/0 | 1965
 196509 |   360 | 171.1 | 11.4 |  50.4 |  413 | 01/1-1-1-31w1/0 | 1965
 196510 |   744 | 202.1 |   25 |  89.8 |  413 | 01/1-1-1-31w1/0 | 1965
 196512 |   744 | 201.3 | 23.8 |  71.9 |  413 | 01/1-1-1-31w1/0 | 1965
 196511 |   720 |   184 | 17.6 |  78.9 |  413 | 01/1-1-1-31w1/0 | 1965
 196610 |   744 |  99.8 | 15.4 |  53.7 |  413 | 01/1-1-1-31w1/0 | 1966
 196612 |   744 |86 | 12.8 |  36.1 |  413 | 01/1-1-1-31w1/0 | 1966
 196611 |   720 |86 | 12.6 |  41.7 |  413 | 01/1-1-1-31w1/0 | 1966
 196601 |   744 | 191.6 | 22.6 |  50.7 |

Re: [SQL] backup of a specific schema

2004-08-28 Thread Kenneth Gonsalves

On Saturday 28 August 2004 06:18 pm, Gaetano Mendola wrote:

> |
> | backup/restore
>
> Well, that chapter speak about pg_dump, don't you had the curiosity
> to look at the complete options for that command ? :-)

yes. so i looked in the book from which i was learning postgres. it gave 
about 3 options and didnt say there were more. so i didnt look further. i 
thought maybe it has to be done programmatically which is why i posted the 
query
-- 
regards
kg

http://www.onlineindianhotels.net - fastest hotel search website in the world
http://www.ootygolfclub.org

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [SQL] Aggregate query for multiple records

2004-08-28 Thread Josh Berkus

Scott,

> Unfortunately, your revised query works like a charm except for the
> fact that prd_data."date" - prd2."date" + 1 give incorrect values when
> the year wraps, see in the output below.  Need to conditionally
> subtract 88 from the date or use an incrementing count() function
> instead of date math to get the correct values.

Oh, that column is text, not a serial number.   Well, it's your fault for 
using wierd custom data types; you figure it out.

> I'm also wondering if using the date functions of Postgres would be
> helpful since the date column does have the -year and MM-month
> parts.

Well, you'd want to convert the column to a timestamp, and then you could 
compute months.   Or you could break it in seperate integer "year" and 
"month" columns and do the same thing.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

Re: [SQL] Aggregate query for multiple records

Re: [SQL] backup of a specific schema

Re: [SQL] from PG_DUMP to CVS

Re: [SQL] Aggregate query for multiple records

Re: [SQL] Aggregate query for multiple records

Re: [SQL] backup of a specific schema

Re: [SQL] Aggregate query for multiple records

7 matches

Site Navigation

Mail list logo

Footer information