Re: [PERFORM] Optimising queries involving unions

2005-05-27 Thread Marc Mamin
Hi,

I'm using a workaround for this kind of issues:


consider:

select A from 

   (select B from T1 where C 
  union
select B from T2 where C 
  union
select B from T3 where C 
) foo
where D


in your case:

SELECT u.txt
  FROM  (
SELECT id, txt FROM largetable1,smalltable t WHERE t.id = u.id   AND
t.foo = 'bar'
UNION ALL
SELECT id, txt FROM largetable2,smalltable t WHERE t.id = u.id   AND
t.foo = 'bar'
) u
  



and

select A from foo where C and D

(A, B, C, D being everything you want, C and D may also include GROUP
BY,ORDER...)

The first version will be handled correctly by the optimiser, whereas in the
second version, 
Postgres will first build the UNION and then run the query on it.




I'm having large tables with identical structure, one per day.
Instead of defining a view on all tables, 
I' using functions that distribute my query on all tables.

The only issue if that I need to define a type that match the result
structure and each return type needs its own function.


Example:
(The first parameter is a schema name, the four next corresponds to A, B, C,
D





-
create type T_i2_vc1 as (int_1 int,int_2 int,vc_1 varchar);

CREATE OR REPLACE FUNCTION
vq_T_i2_vc1(varchar,varchar,varchar,varchar,varchar) RETURNS setof T_i2_vc1
AS $$


DECLARE
result T_i2_vc1%rowtype;
mviews RECORD;
sql varchar;
counter int;
BEGIN
select into counter 1;

-- loop on all daily tables
FOR mviews IN SELECT distinct this_day FROM daylist order by 
plainday
desc LOOP

IF counter =1 THEN
  select INTO  sql 'SELECT '||mviews.this_day||' AS plainday, 
'||$2||'
FROM '||$3||'_'||mviews.plainday||' WHERE '||$4;
ELSE
  select INTO  sql sql||' UNION ALL SELECT 
'||mviews.this_day||' AS
plainday, '||$2||' FROM '||$3||'_'||mviews.plainday||' WHERE '||$4;
END IF;

select into counter counter+1;
END LOOP;

select INTO  sql 'SELECT  '||$1||' FROM ('||sql||')foo '||$5;

   for result in   EXECUTE (sql) LOOP
 return  NEXT result;   
   end loop;
 return ;

END;
$$ LANGUAGE plpgsql;



Note: in your case the function shoud have a further parameter to join
largetable(n) to smalltable in the sub queries

HTH,

Marc





 I've got a query that I think the query optimiser should be able
 to work it's magic on but it doesn't!  I've had a look around and
 asked on the IRC channel and found that the current code doesn't
 attempt to optimise for what I'm asking it to do at the moment.
 Here's a bad example:
 
   SELECT u.txt
   FROM smalltable t, (
 SELECT id, txt FROM largetable1
 UNION ALL
 SELECT id, txt FROM largetable2) u
   WHERE t.id = u.id
 AND t.foo = 'bar';
 
 I was hoping that smalltable would get moved up into the union,
 but it doesn't at the moment and the database does a LOT of extra
 work.  In this case, I can manually do quite a couple of transforms
 to move things around and it does the right thing:
 
   SELECT txt
   FROM (
 SELECT l.id as lid, r.id as rid, r.foo, l.txt
   FROM largetable1 l, smalltable r
 UNION ALL
 SELECT l.id as lid, r.id as rid, r.foo, l.txt
   FROM largetable1 l, smalltable r)
   WHERE foo = 'bar';
 AND lid = rid
 
 The optimiser is intelligent enough to move the where clauses up
 into the union and end end up with a reasonably optimal query.
 Unfortunatly, in real life, the query is much larger and reorganising
 everything manually isn't really feasible!

-- 
Weitersagen: GMX DSL-Flatrates mit Tempo-Garantie!
Ab 4,99 Euro/Monat: http://www.gmx.net/de/go/dsl

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] slow queries, possibly disk io

2005-05-27 Thread Josh Close
On 5/26/05, Christopher Kings-Lynne [EMAIL PROTECTED] wrote:
  I have some queries that have significan't slowed down in the last
  couple days. It's gone from 10 seconds to over 2 mins.
 
  The cpu has never gone over 35% in the servers lifetime, but the load
  average is over 8.0 right now. I'm assuming this is probably due to
  disk io.
 
 You sure it's not a severe lack of vacuuming that's the problem?
 

It's vacuumed hourly. If it needs to be more than that I could do it I
guess. But from everything I've been told, hourly should be enough.

-Josh

---(end of broadcast)---
TIP 8: explain analyze is your friend


[PERFORM] slow queries, possibly disk io

2005-05-27 Thread Josh Close
 Few mandatory questions:

 1. Do you vacuum your db on regular basis? :)

It's vacuumed once every hour. The table sizes and data are constantly changing.


 2. Perhaps statistics for tables in question are out of date, did you
 try alter table set statistics?

No I haven't. What would that do for me?


 3. explain analyze of the slow query?

Here is the function that is ran:

CREATE OR REPLACE FUNCTION adaption.funmsgspermin()
  RETURNS int4 AS
'
DECLARE
this_rServerIds  RECORD;
this_sQuery  TEXT;
this_iMsgsPerMin INT;
this_rNumSentRECORD;

BEGIN
this_iMsgsPerMin := 0;
FOR this_rServerIds IN
SELECT iId
FROM adaption.tblServers
LOOP
this_sQuery := \'
SELECT COALESCE( SUM( iNumSent ), 0 ) AS iNumSent
FROM adaption.tblBatchHistory_\' || this_rServerIds.iId || \'
WHERE tStamp  now() - interval \'\'5 mins\'\';
\';
FOR this_rNumSent IN EXECUTE this_sQuery LOOP
this_iMsgsPerMin := this_iMsgsPerMin + this_rNumSent.iNumSent;
END LOOP;
END LOOP;

this_iMsgsPerMin := this_iMsgsPerMin / 5;

RETURN this_iMsgsPerMin;
END;
'
LANGUAGE 'plpgsql' VOLATILE;

Here is the explain analyze of one loops of the sum:

Aggregate  (cost=31038.04..31038.04 rows=1 width=4) (actual
time=14649.602..14649.604 rows=1 loops=1)
  -  Seq Scan on tblbatchhistory_1  (cost=0.00..30907.03 rows=52401
width=4) (actual time=6339.223..14648.433 rows=919 loops=1)
Filter: (tstamp  (now() - '00:05:00'::interval))
Total runtime: 14649.709 ms


 4. if you for some reason cannot give explain analyze, please try to
 describe the type of query (what kind of join(s)) and amount of data
 found in the tables.

 2 minutes from 10 seconds is a huge leap, and it may mean that
 PostgreSQL for some reason is not planning as well as it could.
 Throwing more RAM at the problem can help, but it would be better
 to hint the planner to do the right thing.  It may be a good time to
 play with planner variables. :)

Is there any documentation on planner vars? And how would I throw more
ram at it? It has 2 gigs right now. How do I know if postgres is using
that?

-Josh

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


[PERFORM] postgresql-8.0.1 performance tuning

2005-05-27 Thread Martin Fandel




Hi @ all,

i'm trying to tune my postgresql-db but i don't know if the values are right 
set. 

I use the following environment for the postgres-db:

# Hardware 
cpu: 2x P4 3Ghz 
ram: 1024MB DDR 266Mhz

partitions:
/dev/sda3 23G 9,6G 13G 44% /
/dev/sda1 11G 156M 9,9G 2% /var
/dev/sdb1 69G 13G 57G 19% /var/lib/pgsql

/dev/sda is in raid 1 (2x 35GB / 1upm / sca)
/dev/sdb is in raid 10 (4x 35GB / 1upm / sca)
# /Hardware 

# Config 
/etc/sysctl.conf:
kernel.shmall = 786432000
kernel.shmmax = 786432000

/etc/fstab:
/dev/sdb1 /var/lib/pgsql reiserfs acl,user_xattr,noatime,data="" 1 2

/var/lib/pgsql/data/postgresql.conf
superuser_reserved_connections = 2
shared_buffers = 3000
work_mem = 131072
maintenance_work_mem = 131072
max_stack_depth = 2048
max_fsm_pages = 2
max_fsm_relations = 1000
max_files_per_process = 1000
vacuum_cost_delay = 10
vacuum_cost_page_hit = 1
vacuum_cost_page_miss = 10
vacuum_cost_page_dirty = 20
vacuum_cost_limit = 200
bgwriter_delay = 200
bgwriter_percent = 1
bgwriter_maxpages = 100
fsync = true
wal_sync_method = fsync
wal_buffers = 64
commit_delay = 0
commit_siblings = 5
checkpoint_segments = 256
checkpoint_timeout = 900
checkpoint_warning = 30
effective_cache_size = 1
random_page_cost = 4
cpu_tuple_cost = 0.01
cpu_index_tuple_cost = 0.001
cpu_operator_cost = 0.0025
geqo = true
geqo_threshold = 12
geqo_effort = 5
geqo_pool_size = 0
geqo_generations = 0
geqo_selection_bias = 2.0
deadlock_timeout = 1000
max_locks_per_transaction = 64
# /Config 

# Transactions 
we have about 115-300 transactions/min in about 65 tables.
# /Transactions 

I'm really new at using postgres. So i need some experience to set this 
parameters in the postgresql- and the system-config. I can't find standard
calculations for this. :/ The postgresql-documentation doesn't help me to
set the best values for this.

The database must be high-availble. I configured rsync to sync the complete 
/var/lib/pgsql-directory to my hot-standby. On the hotstandby i will make the
dumps of the database to improve the performance of the master-db. 

In my tests the synchronization works fine. I synchronised the hole directory
and restarted the database of the hotstandby. While restarting, postgresql turned
back the old (not archived) wals and the database of my hotstandby was 
consistent. Is this solution recommended? Or must i use archived wal's with 
real system-snapshots?

best regards,

Martin Fandel




Re: [PERFORM] slow queries, possibly disk io

2005-05-27 Thread Tom Lane
Josh Close [EMAIL PROTECTED] writes:
 this_sQuery := \'
 SELECT COALESCE( SUM( iNumSent ), 0 ) AS iNumSent
 FROM adaption.tblBatchHistory_\' || this_rServerIds.iId || \'
 WHERE tStamp  now() - interval \'\'5 mins\'\';
 \';

 Here is the explain analyze of one loops of the sum:

 Aggregate  (cost=31038.04..31038.04 rows=1 width=4) (actual
 time=14649.602..14649.604 rows=1 loops=1)
   -  Seq Scan on tblbatchhistory_1  (cost=0.00..30907.03 rows=52401
 width=4) (actual time=6339.223..14648.433 rows=919 loops=1)
 Filter: (tstamp  (now() - '00:05:00'::interval))
 Total runtime: 14649.709 ms

I think you really want that seqscan to be an indexscan, instead.
I'm betting this is PG 7.4.something?  If so, probably the only
way to make it happen is to simplify the now() expression to a constant:

SELECT COALESCE( SUM( iNumSent ), 0 ) AS iNumSent
FROM adaption.tblBatchHistory_\' || this_rServerIds.iId || \'
WHERE tStamp  \\\'' || (now() - interval \'5 mins\')::text ||
\'\\\'\';

because pre-8.0 the planner won't realize that the inequality is
selective enough to favor an indexscan, unless it's comparing to
a simple constant.

(BTW, 8.0's dollar quoting makes this sort of thing a lot less painful)

regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] slow queries, possibly disk io

2005-05-27 Thread Josh Close
 I think you really want that seqscan to be an indexscan, instead.
 I'm betting this is PG 7.4.something?  If so, probably the only
 way to make it happen is to simplify the now() expression to a constant:
 
 SELECT COALESCE( SUM( iNumSent ), 0 ) AS iNumSent
 FROM adaption.tblBatchHistory_\' || this_rServerIds.iId || \'
 WHERE tStamp  \\\'' || (now() - interval \'5 mins\')::text ||
 \'\\\'\';

The dollar sign thing would be a lot easier. I can't get this to work.
I'm using a db manager where I can just use ' instead of \'. How would
it look for that? In other words, it doesn't have the create or
replace function as ' --stuff ' language 'plpgsql' it just has the
actual function. Makes things a little easier. I'm getting an error at
or near 5.

 
 because pre-8.0 the planner won't realize that the inequality is
 selective enough to favor an indexscan, unless it's comparing to
 a simple constant.
 
 (BTW, 8.0's dollar quoting makes this sort of thing a lot less painful)
 
 regards, tom lane
 


-- 
-Josh

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] slow queries, possibly disk io

2005-05-27 Thread Josh Close
Doing the query

explain
SELECT COALESCE( SUM( iNumSent ), 0 ) AS iNumSent
FROM adaption.tblBatchHistory_1
WHERE tStamp  ( now() - interval '5 mins' )::text

gives me this:

Aggregate  (cost=32138.33..32138.33 rows=1 width=4)
-  Seq Scan on tblbatchhistory_1  (cost=0.00..31996.10 rows=56891 width=4)
Filter: ((tstamp)::text  ((now() - '00:05:00'::interval))::text)

Still not an index scan.

On 5/27/05, Tom Lane [EMAIL PROTECTED] wrote:
 Josh Close [EMAIL PROTECTED] writes:
  this_sQuery := \'
  SELECT COALESCE( SUM( iNumSent ), 0 ) AS iNumSent
  FROM adaption.tblBatchHistory_\' || this_rServerIds.iId || \'
  WHERE tStamp  now() - interval \'\'5 mins\'\';
  \';
 
  Here is the explain analyze of one loops of the sum:
 
  Aggregate  (cost=31038.04..31038.04 rows=1 width=4) (actual
  time=14649.602..14649.604 rows=1 loops=1)
-  Seq Scan on tblbatchhistory_1  (cost=0.00..30907.03 rows=52401
  width=4) (actual time=6339.223..14648.433 rows=919 loops=1)
  Filter: (tstamp  (now() - '00:05:00'::interval))
  Total runtime: 14649.709 ms
 
 I think you really want that seqscan to be an indexscan, instead.
 I'm betting this is PG 7.4.something?  If so, probably the only
 way to make it happen is to simplify the now() expression to a constant:
 
 SELECT COALESCE( SUM( iNumSent ), 0 ) AS iNumSent
 FROM adaption.tblBatchHistory_\' || this_rServerIds.iId || \'
 WHERE tStamp  \\\'' || (now() - interval \'5 mins\')::text ||
 \'\\\'\';
 
 because pre-8.0 the planner won't realize that the inequality is
 selective enough to favor an indexscan, unless it's comparing to
 a simple constant.
 
 (BTW, 8.0's dollar quoting makes this sort of thing a lot less painful)
 
 regards, tom lane
 


-- 
-Josh

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


[PERFORM] OID vs overall system performances on high load databases.

2005-05-27 Thread Eric Lauzon
What are the effect of having a table with arround 500 insert/update/delete on 
two to eight table in a time frame of 2 minutes 24/24h, when you have oid 
enabled versus the same setup when you dont have oid?

That deployment is done on a postgres with 8 to 9 databases, each having those 
2 to 8 high load tables with oid enabled.

Would the oid colum slow down table scan when you have over 20 millions row?

Would the cost of maintaining the oid column inside thoses high load tables 
when there is no oid reference used for data seeking costy for postgres 
ressources!?



Eric Lauzon
[Recherche  Développement]
Above Sécurité / Above Security
Tél  : (450) 430-8166
Fax : (450) 430-1858 

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


[PERFORM] Redundant indexes?

2005-05-27 Thread Jeffrey Tenny
Would I be correct in assuming that the following two indexes are 
completely redundant except for the fact that one complains about 
uniqueness constraint violations and the other does not?


Or is there are legitimate use for having BOTH indexes?

I'm trying to figure out if it's okay to delete the non-unique index.
(I have a bunch of tables suffering this malady from some problematic 
application code).


Table public.erf
 Column |  Type   | Modifiers
+-+---
 rid| integer | not null
 cid| integer | not null
Indexes: erf_rid_key unique btree (rid),
 erf_rid_idx btree (rid)

Index public.erf_rid_idx
 Column |  Type
+-
 rid| integer
btree, for table public.erf

Index public.erf_rid_key
 Column |  Type
+-
 rid| integer
unique, btree, for table public.erf



---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Redundant indexes?

2005-05-27 Thread Tom Lane
Jeffrey Tenny [EMAIL PROTECTED] writes:
 Would I be correct in assuming that the following two indexes are 
 completely redundant except for the fact that one complains about 
 uniqueness constraint violations and the other does not?

Yup ...

regards, tom lane

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] OID vs overall system performances on high load

2005-05-27 Thread Andrew McMillan
On Fri, 2005-05-27 at 13:05 -0400, Eric Lauzon wrote:
 What are the effect of having a table with arround 500
 insert/update/delete on two to eight table in a time frame of 2
 minutes 24/24h, when you have oid enabled versus the same setup when
 you dont have oid?
 
 That deployment is done on a postgres with 8 to 9 databases, each
 having those 2 to 8 high load tables with oid enabled.
 
 Would the oid colum slow down table scan when you have over 20
 millions row?
 
 Would the cost of maintaining the oid column inside thoses high load
 tables when there is no oid reference used for data seeking costy for
 postgres ressources!?

The OID column is an extra few bytes on each row.  If you don't have any
use for it (and let's face it: most of us don't), then create your
tables without OID.

The amount of impact that it makes will depend on what the general row
size is.  If they are rows with a couple of integers then the size of an
OID column will be a significant portion of the size of each row, and
removing it will make the physical on-disk data size significantly
smaller.  If the size of the average row is (e.g.) 2k then the OID will
only be a very small fraction of the data, and removing it will only
make a small difference.

Regards,
Andrew McMillan.

-
Andrew @ Catalyst .Net .NZ  Ltd,  PO Box 11-053, Manners St,  Wellington
WEB: http://catalyst.net.nz/PHYS: Level 2, 150-154 Willis St
DDI: +64(4)803-2201  MOB: +64(272)DEBIAN  OFFICE: +64(4)499-2267
 ... I want a COLOR T.V. and a VIBRATING BED!!!
-



signature.asc
Description: This is a digitally signed message part