Re: [PERFORM] Tuning queries on large database

2004-08-06 Thread Gaetano Mendola
Valerie Schneider DSI/DEV wrote:
Hi,
I have some problem of performance on a PG database, and I don't
know how to improve. I Have two questions : one about the storage
of data, one about tuning queries. If possible !
My job is to compare Oracle and Postgres. All our operational databases
have been running under Oracle for about fifteen years. Now I try to replace
Oracle by Postgres.
Show us the explain analyze on your queries.
Regards
Gaetano Mendola


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [GENERAL] [PERFORM] Tuning queries on large database

2004-08-05 Thread Mark Kirkwood
I am guessing that Oracle can satisfy Q4 entirely via index access, 
whereas Pg has to visit the table as well.

Having said that, a few partial indexes may be worth trying out on 
data.num_poste (say 10 or so), this won't help the table access but 
could lower the index cost. If you combine this with loading the data in 
num_poste order (or run CLUSTER), you may get closer to Oracle's time 
for this query.

regards
Mark
Valerie Schneider DSI/DEV wrote:
For my different queries, it's better but less performant than oracle :
oracle  PG yesterday(numeric)   PG today(integer/real)
Q4  28s 17m20s  6m47s

Q4 : bench= explain analyze select 'Q4',count(*) from data where num_poste 
between 600 and 625;
QUERY PLAN  

Aggregate  (cost=14086174.57..14086174.57 rows=1 width=0) (actual 
time=428235.024..428235.025 rows=1 loops=1)
  -  Index Scan using pk_data on data  (cost=0.00..14076910.99 rows=3705431 
width=0) (actual time=45.283..424634.826 rows=3252938 loops=1)
Index Cond: ((num_poste = 600) AND (num_poste = 625))
Total runtime: 428235.224 ms
(4 rows)

Thanks for all, Valerie.
 

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


[PERFORM] Tuning queries on large database

2004-08-04 Thread Valerie Schneider DSI/DEV
Hi,

I have some problem of performance on a PG database, and I don't
know how to improve. I Have two questions : one about the storage
of data, one about tuning queries. If possible !

My job is to compare Oracle and Postgres. All our operational databases
have been running under Oracle for about fifteen years. Now I try to replace
Oracle by Postgres.

I have a test platform under linux (Dell server, 4 Gb RAM, bi-processor,
Linux Red Hat 9 (2.4.20-31.9)) with 2 databases, 1 with Oracle
(V8i or V9i it's quite the same), 1 with PG (7.4.2). Both databases
have the same structure, same content, about 100 Gb each. I developped
some benches, representative of our use of databases. My problem
is that I have tables (relations) with more than 100 millions rows,
and each row has about 160 fields and an average size 256 bytes.

For Oracle I have a SGA size of 500 Mb.
For PG I have a postgresql.conf as :
max_connections = 1500
shared_buffers = 3
sort_mem = 5
effective_cache_size = 20
and default value for other parameters.

I have a table named data which looks like this :
bench= \d data
 Table public.data
   Column   |Type | Modifiers 
+-+---
 num_poste  | numeric(9,0)| not null
 dat| timestamp without time zone | not null
 datrecu| timestamp without time zone | not null
 rr1| numeric(5,1)| 
 qrr1   | numeric(2,0)|   ...
 ... all numeric fields
 ...
 Indexes:
pk_data primary key, btree (num_poste, dat)
i_data_dat btree (dat)

It contains 1000 different values of num_poste and for each one 
125000 different values of dat (1 row per hour, 15 years). 

I run a vacuum analyze of the table.

bench= select * from tailledb ;
 schema | relfilenode |  table   |   index|  reltuples  |   size   
+-+--++-+--
 public |   125615917 | data || 1.25113e+08 | 72312040
 public |   251139049 | data | i_data_dat | 1.25113e+08 |  2744400
 public |   250870177 | data | pk_data| 1.25113e+08 |  4395480

My first remark is that the table takes a lot of place on disk, about
70 Gb, instead of 35 Gb with oracle.
125 000 000 rows x 256 b = about 32 Gb. This calculation gives an idea
not so bad for oracle. What about for PG ? How data is stored ?


The different queries of the bench are simple queries (no join,
sub-query, ...) and are using indexes (I explained each one to
be sure) :
Q1 select_court : access to about 700 rows  : 1 num_poste and 1 month
(using PK : num_poste=p1  and dat between p2 and p3)
Q2 select_moy   : access to about 7000 rows : 10 num_poste and 1 month
(using PK : num_poste between p1 and p1+10 and dat between p2 and p3)
Q3 select_long  : about 250 000 rows: 2 num_poste 
(using PK : num_poste in (p1,p1+2))
Q4 select_tres_long : about 3 millions rows : 25 num_poste 
(using PK : num_poste between p1 and p1 + 25)

The result is that for short queries (Q1 and Q2) it runs in a few
seconds on both Oracle and PG. The difference becomes important with
Q3 : 8 seconds with oracle
 80 sec with PG
and too much with Q4 : 28s with oracle
   17m20s with PG !

Of course when I run 100 or 1000 parallel queries such as Q3 or Q4, 
it becomes a disaster !
I can't understand these results. The way to execute queries is the
same I think. I've read recommended articles on the PG site.
I tried with a table containing 30 millions rows, results are similar.

What can I do ?

Thanks for your help ! 


*Les points de vue exprimes sont strictement personnels et *
*  n'engagent pas la responsabilite de METEO-FRANCE.   *

* Valerie SCHNEIDER Tel : +33 (0)5 61 07 81 91 *
* METEO-FRANCE / DSI/DEVFax : +33 (0)5 61 07 81 09 *
* 42, avenue G. CoriolisEmail : [EMAIL PROTECTED] *
* 31057 TOULOUSE Cedex - FRANCE http://www.meteo.fr*



---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Tuning queries on large database

2004-08-04 Thread Rod Taylor
On Wed, 2004-08-04 at 08:44, Valerie Schneider DSI/DEV wrote:
 Hi,
 
 I have some problem of performance on a PG database, and I don't
 know how to improve. I Have two questions : one about the storage
 of data, one about tuning queries. If possible !
 
 My job is to compare Oracle and Postgres. All our operational databases
 have been running under Oracle for about fifteen years. Now I try to replace
 Oracle by Postgres.

You may assume some additional hardware may be required -- this would be
purchased out of the Oracle License budget :)

 My first remark is that the table takes a lot of place on disk, about
 70 Gb, instead of 35 Gb with oracle.
 125 000 000 rows x 256 b = about 32 Gb. This calculation gives an idea
 not so bad for oracle. What about for PG ? How data is stored ?

This is due to the datatype you've selected. PostgreSQL does not convert
NUMERIC into a more appropriate integer format behind the scenes, nor
will it use the faster routines for the math when it is an integer.
Currently it makes the assumption that if you've asked for numeric
rather than integer or float that you are dealing with either large
numbers or require high precision math.

Changing most of your columns to integer + Check constraint (where
necessary) will give you a large speed boost and reduce disk
requirements a little.

 The different queries of the bench are simple queries (no join,
 sub-query, ...) and are using indexes (I explained each one to
 be sure) :

Care to send us the EXPLAIN ANALYZE output for each of the 4 queries
after you've improved the datatype selection?

-- 
Rod Taylor rbt [at] rbt [dot] ca

Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL
PGP Key: http://www.rbt.ca/signature.asc


signature.asc
Description: This is a digitally signed message part


Re: [PERFORM] Tuning queries on large database

2004-08-04 Thread Pierre-Frdric Caillaud

not so bad for oracle. What about for PG ? How data is stored
	I agree with the datatype issue. Smallint, bigint, integer... add a  
constraint...

	Also the way order of the records in the database is very important. As  
you seem to have a very large static population in your table, you should  
insert it, ordered by your favourite selection index (looks like it's  
poste).

	Also, you have a lot of static data which pollutes your table. Why not  
create two tables, one for the current year, and one for all the past  
years. Use a view to present a merged view.

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] Tuning queries on large database

2004-08-04 Thread Pierre-Frdric Caillaud
	You often make sums. Why not use separate tables to cache these sums by  
month, by poste, by whatever ?

Rule on insert on the big table updates the cache tables.
---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


FW: [PERFORM] Tuning queries on large database

2004-08-04 Thread Merlin Moncure
[forwarded to performance] 
 The result is that for short queries (Q1 and Q2) it runs in a few
 seconds on both Oracle and PG. The difference becomes important with
 Q3 : 8 seconds with oracle
  80 sec with PG
 and too much with Q4 : 28s with oracle
17m20s with PG !
 
 Of course when I run 100 or 1000 parallel queries such as Q3 or Q4,
 it becomes a disaster !
 I can't understand these results. The way to execute queries is the
 same I think. I've read recommended articles on the PG site.
 I tried with a table containing 30 millions rows, results are similar.


I don't trust the Oracle #s.  Lets look at Q4: returns 3 million rows.
Using your #s of 160 fields and 256 bytes, your are asking for a result
set of 160 * 256 * 3M  = 12 GB!  This data has to be gathered by the
disk, assembled, and sent over the network.

I don't know Oracle, but it probably has some 'smart' result set that
uses a cursor behind the scenes to do the fetching.

With a 3M row result set, you need to strongly consider using cursors.
Try experimenting with the same query (Q4), declared as a cursor, and
fetch the data in 10k blocks in a loop (fetch 1), and watch the #s
fly.

Merlin


---(end of broadcast)---
TIP 8: explain analyze is your friend