Hi everyone,
I'm developping a web decisonnal
application based on
-Red Hat 3 ES
-Postgresql 8.0.1
-Dell poweredge 2850, Ram 2Gb, 2
procs, 3 Ghz, 1Mb cache and 4 disks ext3 10,000 r/mn
I am alone in the box and there is
not any crontab.
I have 2 databases (A and B) with
exactly the same schemas:
-one main table called "aggregate"
having no indexes and supporting only SELECT statements (loaded one
time a month with a new bundle of datas). Row size # 200 bytes (50
columns of type char(x) or integer)
-and several small 'reference'
tables not shown by the following example for clarity reasons.
-Database A : aggregate contains
2,300,000 records ( 500 Mb)
-Database B : aggregate contains
9,000,000 records ( 2 Gb)
There is no index on the aggregate
table since the criterias, their number and their scope are freely
choosen by the customers.
The query :
select sum(ca)
from aggregate
where (issue_date >= '2004-01' and issue_date <=
'2004-02' );
takes 5s on database A ( 5mn30s* the
first time, probably to fill the cache)
and 21mn* on database B (whatever it is the first time or not).
explain shows sequential scan of
course:
Aggregate
(cost=655711.85..655711.85 rows=1 width=4)
-> Seq Scan on "aggregate" (cost=0.00..647411.70 rows=3320060
width=4)
Filter: ((issue_date >= '2004-01'::bpchar) AND (issue_date
<= '2004-02'::bpchar))
*Here is the 'top' display for these
response times:
91 processes: 90 sleeping, 1
running, 0 zombie, 0 stopped
CPU states: cpu user nice system irq softirq iowait idle
total 0,0% 0,0% 0,2% 0,1% 0,0% 48,6% 51,0%
cpu00 0,0% 0,0% 0,0% 0,0% 0,0% 0,0% 100,0%
cpu01 0,0% 0,0% 1,0% 0,0% 0,0% 99,0% 0,0%
cpu02 0,0% 0,0% 0,0% 0,5% 0,0% 0,0% 99,5%
cpu03 0,0% 0,0% 0,0% 0,0% 0,0% 95,5% 4,5%
Mem: 2061424k av, 2043944k used, 17480k free, 0k shrd,
6104k buff
1551692k actv, 172496k in_d, 30452k in_c
Swap: 2096440k av, 0k used, 2096440k free
1792852k cached
PID USER PRI NI SIZE RSS
SHARE STAT %CPU %MEM TIME CPU COMMAND
21983 postgres 20 0 9312 9312 8272 D 0,2 0,4 0:00 1 postmaster
1 root 15 0 488 488 432 S 0,0 0,0 0:06 2 init
2 root RT 0 0 0 0 SW 0,0 0,0 0:00 0
migration/0
For the 5s response time, the 'top'
command shows 0% iowait and 25% cpu.
- I guess this is a cache issue but
how can I manage/control it ?
Is Postgres managing it's own cache or does it use the OS cache ?
- Is using the cache is a good approach?
It does not seem to work for large databases : I tryed several
different values for postgres.conf and /proc/sys/kernel/shmmax without
detecting any response time enhancement (For example : shared_buffers =
190000 , sort_mem = 4096 , effective_cache_size = 37000
and kernel/shmmax=1200000000 )
Do I have to upgrade the RAM to 6Gb or/and buy faster HD (of
what type?) ?
Moreover, a query on database B will destroy the cache
previously build for database A, increasing the response time for the
next query on database A. And I have in fact 15 databases !
- In my case, what should be the best parameters combination
between postgres.conf and /proc/sys/kernel/shmmax ?
- is there a way to reduce the size of the "aggregate" table
files (1Gb + 1Gb + 1 Gb + 0.8Gb = 3.8Gb for the "aggregate" table
instead of 2Gb = 200 * 9,000,000 records) by playing with the data
types or others parameters (fillfactor ?).
Vacuum (even full) seems to be useless since the aggregate table
supports only 'copy aggregate from' and 'select'.
- is it possible to define a sort of RAM filesystem (as
it exists in DOS/Windows) which I could create and populate my
databases into ? ...since the databases does not support updates for
this application.
Sorry for my naive questions and my poor english but any help or
advise will be greatly appreciated !
Patrick Vedrines
PS (maybe of interest for some users like me) :
I created a partition on a new similar disk but on the last
cylinders (near the periphery) and copied the database B into it: the
response time is 25% faster (i.e. 15mn instead of 21mn). But 15 mn is
still too long for my customers (5 mn would be nice).