You could try writing a plpgsql function which would generate the data
set.
Or you could use your existing data set.
Creating 3.5 mln rows using stored proc is probably slow.
Probably it would be better and faster to use some random() and
generate_series() trick.
In this case others can try it and dataset generation is faster.
By the way, a simple way to de-bloat your big table without blocking
would be this :
- stop all inserts and updates
- begin
- create table new like old table
- insert into new select * from old (order by perhaps)
- create indexes
- rename new into old
- commit
If this is just a reporting database where you insert a batch of new data
every day, for instance, that's very easy to do. If it's OLTP, then, no.
Those are orders and order_products tables.
I ran vacuum full analyze verbose last night.
Now database has 4832 MB size, including 1 GB
pg_shdepend bloated indexes.
I added max_fsm_pages=150000 and re-booted.
Query below and other queries are still too slow
set search_path to firma2,public;
explain analyze
SELECT sum(1)
FROM dok JOIN rid USING (dokumnr)
JOIN toode USING (toode)
WHERE rid.toode='X05' AND dok.kuupaev>='2008-09-01'
"Aggregate (cost=181795.13..181795.14 rows=1 width=0) (actual
time=23678.265..23678.268 rows=1 loops=1)"
" -> Nested Loop (cost=73999.44..181733.74 rows=24555 width=0) (actual
time=18459.230..23598.956 rows=21476 loops=1)"
" -> Index Scan using toode_pkey on toode (cost=0.00..6.01 rows=1
width=24) (actual time=0.134..0.145 rows=1 loops=1)"
" Index Cond: ('X05'::bpchar = toode)"
" -> Hash Join (cost=73999.44..181482.18 rows=24555 width=24)
(actual time=18459.076..23441.098 rows=21476 loops=1)"
" Hash Cond: ("outer".dokumnr = "inner".dokumnr)"
" -> Bitmap Heap Scan on rid (cost=4082.88..101779.03
rows=270252 width=28) (actual time=9337.782..12720.365 rows=278182 loops=1)"
" Recheck Cond: (toode = 'X05'::bpchar)"
" -> Bitmap Index Scan on rid_toode_idx
(cost=0.00..4082.88 rows=270252 width=0) (actual time=9330.634..9330.634
rows=278183 loops=1)"
" Index Cond: (toode = 'X05'::bpchar)"
" -> Hash (cost=69195.13..69195.13 rows=112573 width=4)
(actual time=8894.465..8894.465 rows=109890 loops=1)"
" -> Bitmap Heap Scan on dok (cost=1492.00..69195.13
rows=112573 width=4) (actual time=1618.763..8404.847 rows=109890 loops=1)"
" Recheck Cond: (kuupaev >= '2008-09-01'::date)"
" -> Bitmap Index Scan on dok_kuupaev_idx
(cost=0.00..1492.00 rows=112573 width=0) (actual time=1612.177..1612.177
rows=110484 loops=1)"
" Index Cond: (kuupaev >=
'2008-09-01'::date)"
"Total runtime: 23678.790 ms"
Here is a list of untried recommendations from this thread:
1. CLUSTER rid ON rid_toode_pkey ; CLUSTER dok ON dok_kuupaev_idx
- In 8.1.4 provided form of CLUSTER causes syntax error, no idea what
syntax to use.
Risky to try in prod server. Requires creating randomly distributed
product_id testcase to measure
difference.
2. Change CHAR(20) product index to int index by adding update trigger.
Risky to try in prod server. Requires creating randomly distributed
product_id testcase to measure
difference.
3. Denormalization of sale date to order_producs table by adding update
trigger.
Risky to try in prod server. Requires creating randomly distributed
product_id testcase to measure
difference.
4. Check on the performance of the RAID: Does it leverage NCQ appropriately
when running queries in parallel ?
No idea how.
5. Materialized views. I need date granularity so it is possible to sum only
one days sales.
http://www.pgcon.org/2008/schedule/events/69.en.html
Seems to be major appl re-write, no idea how.
Appoaches which probably does not change perfomance:
6. Upgrade to 8.4 or to 8.3.5
7. run server on standalone mode and recover 1 GB pg_shdepend bloated index.
8. tune some conf file parameters:
work_mem = 512
I'd consider increasing this value a little - 0.5 MB seems too low to me
(but not necessarily).
effective_cache_size= 70000
Well, your server has 2GB of RAM and usually it's recommended to set
this value to about 60-70% of your RAM, so using 540MB (25%) seems quite
low.
Data size is nearly the same as RAM size. It is unpleasant surprise that
queries take so long time.
What should I do next?
Andrus.
1 40926 firma2.rid 1737 MB
2 40595 firma2.dok 1632 MB
3 1214 pg_catalog.pg_shdepend 1235 MB
4 1232 pg_catalog.pg_shdepend_depender_index 795 MB
7 1233 pg_catalog.pg_shdepend_reference_index 439 MB
8 44299 firma2.rid_toode_idx 298 MB
9 44286 firma2.dok_tasudok_idx 245 MB
10 19103791 firma2.rid_toode_pattern_idx 202 MB
11 44283 firma2.dok_klient_idx 160 MB
12 44298 firma2.rid_inpdokumnr_idx 148 MB
13 44297 firma2.rid_dokumnr_idx 132 MB
14 43573 firma2.rid_pkey 130 MB
17 40556 pg_toast.pg_toast_40552 112 MB
18 44288 firma2.dok_tasumata_idx 103 MB
19 44289 firma2.dok_tellimus_idx 101 MB
20 44284 firma2.dok_krdokumnr_idx 101 MB
21 44285 firma2.dok_kuupaev_idx 94 MB
22 19076304 firma2.rid_rtellimus_idx 90 MB
24 44282 firma2.dok_dokumnr_idx 74 MB
25 43479 firma2.dok_pkey 74 MB
26 18663923 firma2.dok_yksus_pattern_idx 65 MB
27 18801591 firma2.dok_sihtyksus_pattern_idx 64 MB
32 18774881 firma2.dok_doktyyp 47 MB
output from vacuum full:
INFO: free space map contains 14353 pages in 314 relations
DETAIL: A total of 20000 page slots are in use (including overhead).
89664 page slots are required to track all free space.
Current limits are: 20000 page slots, 1000 relations, using 182 KB.
NOTICE: number of page slots needed (89664) exceeds max_fsm_pages (20000)
HINT: Consider increasing the configuration parameter "max_fsm_pages" to a
value over 89664.
Query returned successfully with no result in 10513335 ms.
--
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance