Re: [PERFORM] Hash join on int takes 8..114 seconds

Andrus Sun, 23 Nov 2008 06:41:07 -0800

You could try writing a plpgsql function which would generate the data
set.
Or you could use your existing data set.


Creating 3.5 mln rows using stored proc is probably slow.
Probably it would be better and faster to use some random() and
generate_series() trick.
In this case others can try it and dataset generation is faster.

By the way, a simple way to de-bloat your big table without blocking
would be this :

- stop all inserts and updates
- begin
- create table new like old table
- insert into new select * from old (order by perhaps)
- create indexes
- rename new into old
- commit

If this is just a reporting database where you insert a batch of new data
every day, for instance, that's very easy to do. If it's OLTP, then, no.


Those are orders and order_products tables.
I ran vacuum full analyze verbose last night.
Now database has 4832 MB size, including 1 GB
pg_shdepend bloated indexes.
I added  max_fsm_pages=150000 and re-booted.

Query below and other queries are still too slow

set search_path to firma2,public;
explain analyze
SELECT sum(1)
  FROM dok JOIN rid USING (dokumnr)
  JOIN toode USING (toode)
  WHERE rid.toode='X05' AND dok.kuupaev>='2008-09-01'

"Aggregate  (cost=181795.13..181795.14 rows=1 width=0) (actual
time=23678.265..23678.268 rows=1 loops=1)"
"  ->  Nested Loop  (cost=73999.44..181733.74 rows=24555 width=0) (actual
time=18459.230..23598.956 rows=21476 loops=1)"
"        ->  Index Scan using toode_pkey on toode  (cost=0.00..6.01 rows=1
width=24) (actual time=0.134..0.145 rows=1 loops=1)"
"              Index Cond: ('X05'::bpchar = toode)"
"        ->  Hash Join  (cost=73999.44..181482.18 rows=24555 width=24)
(actual time=18459.076..23441.098 rows=21476 loops=1)"
"              Hash Cond: ("outer".dokumnr = "inner".dokumnr)"
"              ->  Bitmap Heap Scan on rid  (cost=4082.88..101779.03
rows=270252 width=28) (actual time=9337.782..12720.365 rows=278182 loops=1)"
"                    Recheck Cond: (toode = 'X05'::bpchar)"
"                    ->  Bitmap Index Scan on rid_toode_idx
(cost=0.00..4082.88 rows=270252 width=0) (actual time=9330.634..9330.634
rows=278183 loops=1)"
"                          Index Cond: (toode = 'X05'::bpchar)"
"              ->  Hash  (cost=69195.13..69195.13 rows=112573 width=4)
(actual time=8894.465..8894.465 rows=109890 loops=1)"
"                    ->  Bitmap Heap Scan on dok  (cost=1492.00..69195.13
rows=112573 width=4) (actual time=1618.763..8404.847 rows=109890 loops=1)"
"                          Recheck Cond: (kuupaev >= '2008-09-01'::date)"
"                          ->  Bitmap Index Scan on dok_kuupaev_idx
(cost=0.00..1492.00 rows=112573 width=0) (actual time=1612.177..1612.177
rows=110484 loops=1)"
"                                Index Cond: (kuupaev >=
'2008-09-01'::date)"
"Total runtime: 23678.790 ms"


Here is a list of untried recommendations from this thread:

1. CLUSTER rid ON rid_toode_pkey ; CLUSTER dok ON dok_kuupaev_idx
- In 8.1.4 provided form  of CLUSTER causes syntax error, no idea what
syntax to use.
Risky to try in prod server. Requires creating  randomly distributed
product_id testcase to measure
difference.

2. Change CHAR(20) product index to int index by adding update trigger.
Risky to try in prod server. Requires creating  randomly distributed
product_id testcase to measure
difference.

3. Denormalization of sale date to order_producs table by adding update
trigger.
Risky to try in prod server. Requires creating  randomly distributed
product_id testcase to measure
difference.

4. Check on the performance of the RAID: Does it leverage NCQ appropriately
when running queries in parallel ?
No idea how.

5. Materialized views. I need date granularity so it is possible to sum only
one days sales.
http://www.pgcon.org/2008/schedule/events/69.en.html
Seems to be major appl re-write, no idea how.

Appoaches which probably does not change perfomance:

6. Upgrade to 8.4 or to 8.3.5

7. run server on standalone mode and recover 1 GB pg_shdepend bloated index.

8. tune some conf file parameters:

work_mem = 512

I'd consider increasing this value a little - 0.5 MB seems too low to me
(but not necessarily).

effective_cache_size= 70000

Well, your server has 2GB of RAM and usually it's recommended to set
this value to about 60-70% of your RAM, so using 540MB (25%) seems quite
low.

Data size is nearly the same as RAM size. It is unpleasant surprise that
queries take so long time.

What should I do next?


Andrus.

     1        40926 firma2.rid                                    1737 MB
     2        40595 firma2.dok                                    1632 MB
     3         1214 pg_catalog.pg_shdepend                        1235 MB
     4         1232 pg_catalog.pg_shdepend_depender_index         795 MB
     7         1233 pg_catalog.pg_shdepend_reference_index        439 MB
     8        44299 firma2.rid_toode_idx                          298 MB
     9        44286 firma2.dok_tasudok_idx                        245 MB
    10     19103791 firma2.rid_toode_pattern_idx                  202 MB
    11        44283 firma2.dok_klient_idx                         160 MB
    12        44298 firma2.rid_inpdokumnr_idx                     148 MB
    13        44297 firma2.rid_dokumnr_idx                        132 MB
    14        43573 firma2.rid_pkey                               130 MB
    17        40556 pg_toast.pg_toast_40552                       112 MB
    18        44288 firma2.dok_tasumata_idx                       103 MB
    19        44289 firma2.dok_tellimus_idx                       101 MB
    20        44284 firma2.dok_krdokumnr_idx                      101 MB
    21        44285 firma2.dok_kuupaev_idx                        94 MB
    22     19076304 firma2.rid_rtellimus_idx                      90 MB
    24        44282 firma2.dok_dokumnr_idx                        74 MB
    25        43479 firma2.dok_pkey                               74 MB
    26     18663923 firma2.dok_yksus_pattern_idx                  65 MB
    27     18801591 firma2.dok_sihtyksus_pattern_idx              64 MB
    32     18774881 firma2.dok_doktyyp                            47 MB


output from  vacuum full:


INFO:  free space map contains 14353 pages in 314 relations
DETAIL:  A total of 20000 page slots are in use (including overhead).
89664 page slots are required to track all free space.
Current limits are:  20000 page slots, 1000 relations, using 182 KB.
NOTICE:  number of page slots needed (89664) exceeds max_fsm_pages (20000)
HINT:  Consider increasing the configuration parameter "max_fsm_pages" to a
value over 89664.

Query returned successfully with no result in 10513335 ms.


--
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Hash join on int takes 8..114 seconds

Reply via email to