Re: [PERFORM] slow joining very large table to smaller ones

Dan Harris Thu, 14 Jul 2005 17:14:57 -0700


On Jul 14, 2005, at 5:12 PM, John A Meinel wrote:

Dan Harris wrote:
Well, postgres is estimating around 500 rows each, is that wayoff? Try
just doing:
EXPLAIN ANALYZE SELECT incidentid FROM k_b WHERE id = 107;
EXPLAIN ANALYZE SELECT incidentid FROM k_r WHERE id = 94;
Once again, do this and post the results. We might just need to tweak
your settings so that it estimates the number of rows correctly,and we
don't need to do anything else.


Ok, sorry I missed these the first time through:

explain analyze select incidentid from k_b where id = 107;
                                                       QUERY PLAN

------------------------------------------------------------------------------------------------------------------------Index Scan using k_b_idx on k_b (cost=0.00..1926.03 rows=675width=14) (actual time=0.042..298.394 rows=2493 loops=1)

   Index Cond: (id = 107)
Total runtime: 299.103 ms

select count(*) from k_b;
count
--------
698350

( sorry! I think I said this one only had tens of thousands in it )

explain analyze select incidentid from k_r where id =94; QUERY PLAN-------------------------------------------------------------------------------------------------------------------------Index Scan using k_r_idx on k_r (cost=0.00..2137.61 rows=757width=14) (actual time=0.092..212.187 rows=10893 loops=1)

   Index Cond: (id = 94)
Total runtime: 216.498 ms
(3 rows)


select count(*) from k_r;
count
--------
671670

That one is quite a bit slower, yet it's the same table structure andsame index as k_b, also it has fewer records.

I did run VACUUM ANALYZE immediately before running these queries.It seems a lot better with the join_collapse set.

\
Well, the planner is powerful enough to flatten nested selects. Tomake
it less "intelligent" you can do:
SET join_collapse_limit 1;
or
SET join_collapse_limit 0;
Which should tell postgres to not try and get tricky with your query.
Again, *usually* the planner knows better than you do. So againjust do
it to see what you get.


Ok, when join_collapse_limit = 1 I get this now:

explain analyze select recordtext from eventactivity join ( selectincidentid from k_r join k_b using (incidentid) where k_r.id = 94 andk_b.id = 107 ) a using (incidentid );QUERY PLAN-----------------------------------------------------------------------------------------------------------------------------------------------Nested Loop (cost=0.00..156509.08 rows=2948 width=35) (actualtime=1.555..340.625 rows=24825 loops=1)-> Nested Loop (cost=0.00..5361.89 rows=6 width=28) (actualtime=1.234..142.078 rows=366 loops=1)-> Index Scan using k_b_idx on k_b (cost=0.00..1943.09rows=681 width=14) (actual time=0.423..56.974 rows=2521 loops=1)

               Index Cond: (id = 107)

-> Index Scan using k_r_idx on k_r (cost=0.00..5.01rows=1 width=14) (actual time=0.031..0.031 rows=0 loops=2521)Index Cond: ((k_r.id = 94) AND((k_r.incidentid)::text = ("outer".incidentid)::text))-> Index Scan using eventactivity1 on eventactivity(cost=0.00..25079.55 rows=8932 width=49) (actual time=0.107..0.481rows=68 loops=366)Index Cond: ((eventactivity.incidentid)::text =("outer".incidentid)::text)

Total runtime: 347.975 ms

MUCH better! Maybe you can help me understand what I did and if Ineed to make something permanent to get this behavior from now on?




If you have analyzed recently can you do:
SELECT relname, reltuples FROM pg_class WHERE relname='eventactivity';

It is a cheaper form than "SELECT count(*) FROM eventactivity" toget an

approximate estimate of the number of rows. But if it isn't too
expensive, please also give the value from SELECT count(*) FROM
eventactivity.

Again, that helps us know if your tables are up-to-date.


Sure:

select relname, reltuples from pg_class where relname='eventactivity';
    relname    |  reltuples
---------------+-------------
eventactivity | 3.16882e+07

select count(*) from eventactivity;
  count
----------
31871142

I don't know how selective your keys are, but one of these queries
should probably structure it better for the planner. It dependsa lot on
how selective your query is.
eventactivity currently has around 36 million rows in it. Thereshouldonly be maybe 200-300 incidentids at most that will be matchedwith thecombination of k_b and k_r. That's why I was thinking I couldsomehow
get a list of just the incidentids that matched the id  = 94 and id =
107 in k_b and k_r first. Then, I would only need to grab a fewhundred
out of 36 million rows from eventactivity.
Well, you can also try:
SELECT count(*) FROM k_b JOIN k_r USING (incidentid)
 WHERE k_b.id=?? AND k_r.id=??
;

That will tell you how many rows they have in common.

select count(*) from k_b join k_r using (incidentid) where k_b.id=107and k_r.id=94;

count
-------
   373


Well, if you look at the latest plans, things have gone up from 44M to
156M, I don't know why it is worse, but it is getting there.

I assume this is because r_k and r_b are growing fairly rapidly rightnow. The time in between queries contained a lot of inserts. I wascareful to vacuum analyze before sending statistics, as I did thistime. I'm sorry if this has confused the issue.





---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Re: [PERFORM] slow joining very large table to smaller ones

Reply via email to