On Jul 14, 2005, at 5:12 PM, John A Meinel wrote:

Dan Harris wrote:



Well, postgres is estimating around 500 rows each, is that way off? Try
just doing:
EXPLAIN ANALYZE SELECT incidentid FROM k_b WHERE id = 107;
EXPLAIN ANALYZE SELECT incidentid FROM k_r WHERE id = 94;


Once again, do this and post the results. We might just need to tweak
your settings so that it estimates the number of rows correctly, and we
don't need to do anything else.


Ok, sorry I missed these the first time through:

explain analyze select incidentid from k_b where id = 107;
                                                       QUERY PLAN
------------------------------------------------------------------------ ------------------------------------------------ Index Scan using k_b_idx on k_b (cost=0.00..1926.03 rows=675 width=14) (actual time=0.042..298.394 rows=2493 loops=1)
   Index Cond: (id = 107)
Total runtime: 299.103 ms

select count(*) from k_b;
count
--------
698350

( sorry! I think I said this one only had tens of thousands in it )


explain analyze select incidentid from k_r where id = 94; QUERY PLAN ------------------------------------------------------------------------ ------------------------------------------------- Index Scan using k_r_idx on k_r (cost=0.00..2137.61 rows=757 width=14) (actual time=0.092..212.187 rows=10893 loops=1)
   Index Cond: (id = 94)
Total runtime: 216.498 ms
(3 rows)


select count(*) from k_r;
count
--------
671670


That one is quite a bit slower, yet it's the same table structure and same index as k_b, also it has fewer records.

I did run VACUUM ANALYZE immediately before running these queries. It seems a lot better with the join_collapse set.


\
Well, the planner is powerful enough to flatten nested selects. To make
it less "intelligent" you can do:
SET join_collapse_limit 1;
or
SET join_collapse_limit 0;
Which should tell postgres to not try and get tricky with your query.
Again, *usually* the planner knows better than you do. So again just do
it to see what you get.


Ok, when join_collapse_limit = 1 I get this now:

explain analyze select recordtext from eventactivity join ( select incidentid from k_r join k_b using (incidentid) where k_r.id = 94 and k_b.id = 107 ) a using (incidentid ); QUERY PLAN ------------------------------------------------------------------------ ----------------------------------------------------------------------- Nested Loop (cost=0.00..156509.08 rows=2948 width=35) (actual time=1.555..340.625 rows=24825 loops=1) -> Nested Loop (cost=0.00..5361.89 rows=6 width=28) (actual time=1.234..142.078 rows=366 loops=1) -> Index Scan using k_b_idx on k_b (cost=0.00..1943.09 rows=681 width=14) (actual time=0.423..56.974 rows=2521 loops=1)
               Index Cond: (id = 107)
-> Index Scan using k_r_idx on k_r (cost=0.00..5.01 rows=1 width=14) (actual time=0.031..0.031 rows=0 loops=2521) Index Cond: ((k_r.id = 94) AND ((k_r.incidentid)::text = ("outer".incidentid)::text)) -> Index Scan using eventactivity1 on eventactivity (cost=0.00..25079.55 rows=8932 width=49) (actual time=0.107..0.481 rows=68 loops=366) Index Cond: ((eventactivity.incidentid)::text = ("outer".incidentid)::text)
Total runtime: 347.975 ms

MUCH better! Maybe you can help me understand what I did and if I need to make something permanent to get this behavior from now on?






If you have analyzed recently can you do:
SELECT relname, reltuples FROM pg_class WHERE relname='eventactivity';

It is a cheaper form than "SELECT count(*) FROM eventactivity" to get an
approximate estimate of the number of rows. But if it isn't too
expensive, please also give the value from SELECT count(*) FROM
eventactivity.

Again, that helps us know if your tables are up-to-date.


Sure:

select relname, reltuples from pg_class where relname='eventactivity';
    relname    |  reltuples
---------------+-------------
eventactivity | 3.16882e+07

select count(*) from eventactivity;
  count
----------
31871142










I don't know how selective your keys are, but one of these queries
should probably structure it better for the planner. It depends a lot on
how selective your query is.



eventactivity currently has around 36 million rows in it. There should only be maybe 200-300 incidentids at most that will be matched with the combination of k_b and k_r. That's why I was thinking I could somehow
get a list of just the incidentids that matched the id  = 94 and id =
107 in k_b and k_r first. Then, I would only need to grab a few hundred
out of 36 million rows from eventactivity.



Well, you can also try:
SELECT count(*) FROM k_b JOIN k_r USING (incidentid)
 WHERE k_b.id=?? AND k_r.id=??
;

That will tell you how many rows they have in common.

select count(*) from k_b join k_r using (incidentid) where k_b.id=107 and k_r.id=94;
count
-------
   373




Well, if you look at the latest plans, things have gone up from 44M to
156M, I don't know why it is worse, but it is getting there.

I assume this is because r_k and r_b are growing fairly rapidly right now. The time in between queries contained a lot of inserts. I was careful to vacuum analyze before sending statistics, as I did this time. I'm sorry if this has confused the issue.




---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Reply via email to