[HACKERS] Cost estimation problem on seq scan in a loop

Jeff Janes Mon, 16 Dec 2013 15:43:59 -0800

While doing ad hoc queries I've seen several different problems that all
seem to be variations on a theme.


The plan comes out looking like this:

 Nested Loop  (cost=826867.95..877038.04 rows=1 width=125)
   Join Filter: (foo.bar = smallish_table.bar)
   ->  Something Complicated  (cost=826867.95..876800.28 rows=1 width=81)
        .....
   ->  Seq Scan on smallish_table  (cost=0.00..142.89 rows=7389 width=44)


The estimate of rows=1 for Something Complicated is wrong and you really
get 1000 or 100,000 rows.  Meaning the seq scan on smallish_table gets
iterated a lot, and the time really adds up.

It would be great if Something Complicated had the correct row estimate,
but since I've seen this situation arise with a lot of different Something
Complicated that don't have much to do with each other (although usually an
antijoin of some kind is involved) , there is little reason to think we can
squash every one of them.

Is there some principled way to go about teaching the planner that hashing
smallish_table on the join filter key is a cheap insurance policy against
underestimating the row count of the outer loop?

Cheers,

Jeff

[HACKERS] Cost estimation problem on seq scan in a loop

Reply via email to