=?UTF-8?Q?David_Kube=C4=8Dka?= <kubecka....@gmail.com> writes: > There is division by loop_count because of predicted effect of caching and > it is exactly this division which makes the run_cost for single index > lookup so low compared with the query version with random_fk_uniq. So the > main problem is that the planner calls cost_index with loop_count equal to > number of rows of inner table, *but it ignores semi-join semantics*, i.e. > it doesn't account for the number of *unique* rows in the inner table which > will actually be the number of loops.
Good point. > Since this is my first time looking into the optimizer (and in fact any > postgres) code I am not yet able to locate the exact place where this > should be repaired, but I hope that in few days I will have a patch :-) Hm, this might not be the best problem to tackle for your first Postgres patch :-(. The information about the estimated number of unique rows isn't readily available at the point where we're creating parameterized paths. When we do compute such an estimate, it's done like this: pathnode->path.rows = estimate_num_groups(root, uniq_exprs, rel->rows); where uniq_exprs is a list of right-hand-side expressions we've determined belong to the semijoin, and rel->rows represents the raw size of the semijoin's RHS relation (which might itself be a join). Neither of those things are available to create_index_path. I chewed on this for awhile and decided that there'd be no real harm in taking identification of the unique expressions out of create_unique_path() and doing it earlier, in initsplan.c; we'd need a couple more fields in SpecialJoinInfo but that doesn't seem like a problem. However, rel->rows is a *big* problem; we simply have not made any join size estimates yet, and can't, because these things are done bottom up. However ... estimate_num_groups's dependency on its rowcount input is not large (it's basically using it as a clamp). So conceivably we could have get_loop_count just multiply together the sizes of the base relations included in the semijoin's RHS to get a preliminary estimate of that number. This would be the right thing anyway for a single relation in the RHS, which is the most common case. It would usually be an overestimate for join RHS, but we could hope that the output of estimate_num_groups wouldn't be affected too badly. regards, tom lane -- Sent via pgsql-hackers mailing list (firstname.lastname@example.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers