Etsuro Fujita <etsuro.fuj...@gmail.com> writes: > On Wed, Jul 1, 2020 at 11:40 PM Tom Lane <t...@sss.pgh.pa.us> wrote: >> Short of sending a whole second query to the remote server, it's >> not clear to me how we could get the full table size (or equivalently >> the target query's selectivity for that table). The best we realistically >> can do is to adopt pg_class.reltuples if there's been an ANALYZE of >> the foreign table. That case already works (and this proposal doesn't >> break it). The problem is what to do when pg_class.reltuples is zero >> or otherwise badly out-of-date.
> In estimate_path_cost_size(), if use_remote_estimate is true, we > adjust the rows estimate returned from the remote server, by factoring > in the selectivity of the locally-checked quals. I thought what I > proposed above would be more consistent with that. No, I don't think that would be very helpful. There are really three different numbers of interest here: 1. The actual total rowcount of the remote table. 2. The number of rows returned by the remote query (which is #1 times the selectivity of the shippable quals). 3. The number of rows returned by the foreign scan (which is #2 times the selectivity of the non-shippable quals)). Clearly, rel->rows should be set to #3. However, what we really want for rel->tuples is #1. That's because, to the extent that the planner inspects rel->tuples at all, it's to adjust whole-table stats such as we might have from ANALYZE. What you're suggesting is that we use #2, but I doubt that that's a big improvement. In a decently tuned query it's going to be a lot closer to #3 than to #1. We could perhaps try to make our own estimate of the selectivity of the shippable quals and then back into #1 from the value we got for #2 from the remote server. But that sounds mighty error-prone, so I doubt it'd make for much of an improvement. It also doesn't sound like something I'd want to back-patch. Another point here is that, to the extent we are relying on whole-table stats from the last ANALYZE, pg_class.reltuples is actually the right value to go along with that. We could spend a lot of cycles doing what I just suggested and end up with net-worse estimates. In any case, the proposal I'm making is just to add a sanity-check clamp to prevent the worst effects of not setting rel->tuples sanely. It doesn't foreclose future improvements inside the FDW. regards, tom lane