Hi Antonin,

(2019/02/08 2:04), Antonin Houska wrote:
Etsuro Fujita<fujita.ets...@lab.ntt.co.jp>  wrote:
Here is an updated version of the patch set.  Changes are:

I've spent some time reviewing the patches.

Thanks for the review!

First, I wonder if the information on LIMIT needs to be passed to the
FDW. Cannot the FDW routines use root->tuple_fraction?

I think it's OK for the FDW to use the tuple_fraction, but I'm not sure the tuple_fraction should be enough. For example, I don't think we can re-compute from the tuple_fraction the LIMIT/OFFSET values.

I think (although have
not verified experimentally) that the problem reported in [1] is not limited
to the LIMIT clause. I think the same problem can happen if client uses cursor
and sets cursor_tuple_fraction very low. Since the remote node is not aware of
this setting when running EXPLAIN, the low fraction can also make postgres_fdw
think that retrieval of the whole set is cheaper than it will appear to be
during the actual execution.

I think it would be better to get a fast-start plan on the remote side in such a situation, but I'm not sure we can do that as well with this patch, because in the cursor case, the FDW couldn't know in advance exactly how may rows will be fetched from the remote side using that cursor, so the FDW couldn't push down LIMIT. So I'd like to leave that for another patch.

Some comments on coding:

0001-postgres_fdw-Perform-UPPERREL_ORDERED-step-remotely-v3.patch
-----------------------------------------------------------------

* I'm not sure if UPPERREL_ORDERD should to deal with LIMIT at all. Note that
grouping_planner() does not call create_limit_path() until it's done with
create_ordered_paths(), and create_ordered_paths() is where the FDW is
requested to add its paths to UPPERREL_ORDERED relation.

Maybe I'm missing something, but the 0001 patch doesn't deal with LIMIT at all. I added the parameter limit_tuples to PgFdwPathExtraData so that we can pass limit_tuples=-1, which means no LIMIT, to cost_sort(), though.

* add_foreign_ordered_paths()

Exprssion root->upper_targets[UPPERREL_FINAL] is used to access the target.

I think create_ordered_paths() should actually set the target so that
postgresGetForeignUpperPaths() can simply find it in
output_rel->reltarget. This is how postgres_fdw already retrieves the target
for grouped paths. Small patch is attached that makes this possible.

Seems like a good idea.  Thanks for the patch!  Will review.

* regression tests: I think test(s) should be added for queries that have
   ORDER BY clause but do not have GROUP BY (and also no LIMIT / OFFSET)
   clause. I haven't noticed such tests.

Will do.

0002-postgres_fdw-Perform-UPPERREL_FINAL-step-remotely-v3.patch
---------------------------------------------------------------

* add_foreign_final_paths()

Similar problem I reported for add_foreign_ordered_paths() above.

* adjust_limit_rows_costs()

Doesn't this function address more generic problem than the use of
postgres_fdw? If so, then it should be submitted as a separate patch. BTW, the
function is only used in pathnode.c, so it should be declared static.

Actually, this is simple refactoring for create_limit_path() so that that function can be shared with postgres_fdw. See estimate_path_cost_size(). I'll separate that refactoring in the next version of the patch set.

Both patches:
------------

One thing that makes me confused when I try to understand details is that the
new functions add_foreign_ordered_paths() and add_foreign_final_paths() both
pass "input_rel" for the "foreignrel" argument of estimate_path_cost_size():

estimate_path_cost_size(root, input_rel, ...)

whereas the existing add_foreign_grouping_paths() passes "grouped_rel":

estimate_path_cost_size(root, grouped_rel, ...)

Can you please make this consistent? If you do, you'll probably need to
reconsider your changes to estimate_path_cost_size().

This is intended and I think I should add more comments on that. Let me explain. These patches modified estimate_path_cost_size() so that we can estimate the costs of a foreign path for the UPPERREL_ORDERED or UPPERREL_FINAL rel, by adding the costs of the upper-relation processing (ie, ORDER BY and/or LIMIT/OFFSET, specified by PgFdwPathExtraData) to the costs of implementing the *underlying* relation for the upper relation (ie, scan, join or grouping rel, specified by the input_rel). So those functions call estimate_path_cost_size() with the input_rel, not the ordered_rel/final_rel, along with PgFdwPathExtraData.

And maybe related problem: why should FDW care about the effect of
apply_scanjoin_target_to_paths() like you claim to do here?

        /*
         * If this is UPPERREL_ORDERED and/or UPPERREL_FINAL steps on the
         * final scan/join relation, the costs obtained from the cache
         * wouldn't yet contain the eval cost for the final scan/join target,
         * which would have been updated by apply_scanjoin_target_to_paths();
         * add the eval cost now.
         */
        if (fpextra&&  !IS_UPPER_REL(foreignrel))
        {
                /* The costs should have been obtained from the cache. */
                Assert(fpinfo->rel_startup_cost>= 0&&
                        fpinfo->rel_total_cost>= 0);

                startup_cost += foreignrel->reltarget->cost.startup;
                run_cost += foreignrel->reltarget->cost.per_tuple * rows;
        }

I think it should not, whether "foreignrel" means "input_rel" or "output_rel"
from the perspective of postgresGetForeignUpperPaths().

I added this before costing the sort operation below, because 1) the cost of evaluating the scan/join target might not be zero (consider the case where sort columns are not simple Vars, for example) and 2) the cost of sorting takes into account the underlying path's startup/total costs. Maybe I'm missing something, though.

Thanks again!

Best regards,
Etsuro Fujita


Reply via email to