(2019/02/12 18:03), Antonin Houska wrote:
Etsuro Fujita<fujita.ets...@lab.ntt.co.jp>  wrote:
(2019/02/08 2:04), Antonin Houska wrote:
Some comments on coding:

0001-postgres_fdw-Perform-UPPERREL_ORDERED-step-remotely-v3.patch
-----------------------------------------------------------------

* I'm not sure if UPPERREL_ORDERD should to deal with LIMIT at all. Note that
grouping_planner() does not call create_limit_path() until it's done with
create_ordered_paths(), and create_ordered_paths() is where the FDW is
requested to add its paths to UPPERREL_ORDERED relation.

Maybe I'm missing something, but the 0001 patch doesn't deal with LIMIT at
all.  I added the parameter limit_tuples to PgFdwPathExtraData so that we can
pass limit_tuples=-1, which means no LIMIT, to cost_sort(), though.

Yes, the limit_tuples field made me confused. But if cost_sort() is the only
reason, why not simply pass -1 in the 0001 patch, and use the limit_tuples
argument only in 0002? I see several other calls of cost_sort() in the core
where -1 is hard-coded.

Yeah, that would make it easy to review the patch; will do.

Both patches:
------------

One thing that makes me confused when I try to understand details is that the
new functions add_foreign_ordered_paths() and add_foreign_final_paths() both
pass "input_rel" for the "foreignrel" argument of estimate_path_cost_size():

estimate_path_cost_size(root, input_rel, ...)

whereas the existing add_foreign_grouping_paths() passes "grouped_rel":

estimate_path_cost_size(root, grouped_rel, ...)

Can you please make this consistent? If you do, you'll probably need to
reconsider your changes to estimate_path_cost_size().

This is intended and I think I should add more comments on that.  Let me
explain.  These patches modified estimate_path_cost_size() so that we can
estimate the costs of a foreign path for the UPPERREL_ORDERED or
UPPERREL_FINAL rel, by adding the costs of the upper-relation processing (ie,
ORDER BY and/or LIMIT/OFFSET, specified by PgFdwPathExtraData) to the costs of
implementing the *underlying* relation for the upper relation (ie, scan, join
or grouping rel, specified by the input_rel). So those functions call
estimate_path_cost_size() with the input_rel, not the ordered_rel/final_rel,
along with PgFdwPathExtraData.

I think the same already happens for the UPPERREL_GROUP_AGG relation:
estimate_path_cost_size()

        ...
        else if (IS_UPPER_REL(foreignrel))
        {
                ...
                ofpinfo = (PgFdwRelationInfo *) fpinfo->outerrel->fdw_private;

Here "outerrel" actually means input relation from the grouping perspective.
The input relation (i.e. result of scan / join) estimates are retrieved from
"ofpinfo" and the costs of aggregation are added. Why should this be different
for other kinds of upper relations?

IIUC, I think there is an oversight in that case: the cost estimation doesn't account for evaluating the final scan/join target updated by apply_scanjoin_target_to_paths(), but I think that is another issue, so I'll start a new thread.

And maybe related problem: why should FDW care about the effect of
apply_scanjoin_target_to_paths() like you claim to do here?

        /*
         * If this is UPPERREL_ORDERED and/or UPPERREL_FINAL steps on the
         * final scan/join relation, the costs obtained from the cache
         * wouldn't yet contain the eval cost for the final scan/join target,
         * which would have been updated by apply_scanjoin_target_to_paths();
         * add the eval cost now.
         */
        if (fpextra&&   !IS_UPPER_REL(foreignrel))
        {
                /* The costs should have been obtained from the cache. */
                Assert(fpinfo->rel_startup_cost>= 0&&
                        fpinfo->rel_total_cost>= 0);

                startup_cost += foreignrel->reltarget->cost.startup;
                run_cost += foreignrel->reltarget->cost.per_tuple * rows;
        }

I think it should not, whether "foreignrel" means "input_rel" or "output_rel"
from the perspective of postgresGetForeignUpperPaths().

I added this before costing the sort operation below, because 1) the cost of
evaluating the scan/join target might not be zero (consider the case where
sort columns are not simple Vars, for example) and 2) the cost of sorting
takes into account the underlying path's startup/total costs.  Maybe I'm
missing something, though.

My understanding is that the "input_rel" argument of
postgresGetForeignUpperPaths() should already have been processed by
postgresGetForeignPaths() or postgresGetForeignJoinPaths() (or actually by
postgresGetForeignUpperPaths() called with different "stage" too). Therefore
it should already have the "fdw_private" field initialized. The estimates in
the corresponding PgFdwRelationInfo should already include the reltarget
costs, as set previously by estimate_path_cost_size().

Right, but what I'm saying here is: the costs of evaluating the final scan/join target updated by apply_scanjoin_target_to_paths() wouldn't be yet included in the costs we calculated using local statistics when called from postgresGetForeignPaths() or postgresGetForeignJoinPaths(). Let me explain using an example:

    SELECT a+b FROM foreign_table LIMIT 10;

For this query, the reltarget for the foreign_table would be {a, b} when called from postgresGetForeignPaths() (see build_base_rel_tlists()). The costs of evaluating simple Vars are zeroes, so we wouldn't charge any costs for tlist evaluation when estimating the costs of a basic foreign table scan in that callback routine, but the final scan target, with which the reltarget will be replaced later by apply_scanjoin_target_to_paths(), would be {a+b}, so we need to adjust the costs of the basic foreign table scan, cached in the PgFdwRelationInfo for the input_rel (ie, the foreign_table), so that eval costs for the replaced tlist are included when called from postgresGetForeignUpperPaths() with the UPPERREL_FINAL stage, as the costs of evaluating the expression 'a+b' wouldn't be zeroes.

Best regards,
Etsuro Fujita


Reply via email to