PFA the patch with all the comments addressed. On Tue, Oct 13, 2015 at 10:07 PM, Robert Haas <robertmh...@gmail.com> wrote:
> On Tue, Oct 13, 2015 at 3:29 AM, Ashutosh Bapat > <ashutosh.ba...@enterprisedb.com> wrote: > >> - You consider pushing down ORDER BY if any prefix of the query > >> pathkeys satisfy is_foreign_expr(), but that doesn't seem right to me. > >> If the user says SELECT * FROM remotetab ORDER BY a, unsafe(a), > >> ordering the result set by a doesn't help us much. We've talked a few > >> times about an incremental sort capability that would take a stream of > >> tuples already ordered by one or more columns and sort each group by > >> additional columns, but I don't think we have that currently. Without > >> that capability, I don't think there's much benefit in sorting by a > >> prefix of the pathkeys. I suspect that if we can't get them all, it's > >> not worth doing. > > > > I somehow thought, we are using output, which is ordered by prefix of > > pathkeys in Sort nodes. But as you rightly pointed out that's not the > case. > > Only complete pathkeys are useful. > > A truncated list of pathkeys is useful for merge joins, but not for > toplevel ordering. > Ok. Taken care in the attached patch. > > >> - Right now, you have this code below the point where we bail out if > >> use_remote_estimate is not set. If we keep it like that, the comment > >> needs updating. But I suggest that we consider an ordered path even > >> if we are not using remote estimates. Estimate the sorted path to > >> cost somewhat more than the unsorted path, so that we only choose that > >> path if the sort actually benefits us. I don't know exactly how to > >> come up with a principled estimate given that we don't actually know > >> whether the remote side will need an extra sort or not, but maybe a > >> dumb estimate is still better than not trying a sorted path. > > > > I like that idea, although there are two questions > > 1. How can we estimate cost of getting the data sorted? If there is an > > appropriate index on foreign server we can get the data sorted at no > extra > > cost. If there isn't the cost of sorting is proportionate to NlogN where > N > > is the size of data. It seems unreasonable to arrive at the cost of > sorting > > by multiplying with some constant multiplier. Also, the constant > multiplier > > to the NlogN estimate depends heavily upon the properties of foreign > server > > e.g. size of memory available for sorting, disk and CPU speed etc. The > last > > two might have got factored into fdw_tuple_cost and fdw_startup_cost, so > > that's probably taken care of. If the estimate we come up turns out to be > > too pessimistic, we will not get sorted data even if that's the right > thing > > to do. If too optimistic, we will incur heavy cost at the time of > execution. > > Setting the cost estimate to some constant factor of NlogN would be too > > pessimistic if there is an appropriate index on foreign server. Otherway > > round if there isn't an appropriate index on foreign server. > > > > Even if we leave these arguments aside for a while, the question remains > as > > to what should be the constant factor 10% or 20% or 50% or 100% or > something > > else on top of the estimate for simple foreign table scan estimates (or > > NlogN of that)? I am unable to justify any of these factors myself. What > do > > you say? > > I think we want to estimate the cost in such a way that we'll tend to > pick the ordered path if it's useful, but skip it if it's not. So, > say we pick 10%. That's definitely enough that we won't pick a remote > sort when it's useless, but it's small enough that if a remote sort is > useful, we will probably choose to do it. I think that's what we > want. I believe we should err on the side of a small estimate because > it's generally better to do as much work as possible on the remote > side. In some cases the sort may turn out to be free at execution > time because the remote server was going to generate the results in > that order anyway, and it may know that because of its own pathkeys, > and thus be able to skip the explicit ordering step. > The patch uses a factor of 1.1 (10% increase) to multiple the startup and total costs in fpinfo for unsorted data. This change has caused the plans for few queries in the test postgres_fdw to change. There is ORDER BY and LIMIT clause in the queries in postgres_fdw testcase to keep test outputs sane and consistent. These ORDER BY clauses are not being pushed down the foreign servers. I tried using values upto 2 for this but still the foreign paths with pathkeys won over those without pathkeys. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c index 697de60..25d8650 100644 --- a/contrib/postgres_fdw/deparse.c +++ b/contrib/postgres_fdw/deparse.c @@ -186,23 +186,26 @@ is_foreign_expr(PlannerInfo *root, * Check that the expression consists of nodes that are safe to execute * remotely. */ glob_cxt.root = root; glob_cxt.foreignrel = baserel; loc_cxt.collation = InvalidOid; loc_cxt.state = FDW_COLLATE_NONE; if (!foreign_expr_walker((Node *) expr, &glob_cxt, &loc_cxt)) return false; - /* Expressions examined here should be boolean, ie noncollatable */ - Assert(loc_cxt.collation == InvalidOid); - Assert(loc_cxt.state == FDW_COLLATE_NONE); + /* + * The collation of the expression should be none or originate from a + * foreign var. + */ + Assert(loc_cxt.state == FDW_COLLATE_NONE || + loc_cxt.state == FDW_COLLATE_SAFE); /* * An expression which includes any mutable functions can't be sent over * because its result is not stable. For example, sending now() remote * side could cause confusion from clock offsets. Future versions might * be able to make this choice with more granularity. (We check this last * because it requires a lot of expensive catalog lookups.) */ if (contain_mutable_functions((Node *) expr)) return false; @@ -1870,10 +1873,50 @@ printRemoteParam(int paramindex, Oid paramtype, int32 paramtypmod, */ static void printRemotePlaceholder(Oid paramtype, int32 paramtypmod, deparse_expr_cxt *context) { StringInfo buf = context->buf; char *ptypename = format_type_with_typemod(paramtype, paramtypmod); appendStringInfo(buf, "((SELECT null::%s)::%s)", ptypename, ptypename); } + +void +appendOrderByClause(StringInfo buf, PlannerInfo *root, RelOptInfo *baserel, List *pathkeys) +{ + ListCell *lcell; + deparse_expr_cxt context; + int nestlevel; + char *delim = " "; + + /* Set up context struct for recursion */ + context.root = root; + context.foreignrel = baserel; + context.buf = buf; + context.params_list = NULL; + + /* Make sure any constants in the exprs are printed portably */ + nestlevel = set_transmission_modes(); + + appendStringInfo(buf, " ORDER BY"); + foreach(lcell, pathkeys) + { + PathKey *pathkey = lfirst(lcell); + Expr *em_expr; + + em_expr = find_em_expr_for_rel(pathkey->pk_eclass, baserel); + Assert(em_expr); + appendStringInfo(buf, "%s", delim); + deparseExpr(em_expr, &context); + if (pathkey->pk_strategy == BTLessStrategyNumber) + appendStringInfo(buf, " ASC"); + else + appendStringInfo(buf, " DESC"); + + if (pathkey->pk_nulls_first) + appendStringInfo(buf, " NULLS FIRST"); + + delim = ", "; + } + reset_transmission_modes(nestlevel); +} diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out index 65ea6e8..e8d9e21 100644 --- a/contrib/postgres_fdw/expected/postgres_fdw.out +++ b/contrib/postgres_fdw/expected/postgres_fdw.out @@ -129,84 +129,76 @@ ALTER FOREIGN TABLE ft2 ALTER COLUMN c1 OPTIONS (column_name 'C 1'); -- Now we should be able to run ANALYZE. -- To exercise multiple code paths, we use local stats on ft1 -- and remote-estimate mode on ft2. ANALYZE ft1; ALTER FOREIGN TABLE ft2 OPTIONS (use_remote_estimate 'true'); -- =================================================================== -- simple queries -- =================================================================== -- single table, with/without alias EXPLAIN (COSTS false) SELECT * FROM ft1 ORDER BY c3, c1 OFFSET 100 LIMIT 10; - QUERY PLAN ---------------------------------- + QUERY PLAN +--------------------------- Limit - -> Sort - Sort Key: c3, c1 - -> Foreign Scan on ft1 -(4 rows) + -> Foreign Scan on ft1 +(2 rows) SELECT * FROM ft1 ORDER BY c3, c1 OFFSET 100 LIMIT 10; c1 | c2 | c3 | c4 | c5 | c6 | c7 | c8 -----+----+-------+------------------------------+--------------------------+----+------------+----- 101 | 1 | 00101 | Fri Jan 02 00:00:00 1970 PST | Fri Jan 02 00:00:00 1970 | 1 | 1 | foo 102 | 2 | 00102 | Sat Jan 03 00:00:00 1970 PST | Sat Jan 03 00:00:00 1970 | 2 | 2 | foo 103 | 3 | 00103 | Sun Jan 04 00:00:00 1970 PST | Sun Jan 04 00:00:00 1970 | 3 | 3 | foo 104 | 4 | 00104 | Mon Jan 05 00:00:00 1970 PST | Mon Jan 05 00:00:00 1970 | 4 | 4 | foo 105 | 5 | 00105 | Tue Jan 06 00:00:00 1970 PST | Tue Jan 06 00:00:00 1970 | 5 | 5 | foo 106 | 6 | 00106 | Wed Jan 07 00:00:00 1970 PST | Wed Jan 07 00:00:00 1970 | 6 | 6 | foo 107 | 7 | 00107 | Thu Jan 08 00:00:00 1970 PST | Thu Jan 08 00:00:00 1970 | 7 | 7 | foo 108 | 8 | 00108 | Fri Jan 09 00:00:00 1970 PST | Fri Jan 09 00:00:00 1970 | 8 | 8 | foo 109 | 9 | 00109 | Sat Jan 10 00:00:00 1970 PST | Sat Jan 10 00:00:00 1970 | 9 | 9 | foo 110 | 0 | 00110 | Sun Jan 11 00:00:00 1970 PST | Sun Jan 11 00:00:00 1970 | 0 | 0 | foo (10 rows) EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft1 t1 ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10; - QUERY PLAN -------------------------------------------------------------------------------------- + QUERY PLAN +---------------------------------------------------------------------------------------------------------- Limit Output: c1, c2, c3, c4, c5, c6, c7, c8 - -> Sort + -> Foreign Scan on public.ft1 t1 Output: c1, c2, c3, c4, c5, c6, c7, c8 - Sort Key: t1.c3, t1.c1 - -> Foreign Scan on public.ft1 t1 - Output: c1, c2, c3, c4, c5, c6, c7, c8 - Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" -(8 rows) + Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" ORDER BY c3 ASC, "C 1" ASC +(5 rows) SELECT * FROM ft1 t1 ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10; c1 | c2 | c3 | c4 | c5 | c6 | c7 | c8 -----+----+-------+------------------------------+--------------------------+----+------------+----- 101 | 1 | 00101 | Fri Jan 02 00:00:00 1970 PST | Fri Jan 02 00:00:00 1970 | 1 | 1 | foo 102 | 2 | 00102 | Sat Jan 03 00:00:00 1970 PST | Sat Jan 03 00:00:00 1970 | 2 | 2 | foo 103 | 3 | 00103 | Sun Jan 04 00:00:00 1970 PST | Sun Jan 04 00:00:00 1970 | 3 | 3 | foo 104 | 4 | 00104 | Mon Jan 05 00:00:00 1970 PST | Mon Jan 05 00:00:00 1970 | 4 | 4 | foo 105 | 5 | 00105 | Tue Jan 06 00:00:00 1970 PST | Tue Jan 06 00:00:00 1970 | 5 | 5 | foo 106 | 6 | 00106 | Wed Jan 07 00:00:00 1970 PST | Wed Jan 07 00:00:00 1970 | 6 | 6 | foo 107 | 7 | 00107 | Thu Jan 08 00:00:00 1970 PST | Thu Jan 08 00:00:00 1970 | 7 | 7 | foo 108 | 8 | 00108 | Fri Jan 09 00:00:00 1970 PST | Fri Jan 09 00:00:00 1970 | 8 | 8 | foo 109 | 9 | 00109 | Sat Jan 10 00:00:00 1970 PST | Sat Jan 10 00:00:00 1970 | 9 | 9 | foo 110 | 0 | 00110 | Sun Jan 11 00:00:00 1970 PST | Sun Jan 11 00:00:00 1970 | 0 | 0 | foo (10 rows) -- whole-row reference EXPLAIN (VERBOSE, COSTS false) SELECT t1 FROM ft1 t1 ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10; - QUERY PLAN -------------------------------------------------------------------------------------- + QUERY PLAN +---------------------------------------------------------------------------------------------------------- Limit Output: t1.*, c3, c1 - -> Sort + -> Foreign Scan on public.ft1 t1 Output: t1.*, c3, c1 - Sort Key: t1.c3, t1.c1 - -> Foreign Scan on public.ft1 t1 - Output: t1.*, c3, c1 - Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" -(8 rows) + Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" ORDER BY c3 ASC, "C 1" ASC +(5 rows) SELECT t1 FROM ft1 t1 ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10; t1 -------------------------------------------------------------------------------------------- (101,1,00101,"Fri Jan 02 00:00:00 1970 PST","Fri Jan 02 00:00:00 1970",1,"1 ",foo) (102,2,00102,"Sat Jan 03 00:00:00 1970 PST","Sat Jan 03 00:00:00 1970",2,"2 ",foo) (103,3,00103,"Sun Jan 04 00:00:00 1970 PST","Sun Jan 04 00:00:00 1970",3,"3 ",foo) (104,4,00104,"Mon Jan 05 00:00:00 1970 PST","Mon Jan 05 00:00:00 1970",4,"4 ",foo) (105,5,00105,"Tue Jan 06 00:00:00 1970 PST","Tue Jan 06 00:00:00 1970",5,"5 ",foo) (106,6,00106,"Wed Jan 07 00:00:00 1970 PST","Wed Jan 07 00:00:00 1970",6,"6 ",foo) @@ -643,20 +635,46 @@ SELECT * FROM ft1 WHERE c1 = ANY (ARRAY(SELECT c1 FROM ft2 WHERE c1 < 5)); SELECT * FROM ft2 WHERE c1 = ANY (ARRAY(SELECT c1 FROM ft1 WHERE c1 < 5)); c1 | c2 | c3 | c4 | c5 | c6 | c7 | c8 ----+----+-------+------------------------------+--------------------------+----+------------+----- 1 | 1 | 00001 | Fri Jan 02 00:00:00 1970 PST | Fri Jan 02 00:00:00 1970 | 1 | 1 | foo 2 | 2 | 00002 | Sat Jan 03 00:00:00 1970 PST | Sat Jan 03 00:00:00 1970 | 2 | 2 | foo 3 | 3 | 00003 | Sun Jan 04 00:00:00 1970 PST | Sun Jan 04 00:00:00 1970 | 3 | 3 | foo 4 | 4 | 00004 | Mon Jan 05 00:00:00 1970 PST | Mon Jan 05 00:00:00 1970 | 4 | 4 | foo (4 rows) +-- pushing down pathkeys to the foreign server +EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft2 t2 ORDER BY t2.c1 OFFSET 100 LIMIT 10; + QUERY PLAN +-------------------------------------------------------------------------------------------------- + Limit + Output: c1, c2, c3, c4, c5, c6, c7, c8 + -> Foreign Scan on public.ft2 t2 + Output: c1, c2, c3, c4, c5, c6, c7, c8 + Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" ORDER BY "C 1" ASC +(5 rows) + +SELECT * FROM ft2 t2 ORDER BY t2.c1 OFFSET 100 LIMIT 10; + c1 | c2 | c3 | c4 | c5 | c6 | c7 | c8 +-----+----+-------+------------------------------+--------------------------+----+------------+----- + 101 | 1 | 00101 | Fri Jan 02 00:00:00 1970 PST | Fri Jan 02 00:00:00 1970 | 1 | 1 | foo + 102 | 2 | 00102 | Sat Jan 03 00:00:00 1970 PST | Sat Jan 03 00:00:00 1970 | 2 | 2 | foo + 103 | 3 | 00103 | Sun Jan 04 00:00:00 1970 PST | Sun Jan 04 00:00:00 1970 | 3 | 3 | foo + 104 | 4 | 00104 | Mon Jan 05 00:00:00 1970 PST | Mon Jan 05 00:00:00 1970 | 4 | 4 | foo + 105 | 5 | 00105 | Tue Jan 06 00:00:00 1970 PST | Tue Jan 06 00:00:00 1970 | 5 | 5 | foo + 106 | 6 | 00106 | Wed Jan 07 00:00:00 1970 PST | Wed Jan 07 00:00:00 1970 | 6 | 6 | foo + 107 | 7 | 00107 | Thu Jan 08 00:00:00 1970 PST | Thu Jan 08 00:00:00 1970 | 7 | 7 | foo + 108 | 8 | 00108 | Fri Jan 09 00:00:00 1970 PST | Fri Jan 09 00:00:00 1970 | 8 | 8 | foo + 109 | 9 | 00109 | Sat Jan 10 00:00:00 1970 PST | Sat Jan 10 00:00:00 1970 | 9 | 9 | foo + 110 | 0 | 00110 | Sun Jan 11 00:00:00 1970 PST | Sun Jan 11 00:00:00 1970 | 0 | 0 | foo +(10 rows) + -- =================================================================== -- parameterized queries -- =================================================================== -- simple join PREPARE st1(int, int) AS SELECT t1.c3, t2.c3 FROM ft1 t1, ft2 t2 WHERE t1.c1 = $1 AND t2.c1 = $2; EXPLAIN (VERBOSE, COSTS false) EXECUTE st1(1, 2); QUERY PLAN -------------------------------------------------------------------- Nested Loop Output: t1.c3, t2.c3 @@ -676,71 +694,62 @@ EXECUTE st1(1, 1); EXECUTE st1(101, 101); c3 | c3 -------+------- 00101 | 00101 (1 row) -- subquery using stable function (can't be sent to remote) PREPARE st2(int) AS SELECT * FROM ft1 t1 WHERE t1.c1 < $2 AND t1.c3 IN (SELECT c3 FROM ft2 t2 WHERE c1 > $1 AND date(c4) = '1970-01-17'::date) ORDER BY c1; EXPLAIN (VERBOSE, COSTS false) EXECUTE st2(10, 20); - QUERY PLAN ----------------------------------------------------------------------------------------------------------- - Sort + QUERY PLAN +----------------------------------------------------------------------------------------------------------------------- + Nested Loop Semi Join Output: t1.c1, t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8 - Sort Key: t1.c1 - -> Nested Loop Semi Join + -> Foreign Scan on public.ft1 t1 Output: t1.c1, t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8 - Join Filter: (t1.c3 = t2.c3) - -> Foreign Scan on public.ft1 t1 - Output: t1.c1, t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8 - Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" < 20)) - -> Materialize - Output: t2.c3 - -> Foreign Scan on public.ft2 t2 - Output: t2.c3 - Filter: (date(t2.c4) = '01-17-1970'::date) - Remote SQL: SELECT c3, c4 FROM "S 1"."T 1" WHERE (("C 1" > 10)) -(15 rows) + Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" < 20)) ORDER BY "C 1" ASC + -> Foreign Scan on public.ft2 t2 + Output: t2.c3 + Filter: (date(t2.c4) = '01-17-1970'::date) + Remote SQL: SELECT c3, c4 FROM "S 1"."T 1" WHERE (("C 1" > 10)) AND (($1::text = c3)) +(9 rows) EXECUTE st2(10, 20); c1 | c2 | c3 | c4 | c5 | c6 | c7 | c8 ----+----+-------+------------------------------+--------------------------+----+------------+----- 16 | 6 | 00016 | Sat Jan 17 00:00:00 1970 PST | Sat Jan 17 00:00:00 1970 | 6 | 6 | foo (1 row) EXECUTE st2(101, 121); c1 | c2 | c3 | c4 | c5 | c6 | c7 | c8 -----+----+-------+------------------------------+--------------------------+----+------------+----- 116 | 6 | 00116 | Sat Jan 17 00:00:00 1970 PST | Sat Jan 17 00:00:00 1970 | 6 | 6 | foo (1 row) -- subquery using immutable function (can be sent to remote) PREPARE st3(int) AS SELECT * FROM ft1 t1 WHERE t1.c1 < $2 AND t1.c3 IN (SELECT c3 FROM ft2 t2 WHERE c1 > $1 AND date(c5) = '1970-01-17'::date) ORDER BY c1; EXPLAIN (VERBOSE, COSTS false) EXECUTE st3(10, 20); QUERY PLAN ----------------------------------------------------------------------------------------------------------------------- - Sort + Nested Loop Semi Join Output: t1.c1, t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8 - Sort Key: t1.c1 - -> Nested Loop Semi Join + Join Filter: (t1.c3 = t2.c3) + -> Foreign Scan on public.ft1 t1 Output: t1.c1, t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8 - Join Filter: (t1.c3 = t2.c3) - -> Foreign Scan on public.ft1 t1 - Output: t1.c1, t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8 - Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" < 20)) - -> Materialize + Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" < 20)) ORDER BY "C 1" ASC + -> Materialize + Output: t2.c3 + -> Foreign Scan on public.ft2 t2 Output: t2.c3 - -> Foreign Scan on public.ft2 t2 - Output: t2.c3 - Remote SQL: SELECT c3 FROM "S 1"."T 1" WHERE (("C 1" > 10)) AND ((date(c5) = '1970-01-17'::date)) -(14 rows) + Remote SQL: SELECT c3 FROM "S 1"."T 1" WHERE (("C 1" > 10)) AND ((date(c5) = '1970-01-17'::date)) +(11 rows) EXECUTE st3(10, 20); c1 | c2 | c3 | c4 | c5 | c6 | c7 | c8 ----+----+-------+------------------------------+--------------------------+----+------------+----- 16 | 6 | 00016 | Sat Jan 17 00:00:00 1970 PST | Sat Jan 17 00:00:00 1970 | 6 | 6 | foo (1 row) EXECUTE st3(20, 30); c1 | c2 | c3 | c4 | c5 | c6 | c7 | c8 ----+----+----+----+----+----+----+---- diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c index e4d799c..d2ead22 100644 --- a/contrib/postgres_fdw/postgres_fdw.c +++ b/contrib/postgres_fdw/postgres_fdw.c @@ -40,20 +40,21 @@ #include "utils/sampling.h" PG_MODULE_MAGIC; /* Default CPU cost to start up a foreign query. */ #define DEFAULT_FDW_STARTUP_COST 100.0 /* Default CPU cost to process 1 row (above and beyond cpu_tuple_cost). */ #define DEFAULT_FDW_TUPLE_COST 0.01 +#define DEFAULT_FDW_SORT_MULTIPLIER 1.1 /* * FDW-specific planner information kept in RelOptInfo.fdw_private for a * foreign table. This information is collected by postgresGetForeignRelSize. */ typedef struct PgFdwRelationInfo { /* baserestrictinfo clauses, broken down into safe and unsafe subsets. */ List *remote_conds; List *local_conds; @@ -289,20 +290,21 @@ static bool postgresAnalyzeForeignTable(Relation relation, BlockNumber *totalpages); static List *postgresImportForeignSchema(ImportForeignSchemaStmt *stmt, Oid serverOid); /* * Helper functions */ static void estimate_path_cost_size(PlannerInfo *root, RelOptInfo *baserel, List *join_conds, + List *pathkeys, double *p_rows, int *p_width, Cost *p_startup_cost, Cost *p_total_cost); static void get_remote_estimate(const char *sql, PGconn *conn, double *rows, int *width, Cost *startup_cost, Cost *total_cost); static bool ec_member_matches_foreign(PlannerInfo *root, RelOptInfo *rel, EquivalenceClass *ec, EquivalenceMember *em, @@ -490,21 +492,21 @@ postgresGetForeignRelSize(PlannerInfo *root, * average row width. Otherwise, estimate using whatever statistics we * have locally, in a way similar to ordinary tables. */ if (fpinfo->use_remote_estimate) { /* * Get cost/size estimates with help of remote server. Save the * values in fpinfo so we don't need to do it again to generate the * basic foreign path. */ - estimate_path_cost_size(root, baserel, NIL, + estimate_path_cost_size(root, baserel, NIL, NIL, &fpinfo->rows, &fpinfo->width, &fpinfo->startup_cost, &fpinfo->total_cost); /* Report estimated baserel size to planner. */ baserel->rows = fpinfo->rows; baserel->width = fpinfo->width; } else { /* @@ -520,57 +522,126 @@ postgresGetForeignRelSize(PlannerInfo *root, { baserel->pages = 10; baserel->tuples = (10 * BLCKSZ) / (baserel->width + MAXALIGN(SizeofHeapTupleHeader)); } /* Estimate baserel size as best we can with local statistics. */ set_baserel_size_estimates(root, baserel); /* Fill in basically-bogus cost estimates for use later. */ - estimate_path_cost_size(root, baserel, NIL, + estimate_path_cost_size(root, baserel, NIL, NIL, &fpinfo->rows, &fpinfo->width, &fpinfo->startup_cost, &fpinfo->total_cost); } } /* * postgresGetForeignPaths * Create possible scan paths for a scan on the foreign table */ static void postgresGetForeignPaths(PlannerInfo *root, RelOptInfo *baserel, Oid foreigntableid) { PgFdwRelationInfo *fpinfo = (PgFdwRelationInfo *) baserel->fdw_private; ForeignPath *path; List *ppi_list; ListCell *lc; + List *usable_pathkeys = NIL; /* * Create simplest ForeignScan path node and add it to baserel. This path * corresponds to SeqScan path of regular tables (though depending on what * baserestrict conditions we were able to send to remote, there might * actually be an indexscan happening there). We already did all the work * to estimate cost and size of this path. */ path = create_foreignscan_path(root, baserel, fpinfo->rows, fpinfo->startup_cost, fpinfo->total_cost, NIL, /* no pathkeys */ NULL, /* no outer rel either */ NIL); /* no fdw_private list */ add_path(baserel, (Path *) path); /* + * If root->query_pathkeys belongs entirely to this baserel and + * the expressions involved can be evaluated on the foreign server, getting + * the rows sorted according to those pathkeys might be better than sorting + * the data locally. There can be indexes on the foreign server which can + * be used to sort the data faster at the foreign server. + */ + foreach(lc, root->query_pathkeys) + { + PathKey *pathkey = (PathKey *)lfirst(lc); + EquivalenceClass *pathkey_ec = pathkey->pk_eclass; + Expr *em_expr = find_em_expr_for_rel(pathkey_ec, baserel); + + if (em_expr && is_foreign_expr(root, baserel, em_expr)) + usable_pathkeys = lappend(usable_pathkeys, pathkey); + else + { + /* We found a pathkey which doesn't belong to given relation. */ + list_free(usable_pathkeys); + usable_pathkeys = NIL; + break; + } + } + + /* Create a path with useful pathkeys, if we found one. */ + if (usable_pathkeys) + { + double rows; + int width; + Cost startup_cost; + Cost total_cost; + + /* + * Use estimates from the foreign server if allowed, otherwise, cook + * them locally. + */ + if (fpinfo->use_remote_estimate) + estimate_path_cost_size(root, baserel, NIL, usable_pathkeys, + &rows, &width, &startup_cost, &total_cost); + else + { + /* + * We are not allowed to consult the foreign server for estimating + * cost of sorting. If there are indexes on the columns on which we + * expect the data to be sorted and they cover the columns we fetch, + * we will get the data sorted at no extra cost. But if that's not + * the case, foreign server will incur cost for sorting it. Assume + * it to be 10% higher than the cost for getting the data unsorted. + * That is large enough not to make remote sort attractive when it's + * useless, but it's small enough that to make a remote sort + * attracive when useful. Erring on the smaller estimate side is + * generally better to do as much work as possible on the + * remote side. + */ + fpinfo->startup_cost = fpinfo->startup_cost * DEFAULT_FDW_SORT_MULTIPLIER; + fpinfo->total_cost = fpinfo->total_cost * DEFAULT_FDW_SORT_MULTIPLIER; + } + + path = create_foreignscan_path(root, baserel, + rows, + startup_cost, + total_cost, + usable_pathkeys, + NULL, + NIL); + add_path(baserel, (Path *)path); + } + + /* * If we're not using remote estimates, stop here. We have no way to * estimate whether any join clauses would be worth sending across, so * don't bother building parameterized paths. */ if (!fpinfo->use_remote_estimate) return; /* * Thumb through all join clauses for the rel to identify which outer * relations could supply one or more safe-to-send-to-remote join clauses. @@ -703,21 +774,21 @@ postgresGetForeignPaths(PlannerInfo *root, foreach(lc, ppi_list) { ParamPathInfo *param_info = (ParamPathInfo *) lfirst(lc); double rows; int width; Cost startup_cost; Cost total_cost; /* Get a cost estimate from the remote */ estimate_path_cost_size(root, baserel, - param_info->ppi_clauses, + param_info->ppi_clauses, NIL, &rows, &width, &startup_cost, &total_cost); /* * ppi_rows currently won't get looked at by anything, but still we * may as well ensure that it matches our idea of the rowcount. */ param_info->ppi_rows = rows; /* Make the path */ @@ -797,20 +868,24 @@ postgresGetForeignPlan(PlannerInfo *root, * Build the query string to be sent for execution, and identify * expressions to be sent as parameters. */ initStringInfo(&sql); deparseSelectSql(&sql, root, baserel, fpinfo->attrs_used, &retrieved_attrs); if (remote_conds) appendWhereClause(&sql, root, baserel, remote_conds, true, ¶ms_list); + /* Add ORDER BY clause if we found any useful pathkeys */ + if (best_path->path.pathkeys) + appendOrderByClause(&sql, root, baserel, best_path->path.pathkeys); + /* * Add FOR UPDATE/SHARE if appropriate. We apply locking during the * initial row fetch, rather than later on as is done for local tables. * The extra roundtrips involved in trying to duplicate the local * semantics exactly don't seem worthwhile (see also comments for * RowMarkType). * * Note: because we actually run the query as a cursor, this assumes that * DECLARE CURSOR ... FOR UPDATE is supported, which it isn't before 8.3. */ @@ -1713,20 +1788,21 @@ postgresExplainForeignModify(ModifyTableState *mtstate, * estimate_path_cost_size * Get cost and size estimates for a foreign scan * * We assume that all the baserestrictinfo clauses will be applied, plus * any join clauses listed in join_conds. */ static void estimate_path_cost_size(PlannerInfo *root, RelOptInfo *baserel, List *join_conds, + List *pathkeys, double *p_rows, int *p_width, Cost *p_startup_cost, Cost *p_total_cost) { PgFdwRelationInfo *fpinfo = (PgFdwRelationInfo *) baserel->fdw_private; double rows; double retrieved_rows; int width; Cost startup_cost; Cost total_cost; Cost run_cost; @@ -1765,20 +1841,23 @@ estimate_path_cost_size(PlannerInfo *root, appendStringInfoString(&sql, "EXPLAIN "); deparseSelectSql(&sql, root, baserel, fpinfo->attrs_used, &retrieved_attrs); if (fpinfo->remote_conds) appendWhereClause(&sql, root, baserel, fpinfo->remote_conds, true, NULL); if (remote_join_conds) appendWhereClause(&sql, root, baserel, remote_join_conds, (fpinfo->remote_conds == NIL), NULL); + if (pathkeys) + appendOrderByClause(&sql, root, baserel, pathkeys); + /* Get the remote estimate */ conn = GetConnection(fpinfo->server, fpinfo->user, false); get_remote_estimate(sql.data, conn, &rows, &width, &startup_cost, &total_cost); ReleaseConnection(conn); retrieved_rows = rows; /* Factor in the selectivity of the locally-checked quals */ local_sel = clauselist_selectivity(root, @@ -2987,10 +3066,39 @@ static void conversion_error_callback(void *arg) { ConversionLocation *errpos = (ConversionLocation *) arg; TupleDesc tupdesc = RelationGetDescr(errpos->rel); if (errpos->cur_attno > 0 && errpos->cur_attno <= tupdesc->natts) errcontext("column \"%s\" of foreign table \"%s\"", NameStr(tupdesc->attrs[errpos->cur_attno - 1]->attname), RelationGetRelationName(errpos->rel)); } + +/* + * find_em_expr_for_rel + * Find an equivalence class member expression, all Vars in which, belong to the + * given relation. + */ +extern Expr * +find_em_expr_for_rel(EquivalenceClass *ec, RelOptInfo *rel) +{ + ListCell *lc_em; + foreach(lc_em, ec->ec_members) + { + EquivalenceMember *em = lfirst(lc_em); + + if (bms_equal(em->em_relids, rel->relids)) + { + /* + * Found at least one expression whose all Vars come from given + * relation. If there are more than one of those, all of them + * will be equivalent. It's sufficient to find any one of such + * expressions. + */ + return em->em_expr; + } + } + + /* We didn't find any suitable equivalence class expression */ + return NULL; +} diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h index 3835ddb..8956cd2 100644 --- a/contrib/postgres_fdw/postgres_fdw.h +++ b/contrib/postgres_fdw/postgres_fdw.h @@ -67,12 +67,15 @@ extern void deparseUpdateSql(StringInfo buf, PlannerInfo *root, List *targetAttrs, List *returningList, List **retrieved_attrs); extern void deparseDeleteSql(StringInfo buf, PlannerInfo *root, Index rtindex, Relation rel, List *returningList, List **retrieved_attrs); extern void deparseAnalyzeSizeSql(StringInfo buf, Relation rel); extern void deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs); extern void deparseStringLiteral(StringInfo buf, const char *val); +extern Expr *find_em_expr_for_rel(EquivalenceClass *ec, RelOptInfo *rel); +extern void appendOrderByClause(StringInfo buf, PlannerInfo *root, + RelOptInfo *baserel, List *pathkeys); #endif /* POSTGRES_FDW_H */ diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql index 11160f8..8f45238 100644 --- a/contrib/postgres_fdw/sql/postgres_fdw.sql +++ b/contrib/postgres_fdw/sql/postgres_fdw.sql @@ -207,20 +207,23 @@ EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft2 a, ft2 b WHERE a.c1 = 47 AND b.c1 = a.c2; -- check both safe and unsafe join conditions EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft2 a, ft2 b WHERE a.c2 = 6 AND b.c1 = a.c1 AND a.c8 = 'foo' AND b.c7 = upper(a.c7); SELECT * FROM ft2 a, ft2 b WHERE a.c2 = 6 AND b.c1 = a.c1 AND a.c8 = 'foo' AND b.c7 = upper(a.c7); -- bug before 9.3.5 due to sloppy handling of remote-estimate parameters SELECT * FROM ft1 WHERE c1 = ANY (ARRAY(SELECT c1 FROM ft2 WHERE c1 < 5)); SELECT * FROM ft2 WHERE c1 = ANY (ARRAY(SELECT c1 FROM ft1 WHERE c1 < 5)); +-- pushing down pathkeys to the foreign server +EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft2 t2 ORDER BY t2.c1 OFFSET 100 LIMIT 10; +SELECT * FROM ft2 t2 ORDER BY t2.c1 OFFSET 100 LIMIT 10; -- =================================================================== -- parameterized queries -- =================================================================== -- simple join PREPARE st1(int, int) AS SELECT t1.c3, t2.c3 FROM ft1 t1, ft2 t2 WHERE t1.c1 = $1 AND t2.c1 = $2; EXPLAIN (VERBOSE, COSTS false) EXECUTE st1(1, 2); EXECUTE st1(1, 1); EXECUTE st1(101, 101); -- subquery using stable function (can't be sent to remote)
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers