Hanada-san, > In addition to your comments, I removed useless code that retrieves > ForeignPath > from outer/inner RelOptInfo and store them into ForeignPath#fdw_private. Now > postgres_fdw’s join pushd-down is free from existence of ForeignPath under the > join relation. This means that we can support the case Robert mentioned > before, > that whole "(huge JOIN large) JOIN small” can be pushed down even if “(huge > JOIN > large)” is dominated by another join path. > Yes, it's definitely reasonable design, and fits intention of the interface. I should point out it from the beginning. :-)
> > "l" of the first SELECT represents a whole-row reference. > > However, underlying SELECT contains system columns in its target- > > list. > > > > Is it available to construct such a query? > > SELECT l.a1, l.a2 FROM (SELECT (id,a,atext), ctid) l (a1, a2) ... > > ^^^^^^^^^^ > > Simple relation reference such as "l" is not sufficient for the purpose, yes. > But putting columns in parentheses would not work when a user column is > referenced > in original query. > > I implemented deparseProjectionSql to use ROW(...) expression for a whole-row > reference in the target list, in addition to ordinary column references for > actually used columns and ctid. > > Please see the test case for mixed use of ctid and whole-row reference to > postgres_fdw’s regression tests. Now a whole-row reference in the remote > query > looks like this: > It seems to me that deparseProjectionSql() works properly. > -- ctid with whole-row reference > EXPLAIN (COSTS false, VERBOSE) > SELECT t1.ctid, t1, t2 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY > t1.c3, > t1.c1 OFFSET 100 LIMIT 10; > > ---------------------------------------------------------------------------- > ---------------------------------------------------------------------------- > ------------------------------------------------------------------------- > Limit > Output: t1.ctid, t1.*, t2.*, t1.c3, t1.c1 > -> Sort > Output: t1.ctid, t1.*, t2.*, t1.c3, t1.c1 > Sort Key: t1.c3, t1.c1 > -> Foreign Scan > Output: t1.ctid, t1.*, t2.*, t1.c3, t1.c1 > Remote SQL: SELECT l.a1, l.a2, l.a3, l.a4, r.a1 FROM (SELECT > l.a7, > ROW(l.a10, l.a11, l.a12, l.a13, l.a14, l.a15, l.a16, l.a17), l.a12, l.a10 FROM > (SELECT "C 1" a10, c2 a11, c3 a12, c4 a13, c5 a14, c6 a15, c7 a1 > (8 rows) > > In fact l.a12 and l.a10, for t1.c3 and t1.c1, are redundant in transferred > data, > but IMO this would simplify the code for deparsing. > I agree. Even if we can reduce networking cost a little, tuple construction takes CPU cycles. Your decision is fair enough for me. > > * merge_fpinfo() > > It seems to me fpinfo->rows should be joinrel->rows, and > > fpinfo->width also should be joinrel->width. > > No need to have special intelligence here, isn't it? > > > Oops. They are vestige of my struggle which disabled SELECT clause > optimization > (omit unused columns). Now width and rows are inherited from joinrel. > Besides > that, fdw_startup_cost and fdw_tuple_cost seem wrong, so I fixed them to use > simple > summary, not average. > Does fpinfo->fdw_startup_cost represent a cost to open connection to remote PostgreSQL, doesn't it? postgres_fdw.c:1757 says as follows: /* * Add some additional cost factors to account for connection overhead * (fdw_startup_cost), transferring data across the network * (fdw_tuple_cost per retrieved row), and local manipulation of the data * (cpu_tuple_cost per retrieved row). */ If so, does a ForeignScan that involves 100 underlying relation takes 100 times heavy network operations on startup? Probably, no. I think, average is better than sum, and max of them will reflect the cost more correctly. Also, fdw_tuple_cost introduce the cost of data transfer over the network. I thinks, weighted average is the best strategy, like: fpinfo->fdw_tuple_cost = (fpinfo_o->width / (fpinfo_o->width + fpinfo_i->width) * fpinfo_o->fdw_tuple_cost + (fpinfo_i->width / (fpinfo_o->width + fpinfo_i->width) * fpinfo_i->fdw_tuple_cost; That's just my suggestion. Please apply the best way you thought. > > * explain output > > > > EXPLAIN output may be a bit insufficient to know what does it > > actually try to do. > > > > postgres=# explain select * from ft1,ft2 where a = b; > > QUERY PLAN > > -------------------------------------------------------- > > Foreign Scan (cost=200.00..212.80 rows=1280 width=80) > > (1 row) > > > > Even though it is not an immediate request, it seems to me > > better to show users joined relations and remote ON/WHERE > > clause here. > > > > Like this? > > Foreign Scan on ft1 INNER JOIN ft2 ON ft1.a = ft2.b (cost=200.00..212.80 > rows=1280 width=80) > … > No. This line is produced by ExplainScanTarget(), so it requires the backend knowledge to individual FDW. Rather than the backend, postgresExplainForeignScan() shall produce the output. > It might produce a very long line in a case of joining many tables because it > contains most of remote query other than SELECT clause, but I prefer detailed. > Another idea is to print “Join Cond” and “Remote Filter” as separated EXPLAIN > items. > It is good, if postgres_fdw can generate relations name involved in the join for each level, and join cond/remote filter individually. > Note that v8 patch doesn’t contain this change yet! > It is a "nice to have" feature. So, I don't think the first commit needs to support this. Just a suggestion in the next step. * implementation suggestion At the deparseJoinSql(), + /* print SELECT clause of the join scan */ + initStringInfo(&selbuf); + i = 0; + foreach(lc, baserel->reltargetlist) + { + Var *var = (Var *) lfirst(lc); + TargetEntry *tle; + + if (i > 0) + appendStringInfoString(&selbuf, ", "); + deparseJoinVar(var, &context); + + tle = makeTargetEntry((Expr *) copyObject(var), + i + 1, pstrdup(""), false); + if (fdw_ps_tlist) + *fdw_ps_tlist = lappend(*fdw_ps_tlist, copyObject(tle)); + + *retrieved_attrs = lappend_int(*retrieved_attrs, i + 1); + + i++; + } The tle is a copy of the original target-entry, and var-node is also copied one. Why is the tle copied on lappend() again? Also, NULL as acceptable as 3rd argument of makeTargetEntry. Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei <kai...@ak.jp.nec.com> -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers