Re: [HACKERS] Join push-down for foreign tables

2011-08-30 Thread Shigeru Hanada
Thanks for the comments.

(2011/08/30 1:42), Tom Lane wrote:
 Costs of ForeignJoinPath are estimated by FDW via new routine
 PlanForeignJoin, and SQL based FDW would need to generate remote SQL
 here.  If a FDW can't push down that join, then it can set disable_cost
 (1.0e10) to tell planner to not choose that path.
 
 disable_cost is not a positive guarantee that a path won't be chosen.
 Particularly not for foreign table accesses, where the estimated costs
 could be pretty darn large in themselves.  You need to pick an API
 wherein refusal is unmistakable.  Probably, returning NULL instead of a
 Path structure is the appropriate way to signal can't do this join.

Agreed.  Returning NULL seems fine.

 In this design, cost of ForeignJoinPath is compared to other join nodes
 such as NestPath and MergePath.  If ForeignJoinPath is the cheapest one
 among the join candidates, planner will generates ForeignJoin plan node
 and put it into plan tree as a leaf node.  In other words, joined
 foreign tables are merged into upper ForeignJoin node.
 
 Hmmm ... are you trying to describe what happens when three or more
 foreign tables are all to be joined at the remote end?

Yes, that's what I wanted to say :)

 I agree that's
 an important use-case, and that we probably want just one Plan node to
 result from it, but I'm less sure about what the Path representation
 ought to be.  It might be better to retain the Path tree showing what
 we'd concluded about what the join order ought to be, with the idea that
 the transmitted query could be constructed to reflect that, saving the
 remote-end planner from having to repeat that work.

It seems a fine solution.  Somehow I thought that one path node should
be mapped to one plan node. In fact, merge join path node might be
expanded to multiple plan nodes, through it's reversed case of foreign
join.  I'm going to implement this idea, and hopefully post proof patch
for next CF.

BTW, Is adding foreign server oid to RelOptInfo acceptable?  This field
is set in build_simple_rel() or build_join_rel() if the RelOptInfo
itself is a foreign scan, or it is a foreign join and both inner and
outer RelOptInfo have same and valid foreign server oid.  I think that
this field could avoid recursive search into foreign join subtree.

Regards,
-- 
Shigeru Hanada

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Join push-down for foreign tables

2011-08-30 Thread Tom Lane
Shigeru Hanada shigeru.han...@gmail.com writes:
 BTW, Is adding foreign server oid to RelOptInfo acceptable?

No objection here.

 This field
 is set in build_simple_rel() or build_join_rel() if the RelOptInfo
 itself is a foreign scan, or it is a foreign join and both inner and
 outer RelOptInfo have same and valid foreign server oid.

I think you mean if all the base rels in the join rel come from the
same foreign server.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Join push-down for foreign tables

2011-08-29 Thread Shigeru Hanada
Hi all,

I'd like to develop pushing down JOIN between foreign tables which are
on one foreign server, to enhance performance of joining foreign tables
by reducing data transfer.  This would need many changes in several part
of PG such as planner, executor and FDW API, so please let me describe
my idea first.  Some of below are taken from discussions in late 2010 to
early 2011 about FDW API.

Basics of planning foreign join
===
In current implementation, planner generates paths for every join
combination candidate, and choose cheapest one for actual execution, but
they can't be pushed down to foreign server even if both inner and outer
of a join were on a foreign server.  To follow mechanism of cost-based
planning, pushing down join between foreign tables would need new path
node, say ForeignJoinPath, is necessary to represent candidate of
pushed-down join.  Without this kind of node, every join node types
would need to care relkind of children recursively, and switch what to
do along it.

A ForeignJoinPath can be used to join any of ForeignScanPath and/or
ForeignJoinPath.  This rule can be applied recursively.

New ForeignJoinPath node would not have any sort key, at least in first
version, because collation would make the issue too complex.

reuse ForeignScan vs new ForeignJoin

For symmetry, ForeignJoin plan node should be added, and used to
represent a foreign scan which includes join between foreign tables.
But I'm not sure that adding new planner node is better.  Should we
enhance ForeignScan to represent this kind of plan?

Cost estimation
===
Costs of ForeignJoinPath are estimated by FDW via new routine
PlanForeignJoin, and SQL based FDW would need to generate remote SQL
here.  If a FDW can't push down that join, then it can set disable_cost
(1.0e10) to tell planner to not choose that path.

Typically, planner would generate NestPath, MergePath, HashPath and
ForeignJoinPath for a pair of joined foreign tables if they are on same
foreign server.  If they were on different servers, ForeignJoinPath
would not be generated.

In this design, cost of ForeignJoinPath is compared to other join nodes
such as NestPath and MergePath.  If ForeignJoinPath is the cheapest one
among the join candidates, planner will generates ForeignJoin plan node
and put it into plan tree as a leaf node.  In other words, joined
foreign tables are merged into upper ForeignJoin node.

Any comments are welcome.

Regards,
-- 
Shigeru Hanada

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Join push-down for foreign tables

2011-08-29 Thread Tom Lane
Shigeru Hanada shigeru.han...@gmail.com writes:
 I'd like to develop pushing down JOIN between foreign tables which are
 on one foreign server, to enhance performance of joining foreign tables
 by reducing data transfer.

This sketch sounds pretty reasonable, with one minor point that's not
going to work:

 Costs of ForeignJoinPath are estimated by FDW via new routine
 PlanForeignJoin, and SQL based FDW would need to generate remote SQL
 here.  If a FDW can't push down that join, then it can set disable_cost
 (1.0e10) to tell planner to not choose that path.

disable_cost is not a positive guarantee that a path won't be chosen.
Particularly not for foreign table accesses, where the estimated costs
could be pretty darn large in themselves.  You need to pick an API
wherein refusal is unmistakable.  Probably, returning NULL instead of a
Path structure is the appropriate way to signal can't do this join.

 In this design, cost of ForeignJoinPath is compared to other join nodes
 such as NestPath and MergePath.  If ForeignJoinPath is the cheapest one
 among the join candidates, planner will generates ForeignJoin plan node
 and put it into plan tree as a leaf node.  In other words, joined
 foreign tables are merged into upper ForeignJoin node.

Hmmm ... are you trying to describe what happens when three or more
foreign tables are all to be joined at the remote end?  I agree that's
an important use-case, and that we probably want just one Plan node to
result from it, but I'm less sure about what the Path representation
ought to be.  It might be better to retain the Path tree showing what
we'd concluded about what the join order ought to be, with the idea that
the transmitted query could be constructed to reflect that, saving the
remote-end planner from having to repeat that work.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers