> On Wed, Mar 18, 2015 at 9:33 AM, Kouhei Kaigai <[email protected]> wrote: > >> On Wed, Mar 18, 2015 at 2:34 AM, Kouhei Kaigai <[email protected]> > >> wrote: > >> > So, overall consensus for the FDW hook location is just before the > >> > set_chepest() > >> > at standard_join_search() and merge_clump(), isn't it? > >> > >> Yes, I think so. > >> > >> > Let me make a design of FDW hook to reduce code duplications for each > >> > FDW driver, > >> > especially, to identify baserel/joinrel to be involved in this join. > >> > >> Great, thanks! > >> > >> One issue, which I think Ashutosh alluded to upthread, is that we need > >> to make sure it's not unreasonably difficult for foreign data wrappers > >> to construct the FROM clause of an SQL query to be pushed down to the > >> remote side. It should be simple when there are only inner joins > >> involved, but when there are all outer joins it might be a bit > >> complex. It would be very good if someone could try to write that > >> code, based on the new hook locations, and see how it turns out, so > >> that we can figure out how to address any issues that may crop up > >> there. > >> > > Here is an idea that provides a common utility function that break down > > the supplied RelOptInfo of joinrel into a pair of join-type and a list of > > baserel/joinrel being involved in the relations join. It intends to be > > called by FDW driver to list up underlying relations. > > IIUC, root->join_info_list will provide information of how relations are > > combined to the upper joined relations, thus, I expect it is not > > unreasonably complicated way to solve. > > Once a RelOptInfo of the target joinrel is broken down into multiple sub- > > relations (N>=2 if all inner join, elsewhere N=2), FDW driver can > > reference the RestrictInfo to be used in relations join. > > > > Anyway, I'll try to investigate the existing code for more detail today, > > to clarify whether the above approach is feasible. > > Sounds good. Keep in mind that, while the parse tree will obviously > reflect the way that the user actually specified the join > syntactically, it's not the job of the join_info_list to make it > simple to reconstruct that information. To the contrary, > join_info_list is supposed to be structured in a way that makes it > easy to determine whether *a particular join order is one of the legal > join orders* not *whether it is the specific join order selected by > the user*. See join_is_legal(). > > For FDW pushdown, I think it's sufficient to be able to identify *any > one* legal join order, not necessarily the same order the user > originally entered. For exampe, if the user entered A LEFT JOIN B ON > A.x = B.x LEFT JOIN C ON A.y = C.y and the FDW generates a query that > instead does A LEFT JOIN C ON a.y = C.y LEFT JOIN B ON A.x = B.x, I > suspect that's just fine. Particular FDWs might wish to try to be > smart about what they emit based on knowledge of what the remote > side's optimizer is likely to do, and that's fine. If the remote side > is PostgreSQL, it shouldn't matter much. > Sorry for my response late. It was not easy to code during business trip.
The attached patch adds a hook for FDW/CSP to replace entire join-subtree
by a foreign/custom-scan, according to the discussion upthread.
GetForeignJoinPaths handler of FDW is simplified as follows:
typedef void (*GetForeignJoinPaths_function) (PlannerInfo *root,
RelOptInfo *joinrel);
It takes PlannerInfo and RelOptInfo of the join-relation to be replaced
if available. RelOptInfo contains 'relids' bitmap, so FDW driver will be
able to know the relations to be involved and construct a remote join query.
However, it is not obvious with RelOptInfo to know how relations are joined.
The function below will help implement FDW driver that support remote join.
List *
get_joinrel_broken_down(PlannerInfo *root, RelOptInfo *joinrel,
SpecialJoinInfo **p_sjinfo)
It returns a list of RelOptInfo to be involved in the relations join that
is represented with 'joinrel', and also set a SpecialJoinInfo on the third
argument if not inner join.
In case of inner join, it returns multiple (more than or equal to 2)
relations to be inner-joined. Elsewhere, it returns two relations and
a valid SpecialJoinInfo.
The #if 0 ... #endif block is just for confirmation purpose to show
how hook is invoked and the joinrel is broken down with above service
routine.
postgres=# select * from t0 left join t1 on t1.aid=bid
left join t2 on t1.aid=cid
left join t3 on t1.aid=did
left join t4 on t1.aid=eid;
INFO: LEFT JOIN: t0, t1
INFO: LEFT JOIN: (t0, t1), t2
INFO: LEFT JOIN: (t0, t1), t3
INFO: LEFT JOIN: (t0, t1), t4
INFO: LEFT JOIN: (t0, t1, t3), t2
INFO: LEFT JOIN: (t0, t1, t4), t2
INFO: LEFT JOIN: (t0, t1, t4), t3
INFO: LEFT JOIN: (t0, t1, t3, t4), t2
In this case, joinrel is broken down into (t0, t1, t3, t4) and t2.
The earlier one is also joinrel, so it expects FDW driver will make
the get_joinrel_broken_down() call recurdively.
postgres=# explain select * from t0 natural join t1
natural join t2
natural join t3
natural join t4;
INFO: INNER JOIN: t0, t1
INFO: INNER JOIN: t0, t2
INFO: INNER JOIN: t0, t3
INFO: INNER JOIN: t0, t4
INFO: INNER JOIN: t0, t1, t2
INFO: INNER JOIN: t0, t1, t3
INFO: INNER JOIN: t0, t1, t4
INFO: INNER JOIN: t0, t2, t3
INFO: INNER JOIN: t0, t2, t4
INFO: INNER JOIN: t0, t3, t4
INFO: INNER JOIN: t0, t1, t2, t3
INFO: INNER JOIN: t0, t1, t2, t4
INFO: INNER JOIN: t0, t1, t3, t4
INFO: INNER JOIN: t0, t2, t3, t4
INFO: INNER JOIN: t0, t1, t2, t3, t4
In this case, joinrel is consist of inner join, so get_joinrel_broken_down()
returns a list that contains RelOptInfo of 6 base relations.
postgres=# explain select * from t0 natural join t1
left join t2 on t1.aid=t2.bid
natural join t3
natural join t4;
INFO: INNER JOIN: t0, t1
INFO: INNER JOIN: t0, t3
INFO: INNER JOIN: t0, t4
INFO: LEFT JOIN: t1, t2
INFO: INNER JOIN: (t1, t2), t0
INFO: INNER JOIN: t0, t1, t3
INFO: INNER JOIN: t0, t1, t4
INFO: INNER JOIN: t0, t3, t4
INFO: INNER JOIN: (t1, t2), t0, t3
INFO: INNER JOIN: (t1, t2), t0, t4
INFO: INNER JOIN: t0, t1, t3, t4
INFO: INNER JOIN: (t1, t2), t0, t3, t4
In mixture case, it keeps restriction of join legality (t1 and t2 must
be left joined) during its broken down.
At this moment, I'm not 100% certain about its logic. Especially, I didn't
test SEMI- and ANTI- join cases yet.
However, time is money - I want people to check overall design first, rather
than detailed debugging. Please tell me if I misunderstood the logic to break
down join relations.
Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <[email protected]>
pgsql-v9.5-custom-join.v10.patch
Description: pgsql-v9.5-custom-join.v10.patch
-- Sent via pgsql-hackers mailing list ([email protected]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
