Re: [HACKERS] [idea] more aggressive join pushdown on postgres_fdw
On Fri, Jun 5, 2015 at 5:51 AM, Shigeru HANADA wrote: > 2015/06/05 6:43、Robert Haas のメール: >> On Sat, May 30, 2015 at 9:03 PM, Kouhei Kaigai wrote: >> Neat idea. This ties into something I've thought about and mentioned >> before: what if the innerrel is local, but there's a replicated copy >> on the remote server? Perhaps both cases are worth thinking about at >> some point. > > Interesting, but I’m not sure that I understood the situation. > > Here which kind of replication method do you mean? I guess you assume some > kind of per-table replication such as Slony-I or materialized views with > postgres_fdw or dblink, in postgres_fdw case. If this assumption is correct, > we need a mapping between a local ordinary table and a foreign table which > points remote replicated table. Right. I was thinking of BDR, in particular, or some future future in-core feature which might be similar, but Slony could do the same thing. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [idea] more aggressive join pushdown on postgres_fdw
2015/06/05 6:43、Robert Haas のメール: > On Sat, May 30, 2015 at 9:03 PM, Kouhei Kaigai wrote: > Neat idea. This ties into something I've thought about and mentioned > before: what if the innerrel is local, but there's a replicated copy > on the remote server? Perhaps both cases are worth thinking about at > some point. Interesting, but I’m not sure that I understood the situation. Here which kind of replication method do you mean? I guess you assume some kind of per-table replication such as Slony-I or materialized views with postgres_fdw or dblink, in postgres_fdw case. If this assumption is correct, we need a mapping between a local ordinary table and a foreign table which points remote replicated table. -- Shigeru HANADA shigeru.han...@gmail.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [idea] more aggressive join pushdown on postgres_fdw
> On Thu, Jun 4, 2015 at 9:40 PM, Kouhei Kaigai wrote: > >> Neat idea. This ties into something I've thought about and mentioned > >> before: what if the innerrel is local, but there's a replicated copy > >> on the remote server? Perhaps both cases are worth thinking about at > >> some point. > >> > > I think, here is both merit and de-merit for each. It implies either of > > them never always-better-strategy. > > > > * Push out local table as VALUES(...) clause > > Good: No restriction to functions/operators in the local scan or > > underlying plan node. > > Bad: High cost for data format modification (HeapTupleSlot => > > VALUES(...) clause in text), and 2-way data transfer. > > > > * Remote join between foreign table and replicated table > > Good: Data already exists on remote side, no need to kick out > > contents of local relation (and no need to consume CPU > > cycle to make VALUES() clause). > > Bad: Functions/operators are restricted as existing postgres_fdw > > is doing. Only immutable and built-in ones are available to > > run on the remote side. > > Sure. > > > BTW, do we need either of tables being foreign table, if entire database > > is (synchronously) replicated? > > Also, loopback server may be a candidate even if not replicated (although > > it may be an entrance of deadlock heaven). > > I suppose it's possible that this sort of thing could work out to a > win, but I think it's much less likely to work out than pushing down a > foreign/local join using either the VALUES trick or a replicated copy. > Hmm, it might be too aggressive approach. If we would try to implement, postgres_fdw will need to add so many junk paths (expensive than usual local ones) to consider remote join between replicated local tables. Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [idea] more aggressive join pushdown on postgres_fdw
On Thu, Jun 4, 2015 at 9:40 PM, Kouhei Kaigai wrote: >> Neat idea. This ties into something I've thought about and mentioned >> before: what if the innerrel is local, but there's a replicated copy >> on the remote server? Perhaps both cases are worth thinking about at >> some point. >> > I think, here is both merit and de-merit for each. It implies either of > them never always-better-strategy. > > * Push out local table as VALUES(...) clause > Good: No restriction to functions/operators in the local scan or > underlying plan node. > Bad: High cost for data format modification (HeapTupleSlot => > VALUES(...) clause in text), and 2-way data transfer. > > * Remote join between foreign table and replicated table > Good: Data already exists on remote side, no need to kick out > contents of local relation (and no need to consume CPU > cycle to make VALUES() clause). > Bad: Functions/operators are restricted as existing postgres_fdw > is doing. Only immutable and built-in ones are available to > run on the remote side. Sure. > BTW, do we need either of tables being foreign table, if entire database > is (synchronously) replicated? > Also, loopback server may be a candidate even if not replicated (although > it may be an entrance of deadlock heaven). I suppose it's possible that this sort of thing could work out to a win, but I think it's much less likely to work out than pushing down a foreign/local join using either the VALUES trick or a replicated copy. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [idea] more aggressive join pushdown on postgres_fdw
> On Sat, May 30, 2015 at 9:03 PM, Kouhei Kaigai wrote: > > Yesterday, JPUG held an unconference event at Tokyo, and > > Hanada-san had a talk about join-pushdown feature of > > postgres_fdw. > > At this talk, someone proposed an interesting idea to > > make join pushdown more aggressive/effective. > > Let me share it with pgsql-hackers. > > > > He said, we may have a workload to join a large foreign- > > scan and a small local-scan regardless of the plan type. > > > > For example: > > joinrel (expected nrows = 5) > > + outerrel ForeignScan (expected nrows = 100) > > + innerrel LocalScan (expected nrows = 5) > > > > In this case, we may be able to run the entire joinrel > > on the remote side then fetch just 5 rows, if fdw-driver > > construct VALUES() clause according to the contents of > > LocalScan then makes an entire join query with another > > one kept in ForeignScan. > > > > If above ForeignScan have the following remote query, > > SELECT a, b, c FROM t0 WHERE d < 100 > > we may be able to construct the query below to run remote > > join with local (small) relation. > > > > SELECT a, b, c, x, y FROM > > (SELECT a, b, c FROM t0 WHERE d < 100) AS ft > > JOIN > > (VALUES (1,'aaa'), (2,'bbb'), (3,'ccc'), > > (4,'ddd'), (5,'eee')) AS lt (x, y) > > ON ft.a = lt.x > > > > The VALUES clauses can be mechanically constructed according > > to the result set of LocalScan, and it is not difficult to > > make such a remote query on top of the existing ForeignScan. > > In the result, it will reduce amount of network traffic and > > CPU cycles to form/deform tuples dramatically. > > > > I don't intend to implement this idea urgently (of course, > > join pushdown for both ForeignScan case has higher priority), > > however, it makes sense to keep the future direction in mind. > > > > Also, as an aside, even though Hanada-san mentioned ForeignScan > > does not need an infrastructure to initialize child path nodes, > > this idea may require ForeignScan to have local child path. > > Neat idea. This ties into something I've thought about and mentioned > before: what if the innerrel is local, but there's a replicated copy > on the remote server? Perhaps both cases are worth thinking about at > some point. > I think, here is both merit and de-merit for each. It implies either of them never always-better-strategy. * Push out local table as VALUES(...) clause Good: No restriction to functions/operators in the local scan or underlying plan node. Bad: High cost for data format modification (HeapTupleSlot => VALUES(...) clause in text), and 2-way data transfer. * Remote join between foreign table and replicated table Good: Data already exists on remote side, no need to kick out contents of local relation (and no need to consume CPU cycle to make VALUES() clause). Bad: Functions/operators are restricted as existing postgres_fdw is doing. Only immutable and built-in ones are available to run on the remote side. BTW, do we need either of tables being foreign table, if entire database is (synchronously) replicated? Also, loopback server may be a candidate even if not replicated (although it may be an entrance of deadlock heaven). Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [idea] more aggressive join pushdown on postgres_fdw
On Sat, May 30, 2015 at 9:03 PM, Kouhei Kaigai wrote: > Yesterday, JPUG held an unconference event at Tokyo, and > Hanada-san had a talk about join-pushdown feature of > postgres_fdw. > At this talk, someone proposed an interesting idea to > make join pushdown more aggressive/effective. > Let me share it with pgsql-hackers. > > He said, we may have a workload to join a large foreign- > scan and a small local-scan regardless of the plan type. > > For example: > joinrel (expected nrows = 5) > + outerrel ForeignScan (expected nrows = 100) > + innerrel LocalScan (expected nrows = 5) > > In this case, we may be able to run the entire joinrel > on the remote side then fetch just 5 rows, if fdw-driver > construct VALUES() clause according to the contents of > LocalScan then makes an entire join query with another > one kept in ForeignScan. > > If above ForeignScan have the following remote query, > SELECT a, b, c FROM t0 WHERE d < 100 > we may be able to construct the query below to run remote > join with local (small) relation. > > SELECT a, b, c, x, y FROM > (SELECT a, b, c FROM t0 WHERE d < 100) AS ft > JOIN > (VALUES (1,'aaa'), (2,'bbb'), (3,'ccc'), > (4,'ddd'), (5,'eee')) AS lt (x, y) > ON ft.a = lt.x > > The VALUES clauses can be mechanically constructed according > to the result set of LocalScan, and it is not difficult to > make such a remote query on top of the existing ForeignScan. > In the result, it will reduce amount of network traffic and > CPU cycles to form/deform tuples dramatically. > > I don't intend to implement this idea urgently (of course, > join pushdown for both ForeignScan case has higher priority), > however, it makes sense to keep the future direction in mind. > > Also, as an aside, even though Hanada-san mentioned ForeignScan > does not need an infrastructure to initialize child path nodes, > this idea may require ForeignScan to have local child path. Neat idea. This ties into something I've thought about and mentioned before: what if the innerrel is local, but there's a replicated copy on the remote server? Perhaps both cases are worth thinking about at some point. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] [idea] more aggressive join pushdown on postgres_fdw
Hi, Yesterday, JPUG held an unconference event at Tokyo, and Hanada-san had a talk about join-pushdown feature of postgres_fdw. At this talk, someone proposed an interesting idea to make join pushdown more aggressive/effective. Let me share it with pgsql-hackers. He said, we may have a workload to join a large foreign- scan and a small local-scan regardless of the plan type. For example: joinrel (expected nrows = 5) + outerrel ForeignScan (expected nrows = 100) + innerrel LocalScan (expected nrows = 5) In this case, we may be able to run the entire joinrel on the remote side then fetch just 5 rows, if fdw-driver construct VALUES() clause according to the contents of LocalScan then makes an entire join query with another one kept in ForeignScan. If above ForeignScan have the following remote query, SELECT a, b, c FROM t0 WHERE d < 100 we may be able to construct the query below to run remote join with local (small) relation. SELECT a, b, c, x, y FROM (SELECT a, b, c FROM t0 WHERE d < 100) AS ft JOIN (VALUES (1,'aaa'), (2,'bbb'), (3,'ccc'), (4,'ddd'), (5,'eee')) AS lt (x, y) ON ft.a = lt.x The VALUES clauses can be mechanically constructed according to the result set of LocalScan, and it is not difficult to make such a remote query on top of the existing ForeignScan. In the result, it will reduce amount of network traffic and CPU cycles to form/deform tuples dramatically. I don't intend to implement this idea urgently (of course, join pushdown for both ForeignScan case has higher priority), however, it makes sense to keep the future direction in mind. Also, as an aside, even though Hanada-san mentioned ForeignScan does not need an infrastructure to initialize child path nodes, this idea may require ForeignScan to have local child path. Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers