Re: [HACKERS] postgres_fdw bug in 9.6

2017-10-05 Thread Tom Lane
Robert Haas  writes:
> On Tue, Aug 29, 2017 at 5:14 PM, Tom Lane  wrote:
>> We'd definitely need to do things that way in 9.6.  I'm not quite sure
>> whether it's too late to adopt the clean solution in v10.

> It probably is now.  Are you still planning to do something about this patch?

It's still on my list, but I didn't get to it during the CF.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw bug in 9.6

2017-10-05 Thread Robert Haas
On Tue, Aug 29, 2017 at 5:14 PM, Tom Lane  wrote:
> Etsuro Fujita  writes:
>> [ epqpath-for-foreignjoin-11.patch ]
>
> I started looking at this.  I've not yet wrapped my head around what
> CreateLocalJoinPath() is doing, but I noted that Robert's concerns
> about ABI breakage in the back branches would not be very difficult
> to satisfy.  What we'd need to do is
>
> (1) In the back-branch patch, add the new fields of JoinPathExtraData
> at the end, rather than in their more natural locations.  This should
> avoid any ABI break for uses of that struct, since there's no reason
> why an FDW would be creating its own variables of that type rather
> than just using what the core code passes it.
>
> (2) In the back branches, just leave GetExistingLocalJoinPath in place
> rather than deleting it.  Existing FDWs would still be subject to the
> bug until they were updated, but given how hard it is to trigger, that
> doesn't seem like a huge problem.
>
> A variant of (2) is to install a hack fix in GetExistingLocalJoinPath
> rather than leaving it unchanged.  I'm not very excited about that though;
> it doesn't seem unlikely that we might introduce new bugs that way, and
> it would certainly be a distraction from getting the real fix finished.
>
> We'd definitely need to do things that way in 9.6.  I'm not quite sure
> whether it's too late to adopt the clean solution in v10.

It probably is now.  Are you still planning to do something about this patch?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw bug in 9.6

2017-09-19 Thread Ashutosh Bapat
On Tue, Sep 5, 2017 at 1:10 PM, Etsuro Fujita
 wrote:
>
>
> I think Tom is reviewing this patch [1].
>

I am marking this as ready for committer as I don't have any new
comments and possibly other reviewers have also done reviewing it.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw bug in 9.6

2017-09-05 Thread Ryan Murphy
> I tested that with HEAD, but couldn't reproduce this problem.  Didn't you
test that with HEAD?

Yeah, I know it's not an issue other people are having - I think it's
something specific about how my environment / postgres build is set up.  In
any case I knew that it wasn't caused by your patch.

Thanks,
Ryan


Re: [HACKERS] postgres_fdw bug in 9.6

2017-09-05 Thread Etsuro Fujita

On 2017/09/05 13:43, Ryan Murphy wrote:

The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   not tested
Spec compliant:   not tested
Documentation:not tested

This feature is obviously a pretty deep cut, and I don't understand the JOIN 
system enough to comment on whether the code is correct.  I did not see any 
issues when glancing through the patch.


Thank you for the review!


I ran "make check", "make installcheck" and "make installcheck-world" and had ALMOST no 
problems:  I marked it Tested and Passed because the only issue I had was a recurring issue I've had with "make 
installcheck-world" which I think is unrelated:

*** path/to/postgres/src/interfaces/ecpg/test/expected/connect-test5.stderr 
2017-02-14 09:22:25.0 -0600
--- path/to/postgres/src/interfaces/ecpg/test/results/connect-test5.stderr  
2017-09-04 23:31:50.0 -0500
***
*** 36,42 
   [NO_PID]: sqlca: code: 0, state: 0
   [NO_PID]: ECPGconnect: opening database  on  port 
  for user regress_ecpg_user2
   [NO_PID]: sqlca: code: 0, state: 0
! [NO_PID]: ECPGconnect: could not open database: FATAL:  database 
"regress_ecpg_user2" does not exist
...


I tested that with HEAD, but couldn't reproduce this problem.  Didn't 
you test that with HEAD?



But this still Needs Review by someone who knows the JOIN system better.


I think Tom is reviewing this patch [1].

Best regards,
Etsuro Fujita

[1] https://www.postgresql.org/message-id/22168.1504041271%40sss.pgh.pa.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw bug in 9.6

2017-09-04 Thread Ryan Murphy
The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   not tested
Spec compliant:   not tested
Documentation:not tested

This feature is obviously a pretty deep cut, and I don't understand the JOIN 
system enough to comment on whether the code is correct.  I did not see any 
issues when glancing through the patch.

I ran "make check", "make installcheck" and "make installcheck-world" and had 
ALMOST no problems:  I marked it Tested and Passed because the only issue I had 
was a recurring issue I've had with "make installcheck-world" which I think is 
unrelated:

*** path/to/postgres/src/interfaces/ecpg/test/expected/connect-test5.stderr 
2017-02-14 09:22:25.0 -0600
--- path/to/postgres/src/interfaces/ecpg/test/results/connect-test5.stderr  
2017-09-04 23:31:50.0 -0500
***
*** 36,42 
  [NO_PID]: sqlca: code: 0, state: 0
  [NO_PID]: ECPGconnect: opening database  on  port  
 for user regress_ecpg_user2
  [NO_PID]: sqlca: code: 0, state: 0
! [NO_PID]: ECPGconnect: could not open database: FATAL:  database 
"regress_ecpg_user2" does not exist
...

But this still Needs Review by someone who knows the JOIN system better.

Best,
Ryan
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw bug in 9.6

2017-08-29 Thread Tom Lane
Etsuro Fujita  writes:
> [ epqpath-for-foreignjoin-11.patch ]

I started looking at this.  I've not yet wrapped my head around what
CreateLocalJoinPath() is doing, but I noted that Robert's concerns
about ABI breakage in the back branches would not be very difficult
to satisfy.  What we'd need to do is

(1) In the back-branch patch, add the new fields of JoinPathExtraData
at the end, rather than in their more natural locations.  This should
avoid any ABI break for uses of that struct, since there's no reason
why an FDW would be creating its own variables of that type rather
than just using what the core code passes it.

(2) In the back branches, just leave GetExistingLocalJoinPath in place
rather than deleting it.  Existing FDWs would still be subject to the
bug until they were updated, but given how hard it is to trigger, that
doesn't seem like a huge problem.

A variant of (2) is to install a hack fix in GetExistingLocalJoinPath
rather than leaving it unchanged.  I'm not very excited about that though;
it doesn't seem unlikely that we might introduce new bugs that way, and
it would certainly be a distraction from getting the real fix finished.

We'd definitely need to do things that way in 9.6.  I'm not quite sure
whether it's too late to adopt the clean solution in v10.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw bug in 9.6

2017-08-29 Thread Etsuro Fujita

On 2017/08/21 20:37, Etsuro Fujita wrote:

Attached is an updated version of the patch.


I corrected some comments.  New version attached.

Best regards,
Etsuro Fujita
*** a/contrib/postgres_fdw/expected/postgres_fdw.out
--- b/contrib/postgres_fdw/expected/postgres_fdw.out
***
*** 1701,1722  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = 
t2.c1) ORDER BY t1.c3, t
 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS 
NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) 
END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, 
r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN 
"S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1" ORDER BY r1.c3 ASC NULLS LAST, 
r1."C 1" ASC NULLS LAST FOR UPDATE OF r1
!->  Merge Join
   Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!  Merge Cond: (t1.c1 = t2.c1)
!  ->  Sort
 Output: t1.c1, t1.c3, t1.*
!Sort Key: t1.c1
!->  Foreign Scan on public.ft1 t1
!  Output: t1.c1, t1.c3, t1.*
!  Remote SQL: SELECT "C 1", c2, c3, c4, c5, 
c6, c7, c8 FROM "S 1"."T 1" FOR UPDATE
!  ->  Sort
 Output: t2.c1, t2.*
!Sort Key: t2.c1
!->  Foreign Scan on public.ft2 t2
!  Output: t2.c1, t2.*
!  Remote SQL: SELECT "C 1", c2, c3, c4, c5, 
c6, c7, c8 FROM "S 1"."T 1"
! (23 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY 
t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR UPDATE OF t1;
   c1  | c1  
--- 1701,1716 
 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS 
NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) 
END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, 
r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN 
"S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1" ORDER BY r1.c3 ASC NULLS LAST, 
r1."C 1" ASC NULLS LAST FOR UPDATE OF r1
!->  Nested Loop
   Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!  Join Filter: (t1.c1 = t2.c1)
!  ->  Foreign Scan on public.ft1 t1
 Output: t1.c1, t1.c3, t1.*
!Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, 
c8 FROM "S 1"."T 1" FOR UPDATE
!  ->  Foreign Scan on public.ft2 t2
 Output: t2.c1, t2.*
!Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, 
c8 FROM "S 1"."T 1"
! (17 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY 
t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR UPDATE OF t1;
   c1  | c1  
***
*** 1745,1766  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = 
t2.c1) ORDER BY t1.c3, t
 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS 
NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) 
END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, 
r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN 
"S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1" ORDER BY r1.c3 ASC NULLS LAST, 
r1."C 1" ASC NULLS LAST FOR UPDATE OF r1 FOR UPDATE OF r2
!->  Merge Join
   Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!  Merge Cond: (t1.c1 = t2.c1)
!  ->  Sort
 Output: t1.c1, t1.c3, t1.*
!Sort Key: t1.c1
!->  Foreign Scan on public.ft1 t1
!  Output: t1.c1, t1.c3, t1.*
!  Remote SQL: SELECT "C 1", c2, c3, c4, c5, 
c6, c7, c8 FROM "S 1"."T 1" FOR UPDATE
!  ->  Sort
 Output: t2.c1, t2.*
!Sort Key: t2.c1
!->  Foreign Scan on public.ft2 t2
!  Output: t2.c1, t2.*
!  Remote SQL: SELECT "C 1", c2, c3, c4, c5, 
c6, c7, c8 FROM "S 1"."T 1" FOR UPDATE
! (23 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY 
t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR 

Re: [HACKERS] postgres_fdw bug in 9.6

2017-08-21 Thread Etsuro Fujita

On 2017/04/08 4:24, Robert Haas wrote:

Looking at the code itself, I find the changes to joinpath.c rather alarming.


I missed this mail.  Sorry about that, Robert.


+/* Save hashclauses for possible use by the FDW */
+if (extra->consider_foreignjoin && hashclauses)
+extra->hashclauses = hashclauses;

A minor consideration is that this is fairly far away from where
hashclauses actually gets populated, so if someone later added an
early "return" statement to this function -- after creating some paths
-- it could subtly break join pushdown.  But I also think there's no
real need for this.  The loop that extracts hash clauses is simple
enough that we could just refactor it into a separate function, or if
necessary duplicate the logic.


I refactored that into a new function so that we can call that function 
at the top of add_paths_to_joinrel and store the result in 
JoinPathExtraData.



+/* Save first mergejoin data for possible use by the FDW */
+if (extra->consider_foreignjoin && outerkeys == all_pathkeys)
+{
+extra->mergeclauses = cur_mergeclauses;
+extra->outersortkeys = outerkeys;
+extra->innersortkeys = innerkeys;
+}

Similarly here.  select_outer_pathkeys_for_merge(),
find_mergeclauses_for_pathkeys(), and make_inner_pathkeys_for_merge()
are all extern, so there's nothing to keep CreateLocalJoinPath() from
just doing that work itself instead of getting help from joinpath,
which I guess seems better to me.  I think it's just better if we
don't burden joinpath.c with keeping little bits of data around that
CreateLocalJoinPath() can easily get for itself.


Done that way.


There appears to be no regression test covering the case where we get
a Merge Full Join with a non-empty list of mergeclauses.  Hash Full
Join is tested, as is Merge Full Join without merge clauses, but
there's no test for Merge Full Join with mergeclauses, and since that
is a separate code path it seems like it should be tested.


Done.


-/*
- * If either inner or outer path is a ForeignPath corresponding to a
- * pushed down join, replace it with the fdw_outerpath, so that we
- * maintain path for EPQ checks built entirely of local join
- * strategies.
- */
-if (IsA(joinpath->outerjoinpath, ForeignPath))
-{
-ForeignPath *foreign_path;
-
-foreign_path = (ForeignPath *) joinpath->outerjoinpath;
-if (IS_JOIN_REL(foreign_path->path.parent))
-joinpath->outerjoinpath = foreign_path->fdw_outerpath;
-}
-
-if (IsA(joinpath->innerjoinpath, ForeignPath))
-{
-ForeignPath *foreign_path;
-
-foreign_path = (ForeignPath *) joinpath->innerjoinpath;
-if (IS_JOIN_REL(foreign_path->path.parent))
-joinpath->innerjoinpath = foreign_path->fdw_outerpath;
-}

This logic is removed and not replaced with anything, but I don't see
what keeps this code...

+Path   *outer_path = outerrel->cheapest_total_path;
+Path   *inner_path = innerrel->cheapest_total_path;

...from picking a ForeignPath?


CreateLocalJoinPath creates an alternative local join path for a foreign 
join from the cheapest total paths for the outer/inner relations.  The 
reason for the above is to pass these paths to that function.  On second 
thought, however, I think it would be convenient for the caller to just 
pass outerrel/innerrel to that function.  So, I modified that function's 
API as such.  Another change is: the previous version of that function 
allowed the caller to create a parameterized local-join path 
corresponding to a parameterized foreign join, but that is a feature, 
not a bug fix, so I dropped that.  (I'll propose that as part of the 
patch in [1].)



There's probably more to think about here, but those are my question
on an initial read-through.


Thanks for the review!

Attached is an updated version of the patch.

Best regards,
Etsuro Fujita

[1] https://commitfest.postgresql.org/14/1042/
*** a/contrib/postgres_fdw/expected/postgres_fdw.out
--- b/contrib/postgres_fdw/expected/postgres_fdw.out
***
*** 1701,1722  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = 
t2.c1) ORDER BY t1.c3, t
 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS 
NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) 
END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, 
r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN 
"S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1" ORDER BY r1.c3 ASC NULLS LAST, 
r1."C 1" ASC NULLS LAST FOR UPDATE OF r1
!->  Merge Join
   Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!   

Re: [HACKERS] postgres_fdw bug in 9.6

2017-08-17 Thread Etsuro Fujita

On 2017/08/18 8:16, Michael Paquier wrote:

On Fri, Aug 18, 2017 at 3:59 AM, Tom Lane  wrote:

Robert Haas  writes:

On Thu, Aug 17, 2017 at 8:00 AM, Etsuro Fujita
 wrote:

I rebased the patch to HEAD.  PFA a new version of the patch.



Tom, you were instrumental in identifying what was going wrong here
initially.  Any chance you'd be willing to have a look at the patch?


I will, but probably not for a week or so.  Going eclipse-chasing.


Good luck:
https://xkcd.com/1876/
And enjoy:
https://xkcd.com/1877/


Enjoy!

Best regards,
Etsuro Fujita



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw bug in 9.6

2017-08-17 Thread Michael Paquier
On Fri, Aug 18, 2017 at 3:59 AM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Thu, Aug 17, 2017 at 8:00 AM, Etsuro Fujita
>>  wrote:
>>> I rebased the patch to HEAD.  PFA a new version of the patch.
>
>> Tom, you were instrumental in identifying what was going wrong here
>> initially.  Any chance you'd be willing to have a look at the patch?
>
> I will, but probably not for a week or so.  Going eclipse-chasing.

Good luck:
https://xkcd.com/1876/
And enjoy:
https://xkcd.com/1877/
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw bug in 9.6

2017-08-17 Thread Tom Lane
Robert Haas  writes:
> On Thu, Aug 17, 2017 at 8:00 AM, Etsuro Fujita
>  wrote:
>> I rebased the patch to HEAD.  PFA a new version of the patch.

> Tom, you were instrumental in identifying what was going wrong here
> initially.  Any chance you'd be willing to have a look at the patch?

I will, but probably not for a week or so.  Going eclipse-chasing.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw bug in 9.6

2017-08-17 Thread Robert Haas
On Thu, Aug 17, 2017 at 8:00 AM, Etsuro Fujita
 wrote:
> On 2017/04/04 18:01, Etsuro Fujita wrote:
>> I rebased the patch also.  Please find attached an updated version of the
>> patch.
>
> I rebased the patch to HEAD.  PFA a new version of the patch.

Tom, you were instrumental in identifying what was going wrong here
initially.  Any chance you'd be willing to have a look at the patch?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw bug in 9.6

2017-08-17 Thread Etsuro Fujita

On 2017/04/04 18:01, Etsuro Fujita wrote:
I rebased the patch also.  Please find attached an updated version of 
the patch.


I rebased the patch to HEAD.  PFA a new version of the patch.

Best regards,
Etsuro Fujita
*** a/contrib/postgres_fdw/expected/postgres_fdw.out
--- b/contrib/postgres_fdw/expected/postgres_fdw.out
***
*** 1701,1722  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = 
t2.c1) ORDER BY t1.c3, t
 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS 
NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) 
END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, 
r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN 
"S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1" ORDER BY r1.c3 ASC NULLS LAST, 
r1."C 1" ASC NULLS LAST FOR UPDATE OF r1
!->  Merge Join
   Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!  Merge Cond: (t1.c1 = t2.c1)
!  ->  Sort
 Output: t1.c1, t1.c3, t1.*
!Sort Key: t1.c1
!->  Foreign Scan on public.ft1 t1
!  Output: t1.c1, t1.c3, t1.*
!  Remote SQL: SELECT "C 1", c2, c3, c4, c5, 
c6, c7, c8 FROM "S 1"."T 1" FOR UPDATE
!  ->  Sort
 Output: t2.c1, t2.*
!Sort Key: t2.c1
!->  Foreign Scan on public.ft2 t2
!  Output: t2.c1, t2.*
!  Remote SQL: SELECT "C 1", c2, c3, c4, c5, 
c6, c7, c8 FROM "S 1"."T 1"
! (23 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY 
t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR UPDATE OF t1;
   c1  | c1  
--- 1701,1716 
 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS 
NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) 
END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, 
r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN 
"S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1" ORDER BY r1.c3 ASC NULLS LAST, 
r1."C 1" ASC NULLS LAST FOR UPDATE OF r1
!->  Nested Loop
   Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!  Join Filter: (t1.c1 = t2.c1)
!  ->  Foreign Scan on public.ft1 t1
 Output: t1.c1, t1.c3, t1.*
!Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, 
c8 FROM "S 1"."T 1" FOR UPDATE
!  ->  Foreign Scan on public.ft2 t2
 Output: t2.c1, t2.*
!Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, 
c8 FROM "S 1"."T 1"
! (17 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY 
t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR UPDATE OF t1;
   c1  | c1  
***
*** 1745,1766  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = 
t2.c1) ORDER BY t1.c3, t
 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS 
NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) 
END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, 
r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN 
"S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1" ORDER BY r1.c3 ASC NULLS LAST, 
r1."C 1" ASC NULLS LAST FOR UPDATE OF r1 FOR UPDATE OF r2
!->  Merge Join
   Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!  Merge Cond: (t1.c1 = t2.c1)
!  ->  Sort
 Output: t1.c1, t1.c3, t1.*
!Sort Key: t1.c1
!->  Foreign Scan on public.ft1 t1
!  Output: t1.c1, t1.c3, t1.*
!  Remote SQL: SELECT "C 1", c2, c3, c4, c5, 
c6, c7, c8 FROM "S 1"."T 1" FOR UPDATE
!  ->  Sort
 Output: t2.c1, t2.*
!Sort Key: t2.c1
!->  Foreign Scan on public.ft2 t2
!  Output: t2.c1, t2.*
!  Remote SQL: SELECT "C 1", c2, c3, c4, c5, 
c6, c7, c8 FROM "S 1"."T 1" FOR UPDATE
! (23 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) 

Re: [HACKERS] postgres_fdw bug in 9.6

2017-04-11 Thread Etsuro Fujita

On 2017/04/08 3:33, Robert Haas wrote:

On Mon, Apr 3, 2017 at 2:12 AM, Etsuro Fujita
 wrote:

On 2017/04/01 1:32, Jeff Janes wrote:

On Thu, Mar 23, 2017 at 5:20 AM, Etsuro Fujita
> wrote:
Done.  Attached is a new version of the patch.
Is the fix for 9.6.3 going to be just a back port of this, or will it
look different?


+1 for backporting; although that requires that GetForeignJoinPaths be
updated so that the FDW uses a new function to create an alternative local
join path (ie, CreateLocalJoinPath), that would make maintenance of the code
easy.


Well, the problem here is that this breaks ABI compatibility.  If we
applied this to 9.6, and somebody tried to use a previously-compiled
FDW .so against a new server version, it would fail after the upgrade,
because the new server wouldn't have GetExistingLocalJoinPath and also
possibly because of the change to the structure of JoinPathExtraData.
Maybe there's no better alternative, and maybe nothing outside of
postgres_fdw is using this stuff anyway, but it seems like a concern.


Yeah, but I was thinking that it'd be a good idea to add a note about 
that to the release notes, to avoid such troubles.



Also, the CommitFest entry for this seems to be a bit sketchy.  It
claims that Tom Lane is a co-author of this patch which, AFAICS, is
not the case.  It is listed under Miscellaneous rather than "Bug
Fixes", which seems like a surprising decision.  And it uses a subject
line which is neither very clear nor the same as the (also not
particularly helpful) subject line of the email thread.


Thanks for correcting that!

Best regards,
Etsuro Fujita




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw bug in 9.6

2017-04-09 Thread Ashutosh Bapat
On Sat, Apr 8, 2017 at 12:03 AM, Robert Haas  wrote:
> On Mon, Apr 3, 2017 at 2:12 AM, Etsuro Fujita
>  wrote:
>> On 2017/04/01 1:32, Jeff Janes wrote:
>>> On Thu, Mar 23, 2017 at 5:20 AM, Etsuro Fujita
>>> > wrote:
>>> Done.  Attached is a new version of the patch.
>>> Is the fix for 9.6.3 going to be just a back port of this, or will it
>>> look different?
>>
>> +1 for backporting; although that requires that GetForeignJoinPaths be
>> updated so that the FDW uses a new function to create an alternative local
>> join path (ie, CreateLocalJoinPath), that would make maintenance of the code
>> easy.
>
> Well, the problem here is that this breaks ABI compatibility.  If we
> applied this to 9.6, and somebody tried to use a previously-compiled
> FDW .so against a new server version, it would fail after the upgrade,
> because the new server wouldn't have GetExistingLocalJoinPath and also
> possibly because of the change to the structure of JoinPathExtraData.
> Maybe there's no better alternative, and maybe nothing outside of
> postgres_fdw is using this stuff anyway, but it seems like a concern.

I had submitted a patch in [1]. We thought that that patch is good to
fix the issue on the backbranches. But it got berried in the thread.
If you think that's a feasible solution for backbranches, I will work
on the comments.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw bug in 9.6

2017-04-08 Thread David Steele
On 4/7/17 3:24 PM, Robert Haas wrote:
> 
> There's probably more to think about here, but those are my question
> on an initial read-through.

This bug has been moved to CF 2017-07.

-- 
-David
da...@pgmasters.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw bug in 9.6

2017-04-07 Thread Robert Haas
On Fri, Apr 7, 2017 at 2:33 PM, Robert Haas  wrote:
> On Mon, Apr 3, 2017 at 2:12 AM, Etsuro Fujita
>  wrote:
>> On 2017/04/01 1:32, Jeff Janes wrote:
>>> On Thu, Mar 23, 2017 at 5:20 AM, Etsuro Fujita
>>> > wrote:
>>> Done.  Attached is a new version of the patch.
>>> Is the fix for 9.6.3 going to be just a back port of this, or will it
>>> look different?
>>
>> +1 for backporting; although that requires that GetForeignJoinPaths be
>> updated so that the FDW uses a new function to create an alternative local
>> join path (ie, CreateLocalJoinPath), that would make maintenance of the code
>> easy.
>
> Well, the problem here is that this breaks ABI compatibility.  If we
> applied this to 9.6, and somebody tried to use a previously-compiled
> FDW .so against a new server version, it would fail after the upgrade,
> because the new server wouldn't have GetExistingLocalJoinPath and also
> possibly because of the change to the structure of JoinPathExtraData.
> Maybe there's no better alternative, and maybe nothing outside of
> postgres_fdw is using this stuff anyway, but it seems like a concern.
>
> Also, the CommitFest entry for this seems to be a bit sketchy.  It
> claims that Tom Lane is a co-author of this patch which, AFAICS, is
> not the case.  It is listed under Miscellaneous rather than "Bug
> Fixes", which seems like a surprising decision.  And it uses a subject
> line which is neither very clear nor the same as the (also not
> particularly helpful) subject line of the email thread.

Looking at the code itself, I find the changes to joinpath.c rather alarming.

+/* Save hashclauses for possible use by the FDW */
+if (extra->consider_foreignjoin && hashclauses)
+extra->hashclauses = hashclauses;

A minor consideration is that this is fairly far away from where
hashclauses actually gets populated, so if someone later added an
early "return" statement to this function -- after creating some paths
-- it could subtly break join pushdown.  But I also think there's no
real need for this.  The loop that extracts hash clauses is simple
enough that we could just refactor it into a separate function, or if
necessary duplicate the logic.

+/* Save first mergejoin data for possible use by the FDW */
+if (extra->consider_foreignjoin && outerkeys == all_pathkeys)
+{
+extra->mergeclauses = cur_mergeclauses;
+extra->outersortkeys = outerkeys;
+extra->innersortkeys = innerkeys;
+}

Similarly here.  select_outer_pathkeys_for_merge(),
find_mergeclauses_for_pathkeys(), and make_inner_pathkeys_for_merge()
are all extern, so there's nothing to keep CreateLocalJoinPath() from
just doing that work itself instead of getting help from joinpath,
which I guess seems better to me.  I think it's just better if we
don't burden joinpath.c with keeping little bits of data around that
CreateLocalJoinPath() can easily get for itself.

There appears to be no regression test covering the case where we get
a Merge Full Join with a non-empty list of mergeclauses.  Hash Full
Join is tested, as is Merge Full Join without merge clauses, but
there's no test for Merge Full Join with mergeclauses, and since that
is a separate code path it seems like it should be tested.

-/*
- * If either inner or outer path is a ForeignPath corresponding to a
- * pushed down join, replace it with the fdw_outerpath, so that we
- * maintain path for EPQ checks built entirely of local join
- * strategies.
- */
-if (IsA(joinpath->outerjoinpath, ForeignPath))
-{
-ForeignPath *foreign_path;
-
-foreign_path = (ForeignPath *) joinpath->outerjoinpath;
-if (IS_JOIN_REL(foreign_path->path.parent))
-joinpath->outerjoinpath = foreign_path->fdw_outerpath;
-}
-
-if (IsA(joinpath->innerjoinpath, ForeignPath))
-{
-ForeignPath *foreign_path;
-
-foreign_path = (ForeignPath *) joinpath->innerjoinpath;
-if (IS_JOIN_REL(foreign_path->path.parent))
-joinpath->innerjoinpath = foreign_path->fdw_outerpath;
-}

This logic is removed and not replaced with anything, but I don't see
what keeps this code...

+Path   *outer_path = outerrel->cheapest_total_path;
+Path   *inner_path = innerrel->cheapest_total_path;

...from picking a ForeignPath?

There's probably more to think about here, but those are my question
on an initial read-through.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw bug in 9.6

2017-04-07 Thread Robert Haas
On Mon, Apr 3, 2017 at 2:12 AM, Etsuro Fujita
 wrote:
> On 2017/04/01 1:32, Jeff Janes wrote:
>> On Thu, Mar 23, 2017 at 5:20 AM, Etsuro Fujita
>> > wrote:
>> Done.  Attached is a new version of the patch.
>> Is the fix for 9.6.3 going to be just a back port of this, or will it
>> look different?
>
> +1 for backporting; although that requires that GetForeignJoinPaths be
> updated so that the FDW uses a new function to create an alternative local
> join path (ie, CreateLocalJoinPath), that would make maintenance of the code
> easy.

Well, the problem here is that this breaks ABI compatibility.  If we
applied this to 9.6, and somebody tried to use a previously-compiled
FDW .so against a new server version, it would fail after the upgrade,
because the new server wouldn't have GetExistingLocalJoinPath and also
possibly because of the change to the structure of JoinPathExtraData.
Maybe there's no better alternative, and maybe nothing outside of
postgres_fdw is using this stuff anyway, but it seems like a concern.

Also, the CommitFest entry for this seems to be a bit sketchy.  It
claims that Tom Lane is a co-author of this patch which, AFAICS, is
not the case.  It is listed under Miscellaneous rather than "Bug
Fixes", which seems like a surprising decision.  And it uses a subject
line which is neither very clear nor the same as the (also not
particularly helpful) subject line of the email thread.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw bug in 9.6

2017-04-05 Thread Etsuro Fujita

On 2017/04/04 21:28, Ashutosh Bapat wrote:

On Tue, Apr 4, 2017 at 2:31 PM, Etsuro Fujita
> wrote:
I rebased the patch also.  Please find attached an updated version
of the patch.



Thanks, marking this as "ready for committer".


Thanks!

Best regards,
Etsuro Fujita




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw bug in 9.6

2017-04-04 Thread Ashutosh Bapat
On Tue, Apr 4, 2017 at 2:31 PM, Etsuro Fujita 
wrote:

>
> I rebased the patch also.  Please find attached an updated version of the
> patch.
>

Thanks, marking this as "ready for committer".

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


Re: [HACKERS] postgres_fdw bug in 9.6

2017-04-04 Thread Etsuro Fujita

On 2017/04/04 14:38, Ashutosh Bapat wrote:

Probably we should use "could not be created" instead of "was not
created" in "... a local path suitable for EPQ checks was not created".


Done.


"outer_path should not require relations from inner_path" may be
reworded as "outer paths should not be parameterized by the inner
relations".

"neither path should require relations from the other path" may be
reworded as "neither path should be parameterized by the the other
joining relation".


Done.  I used "input" instead of "joining" in the latter, though.


Why JOIN_RIGHT is being treated differently from JOIN_LEFT? We
should be able
to create a nested loop join for JOIN_RIGHT?
+case JOIN_RIGHT:
+case JOIN_FULL:


I don't think so, because nestloop joins aren't supported for
JOIN_RIGHT.  See ExecInitNestLoop().


Hmm, I see in match_unsorted_outer()
1254 case JOIN_RIGHT:
1255 case JOIN_FULL:
1256 nestjoinOK = false;
1257 useallclauses = true;
1258 break;


Yeah, I should have pointed that out as well.

I rebased the patch also.  Please find attached an updated version of 
the patch.


Best regards,
Etsuro Fujita
*** a/contrib/postgres_fdw/expected/postgres_fdw.out
--- b/contrib/postgres_fdw/expected/postgres_fdw.out
***
*** 1629,1650  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t
 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1" ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR UPDATE OF r1
!->  Merge Join
   Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!  Merge Cond: (t1.c1 = t2.c1)
!  ->  Sort
 Output: t1.c1, t1.c3, t1.*
!Sort Key: t1.c1
!->  Foreign Scan on public.ft1 t1
!  Output: t1.c1, t1.c3, t1.*
!  Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR UPDATE
!  ->  Sort
 Output: t2.c1, t2.*
!Sort Key: t2.c1
!->  Foreign Scan on public.ft2 t2
!  Output: t2.c1, t2.*
!  Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1"
! (23 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR UPDATE OF t1;
   c1  | c1  
--- 1629,1644 
 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1" ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR UPDATE OF r1
!->  Nested Loop
   Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!  Join Filter: (t1.c1 = t2.c1)
!  ->  Foreign Scan on public.ft1 t1
 Output: t1.c1, t1.c3, t1.*
!Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR UPDATE
!  ->  Foreign Scan on public.ft2 t2
 Output: t2.c1, t2.*
!Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1"
! (17 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR UPDATE OF t1;
   c1  | c1  
***
*** 1673,1694  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t
 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1" ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS 

Re: [HACKERS] postgres_fdw bug in 9.6

2017-04-03 Thread Ashutosh Bapat
Probably we should use "could not be created" instead of "was not created"
in "... a local path suitable for EPQ checks was not created".

"outer_path should not require relations from inner_path" may be reworded
as "outer paths should not be parameterized by the inner relations".

"neither path should require relations from the other path" may be reworded
as "neither path should be parameterized by the the other joining relation".


> While the comment below mentions ON true, the testcase you have added is
>> for ON
>> false. Either the testcase should change or this comment. That raises
>> another
>> question, what happens when somebody does FULL JOIN ON false?
>> + * If special case: for "x FULL JOIN y ON true",
>> there
>>
>
> FULL JOIN ON FALSE would be handled the same way as FULL JOIN ON TRUE, so
> I think we should rewrite that comment into something like this: If special
> case: for "x FULL JOIN y ON true" or "x FULL JOIN y ON false"...
>

Ok.


>
> Why JOIN_RIGHT is being treated differently from JOIN_LEFT? We should be
>> able
>> to create a nested loop join for JOIN_RIGHT?
>> +case JOIN_RIGHT:
>> +case JOIN_FULL:
>>
>
> I don't think so, because nestloop joins aren't supported for JOIN_RIGHT.
> See ExecInitNestLoop().
>

Hmm, I see in match_unsorted_outer()
1254 case JOIN_RIGHT:
1255 case JOIN_FULL:
1256 nestjoinOK = false;
1257 useallclauses = true;
1258 break;

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


Re: [HACKERS] postgres_fdw bug in 9.6

2017-04-03 Thread Etsuro Fujita

On 2017/04/01 1:32, Jeff Janes wrote:

On Thu, Mar 23, 2017 at 5:20 AM, Etsuro Fujita
> wrote:
Done.  Attached is a new version of the patch.



Is the fix for 9.6.3 going to be just a back port of this, or will it
look different?


+1 for backporting; although that requires that GetForeignJoinPaths be 
updated so that the FDW uses a new function to create an alternative 
local join path (ie, CreateLocalJoinPath), that would make maintenance 
of the code easy.


Best regards,
Etsuro Fujita




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw bug in 9.6

2017-03-31 Thread Jeff Janes
On Thu, Mar 23, 2017 at 5:20 AM, Etsuro Fujita 
wrote:

> On 2017/03/21 18:40, Etsuro Fujita wrote:
>
>> Ok, I'll update the patch.  One thing I'd like to revise in addition to
>> that is (1) add to JoinPathExtraData a flag member to indicate whether
>> to give the FDW a chance to consider a remote join, which will be set to
>> true if the joinrel's fdwroutine is not NULL and the fdwroutine's
>> GetForeignJoinPaths is not NULL, and (2) if the flag is true, save info
>> to create an alternative local join path, such as hashclauses and
>> mergeclauses proposed in the patch, into JoinPathExtraData in
>> add_paths_to_joinrel.  This would avoid useless overhead in saving such
>> info into JoinPathExtraData when we don't give the FDW that chance.
>>
>
> Done.  Attached is a new version of the patch.
>

Is the fix for 9.6.3 going to be just a back port of this, or will it look
different?

Cheers,

Jeff


Re: [HACKERS] postgres_fdw bug in 9.6

2017-03-31 Thread Etsuro Fujita

On 2017/03/30 20:16, Ashutosh Bapat wrote:

The patch applies cleanly, compiles. make check in regress as well as
postgres_fdw works fine. Here are few comments


Thanks for the review!


local-join should be local join.


OK, done.


The comments should explain why.
+/* Should be unparameterized */
+Assert(outer_path->param_info == NULL);
+Assert(inner_path->param_info == NULL);


Done.


+ a suitable local join path, which can be used as the alternative local
May be we should reword this as ... which can be used to create an alternative
local ... This rewording is required even in the existing docs.


Done.


+/* outer_path should not require rels from inner_path */
+Assert(!PATH_PARAM_BY_REL(outer_path, inner_path->parent));
Probably this should throw an error or return NULL in such case rather than
Asserting. This function is callable from any FDW, and that FDW may provide
such paths, may be because of an internal bug. Same case with
+/* Neither path should require rels from the other path */
+Assert(!PATH_PARAM_BY_REL(outer_path, inner_path->parent) ||
+   !PATH_PARAM_BY_REL(inner_path, outer_path->parent));


Good idea!  I think it's better to throw an error because that is a bug 
in the FDW; done that way.



While the comment below mentions ON true, the testcase you have added is for ON
false. Either the testcase should change or this comment. That raises another
question, what happens when somebody does FULL JOIN ON false?
+ * If special case: for "x FULL JOIN y ON true", there


FULL JOIN ON FALSE would be handled the same way as FULL JOIN ON TRUE, 
so I think we should rewrite that comment into something like this: If 
special case: for "x FULL JOIN y ON true" or "x FULL JOIN y ON false"...



Why JOIN_RIGHT is being treated differently from JOIN_LEFT? We should be able
to create a nested loop join for JOIN_RIGHT?
+case JOIN_RIGHT:
+case JOIN_FULL:


I don't think so, because nestloop joins aren't supported for 
JOIN_RIGHT.  See ExecInitNestLoop().


Best regards,
Etsuro Fujita
*** a/contrib/postgres_fdw/expected/postgres_fdw.out
--- b/contrib/postgres_fdw/expected/postgres_fdw.out
***
*** 1629,1650  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t
 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1" ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR UPDATE OF r1
!->  Merge Join
   Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!  Merge Cond: (t1.c1 = t2.c1)
!  ->  Sort
 Output: t1.c1, t1.c3, t1.*
!Sort Key: t1.c1
!->  Foreign Scan on public.ft1 t1
!  Output: t1.c1, t1.c3, t1.*
!  Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR UPDATE
!  ->  Sort
 Output: t2.c1, t2.*
!Sort Key: t2.c1
!->  Foreign Scan on public.ft2 t2
!  Output: t2.c1, t2.*
!  Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1"
! (23 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR UPDATE OF t1;
   c1  | c1  
--- 1629,1644 
 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1" ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR UPDATE OF r1
!->  Nested Loop
   Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!  Join Filter: (t1.c1 = t2.c1)
!  ->  Foreign Scan on public.ft1 t1
 Output: t1.c1, t1.c3, t1.*
!Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR UPDATE
!  ->  Foreign Scan on public.ft2 t2

Re: [HACKERS] postgres_fdw bug in 9.6

2017-03-30 Thread Ashutosh Bapat
The patch applies cleanly, compiles. make check in regress as well as
postgres_fdw works fine. Here are few comments

local-join should be local join.

The comments should explain why.
+/* Should be unparameterized */
+Assert(outer_path->param_info == NULL);
+Assert(inner_path->param_info == NULL);

+ a suitable local join path, which can be used as the alternative local
May be we should reword this as ... which can be used to create an alternative
local ... This rewording is required even in the existing docs.

+/* outer_path should not require rels from inner_path */
+Assert(!PATH_PARAM_BY_REL(outer_path, inner_path->parent));
Probably this should throw an error or return NULL in such case rather than
Asserting. This function is callable from any FDW, and that FDW may provide
such paths, may be because of an internal bug. Same case with
+/* Neither path should require rels from the other path */
+Assert(!PATH_PARAM_BY_REL(outer_path, inner_path->parent) ||
+   !PATH_PARAM_BY_REL(inner_path, outer_path->parent));

While the comment below mentions ON true, the testcase you have added is for ON
false. Either the testcase should change or this comment. That raises another
question, what happens when somebody does FULL JOIN ON false?
+ * If special case: for "x FULL JOIN y ON true", there

Why JOIN_RIGHT is being treated differently from JOIN_LEFT? We should be able
to create a nested loop join for JOIN_RIGHT?
+case JOIN_RIGHT:
+case JOIN_FULL:

On Thu, Mar 23, 2017 at 5:50 PM, Etsuro Fujita
 wrote:
> On 2017/03/21 18:40, Etsuro Fujita wrote:
>>
>> Ok, I'll update the patch.  One thing I'd like to revise in addition to
>> that is (1) add to JoinPathExtraData a flag member to indicate whether
>> to give the FDW a chance to consider a remote join, which will be set to
>> true if the joinrel's fdwroutine is not NULL and the fdwroutine's
>> GetForeignJoinPaths is not NULL, and (2) if the flag is true, save info
>> to create an alternative local join path, such as hashclauses and
>> mergeclauses proposed in the patch, into JoinPathExtraData in
>> add_paths_to_joinrel.  This would avoid useless overhead in saving such
>> info into JoinPathExtraData when we don't give the FDW that chance.
>
>
> Done.  Attached is a new version of the patch.
>
> Best regards,
> Etsuro Fujita



-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw bug in 9.6

2017-03-23 Thread Etsuro Fujita

On 2017/03/21 18:40, Etsuro Fujita wrote:

Ok, I'll update the patch.  One thing I'd like to revise in addition to
that is (1) add to JoinPathExtraData a flag member to indicate whether
to give the FDW a chance to consider a remote join, which will be set to
true if the joinrel's fdwroutine is not NULL and the fdwroutine's
GetForeignJoinPaths is not NULL, and (2) if the flag is true, save info
to create an alternative local join path, such as hashclauses and
mergeclauses proposed in the patch, into JoinPathExtraData in
add_paths_to_joinrel.  This would avoid useless overhead in saving such
info into JoinPathExtraData when we don't give the FDW that chance.


Done.  Attached is a new version of the patch.

Best regards,
Etsuro Fujita


epqpath-for-foreignjoin-6.patch
Description: binary/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw bug in 9.6

2017-03-21 Thread Etsuro Fujita

On 2017/03/17 0:37, David Steele wrote:

This patch does not apply cleanly at cccbdde:



Marked "Waiting on Author".


Ok, I'll update the patch.  One thing I'd like to revise in addition to 
that is (1) add to JoinPathExtraData a flag member to indicate whether 
to give the FDW a chance to consider a remote join, which will be set to 
true if the joinrel's fdwroutine is not NULL and the fdwroutine's 
GetForeignJoinPaths is not NULL, and (2) if the flag is true, save info 
to create an alternative local join path, such as hashclauses and 
mergeclauses proposed in the patch, into JoinPathExtraData in 
add_paths_to_joinrel.  This would avoid useless overhead in saving such 
info into JoinPathExtraData when we don't give the FDW that chance.


Best regards,
Etsuro Fujita




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw bug in 9.6

2017-03-16 Thread David Steele
On 1/23/17 4:56 AM, Etsuro Fujita wrote:
> On 2017/01/20 14:24, Ashutosh Bapat wrote:
>> On Thu, Jan 19, 2017 at 10:38 PM, Robert Haas 
>> wrote:
>>> On Wed, Jan 18, 2017 at 10:26 PM, Ashutosh Bapat
>>>  wrote:
> Yes, I think that's broadly the approach Tom was recommending.
> 
 I don't have problem with that approach, but the latest patch has
 following problems.
> 
 2. There are many cases where the new function would return no local
 path and hence postgres_fdw doesn't push down the join [1]. This will
 be performance regression compared to 9.6.
> 
>>> Some, or many?  How many?
> 
>> AFAIU, the problem esp. with parameterized paths is this: If rel1 is
>> required to be parameterized by rel2 (because of lateral references?),
>> a nested loop join with rel2 as outer relation and rel1 as inner
>> relation is possible. postgres_fdw and hence this function, however
>> doesn't consider all the possible join combinations and thus when this
>> function is presented with rel1 as outer and rel2 as inner would
>> refuse to create a path. More generally, while creating local paths,
>> we try many combinations of participating relations, some of which do
>> not produce local paths and some of them do (AFAIK, it happens in case
>> of lateral references, but there might be other cases). postgres_fdw,
>> however considers only a single combination, which may or may not have
>> produced local path. In such a case, postgres_fdw would create a
>> foreign join path but won't get a local path and thus bail out.
> 
> I had second thoughts; one idea how to build parameterized paths to
> avoid this issue is: as in postgresGetForeignPaths, to (1) identify
> which outer relations could supply additional safe-to-send-to-remote
> join clauses, and (2) build a parameterized path and its alternative
> local-join path for each such outer relation.  In #1, we could look at
> the join relation's ppilist to identify interesting outer relations.  In
> #2, the local-join path corresponding to each such outer relation could
> be built from the cheapest-total paths for the outer and inner
> relations, using CreateLocalJoinPath, so that the result path has the
> outer relation as its required_outer.
> 
>>> I'm a bit sketchy about this kind of thing, though:
>>>
>>> ! if (required_outer)
>>>   {
>>> ! bms_free(required_outer);
>>> ! return NULL;
>>>   }
>>>
>>> I don't understand what would stop that from making this fail to
>>> generate parameterized paths in some cases in which it would be legal
>>> to do so, and the comment is very brief.
> 
>> I am not so much worried about this though :).
>> GetExistingLocalJoinPath() also does not handle that case. The new
>> function is not making it worse in this case.
>> 731 /* Skip parameterised paths. */
>> 732 if (path->param_info != NULL)
>> 733 continue;
> 
> One idea to remove such extra checks is to pass the required_outer to
> CreateLocalJoinPath like the attached.  As described above, the caller
> would have that set before calling that function, so we wouldn't need to
> calculate that set in that function.
> 
> Other changes:
> 
> * Also modified CreateLocalJoinPath so that we pass outer/inner paths,
> not outer/inner rels, because it would be more flexible for the FDW to
> build the local-join path from paths it chose.
> * Fixed a bug that I missed the RIGHT JOIN case in CreateLocalJoinPath.

This patch does not apply cleanly at cccbdde:

$ patch -p1 < ../other/epqpath-for-foreignjoin-5.patch
patching file contrib/postgres_fdw/expected/postgres_fdw.out
Hunk #6 succeeded at 4134 (offset 81 lines).
Hunk #7 succeeded at 4275 (offset 81 lines).
patching file contrib/postgres_fdw/postgres_fdw.c
Hunk #1 succeeded at 4356 (offset 3 lines).
Hunk #2 succeeded at 4386 (offset 3 lines).
patching file contrib/postgres_fdw/sql/postgres_fdw.sql
patching file doc/src/sgml/fdwhandler.sgml
patching file src/backend/foreign/foreign.c
Hunk #2 FAILED at 696.
1 out of 2 hunks FAILED -- saving rejects to file
src/backend/foreign/foreign.c.rej
patching file src/backend/optimizer/path/joinpath.c
Hunk #1 FAILED at 25.
Hunk #2 succeeded at 109 (offset 27 lines).
Hunk #3 succeeded at 134 (offset 27 lines).
Hunk #4 succeeded at 197 (offset 27 lines).
Hunk #5 succeeded at 208 (offset 27 lines).
Hunk #6 succeeded at 225 (offset 27 lines).
Hunk #7 succeeded at 745 (offset 113 lines).
Hunk #8 FAILED at 894.
Hunk #9 succeeded at 1558 (offset 267 lines).
Hunk #10 succeeded at 1609 (offset 268 lines).
2 out of 10 hunks FAILED -- saving rejects to file
src/backend/optimizer/path/joinpath.c.rej
patching file src/include/foreign/fdwapi.h
Hunk #1 succeeded at 237 (offset 2 lines).
patching file src/include/nodes/relation.h
Hunk #1 succeeded at 914 (offset 10 lines).
Hunk #2 succeeded at 2057 (offset 47 lines).

Marked "Waiting on 

Re: [HACKERS] postgres_fdw bug in 9.6

2017-01-31 Thread Michael Paquier
On Mon, Jan 23, 2017 at 6:56 PM, Etsuro Fujita
 wrote:
> Other changes:
>
> * Also modified CreateLocalJoinPath so that we pass outer/inner paths, not
> outer/inner rels, because it would be more flexible for the FDW to build the
> local-join path from paths it chose.
> * Fixed a bug that I missed the RIGHT JOIN case in CreateLocalJoinPath.

The patch is moved to CF 2017-03.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw bug in 9.6

2017-01-23 Thread Etsuro Fujita

On 2017/01/20 14:24, Ashutosh Bapat wrote:

On Thu, Jan 19, 2017 at 10:38 PM, Robert Haas  wrote:

On Wed, Jan 18, 2017 at 10:26 PM, Ashutosh Bapat
 wrote:

Yes, I think that's broadly the approach Tom was recommending.



I don't have problem with that approach, but the latest patch has
following problems.



2. There are many cases where the new function would return no local
path and hence postgres_fdw doesn't push down the join [1]. This will
be performance regression compared to 9.6.



Some, or many?  How many?



AFAIU, the problem esp. with parameterized paths is this: If rel1 is
required to be parameterized by rel2 (because of lateral references?),
a nested loop join with rel2 as outer relation and rel1 as inner
relation is possible. postgres_fdw and hence this function, however
doesn't consider all the possible join combinations and thus when this
function is presented with rel1 as outer and rel2 as inner would
refuse to create a path. More generally, while creating local paths,
we try many combinations of participating relations, some of which do
not produce local paths and some of them do (AFAIK, it happens in case
of lateral references, but there might be other cases). postgres_fdw,
however considers only a single combination, which may or may not have
produced local path. In such a case, postgres_fdw would create a
foreign join path but won't get a local path and thus bail out.


I had second thoughts; one idea how to build parameterized paths to 
avoid this issue is: as in postgresGetForeignPaths, to (1) identify 
which outer relations could supply additional safe-to-send-to-remote 
join clauses, and (2) build a parameterized path and its alternative 
local-join path for each such outer relation.  In #1, we could look at 
the join relation's ppilist to identify interesting outer relations.  In 
#2, the local-join path corresponding to each such outer relation could 
be built from the cheapest-total paths for the outer and inner 
relations, using CreateLocalJoinPath, so that the result path has the 
outer relation as its required_outer.



I'm a bit sketchy about this kind of thing, though:

! if (required_outer)
  {
! bms_free(required_outer);
! return NULL;
  }

I don't understand what would stop that from making this fail to
generate parameterized paths in some cases in which it would be legal
to do so, and the comment is very brief.



I am not so much worried about this though :).
GetExistingLocalJoinPath() also does not handle that case. The new
function is not making it worse in this case.
731 /* Skip parameterised paths. */
732 if (path->param_info != NULL)
733 continue;


One idea to remove such extra checks is to pass the required_outer to 
CreateLocalJoinPath like the attached.  As described above, the caller 
would have that set before calling that function, so we wouldn't need to 
calculate that set in that function.


Other changes:

* Also modified CreateLocalJoinPath so that we pass outer/inner paths, 
not outer/inner rels, because it would be more flexible for the FDW to 
build the local-join path from paths it chose.

* Fixed a bug that I missed the RIGHT JOIN case in CreateLocalJoinPath.

Best regards,
Etsuro Fujita
*** a/contrib/postgres_fdw/expected/postgres_fdw.out
--- b/contrib/postgres_fdw/expected/postgres_fdw.out
***
*** 1519,1540  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t
 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1" ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR UPDATE OF r1
!->  Merge Join
   Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!  Merge Cond: (t1.c1 = t2.c1)
!  ->  Sort
 Output: t1.c1, t1.c3, t1.*
!Sort Key: t1.c1
!->  Foreign Scan on public.ft1 t1
!  Output: t1.c1, t1.c3, t1.*
!  Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR UPDATE
!  ->  Sort
 Output: t2.c1, t2.*
!Sort Key: t2.c1
!->  Foreign Scan on public.ft2 t2
!  Output: t2.c1, t2.*
!  Remote SQL: SELECT "C 1", c2, c3, c4, c5, 

Re: [HACKERS] postgres_fdw bug in 9.6

2017-01-19 Thread Ashutosh Bapat
On Thu, Jan 19, 2017 at 10:38 PM, Robert Haas  wrote:
> On Wed, Jan 18, 2017 at 10:26 PM, Ashutosh Bapat
>  wrote:
>>> Yes, I think that's broadly the approach Tom was recommending.
>>
>> I don't have problem with that approach, but the latest patch has
>> following problems.
>> 1. We are copying chunks of path creation logic, instead of reusing
>> corresponding functions.
>
> Exactly which functions do you think we ought to be reusing that the
> patch currently doesn't reuse?
>
>> 2. There are many cases where the new function would return no local
>> path and hence postgres_fdw doesn't push down the join [1]. This will
>> be performance regression compared to 9.6.
>
> Some, or many?  How many?  Which ones?  At least some of the problems
> you were complaining about look like basic validity checks that were
> copied from joinpath.c, so they're probably necessary for correctness.
> It's worth remembering that we're trying to fix a bug here; if that
> makes some cases perform less well, that's sad, but not as sad as
> throwing a bogus error, which is what's happening right now.

AFAIU, the problem esp. with parameterized paths is this: If rel1 is
required to be parameterized by rel2 (because of lateral references?),
a nested loop join with rel2 as outer relation and rel1 as inner
relation is possible. postgres_fdw and hence this function, however
doesn't consider all the possible join combinations and thus when this
function is presented with rel1 as outer and rel2 as inner would
refuse to create a path. More generally, while creating local paths,
we try many combinations of participating relations, some of which do
not produce local paths and some of them do (AFAIK, it happens in case
of lateral references, but there might be other cases). postgres_fdw,
however considers only a single combination, which may or may not have
produced local path. In such a case, postgres_fdw would create a
foreign join path but won't get a local path and thus bail out.
Obviously, we don't want CreateLocalJoinPath() to try out all the
combinations. That is the simplicity of copying an existing path; all
combinations have been tried already. If we really want a nested loop
join, we may try pulling inner and outer paths from an existing path
and build a nested loop join (that's just a suggestion).

Take example of
/*
 * If the cheapest-total outer path is parameterized by the
 * inner rel, we can't generate a nestloop path.  (There's no
 * use looking for alternative outer paths, since it should
 * already be the least-parameterized available path.)
 */
if (PATH_PARAM_BY_REL(outer_path, innerrel))
return NULL;

We may be able to create a nest loop path if we switch inner and outer
relations.

I have provided a patch in [1] to fix the bogus error. But if we want
to replace one approach (when there's a way to fix bug in that
approach) with another (to fix that bug and improve), the other
approach should be equally efficient.

>
> I'm a bit sketchy about this kind of thing, though:
>
> ! if (required_outer)
>   {
> ! bms_free(required_outer);
> ! return NULL;
>   }
>
> I don't understand what would stop that from making this fail to
> generate parameterized paths in some cases in which it would be legal
> to do so, and the comment is very brief.

I am not so much worried about this though :).
GetExistingLocalJoinPath() also does not handle that case. The new
function is not making it worse in this case.
731 /* Skip parameterised paths. */
732 if (path->param_info != NULL)
733 continue;

[1] 
https://www.postgresql.org/message-id/cafjfprd9ozekur9aormh2toxq4nyuusw2kb9s%2bo%3duvz4xcr%3...@mail.gmail.com

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw bug in 9.6

2017-01-19 Thread Etsuro Fujita

On 2017/01/19 12:26, Ashutosh Bapat wrote:

On Thu, Jan 19, 2017 at 2:14 AM, Robert Haas  wrote:

On Fri, Jan 13, 2017 at 6:22 AM, Etsuro Fujita
 wrote:

My biggest concern about GetExistingLocalJoinPath is that might not be
extendable to the case of foreign-join paths with parameterization; in which
case, fdw_outerpath for a given foreign-join path would need to have the
same parameterization as the foreign-join path, but there might not be any
existing paths with the same parameterization in the path list.



I agree that this is a problem.



Effectively, it means that foreign join path creation will have a
parameterization different (per add_path()) from any local join
produced. But why would it be so?


I think it's better to give the FDW a chance to do that because the FDW 
might have more knowledge about the parameterization for joinrels than core.



The parameterization bubbles up from
the base relation. The process for creating parameterized local and
foreign paths for a base relation is same. Thus we will have same
parameterizations considered for foreign as well as local joins. Those
different parameterizations will be retained add_path(), so we should
find one there


Is that right?  I think there would be cases where we can't find one 
because add_path removes paths dominated by others.


Best regards,
Etsuro Fujita




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw bug in 9.6

2017-01-19 Thread Etsuro Fujita

On 2017/01/18 20:34, Ashutosh Bapat wrote:

On Wed, Jan 18, 2017 at 8:18 AM, Etsuro Fujita
 wrote:

Done.  Attached is the new version.  I also adjusted the code a bit further.



With this patch there are multiple cases, where CreateLocalJoinPath()
would return NULL and hence postgres_fdw would not push a join down
for example
/*
 * (1) if either cheapest-total path is parameterized by the
 * other rel, we can't generate a hashjoin/mergejoin path, and
 * (2) proposed hashjoin/mergejoin path is still parameterized
 * (ie, the required_outer set calculated by
 * calc_non_nestloop_required_outer isn't NULL), it's not what
 * we want; which means that both the cheapest-total paths
 * should be unparameterized.
 */
if (outer_path->param_info || inner_path->param_info)
return NULL;
or
/*
 * If the cheapest-total outer path is parameterized by the
 * inner rel, we can't generate a nestloop path.  (There's no
 * use looking for alternative outer paths, since it should
 * already be the least-parameterized available path.)
 */
if (PATH_PARAM_BY_REL(outer_path, innerrel))
return NULL;
/* If proposed path is still parameterized, don't return it. */
required_outer = calc_nestloop_required_outer(outer_path,
  inner_path);
if (required_outer)
{
bms_free(required_outer);
return NULL;
}

Am I right?

The earlier version of this function GetExistingLocalJoinPath() used
to return a local path in those cases and hence postgres_fdw was able
to push down corresponding queries. So, we are introducing a
performance regression.


Really?  My understanding is: postgres_fdw will never have those cases 
because it can always get the cheapest-total paths that are unparameterized.


Best regards,
Etsuro Fujita




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw bug in 9.6

2017-01-19 Thread Robert Haas
On Wed, Jan 18, 2017 at 10:26 PM, Ashutosh Bapat
 wrote:
>> Yes, I think that's broadly the approach Tom was recommending.
>
> I don't have problem with that approach, but the latest patch has
> following problems.
> 1. We are copying chunks of path creation logic, instead of reusing
> corresponding functions.

Exactly which functions do you think we ought to be reusing that the
patch currently doesn't reuse?

> 2. There are many cases where the new function would return no local
> path and hence postgres_fdw doesn't push down the join [1]. This will
> be performance regression compared to 9.6.

Some, or many?  How many?  Which ones?  At least some of the problems
you were complaining about look like basic validity checks that were
copied from joinpath.c, so they're probably necessary for correctness.
It's worth remembering that we're trying to fix a bug here; if that
makes some cases perform less well, that's sad, but not as sad as
throwing a bogus error, which is what's happening right now.

I'm a bit sketchy about this kind of thing, though:

! if (required_outer)
  {
! bms_free(required_outer);
! return NULL;
  }

I don't understand what would stop that from making this fail to
generate parameterized paths in some cases in which it would be legal
to do so, and the comment is very brief.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw bug in 9.6

2017-01-18 Thread Ashutosh Bapat
On Thu, Jan 19, 2017 at 2:14 AM, Robert Haas  wrote:
> On Fri, Jan 13, 2017 at 6:22 AM, Etsuro Fujita
>  wrote:
>> My biggest concern about GetExistingLocalJoinPath is that might not be
>> extendable to the case of foreign-join paths with parameterization; in which
>> case, fdw_outerpath for a given foreign-join path would need to have the
>> same parameterization as the foreign-join path, but there might not be any
>> existing paths with the same parameterization in the path list.
>
> I agree that this is a problem.

Effectively, it means that foreign join path creation will have a
parameterization different (per add_path()) from any local join
produced. But why would it be so? The parameterization bubbles up from
the base relation. The process for creating parameterized local and
foreign paths for a base relation is same. Thus we will have same
parameterizations considered for foreign as well as local joins. Those
different parameterizations will be retained add_path(), so we should
find one there or should be able to expand an existing
parameterization by reparameterization.

Even if such a case exists, the same problem exists while searching
paths from the joining relations. If the join doesn't have a local
path with required parameterization, so would be the joining
relations, parameterized paths from which are used to build
parameterized paths for join.

>
>> You might
>> think we could get the fdw_outerpath by getting an existing path with no
>> parameterization as in GetExistingLocalJoinPath and then modifying the
>> path's param_info to match the parameterization of the foreign-join path.  I
>> don't know that really works, but that might be inefficient.
>
> I am not sure about this.
>
>> What I have in mind to support foreign-join paths with parameterization for
>> postgres_fdw like this: (1) generate parameterized paths from any joinable
>> combination of the outer/inner cheapest-parameterized paths that have pushed
>> down the outer/inner relation to the remote server, in a similar way as
>> postgresGetForeignJoinPaths creates unparameterized paths, and (2) create
>> fdw_outerpath for each parameterized path from the outer/inner paths used to
>> generate the parameterized path, by create_nestloop_path (or,
>> create_hashjoin_path or create_mergejoin_path if full join), so that the
>> resulting fdw_outerpath has the same parameterization as the paramterized
>> path.  This would probably work and might be more efficient.  And the patch
>> I proposed would be easily extended to this, by replacing the outer/inner
>> cheapest-total paths with the outer/inner cheapest-parameterized paths.
>> Attached is the latest version of the patch.
>
> Yes, I think that's broadly the approach Tom was recommending.

I don't have problem with that approach, but the latest patch has
following problems.
1. We are copying chunks of path creation logic, instead of reusing
corresponding functions.
2. There are many cases where the new function would return no local
path and hence postgres_fdw doesn't push down the join [1]. This will
be performance regression compared to 9.6.

[1] https://www.mail-archive.com/pgsql-hackers@postgresql.org/msg302463.html
-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw bug in 9.6

2017-01-18 Thread Robert Haas
On Fri, Jan 13, 2017 at 6:22 AM, Etsuro Fujita
 wrote:
> My biggest concern about GetExistingLocalJoinPath is that might not be
> extendable to the case of foreign-join paths with parameterization; in which
> case, fdw_outerpath for a given foreign-join path would need to have the
> same parameterization as the foreign-join path, but there might not be any
> existing paths with the same parameterization in the path list.

I agree that this is a problem.

> You might
> think we could get the fdw_outerpath by getting an existing path with no
> parameterization as in GetExistingLocalJoinPath and then modifying the
> path's param_info to match the parameterization of the foreign-join path.  I
> don't know that really works, but that might be inefficient.

I am not sure about this.

> What I have in mind to support foreign-join paths with parameterization for
> postgres_fdw like this: (1) generate parameterized paths from any joinable
> combination of the outer/inner cheapest-parameterized paths that have pushed
> down the outer/inner relation to the remote server, in a similar way as
> postgresGetForeignJoinPaths creates unparameterized paths, and (2) create
> fdw_outerpath for each parameterized path from the outer/inner paths used to
> generate the parameterized path, by create_nestloop_path (or,
> create_hashjoin_path or create_mergejoin_path if full join), so that the
> resulting fdw_outerpath has the same parameterization as the paramterized
> path.  This would probably work and might be more efficient.  And the patch
> I proposed would be easily extended to this, by replacing the outer/inner
> cheapest-total paths with the outer/inner cheapest-parameterized paths.
> Attached is the latest version of the patch.

Yes, I think that's broadly the approach Tom was recommending.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw bug in 9.6

2017-01-18 Thread Ashutosh Bapat
On Wed, Jan 18, 2017 at 8:18 AM, Etsuro Fujita
 wrote:
> On 2017/01/16 11:38, Etsuro Fujita wrote:
>>
>> On 2017/01/14 6:39, Jeff Janes wrote:
>>>
>>> I do get a compiler warning:
>>>
>>> foreign.c: In function 'CreateLocalJoinPath':
>>> foreign.c:832: warning: implicit declaration of function
>>> 'pathkeys_contained_in'
>
>
>> Will fix.
>
>
> Done.  Attached is the new version.  I also adjusted the code a bit further.

With this patch there are multiple cases, where CreateLocalJoinPath()
would return NULL and hence postgres_fdw would not push a join down
for example
/*
 * (1) if either cheapest-total path is parameterized by the
 * other rel, we can't generate a hashjoin/mergejoin path, and
 * (2) proposed hashjoin/mergejoin path is still parameterized
 * (ie, the required_outer set calculated by
 * calc_non_nestloop_required_outer isn't NULL), it's not what
 * we want; which means that both the cheapest-total paths
 * should be unparameterized.
 */
if (outer_path->param_info || inner_path->param_info)
return NULL;
or
/*
 * If the cheapest-total outer path is parameterized by the
 * inner rel, we can't generate a nestloop path.  (There's no
 * use looking for alternative outer paths, since it should
 * already be the least-parameterized available path.)
 */
if (PATH_PARAM_BY_REL(outer_path, innerrel))
return NULL;
/* If proposed path is still parameterized, don't return it. */
required_outer = calc_nestloop_required_outer(outer_path,
  inner_path);
if (required_outer)
{
bms_free(required_outer);
return NULL;
}

Am I right?

The earlier version of this function GetExistingLocalJoinPath() used
to return a local path in those cases and hence postgres_fdw was able
to push down corresponding queries. So, we are introducing a
performance regression. We need to plug those holes.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw bug in 9.6

2017-01-17 Thread Etsuro Fujita

On 2017/01/16 11:38, Etsuro Fujita wrote:

On 2017/01/14 6:39, Jeff Janes wrote:

I do get a compiler warning:

foreign.c: In function 'CreateLocalJoinPath':
foreign.c:832: warning: implicit declaration of function
'pathkeys_contained_in'



Will fix.


Done.  Attached is the new version.  I also adjusted the code a bit further.

Best regards,
Etsuro Fujita
*** a/contrib/postgres_fdw/expected/postgres_fdw.out
--- b/contrib/postgres_fdw/expected/postgres_fdw.out
***
*** 1519,1540  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t
 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1" ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR UPDATE OF r1
!->  Merge Join
   Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!  Merge Cond: (t1.c1 = t2.c1)
!  ->  Sort
 Output: t1.c1, t1.c3, t1.*
!Sort Key: t1.c1
!->  Foreign Scan on public.ft1 t1
!  Output: t1.c1, t1.c3, t1.*
!  Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR UPDATE
!  ->  Sort
 Output: t2.c1, t2.*
!Sort Key: t2.c1
!->  Foreign Scan on public.ft2 t2
!  Output: t2.c1, t2.*
!  Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1"
! (23 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR UPDATE OF t1;
   c1  | c1  
--- 1519,1534 
 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1" ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR UPDATE OF r1
!->  Nested Loop
   Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!  Join Filter: (t1.c1 = t2.c1)
!  ->  Foreign Scan on public.ft1 t1
 Output: t1.c1, t1.c3, t1.*
!Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR UPDATE
!  ->  Foreign Scan on public.ft2 t2
 Output: t2.c1, t2.*
!Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1"
! (17 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR UPDATE OF t1;
   c1  | c1  
***
*** 1563,1584  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t
 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1" ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR UPDATE OF r1 FOR UPDATE OF r2
!->  Merge Join
   Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!  Merge Cond: (t1.c1 = t2.c1)
!  ->  Sort
 Output: t1.c1, t1.c3, t1.*
!Sort Key: t1.c1
!->  Foreign Scan on public.ft1 t1
!  Output: t1.c1, t1.c3, t1.*
!  Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR UPDATE
!  ->  Sort
 Output: t2.c1, t2.*
!Sort Key: t2.c1
!->  Foreign Scan on public.ft2 t2
!  Output: t2.c1, t2.*
!  Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, 

Re: [HACKERS] postgres_fdw bug in 9.6

2017-01-15 Thread Etsuro Fujita

On 2017/01/14 6:39, Jeff Janes wrote:

I've tested this patch against
9.6.stable-2d443ae1b0121e15265864d2b21 and 10.dev-0563a3a8b59150bf3cc8,
and in both cases it fixes the original problem.


Thanks for testing!


I do get a compiler warning:

foreign.c: In function 'CreateLocalJoinPath':
foreign.c:832: warning: implicit declaration of function
'pathkeys_contained_in'


Will fix.

Best regards,
Etsuro Fujita




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw bug in 9.6

2017-01-13 Thread Jeff Janes
On Fri, Jan 13, 2017 at 3:22 AM, Etsuro Fujita 
wrote:

> On 2017/01/13 0:43, Robert Haas wrote:
>
>> On Wed, Jan 11, 2017 at 2:45 AM, Etsuro Fujita
>>  wrote:
>>
>>> As I said before, that might be fine for 9.6, but I don't think it's a
>>> good
>>> idea to search the pathlist because once we support parameterized foreign
>>> join paths, which is on my TODOs, we would have to traverse through the
>>> possibly-lengthy pathlist to find a local-join path, as mentioned in [3].
>>>
>>
> I'm not sure that's really going to be a problem.  The number of
>> possible parameterizations that need to be considered isn't usually
>> very big.  I bet the path list will have ten or a few tens of entries
>> in it, not a hundred or a thousand.  Traversing it isn't that
>> expensive.
>>
>> That having been said, I haven't read the patches, so I'm not really
>> up to speed on the bigger issues here.  But surely it's more important
>> to get the overall design right than to worry about the cost of
>> walking the pathlist or worrying about the cost of an extra function
>> call (?!).
>>
>
> My biggest concern about GetExistingLocalJoinPath is that might not be
> extendable to the case of foreign-join paths with parameterization; in
> which case, fdw_outerpath for a given foreign-join path would need to have
> the same parameterization as the foreign-join path, but there might not be
> any existing paths with the same parameterization in the path list.  You
> might think we could get the fdw_outerpath by getting an existing path with
> no parameterization as in GetExistingLocalJoinPath and then modifying the
> path's param_info to match the parameterization of the foreign-join path.
> I don't know that really works, but that might be inefficient.
>
> What I have in mind to support foreign-join paths with parameterization
> for postgres_fdw like this: (1) generate parameterized paths from any
> joinable combination of the outer/inner cheapest-parameterized paths that
> have pushed down the outer/inner relation to the remote server, in a
> similar way as postgresGetForeignJoinPaths creates unparameterized paths,
> and (2) create fdw_outerpath for each parameterized path from the
> outer/inner paths used to generate the parameterized path, by
> create_nestloop_path (or, create_hashjoin_path or create_mergejoin_path if
> full join), so that the resulting fdw_outerpath has the same
> parameterization as the paramterized path.  This would probably work and
> might be more efficient.  And the patch I proposed would be easily extended
> to this, by replacing the outer/inner cheapest-total paths with the
> outer/inner cheapest-parameterized paths.  Attached is the latest version
> of the patch.
>


I'm afraid this discussion and the C code here are over my head.  But I've
tested this patch against 9.6.stable-2d443ae1b0121e15265864d2b21 and
10.dev-0563a3a8b59150bf3cc8, and in both cases it fixes the original
problem.

I do get a compiler warning:

foreign.c: In function 'CreateLocalJoinPath':
foreign.c:832: warning: implicit declaration of function
'pathkeys_contained_in'


Cheers,

Jeff


Re: [HACKERS] postgres_fdw bug in 9.6

2017-01-13 Thread Etsuro Fujita

On 2017/01/13 0:43, Robert Haas wrote:

On Wed, Jan 11, 2017 at 2:45 AM, Etsuro Fujita
 wrote:

As I said before, that might be fine for 9.6, but I don't think it's a good
idea to search the pathlist because once we support parameterized foreign
join paths, which is on my TODOs, we would have to traverse through the
possibly-lengthy pathlist to find a local-join path, as mentioned in [3].



I'm not sure that's really going to be a problem.  The number of
possible parameterizations that need to be considered isn't usually
very big.  I bet the path list will have ten or a few tens of entries
in it, not a hundred or a thousand.  Traversing it isn't that
expensive.

That having been said, I haven't read the patches, so I'm not really
up to speed on the bigger issues here.  But surely it's more important
to get the overall design right than to worry about the cost of
walking the pathlist or worrying about the cost of an extra function
call (?!).


My biggest concern about GetExistingLocalJoinPath is that might not be 
extendable to the case of foreign-join paths with parameterization; in 
which case, fdw_outerpath for a given foreign-join path would need to 
have the same parameterization as the foreign-join path, but there might 
not be any existing paths with the same parameterization in the path 
list.  You might think we could get the fdw_outerpath by getting an 
existing path with no parameterization as in GetExistingLocalJoinPath 
and then modifying the path's param_info to match the parameterization 
of the foreign-join path.  I don't know that really works, but that 
might be inefficient.


What I have in mind to support foreign-join paths with parameterization 
for postgres_fdw like this: (1) generate parameterized paths from any 
joinable combination of the outer/inner cheapest-parameterized paths 
that have pushed down the outer/inner relation to the remote server, in 
a similar way as postgresGetForeignJoinPaths creates unparameterized 
paths, and (2) create fdw_outerpath for each parameterized path from the 
outer/inner paths used to generate the parameterized path, by 
create_nestloop_path (or, create_hashjoin_path or create_mergejoin_path 
if full join), so that the resulting fdw_outerpath has the same 
parameterization as the paramterized path.  This would probably work and 
might be more efficient.  And the patch I proposed would be easily 
extended to this, by replacing the outer/inner cheapest-total paths with 
the outer/inner cheapest-parameterized paths.  Attached is the latest 
version of the patch.


Best regards,
Etsuro Fujita
*** a/contrib/postgres_fdw/expected/postgres_fdw.out
--- b/contrib/postgres_fdw/expected/postgres_fdw.out
***
*** 1519,1540  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t
 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1" ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR UPDATE OF r1
!->  Merge Join
   Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!  Merge Cond: (t1.c1 = t2.c1)
!  ->  Sort
 Output: t1.c1, t1.c3, t1.*
!Sort Key: t1.c1
!->  Foreign Scan on public.ft1 t1
!  Output: t1.c1, t1.c3, t1.*
!  Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR UPDATE
!  ->  Sort
 Output: t2.c1, t2.*
!Sort Key: t2.c1
!->  Foreign Scan on public.ft2 t2
!  Output: t2.c1, t2.*
!  Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1"
! (23 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR UPDATE OF t1;
   c1  | c1  
--- 1519,1534 
 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1" ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR UPDATE OF r1
! 

Re: [HACKERS] postgres_fdw bug in 9.6

2017-01-12 Thread Robert Haas
On Wed, Jan 11, 2017 at 2:45 AM, Etsuro Fujita
 wrote:
> As I said before, that might be fine for 9.6, but I don't think it's a good
> idea to search the pathlist because once we support parameterized foreign
> join paths, which is on my TODOs, we would have to traverse through the
> possibly-lengthy pathlist to find a local-join path, as mentioned in [3].

I'm not sure that's really going to be a problem.  The number of
possible parameterizations that need to be considered isn't usually
very big.  I bet the path list will have ten or a few tens of entries
in it, not a hundred or a thousand.  Traversing it isn't that
expensive.

That having been said, I haven't read the patches, so I'm not really
up to speed on the bigger issues here.  But surely it's more important
to get the overall design right than to worry about the cost of
walking the pathlist or worrying about the cost of an extra function
call (?!).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw bug in 9.6

2017-01-11 Thread Etsuro Fujita

On 2017/01/11 19:26, Ashutosh Bapat wrote:

On Wed, Jan 11, 2017 at 3:30 PM, Etsuro Fujita
 wrote:

On 2017/01/11 17:51, Ashutosh Bapat wrote:

On Wed, Jan 11, 2017 at 1:15 PM, Etsuro Fujita
 wrote:

On 2017/01/11 13:40, Ashutosh Bapat wrote:

Or it can
have foreign paths in there, which will need redirection. That's not
very good.



Maybe I'm missing something, but redirection isn't a problem.



Peformance wise it is, correctness-wise it is not. Why do we want to
incur a hop, when we can avoid it.



ISTM that's solving a problem that hasn't been proven to be a problem.



A hop will consume a function call worth CPU at least.


Is that a problem in practice?


2. Fix existing code by applying patch from [1]



As I said before, that might be fine for 9.6, but I don't think it's a
good
idea to search the pathlist because once we support parameterized foreign
join paths, which is on my TODOs, we would have to traverse through the
possibly-lengthy pathlist to find a local-join path, as mentioned in [3].



I don't agree that pathlists will be long enough to make this a
non-attractive solution. For parameterized foreign join paths, with
the approach that this patch takes, we will require to search in two
such pathlists, inner and outer.



Sorry, I don't understand this part.



A parameterized join is built if the joining paths are parameterized
as well. Thus building a parameterized local path  would require one
to search suitably parameterized paths in joining relations in their
pathlists.


Yeah, I'm thinking to (1) create parameterized foreign-join paths from 
the cheapest_parameterized_paths lists of the outer and inner rels, and 
(2) create a local-join paths for any parameterized foreign-join path 
from the component paths chosen from the lists in a similar way as 
CreateLocalJoinPath creates an unparameterized local-join path from the 
cheapest-total-cost paths of the outer and inner rels.  That's the real 
reason why I'm proposing to replace GetExistingLocalJoinPath with 
CreateLocalJoinPath.


Best regards,
Etsuro Fujita




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw bug in 9.6

2017-01-11 Thread Ashutosh Bapat
On Wed, Jan 11, 2017 at 3:30 PM, Etsuro Fujita
 wrote:
> On 2017/01/11 17:51, Ashutosh Bapat wrote:
>>
>> On Wed, Jan 11, 2017 at 1:15 PM, Etsuro Fujita
>>  wrote:
>>>
>>> On 2017/01/11 13:40, Ashutosh Bapat wrote:

 CreateLocalJoinPath tries to construct a nestloop path for the given
 join relation because it wants to avoid merge/hash join paths which
 require expensive setup not required for EPQ. But it chooses cheap
 paths for joining relations which may not be nestloop paths. So,
 effectively it could happen that the whole alternate local plan would
 be filled with hash/merge joins except the uppermost join.
>
>
>>> In many cases the cheapest-total-cost outer and inner paths for a higher
>>> foreign-join relation would be foreign join paths, which would have
>>> nestloop
>>> paths as their fdw_outerpaths if not full joins.  So by redirection, the
>>> plan tree for EPQ would be mostly constructed by nestloop joins.  No?
>
>
>> It's not guaranteed that we will always have foreign join paths there.
>> We have seen this in Jeff's example, which started this thread. We
>> don't know in what all cases we have a tree entirely consisting of
>> (cheapest) foreign join paths.
>
>
> Right, but local-join plans need not be efficient since no base table will
> return more than one row, as stated in the documentation.  (I think
> efficient plans without complicating the code would be better, though.)
>
 Or it can
 have foreign paths in there, which will need redirection. That's not
 very good.
>
>
>>> Maybe I'm missing something, but redirection isn't a problem.
>
>
>> Peformance wise it is, correctness-wise it is not. Why do we want to
>> incur a hop, when we can avoid it.
>
>
> ISTM that's solving a problem that hasn't been proven to be a problem.

A hop will consume a function call worth CPU at least.

>
 2. Fix existing code by applying patch from [1]
>
>
>>> As I said before, that might be fine for 9.6, but I don't think it's a
>>> good
>>> idea to search the pathlist because once we support parameterized foreign
>>> join paths, which is on my TODOs, we would have to traverse through the
>>> possibly-lengthy pathlist to find a local-join path, as mentioned in [3].
>
>
>> I don't agree that pathlists will be long enough to make this a
>> non-attractive solution. For parameterized foreign join paths, with
>> the approach that this patch takes, we will require to search in two
>> such pathlists, inner and outer.
>
>
> Sorry, I don't understand this part.

A parameterized join is built if the joining paths are parameterized
as well. Thus building a parameterized local path  would require one
to search suitably parameterized paths in joining relations in their
pathlists.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw bug in 9.6

2017-01-11 Thread Etsuro Fujita

On 2017/01/11 17:51, Ashutosh Bapat wrote:

On Wed, Jan 11, 2017 at 1:15 PM, Etsuro Fujita
 wrote:

On 2017/01/11 13:40, Ashutosh Bapat wrote:

CreateLocalJoinPath tries to construct a nestloop path for the given
join relation because it wants to avoid merge/hash join paths which
require expensive setup not required for EPQ. But it chooses cheap
paths for joining relations which may not be nestloop paths. So,
effectively it could happen that the whole alternate local plan would
be filled with hash/merge joins except the uppermost join.



In many cases the cheapest-total-cost outer and inner paths for a higher
foreign-join relation would be foreign join paths, which would have nestloop
paths as their fdw_outerpaths if not full joins.  So by redirection, the
plan tree for EPQ would be mostly constructed by nestloop joins.  No?



It's not guaranteed that we will always have foreign join paths there.
We have seen this in Jeff's example, which started this thread. We
don't know in what all cases we have a tree entirely consisting of
(cheapest) foreign join paths.


Right, but local-join plans need not be efficient since no base table 
will return more than one row, as stated in the documentation.  (I think 
efficient plans without complicating the code would be better, though.)



Or it can
have foreign paths in there, which will need redirection. That's not
very good.



Maybe I'm missing something, but redirection isn't a problem.



Peformance wise it is, correctness-wise it is not. Why do we want to
incur a hop, when we can avoid it.


ISTM that's solving a problem that hasn't been proven to be a problem.


2. Fix existing code by applying patch from [1]



As I said before, that might be fine for 9.6, but I don't think it's a good
idea to search the pathlist because once we support parameterized foreign
join paths, which is on my TODOs, we would have to traverse through the
possibly-lengthy pathlist to find a local-join path, as mentioned in [3].



I don't agree that pathlists will be long enough to make this a
non-attractive solution. For parameterized foreign join paths, with
the approach that this patch takes, we will require to search in two
such pathlists, inner and outer.


Sorry, I don't understand this part.

Best regards,
Etsuro Fujita




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw bug in 9.6

2017-01-11 Thread Ashutosh Bapat
On Wed, Jan 11, 2017 at 1:15 PM, Etsuro Fujita
 wrote:
> On 2017/01/11 13:40, Ashutosh Bapat wrote:
>>
>> CreateLocalJoinPath tries to construct a nestloop path for the given
>> join relation because it wants to avoid merge/hash join paths which
>> require expensive setup not required for EPQ. But it chooses cheap
>> paths for joining relations which may not be nestloop paths. So,
>> effectively it could happen that the whole alternate local plan would
>> be filled with hash/merge joins except the uppermost join.
>
>
> In many cases the cheapest-total-cost outer and inner paths for a higher
> foreign-join relation would be foreign join paths, which would have nestloop
> paths as their fdw_outerpaths if not full joins.  So by redirection, the
> plan tree for EPQ would be mostly constructed by nestloop joins.  No?

It's not guaranteed that we will always have foreign join paths there.
We have seen this in Jeff's example, which started this thread. We
don't know in what all cases we have a tree entirely consisting of
(cheapest) foreign join paths.

>
>> Or it can
>> have foreign paths in there, which will need redirection. That's not
>> very good.
>
>
> Maybe I'm missing something, but redirection isn't a problem.

Peformance wise it is, correctness-wise it is not. Why do we want to
incur a hop, when we can avoid it.

>
>> We have to choose one of the following
>> alternatives
>>
>> 1. develop a mechanism to remember epq path for joining relations so
>> that it's available to CreateLocalJoinPath
>> 2.let the caller pass epq local paths for joining relations to
>> CreateLocalJoinPath. How it remembers those, is FDW's problem.
>
>
> I thought such a thing, but I think that would be overcomplicated, as
> pointed out by Tom [2].
>
>> 2. Fix existing code by applying patch from [1]
>
>
> As I said before, that might be fine for 9.6, but I don't think it's a good
> idea to search the pathlist because once we support parameterized foreign
> join paths, which is on my TODOs, we would have to traverse through the
> possibly-lengthy pathlist to find a local-join path, as mentioned in [3].

I don't agree that pathlists will be long enough to make this a
non-attractive solution. For parameterized foreign join paths, with
the approach that this patch takes, we will require to search in two
such pathlists, inner and outer.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw bug in 9.6

2017-01-10 Thread Etsuro Fujita

On 2017/01/11 13:40, Ashutosh Bapat wrote:

CreateLocalJoinPath tries to construct a nestloop path for the given
join relation because it wants to avoid merge/hash join paths which
require expensive setup not required for EPQ. But it chooses cheap
paths for joining relations which may not be nestloop paths. So,
effectively it could happen that the whole alternate local plan would
be filled with hash/merge joins except the uppermost join.


In many cases the cheapest-total-cost outer and inner paths for a higher 
foreign-join relation would be foreign join paths, which would have 
nestloop paths as their fdw_outerpaths if not full joins.  So by 
redirection, the plan tree for EPQ would be mostly constructed by 
nestloop joins.  No?



Or it can
have foreign paths in there, which will need redirection. That's not
very good.


Maybe I'm missing something, but redirection isn't a problem.


We have to choose one of the following
alternatives

1. develop a mechanism to remember epq path for joining relations so
that it's available to CreateLocalJoinPath
2.let the caller pass epq local paths for joining relations to
CreateLocalJoinPath. How it remembers those, is FDW's problem.


I thought such a thing, but I think that would be overcomplicated, as 
pointed out by Tom [2].



2. Fix existing code by applying patch from [1]


As I said before, that might be fine for 9.6, but I don't think it's a 
good idea to search the pathlist because once we support parameterized 
foreign join paths, which is on my TODOs, we would have to traverse 
through the possibly-lengthy pathlist to find a local-join path, as 
mentioned in [3].


Best regards,
Etsuro Fujita

[2] https://www.postgresql.org/message-id/12565.1481904788%40sss.pgh.pa.us
[3] 
https://www.postgresql.org/message-id/c1075e4e-8297-5cf6-3f30-cb21266aa2ee%40lab.ntt.co.jp





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw bug in 9.6

2017-01-10 Thread Ashutosh Bapat
On Fri, Jan 6, 2017 at 6:31 PM, Etsuro Fujita
 wrote:
> On 2017/01/05 12:10, Etsuro Fujita wrote:
>>
>> On 2016/12/28 17:34, Ashutosh Bapat wrote:
>>>
>>> Hmm. If I understand the patch correctly, it does not return any path
>>> when merge join is allowed and there are merge clauses but no hash
>>> clauses. In this case we will not create a foreign join path, loosing
>>> some optimization. If we remove GetExistingLocalJoinPath, which
>>> returns a path in those cases as well, we have a regression in
>>> performance.
>
>
>> Ok, will revise, but as I mentioned upthread, I'm not sure it's a good
>> idea to search the pathlist to get a merge join even in this case.  I'd
>> vote for creating a merge join path from the inner/outer paths in this
>> case as well.
>
>
> Done.  Attached is the new version of the patch.
>
> * I'm still not sure the search approach is the right way to go, so I
> modified CreateLocalJoinPath so that it creates a mergejoin path that
> explicitly sorts both the outer and inner relations as in
> sort_inner_and_outer, by using the information saved in that function. I
> think we could try to create a sort-free mergejoin as in
> match_unsorted_outer, but I'm not sure it's worth complicating the code.
> * I modified CreateLocalJoinPath so that it handles the cheapest-total paths
> for the outer/inner relations that are parameterized if possible.
> * I adjusted the code and revised some comments.
>
> As I said upthread, we could skip costing for a local join path in
> CreateLocalJoinPath, for efficiency, but I'm not sure we should do that.

CreateLocalJoinPath tries to construct a nestloop path for the given
join relation because it wants to avoid merge/hash join paths which
require expensive setup not required for EPQ. But it chooses cheap
paths for joining relations which may not be nestloop paths. So,
effectively it could happen that the whole alternate local plan would
be filled with hash/merge joins except the uppermost join. Or it can
have foreign paths in there, which will need redirection. That's not
very good. Neither it's good to start constructing a nestloop join
tree top to bottom every time. We have to choose one of the following
alternatives

1. develop a mechanism to remember epq path for joining relations so
that it's available to CreateLocalJoinPath
2.let the caller pass epq local paths for joining relations to
CreateLocalJoinPath. How it remembers those, is FDW's problem.
2. Fix existing code by applying patch from [1]

[1] 
https://www.postgresql.org/message-id/cafjfprd9ozekur9aormh2toxq4nyuusw2kb9s%2bo%3duvz4xcr%3...@mail.gmail.com
-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw bug in 9.6

2017-01-10 Thread Ashutosh Bapat
>
>> If the inner and/or outer paths are not ordered as required, we will need
>> to
>> order them. Code below doesn't seem to handle that case.
>> /*
>>  * If the paths are already well enough ordered, we
>> can
>>  * skip doing an explicit sort.
>>  */
>> if (outerkeys &&
>> pathkeys_contained_in(outerkeys,
>> outer_path->pathkeys))
>> outerkeys = NIL;
>> if (innerkeys &&
>> pathkeys_contained_in(innerkeys,
>> inner_path->pathkeys))
>> innerkeys = NIL;
>
>
> I might be missing something, but if the outer/inner paths are already well
> sorted, we wouldn't need an explicit sort, so we can set the
> outerkeys/innerkeys to NIL safely.  (You can find the same optimization in
> try_mergejoin_path.)

Actually I didn't notice that create_mergejoin_path saves those keys
in the MergePath and then adds sorting steps during planning. Sorry
for the trouble.

Another concern here is that we are copying pieces of logic in
add_paths_to_joinrel() without arranging it as neatly as in that
function.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw bug in 9.6

2017-01-09 Thread Etsuro Fujita

On 2017/01/09 22:36, Ashutosh Bapat wrote:

I wrote:

Done.  Attached is the new version of the patch.



Why is this so?
 * If the outer cheapest-total path is parameterized by the inner
 * rel, we can't generate a nestloop path.


That parameterization means that for each row scanned from the inner 
rel, the nestloop has to pass down the param values of the row into the 
scan of the outer rel; but it's not possible for the nestloop to work 
like that.  You can find the same tests in match_unsorted_outer.



 * If either cheapest-total path is parameterized by the other
 * rel, we can't generate a hashjoin/mergejoin path.


The above applies to hashjoins and mergejoins.  In addition, those joins 
cannot pass down the param values of each outer-rel row into the 
inner-rel scan, like nestloop joins, so the inner path mustn't be 
parameterized by the outer rel in those joins.  See the same tests in 
sort_inner_and_outer and hash_inner_and_outer.


While re-reading the patch, I noticed this could be simplified:

!   case JOIN_FULL:
!   /*
!* If either cheapest-total path is parameterized by 
the other
!* rel, we can't generate a hashjoin/mergejoin path.  
(There's no
!* use looking for alternative input paths, since these 
should
!* already be the least-parameterized available paths.)
!*/
!   if (PATH_PARAM_BY_REL(outer_path, innerrel) ||
!   PATH_PARAM_BY_REL(inner_path, outerrel))
!   return NULL;
!   /* If proposed path is still parameterized, we can't 
use it. */
!   required_outer = 
calc_non_nestloop_required_outer(outer_path,
!   
  inner_path);

This would mean that both the outer and inner paths must be 
unparameterized.  Will simplify if it's ok.



I don't think we need to provide details of what kind of path the function
builds.
+ join plan.  CreateLocalJoinPath builds a nested loop join
+ path for the specified join relation, except when the join type is
+ FULL, in which case a merge or hash join path is built.


Ok, will remove.


I am not able to understand the code or the comment below
+
+/* Save first mergejoin information for possible use by the FDW */
+if (outerkeys == all_pathkeys)
+{
+extra->mergeclauses = cur_mergeclauses;
+extra->merge_pathkeys = merge_pathkeys;
+extra->merge_outerkeys = outerkeys;
+extra->merge_innerkeys = innerkeys;
+}


In the case when mergejoin is allowed and we have merge clauses but no 
hash clauses, CreateLocalJoinPath creates a mergejoin from the 
cheapest-total paths for the outer and inner rels, by explicitly sorting 
both the outer and inner rels.  The above is for creating the mergejoin 
in CreateLocalJoinPath; since sort_inner_and_outer also creates 
mergejoins from the cheapest-total paths, the above code saves data 
about one of mergejoins created there so that CreateLocalJoinPath uses 
that data to create the mergejoin.


On second thought, I noticed that this could be simplified a bit as 
well; since the output sort order (merge_pathkeys) of the mergejoin 
implementing a FULL join is NIL (see build_join_pathkeys), we wouldn't 
need to save that info in merge_pathkeys.  Will simplify if it's ok.



If the inner and/or outer paths are not ordered as required, we will need to
order them. Code below doesn't seem to handle that case.
/*
 * If the paths are already well enough ordered, we can
 * skip doing an explicit sort.
 */
if (outerkeys &&
pathkeys_contained_in(outerkeys, outer_path->pathkeys))
outerkeys = NIL;
if (innerkeys &&
pathkeys_contained_in(innerkeys, inner_path->pathkeys))
innerkeys = NIL;


I might be missing something, but if the outer/inner paths are already 
well sorted, we wouldn't need an explicit sort, so we can set the 
outerkeys/innerkeys to NIL safely.  (You can find the same optimization 
in try_mergejoin_path.)


Thanks for the review!

Best regards,
Etsuro Fujita




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw bug in 9.6

2017-01-09 Thread Ashutosh Bapat
>
> Done.  Attached is the new version of the patch.
>
> * I'm still not sure the search approach is the right way to go, so I
> modified CreateLocalJoinPath so that it creates a mergejoin path that
> explicitly sorts both the outer and inner relations as in
> sort_inner_and_outer, by using the information saved in that function. I
> think we could try to create a sort-free mergejoin as in
> match_unsorted_outer, but I'm not sure it's worth complicating the code.

Why is this so?
 * If the outer cheapest-total path is parameterized by the inner
 * rel, we can't generate a nestloop path.
and
 * If either cheapest-total path is parameterized by the other
 * rel, we can't generate a hashjoin/mergejoin path.  (There's no

If the inner and/or outer paths are not ordered as required, we will need to
order them. Code below doesn't seem to handle that case.
/*
 * If the paths are already well enough ordered, we can
 * skip doing an explicit sort.
 */
if (outerkeys &&
pathkeys_contained_in(outerkeys, outer_path->pathkeys))
outerkeys = NIL;
if (innerkeys &&
pathkeys_contained_in(innerkeys, inner_path->pathkeys))
innerkeys = NIL;

> * I modified CreateLocalJoinPath so that it handles the cheapest-total paths
> for the outer/inner relations that are parameterized if possible.

I don't think we need to provide details of what kind of path the function
builds.
+ join plan.  CreateLocalJoinPath builds a nested loop join
+ path for the specified join relation, except when the join type is
+ FULL, in which case a merge or hash join path is built.


I am not able to understand the code or the comment below
+
+/* Save first mergejoin information for possible use by the FDW */
+if (outerkeys == all_pathkeys)
+{
+extra->mergeclauses = cur_mergeclauses;
+extra->merge_pathkeys = merge_pathkeys;
+extra->merge_outerkeys = outerkeys;
+extra->merge_innerkeys = innerkeys;
+}

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw bug in 9.6

2017-01-06 Thread Etsuro Fujita

On 2017/01/05 12:10, Etsuro Fujita wrote:

On 2016/12/28 17:34, Ashutosh Bapat wrote:

Hmm. If I understand the patch correctly, it does not return any path
when merge join is allowed and there are merge clauses but no hash
clauses. In this case we will not create a foreign join path, loosing
some optimization. If we remove GetExistingLocalJoinPath, which
returns a path in those cases as well, we have a regression in
performance.



Ok, will revise, but as I mentioned upthread, I'm not sure it's a good
idea to search the pathlist to get a merge join even in this case.  I'd
vote for creating a merge join path from the inner/outer paths in this
case as well.


Done.  Attached is the new version of the patch.

* I'm still not sure the search approach is the right way to go, so I 
modified CreateLocalJoinPath so that it creates a mergejoin path that 
explicitly sorts both the outer and inner relations as in 
sort_inner_and_outer, by using the information saved in that function. 
I think we could try to create a sort-free mergejoin as in 
match_unsorted_outer, but I'm not sure it's worth complicating the code.
* I modified CreateLocalJoinPath so that it handles the cheapest-total 
paths for the outer/inner relations that are parameterized if possible.

* I adjusted the code and revised some comments.

As I said upthread, we could skip costing for a local join path in 
CreateLocalJoinPath, for efficiency, but I'm not sure we should do that.


Best regards,
Etsuro Fujita
*** a/contrib/postgres_fdw/expected/postgres_fdw.out
--- b/contrib/postgres_fdw/expected/postgres_fdw.out
***
*** 1519,1540  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t
 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1" ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR UPDATE OF r1
!->  Merge Join
   Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!  Merge Cond: (t1.c1 = t2.c1)
!  ->  Sort
 Output: t1.c1, t1.c3, t1.*
!Sort Key: t1.c1
!->  Foreign Scan on public.ft1 t1
!  Output: t1.c1, t1.c3, t1.*
!  Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR UPDATE
!  ->  Sort
 Output: t2.c1, t2.*
!Sort Key: t2.c1
!->  Foreign Scan on public.ft2 t2
!  Output: t2.c1, t2.*
!  Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1"
! (23 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR UPDATE OF t1;
   c1  | c1  
--- 1519,1534 
 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1" ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR UPDATE OF r1
!->  Nested Loop
   Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!  Join Filter: (t1.c1 = t2.c1)
!  ->  Foreign Scan on public.ft1 t1
 Output: t1.c1, t1.c3, t1.*
!Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR UPDATE
!  ->  Foreign Scan on public.ft2 t2
 Output: t2.c1, t2.*
!Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1"
! (17 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR UPDATE OF t1;
   c1  | c1  
***
*** 1563,1584  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t
 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, 

Re: [HACKERS] postgres_fdw bug in 9.6

2017-01-04 Thread Etsuro Fujita

On 2017/01/05 13:19, Ashutosh Bapat wrote:

Hmm. If I understand the patch correctly, it does not return any path
when merge join is allowed and there are merge clauses but no hash
clauses. In this case we will not create a foreign join path, loosing
some optimization. If we remove GetExistingLocalJoinPath, which
returns a path in those cases as well, we have a regression in
performance.



Ok, will revise, but as I mentioned upthread, I'm not sure it's a good idea
to search the pathlist to get a merge join even in this case.  I'd vote for
creating a merge join path from the inner/outer paths in this case as well.
I think that would simplify the code as well.



Creating a new path 1. requires memory


The search approach would require memory for saving the path, too.


2. spends CPU cycles in costing
and creating it


The search approach would also need extra cycles in the cases mentioned 
in [1], wouldn't it?  Since it would be useless to cost the 
fdw_outerpath of a foreign join, we could skip that for the 
fdw_outerpath if necessary.



3. requires a search in inner and outer relations'
pathlists (see my earlier reply).


What I'm thinking is basically to use the cheapest-total-cost paths of 
the inner/outer relations, which wouldn't require any search.


Best regards,
Etsuro Fujita

[1] 
https://www.postgresql.org/message-id/c1075e4e-8297-5cf6-3f30-cb21266aa2ee%40lab.ntt.co.jp





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw bug in 9.6

2017-01-04 Thread Ashutosh Bapat
>
>> Hmm. If I understand the patch correctly, it does not return any path
>> when merge join is allowed and there are merge clauses but no hash
>> clauses. In this case we will not create a foreign join path, loosing
>> some optimization. If we remove GetExistingLocalJoinPath, which
>> returns a path in those cases as well, we have a regression in
>> performance.
>
>
> Ok, will revise, but as I mentioned upthread, I'm not sure it's a good idea
> to search the pathlist to get a merge join even in this case.  I'd vote for
> creating a merge join path from the inner/outer paths in this case as well.
> I think that would simplify the code as well.

Creating a new path 1. requires memory 2. spends CPU cycles in costing
and creating it 3. requires a search in inner and outer relations'
pathlists (see my earlier reply). Searching for an existing path just
requires a search in one relation's pathlist. The path should be
there. Why do we want to spend extra resources in creating a new path
when an old one exists and searching it is more efficient.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw bug in 9.6

2017-01-04 Thread Ashutosh Bapat
On Thu, Jan 5, 2017 at 8:20 AM, Etsuro Fujita
 wrote:
> On 2016/12/27 16:41, Etsuro Fujita wrote:
>>
>> On 2016/12/22 1:04, Ashutosh Bapat wrote:
>>>
>>> 2. We should try to look for other not-so-cheap paths if the cheapest
>>> one is
>>> paramterized. You might want to use get_cheapest_path_for_pathkeys()
>>> to find a
>>> suitable unparameterized path by passing NULL for required_outer and
>>> NIL for
>>> pathkeys, that's a very strange usage, but I think it will serve the
>>> purpose.
>
>
>>> +/* Give up if the cheapest-total-cost paths are parameterized. */
>>> +if (!bms_is_empty(PATH_REQ_OUTER(outer_path)) ||
>>> +!bms_is_empty(PATH_REQ_OUTER(inner_path)))
>>> +return NULL;
>
>
>> I did that because I think that would work well for postgres_fdw, but I
>> agree with you.  Will revise.
>
>
> While working on this, I noticed that in that case
> get_cheapest_path_for_pathkeys() would return NULL because if the
> cheapest-total-cost path is parameterized, then there are no unparameterized
> paths in the rel's pathlist (see set_cheapest).

I guess, that happens when there are lateral references, and every
path for that relation in parameterized. Please correct me if I am
wrong. If that's true, we should be searching for minimum
parameterization for that relation instead of no parameterization.
IIUC the bit about outer relations in the following comment in
add_path()

 *We also remove from the rel's pathlist any old paths that are dominated
 *by new_path --- that is, new_path is cheaper, at least as well ordered,
 *generates no more rows, requires no outer rels not required by the old
 *path, and is no less parallel-safe.

we do not discard an unparameterized path in favour of a cheaper
parameterized path. So, if the cheapest path is unparameterized one,
it's parameterized by minimum required outer relations.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw bug in 9.6

2017-01-04 Thread Etsuro Fujita

On 2016/12/28 17:34, Ashutosh Bapat wrote:

On Wed, Dec 28, 2016 at 1:29 PM, Etsuro Fujita
 wrote:

On 2016/12/28 15:54, Ashutosh Bapat wrote:

On Wed, Dec 28, 2016 at 9:20 AM, Etsuro Fujita
 wrote:

On 2016/12/27 22:03, Ashutosh Bapat wrote:

If mergejoin_allowed is true and mergeclauselist is non-NIL but
hashclauselist is NIL (that's rare but there can be types has merge
operators but not hash operators), we will end up returning NULL. I
think we want to create a merge join in that case. I think the order
of conditions should be 1. check hashclause_list then create hash join
2. else check if merge allowed, create merge join. It looks like that
would cover all the cases, if there aren't any hash clauses, and also
no merge clauses, we won't be able to implement a FULL join, so it
will get rejected during path creation itself.



Right, maybe we can do that by doing similar things as in
match_unsort_outer
and/or sort_inner_and_outer.  But as you mentioned, the case is rare, so
the
problem would be whether it's worth complicating the code (and if it's
worth, whether we should do that at the first version of the function).



All I am requesting is changing the order of conditions. That doesn't
complicate the code.



I might have misunderstood your words, but you are saying we should consider
mergejoin paths with some mergeclauses in the case where hashclauses is NIL,
right?  To do so, we would need to consider the sort orders of outer/inner
paths, which would make the code complicated.



Hmm. If I understand the patch correctly, it does not return any path
when merge join is allowed and there are merge clauses but no hash
clauses. In this case we will not create a foreign join path, loosing
some optimization. If we remove GetExistingLocalJoinPath, which
returns a path in those cases as well, we have a regression in
performance.


Ok, will revise, but as I mentioned upthread, I'm not sure it's a good 
idea to search the pathlist to get a merge join even in this case.  I'd 
vote for creating a merge join path from the inner/outer paths in this 
case as well.  I think that would simplify the code as well.


Best regards,
Etsuro Fujita




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw bug in 9.6

2017-01-04 Thread Etsuro Fujita

On 2016/12/27 22:03, Ashutosh Bapat wrote:

You wrote:

3. Talking about saving some CPU cycles - if a clauseless full join can be
implemented only using merge join, probably that's the only path available
in
the list of paths for the given relation. Instead of building the same
path
anew, should we use the existing path like GetExistingLocalJoinPath() for
that
case?


I wrote:

Hm, that might be an idea, but my concern about that is: the existing path
wouldn't always be guaranteed to be unprameterized.



Why? We retain all the parameterizations (including no
parameterization) available in the pathlist, so if it's possible to
create an unparameterized path, there will be one.


OK, but another concern is: in cases when we consider parameterized 
paths, it might be inefficient to search the pathlist because the 
unparameterized path might be at the rear of the lengthy pathlist.  Note 
that patameterized paths would produce fewer rows and have reduced 
transfer and hence total cost, so they would be in the more front of the 
pathlist, and the unparameterized path would be in the more rear.  (Note 
that the pathlist is kept sorted by total cost.)


Best regards,
Etsuro Fujita




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw bug in 9.6

2017-01-04 Thread Etsuro Fujita

On 2016/12/27 16:41, Etsuro Fujita wrote:

On 2016/12/22 1:04, Ashutosh Bapat wrote:

2. We should try to look for other not-so-cheap paths if the cheapest
one is
paramterized. You might want to use get_cheapest_path_for_pathkeys()
to find a
suitable unparameterized path by passing NULL for required_outer and
NIL for
pathkeys, that's a very strange usage, but I think it will serve the
purpose.



+/* Give up if the cheapest-total-cost paths are parameterized. */
+if (!bms_is_empty(PATH_REQ_OUTER(outer_path)) ||
+!bms_is_empty(PATH_REQ_OUTER(inner_path)))
+return NULL;



I did that because I think that would work well for postgres_fdw, but I
agree with you.  Will revise.


While working on this, I noticed that in that case 
get_cheapest_path_for_pathkeys() would return NULL because if the 
cheapest-total-cost path is parameterized, then there are no 
unparameterized paths in the rel's pathlist (see set_cheapest).


Best regards,
Etsuro Fujita




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw bug in 9.6

2016-12-28 Thread Ashutosh Bapat
On Wed, Dec 28, 2016 at 1:29 PM, Etsuro Fujita
 wrote:
> On 2016/12/28 15:54, Ashutosh Bapat wrote:
>>
>> On Wed, Dec 28, 2016 at 9:20 AM, Etsuro Fujita
>>  wrote:
>>>
>>> On 2016/12/27 22:03, Ashutosh Bapat wrote:

 If mergejoin_allowed is true and mergeclauselist is non-NIL but
 hashclauselist is NIL (that's rare but there can be types has merge
 operators but not hash operators), we will end up returning NULL. I
 think we want to create a merge join in that case. I think the order
 of conditions should be 1. check hashclause_list then create hash join
 2. else check if merge allowed, create merge join. It looks like that
 would cover all the cases, if there aren't any hash clauses, and also
 no merge clauses, we won't be able to implement a FULL join, so it
 will get rejected during path creation itself.
>
>
>>> Right, maybe we can do that by doing similar things as in
>>> match_unsort_outer
>>> and/or sort_inner_and_outer.  But as you mentioned, the case is rare, so
>>> the
>>> problem would be whether it's worth complicating the code (and if it's
>>> worth, whether we should do that at the first version of the function).
>
>
>> All I am requesting is changing the order of conditions. That doesn't
>> complicate the code.
>
>
> I might have misunderstood your words, but you are saying we should consider
> mergejoin paths with some mergeclauses in the case where hashclauses is NIL,
> right?  To do so, we would need to consider the sort orders of outer/inner
> paths, which would make the code complicated.

Hmm. If I understand the patch correctly, it does not return any path
when merge join is allowed and there are merge clauses but no hash
clauses. In this case we will not create a foreign join path, loosing
some optimization. If we remove GetExistingLocalJoinPath, which
returns a path in those cases as well, we have a regression in
performance.

>
 The reason we chose to pick up an existing path was that the
 discussion in thread [1] concluded the efficiency of the local plan
 wasn't a concern for EPQ. Are we now saying something otherwise?
>
>
>>> No, I won't.  Usually, the overhead would be negligible, but in some
>>> cases
>>> where there are many concurrent updates, the overhead might not be
>>> negligible due to many EPQ rechecks.  So it would be better to have an
>>> efficient local plan.
>
>
>> All that the EPQ rechecks do is apply the join and other quals again
>> on the base relation rows. Will choice of plan affect the efficiency?
>
>
> Merge or hash joins would need extra steps to start that work (for example,
> building a hash table from the inner relation for a hash join.)

Hmm, I agree.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw bug in 9.6

2016-12-28 Thread Etsuro Fujita

On 2016/12/28 15:54, Ashutosh Bapat wrote:

On Wed, Dec 28, 2016 at 9:20 AM, Etsuro Fujita
 wrote:

On 2016/12/27 22:03, Ashutosh Bapat wrote:

If mergejoin_allowed is true and mergeclauselist is non-NIL but
hashclauselist is NIL (that's rare but there can be types has merge
operators but not hash operators), we will end up returning NULL. I
think we want to create a merge join in that case. I think the order
of conditions should be 1. check hashclause_list then create hash join
2. else check if merge allowed, create merge join. It looks like that
would cover all the cases, if there aren't any hash clauses, and also
no merge clauses, we won't be able to implement a FULL join, so it
will get rejected during path creation itself.



Right, maybe we can do that by doing similar things as in match_unsort_outer
and/or sort_inner_and_outer.  But as you mentioned, the case is rare, so the
problem would be whether it's worth complicating the code (and if it's
worth, whether we should do that at the first version of the function).



All I am requesting is changing the order of conditions. That doesn't
complicate the code.


I might have misunderstood your words, but you are saying we should 
consider mergejoin paths with some mergeclauses in the case where 
hashclauses is NIL, right?  To do so, we would need to consider the sort 
orders of outer/inner paths, which would make the code complicated.



The reason we chose to pick up an existing path was that the
discussion in thread [1] concluded the efficiency of the local plan
wasn't a concern for EPQ. Are we now saying something otherwise?



No, I won't.  Usually, the overhead would be negligible, but in some cases
where there are many concurrent updates, the overhead might not be
negligible due to many EPQ rechecks.  So it would be better to have an
efficient local plan.



All that the EPQ rechecks do is apply the join and other quals again
on the base relation rows. Will choice of plan affect the efficiency?


Merge or hash joins would need extra steps to start that work (for 
example, building a hash table from the inner relation for a hash join.)


Best regards,
Etsuro Fujita




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw bug in 9.6

2016-12-27 Thread Ashutosh Bapat
On Wed, Dec 28, 2016 at 9:20 AM, Etsuro Fujita
 wrote:
> On 2016/12/27 22:03, Ashutosh Bapat wrote:
>>
>> If mergejoin_allowed is true and mergeclauselist is non-NIL but
>> hashclauselist is NIL (that's rare but there can be types has merge
>> operators but not hash operators), we will end up returning NULL. I
>> think we want to create a merge join in that case. I think the order
>> of conditions should be 1. check hashclause_list then create hash join
>> 2. else check if merge allowed, create merge join. It looks like that
>> would cover all the cases, if there aren't any hash clauses, and also
>> no merge clauses, we won't be able to implement a FULL join, so it
>> will get rejected during path creation itself.
>
>
> Right, maybe we can do that by doing similar things as in match_unsort_outer
> and/or sort_inner_and_outer.  But as you mentioned, the case is rare, so the
> problem would be whether it's worth complicating the code (and if it's
> worth, whether we should do that at the first version of the function).

All I am requesting is changing the order of conditions. That doesn't
complicate the code.

>
 I
 am
 wondering whether the easy and possibly correct solution here is to not
 replace
 a ForeignPath with fdw_outerpath in GetExistingLocalJoinPath()? If we
 don't do
 that, there won't be error building merge join plan and
 postgresRecheckForeignScan() would correctly route the EPQ checks to the
 local
 plan available as outer plan.
>
>
>>> That might be fine for PG9.6, but I'd like to propose replacing
>>> GetExistingLocalJoinPath with CreateLocalJoinPath for PG10, because (1)
>>> GetExistingLocalJoinPath might choose an overkill, merge or hash join
>>> path
>>> for INNER/LEFT/SEMI/ANTI, not a nestloop join path, which might cause an
>>> overhead at EPQ rechecks, and
>
>
>> The reason we chose to pick up an existing path was that the
>> discussion in thread [1] concluded the efficiency of the local plan
>> wasn't a concern for EPQ. Are we now saying something otherwise?
>
>
> No, I won't.  Usually, the overhead would be negligible, but in some cases
> where there are many concurrent updates, the overhead might not be
> negligible due to many EPQ rechecks.  So it would be better to have an
> efficient local plan.
>

All that the EPQ rechecks do is apply the join and other quals again
on the base relation rows. Will choice of plan affect the efficiency?

>>> (2) choosing a local join path randomly from
>>> the rel's pathlist wouldn't be easy to understand.
>
>
>> Easy to understand for whom? Can you please elaborate?
>
>
> Users.  I think the ease of understanding for users is important.

I doubt users care much about whether an existing path is returned or
a new one created as long as they get one to stuff in fdw_outerpath.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw bug in 9.6

2016-12-27 Thread Etsuro Fujita

On 2016/12/27 22:03, Ashutosh Bapat wrote:

If mergejoin_allowed is true and mergeclauselist is non-NIL but
hashclauselist is NIL (that's rare but there can be types has merge
operators but not hash operators), we will end up returning NULL. I
think we want to create a merge join in that case. I think the order
of conditions should be 1. check hashclause_list then create hash join
2. else check if merge allowed, create merge join. It looks like that
would cover all the cases, if there aren't any hash clauses, and also
no merge clauses, we won't be able to implement a FULL join, so it
will get rejected during path creation itself.


Right, maybe we can do that by doing similar things as in 
match_unsort_outer and/or sort_inner_and_outer.  But as you mentioned, 
the case is rare, so the problem would be whether it's worth 
complicating the code (and if it's worth, whether we should do that at 
the first version of the function).



I
am
wondering whether the easy and possibly correct solution here is to not
replace
a ForeignPath with fdw_outerpath in GetExistingLocalJoinPath()? If we
don't do
that, there won't be error building merge join plan and
postgresRecheckForeignScan() would correctly route the EPQ checks to the
local
plan available as outer plan.



That might be fine for PG9.6, but I'd like to propose replacing
GetExistingLocalJoinPath with CreateLocalJoinPath for PG10, because (1)
GetExistingLocalJoinPath might choose an overkill, merge or hash join path
for INNER/LEFT/SEMI/ANTI, not a nestloop join path, which might cause an
overhead at EPQ rechecks, and



The reason we chose to pick up an existing path was that the
discussion in thread [1] concluded the efficiency of the local plan
wasn't a concern for EPQ. Are we now saying something otherwise?


No, I won't.  Usually, the overhead would be negligible, but in some 
cases where there are many concurrent updates, the overhead might not be 
negligible due to many EPQ rechecks.  So it would be better to have an 
efficient local plan.



(2) choosing a local join path randomly from
the rel's pathlist wouldn't be easy to understand.



Easy to understand for whom? Can you please elaborate?


Users.  I think the ease of understanding for users is important.

Best regards,
Etsuro Fujita




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw bug in 9.6

2016-12-27 Thread Ashutosh Bapat
>
>> 3. Talking about saving some CPU cycles - if a clauseless full join can be
>> implemented only using merge join, probably that's the only path available
>> in
>> the list of paths for the given relation. Instead of building the same
>> path
>> anew, should we use the existing path like GetExistingLocalJoinPath() for
>> that
>> case?
>
>
> Hm, that might be an idea, but my concern about that is: the existing path
> wouldn't always be guaranteed to be unprameterized.

Why? We retain all the parameterizations (including no
parameterization) available in the pathlist, so if it's possible to
create an unparameterized path, there will be one.

>
>> In fact, I am wondering whether we should look for an existing nestloop
>> path for all joins except full, in which case we should look for merge or
>> hash
>> join path. We go on building a new path, if only there isn't an existing
>> one.
>> That will certainly save some CPU cycles spent in costing the path.
>
>
> Maybe in many cases, nestloop paths for INNER/LEFT/SEMI/ANTI might be
> removed from the rel's pathlist by dominated merge or hash join paths, so
> searching the pathlist might cause a useless overhead.
>

Fare point.

>
>> 5. Per comment below a clauseless full join can be implemented using only
>> merge
>> join. Why are we checking extra->mergejoin_allowed? Shouldn't it be true
>> here?
>> /*
>>  * Special corner case: for "x FULL JOIN y ON true", there will be
>> no
>>  * join clauses at all.  Note that mergejoin is our only join type
>>  * that supports FULL JOIN without any join clauses, and in that
>> case
>>  * it doesn't require the input paths to be well ordered, so
>> generate
>>  * a clauseless mergejoin path from the cheapest-total-cost paths.
>>  */
>> if (extra->mergejoin_allowed && !extra->mergeclause_list)
>
>
> See the comments for select_mergejoin_clauses:
>
>  * select_mergejoin_clauses
>  *Select mergejoin clauses that are usable for a particular join.
>  *Returns a list of RestrictInfo nodes for those clauses.
>  *
>  * *mergejoin_allowed is normally set to TRUE, but it is set to FALSE if
>  * this is a right/full join and there are nonmergejoinable join clauses.
>  * The executor's mergejoin machinery cannot handle such cases, so we have
>  * to avoid generating a mergejoin plan.  (Note that this flag does NOT
>  * consider whether there are actually any mergejoinable clauses.  This is
>  * correct because in some cases we need to build a clauseless mergejoin.
>  * Simply returning NIL is therefore not enough to distinguish safe from
>  * unsafe cases.)

If mergejoin_allowed is true and mergeclauselist is non-NIL but
hashclauselist is NIL (that's rare but there can be types has merge
operators but not hash operators), we will end up returning NULL. I
think we want to create a merge join in that case. I think the order
of conditions should be 1. check hashclause_list then create hash join
2. else check if merge allowed, create merge join. It looks like that
would cover all the cases, if there aren't any hash clauses, and also
no merge clauses, we won't be able to implement a FULL join, so it
will get rejected during path creation itself.

>
>> Rethinking about the problem, the error comes because the inner or outer
>> plan
>> of a merge join plan doesn't have pathkeys required by the merge join.
>> This
>> happens because the merge join path had foreign path with pathkeys as
>> inner or
>> outer path and corresponding fdw_outerpath didn't have those pathkeys. I
>> am
>> wondering whether the easy and possibly correct solution here is to not
>> replace
>> a ForeignPath with fdw_outerpath in GetExistingLocalJoinPath()? If we
>> don't do
>> that, there won't be error building merge join plan and
>> postgresRecheckForeignScan() would correctly route the EPQ checks to the
>> local
>> plan available as outer plan. Attached patch with that code removed.
>
>
> That might be fine for PG9.6, but I'd like to propose replacing
> GetExistingLocalJoinPath with CreateLocalJoinPath for PG10, because (1)
> GetExistingLocalJoinPath might choose an overkill, merge or hash join path
> for INNER/LEFT/SEMI/ANTI, not a nestloop join path, which might cause an
> overhead at EPQ rechecks, and

The reason we chose to pick up an existing path was that the
discussion in thread [1] concluded the efficiency of the local plan
wasn't a concern for EPQ. Are we now saying something otherwise?

> (2) choosing a local join path randomly from
> the rel's pathlist wouldn't be easy to understand.
>

Easy to understand for whom? Can you please elaborate?

[1] 
https://www.postgresql.org/message-id/flat/565E638E.8020703%40lab.ntt.co.jp#565e638e.8020...@lab.ntt.co.jp
-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Re: [HACKERS] postgres_fdw bug in 9.6

2016-12-26 Thread Etsuro Fujita

On 2016/12/22 1:04, Ashutosh Bapat wrote:

Some review comments


Thanks for the review!


2. We should try to look for other not-so-cheap paths if the cheapest one is
paramterized. You might want to use get_cheapest_path_for_pathkeys() to find a
suitable unparameterized path by passing NULL for required_outer and NIL for
pathkeys, that's a very strange usage, but I think it will serve the purpose.
On the thread we discussed that we should save the epq_path create for lower
join and use it here. That may be better than searching for a path.
+/* Give up if the cheapest-total-cost paths are parameterized. */
+if (!bms_is_empty(PATH_REQ_OUTER(outer_path)) ||
+!bms_is_empty(PATH_REQ_OUTER(inner_path)))
+return NULL;


I did that because I think that would work well for postgres_fdw, but I 
agree with you.  Will revise.



3. Talking about saving some CPU cycles - if a clauseless full join can be
implemented only using merge join, probably that's the only path available in
the list of paths for the given relation. Instead of building the same path
anew, should we use the existing path like GetExistingLocalJoinPath() for that
case?


Hm, that might be an idea, but my concern about that is: the existing 
path wouldn't always be guaranteed to be unprameterized.



In fact, I am wondering whether we should look for an existing nestloop
path for all joins except full, in which case we should look for merge or hash
join path. We go on building a new path, if only there isn't an existing one.
That will certainly save some CPU cycles spent in costing the path.


Maybe in many cases, nestloop paths for INNER/LEFT/SEMI/ANTI might be 
removed from the rel's pathlist by dominated merge or hash join paths, 
so searching the pathlist might cause a useless overhead.



4. Following comment mentions only hash join, but the code creates a merge join
or hash join path.
 * If the jointype is JOIN_FULL, try to create a hashjoin join path from


Will revise.


5. Per comment below a clauseless full join can be implemented using only merge
join. Why are we checking extra->mergejoin_allowed? Shouldn't it be true here?
/*
 * Special corner case: for "x FULL JOIN y ON true", there will be no
 * join clauses at all.  Note that mergejoin is our only join type
 * that supports FULL JOIN without any join clauses, and in that case
 * it doesn't require the input paths to be well ordered, so generate
 * a clauseless mergejoin path from the cheapest-total-cost paths.
 */
if (extra->mergejoin_allowed && !extra->mergeclause_list)


See the comments for select_mergejoin_clauses:

 * select_mergejoin_clauses
 *Select mergejoin clauses that are usable for a particular join.
 *Returns a list of RestrictInfo nodes for those clauses.
 *
 * *mergejoin_allowed is normally set to TRUE, but it is set to FALSE if
 * this is a right/full join and there are nonmergejoinable join clauses.
 * The executor's mergejoin machinery cannot handle such cases, so we have
 * to avoid generating a mergejoin plan.  (Note that this flag does NOT
 * consider whether there are actually any mergejoinable clauses.  This is
 * correct because in some cases we need to build a clauseless mergejoin.
 * Simply returning NIL is therefore not enough to distinguish safe from
 * unsafe cases.)


Rethinking about the problem, the error comes because the inner or outer plan
of a merge join plan doesn't have pathkeys required by the merge join. This
happens because the merge join path had foreign path with pathkeys as inner or
outer path and corresponding fdw_outerpath didn't have those pathkeys. I am
wondering whether the easy and possibly correct solution here is to not replace
a ForeignPath with fdw_outerpath in GetExistingLocalJoinPath()? If we don't do
that, there won't be error building merge join plan and
postgresRecheckForeignScan() would correctly route the EPQ checks to the local
plan available as outer plan. Attached patch with that code removed.


That might be fine for PG9.6, but I'd like to propose replacing 
GetExistingLocalJoinPath with CreateLocalJoinPath for PG10, because (1) 
GetExistingLocalJoinPath might choose an overkill, merge or hash join 
path for INNER/LEFT/SEMI/ANTI, not a nestloop join path, which might 
cause an overhead at EPQ rechecks, and (2) choosing a local join path 
randomly from the rel's pathlist wouldn't be easy to understand.


I'll add this to the next CF.

Best regards,
Etsuro Fujita




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw bug in 9.6

2016-12-26 Thread Etsuro Fujita

On 2016/12/23 1:04, Robert Haas wrote:

On Wed, Dec 21, 2016 at 11:04 AM, Ashutosh Bapat
 wrote:

Some review comments

1. postgres_fdw doesn't push down semi and anti joins so you may want to
discount these two too.
+   jointype == JOIN_SEMI ||
+   jointype == JOIN_ANTI);



But in the future, it might.


I plan to work on adding those cases to postgres_fdw.


We shouldn't randomly leave foot-guns
lying around if there's an easy alternative.


Some FDWs might have already supported pushing down semi/anti joins.  So 
I think it's better to handle those joins as well.



3. Adding new members to JoinPathExtraData may save some work for postgres_fdw
and other FDWs which would use CreateLocalJoinPath(), but it will add few bytes
to the structure even when there is no "FULL foreign join which requires EPQ"
involved in the query. That may not be so much of memory overhead since the
structure is used locally to add_paths_to_joinrel(), but it might be something
to think about. Instead, what if we call select_mergejoin_clauses() within
CreateLocalJoinPath() to get that information?



I think that's exactly backwards.  The few bytes of storage don't
matter, but extra CPU cycles might.


I agree with Robert.

Best regards,
Etsuro Fujita




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw bug in 9.6

2016-12-26 Thread Etsuro Fujita

On 2016/12/21 21:44, Etsuro Fujita wrote:

On 2016/12/20 0:37, Tom Lane wrote:

Etsuro Fujita  writes:

On 2016/12/17 1:13, Tom Lane wrote:

So I think the rule could be



"When first asked to produce a path for a given foreign joinrel,
collect
the cheapest paths for its left and right inputs, and make a
nestloop path
(or hashjoin path, if full join) from those, using the join quals
needed
for the current input relation pair.



Seems reasonable.



Use this as the fdw_outerpath for
all foreign paths made for the joinrel."



I'm not sure that would work well for foreign joins with sort orders.
Consider a merge join, whose left input is a 2-way foreign join with a
sort order that implements a full join and whose right input is a sorted
local table scan.  If the EPQ subplan for the foreign join wouldn't
produce the right sort order, the merge join might break during EPQ
rechecks (note that in this case the EPQ subplan for the foreign join
might produce more than a single row during an EPQ recheck).



How so?  We only recheck one row at a time, therefore it can be
claimed to
have any sort order you care about.



I'll have second thoughts about that.


I noticed I was wrong and you are right.  Sorry for the noise.

Best regards,
Etsuro Fujita




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw bug in 9.6

2016-12-22 Thread Robert Haas
On Wed, Dec 21, 2016 at 11:04 AM, Ashutosh Bapat
 wrote:
> Some review comments
>
> 1. postgres_fdw doesn't push down semi and anti joins so you may want to
> discount these two too.
> +   jointype == JOIN_SEMI ||
> +   jointype == JOIN_ANTI);

But in the future, it might.  We shouldn't randomly leave foot-guns
lying around if there's an easy alternative.

> 3. Adding new members to JoinPathExtraData may save some work for postgres_fdw
> and other FDWs which would use CreateLocalJoinPath(), but it will add few 
> bytes
> to the structure even when there is no "FULL foreign join which requires EPQ"
> involved in the query. That may not be so much of memory overhead since the
> structure is used locally to add_paths_to_joinrel(), but it might be something
> to think about. Instead, what if we call select_mergejoin_clauses() within
> CreateLocalJoinPath() to get that information?

I think that's exactly backwards.  The few bytes of storage don't
matter, but extra CPU cycles might.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw bug in 9.6

2016-12-21 Thread Ashutosh Bapat
Some review comments

1. postgres_fdw doesn't push down semi and anti joins so you may want to
discount these two too.
+   jointype == JOIN_SEMI ||
+   jointype == JOIN_ANTI);

2. We should try to look for other not-so-cheap paths if the cheapest one is
paramterized. You might want to use get_cheapest_path_for_pathkeys() to find a
suitable unparameterized path by passing NULL for required_outer and NIL for
pathkeys, that's a very strange usage, but I think it will serve the purpose.
On the thread we discussed that we should save the epq_path create for lower
join and use it here. That may be better than searching for a path.
+/* Give up if the cheapest-total-cost paths are parameterized. */
+if (!bms_is_empty(PATH_REQ_OUTER(outer_path)) ||
+!bms_is_empty(PATH_REQ_OUTER(inner_path)))
+return NULL;

3. Adding new members to JoinPathExtraData may save some work for postgres_fdw
and other FDWs which would use CreateLocalJoinPath(), but it will add few bytes
to the structure even when there is no "FULL foreign join which requires EPQ"
involved in the query. That may not be so much of memory overhead since the
structure is used locally to add_paths_to_joinrel(), but it might be something
to think about. Instead, what if we call select_mergejoin_clauses() within
CreateLocalJoinPath() to get that information?

3. Talking about saving some CPU cycles - if a clauseless full join can be
implemented only using merge join, probably that's the only path available in
the list of paths for the given relation. Instead of building the same path
anew, should we use the existing path like GetExistingLocalJoinPath() for that
case? In fact, I am wondering whether we should look for an existing nestloop
path for all joins except full, in which case we should look for merge or hash
join path. We go on building a new path, if only there isn't an existing one.
That will certainly save some CPU cycles spent in costing the path.

4. Following comment mentions only hash join, but the code creates a merge join
or hash join path.
 * If the jointype is JOIN_FULL, try to create a hashjoin join path from

5. Per comment below a clauseless full join can be implemented using only merge
join. Why are we checking extra->mergejoin_allowed? Shouldn't it be true here?
/*
 * Special corner case: for "x FULL JOIN y ON true", there will be no
 * join clauses at all.  Note that mergejoin is our only join type
 * that supports FULL JOIN without any join clauses, and in that case
 * it doesn't require the input paths to be well ordered, so generate
 * a clauseless mergejoin path from the cheapest-total-cost paths.
 */
if (extra->mergejoin_allowed && !extra->mergeclause_list)

Rethinking about the problem, the error comes because the inner or outer plan
of a merge join plan doesn't have pathkeys required by the merge join. This
happens because the merge join path had foreign path with pathkeys as inner or
outer path and corresponding fdw_outerpath didn't have those pathkeys. I am
wondering whether the easy and possibly correct solution here is to not replace
a ForeignPath with fdw_outerpath in GetExistingLocalJoinPath()? If we don't do
that, there won't be error building merge join plan and
postgresRecheckForeignScan() would correctly route the EPQ checks to the local
plan available as outer plan. Attached patch with that code removed.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
diff --git a/src/backend/foreign/foreign.c b/src/backend/foreign/foreign.c
index 242d6d2..872d2a1 100644
--- a/src/backend/foreign/foreign.c
+++ b/src/backend/foreign/foreign.c
@@ -775,30 +775,6 @@ GetExistingLocalJoinPath(RelOptInfo *joinrel)
 		if (!joinpath)
 			continue;
 
-		/*
-		 * If either inner or outer path is a ForeignPath corresponding to a
-		 * pushed down join, replace it with the fdw_outerpath, so that we
-		 * maintain path for EPQ checks built entirely of local join
-		 * strategies.
-		 */
-		if (IsA(joinpath->outerjoinpath, ForeignPath))
-		{
-			ForeignPath *foreign_path;
-
-			foreign_path = (ForeignPath *) joinpath->outerjoinpath;
-			if (foreign_path->path.parent->reloptkind == RELOPT_JOINREL)
-joinpath->outerjoinpath = foreign_path->fdw_outerpath;
-		}
-
-		if (IsA(joinpath->innerjoinpath, ForeignPath))
-		{
-			ForeignPath *foreign_path;
-
-			foreign_path = (ForeignPath *) joinpath->innerjoinpath;
-			if (foreign_path->path.parent->reloptkind == RELOPT_JOINREL)
-joinpath->innerjoinpath = foreign_path->fdw_outerpath;
-		}
-
 		return (Path *) joinpath;
 	}
 	return NULL;

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw bug in 9.6

2016-12-21 Thread Etsuro Fujita

On 2016/12/20 0:37, Tom Lane wrote:

Etsuro Fujita  writes:

On 2016/12/17 1:13, Tom Lane wrote:

So I think the rule could be



"When first asked to produce a path for a given foreign joinrel, collect
the cheapest paths for its left and right inputs, and make a nestloop path
(or hashjoin path, if full join) from those, using the join quals needed
for the current input relation pair.



Seems reasonable.



Use this as the fdw_outerpath for
all foreign paths made for the joinrel."



I'm not sure that would work well for foreign joins with sort orders.
Consider a merge join, whose left input is a 2-way foreign join with a
sort order that implements a full join and whose right input is a sorted
local table scan.  If the EPQ subplan for the foreign join wouldn't
produce the right sort order, the merge join might break during EPQ
rechecks (note that in this case the EPQ subplan for the foreign join
might produce more than a single row during an EPQ recheck).



How so?  We only recheck one row at a time, therefore it can be claimed to
have any sort order you care about.


I'll have second thoughts about that.  I agree with you except for that, 
so I've created a patch; I removed GetExistingLocalJoinPath and added a 
helper function, CreateLocalJoinPath, that generates a local join path 
for a given foreign join, as described above.  Please find attached a patch.


Best regards,
Etsuro Fujita
*** a/contrib/postgres_fdw/expected/postgres_fdw.out
--- b/contrib/postgres_fdw/expected/postgres_fdw.out
***
*** 1519,1540  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t
 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1" ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR UPDATE OF r1
!->  Merge Join
   Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!  Merge Cond: (t1.c1 = t2.c1)
!  ->  Sort
 Output: t1.c1, t1.c3, t1.*
!Sort Key: t1.c1
!->  Foreign Scan on public.ft1 t1
!  Output: t1.c1, t1.c3, t1.*
!  Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR UPDATE
!  ->  Sort
 Output: t2.c1, t2.*
!Sort Key: t2.c1
!->  Foreign Scan on public.ft2 t2
!  Output: t2.c1, t2.*
!  Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1"
! (23 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR UPDATE OF t1;
   c1  | c1  
--- 1519,1534 
 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1" ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR UPDATE OF r1
!->  Nested Loop
   Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!  Join Filter: (t1.c1 = t2.c1)
!  ->  Foreign Scan on public.ft1 t1
 Output: t1.c1, t1.c3, t1.*
!Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR UPDATE
!  ->  Foreign Scan on public.ft2 t2
 Output: t2.c1, t2.*
!Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1"
! (17 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR UPDATE OF t1;
   c1  | c1  
***
*** 1563,1584  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t
 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL 

Re: [HACKERS] postgres_fdw bug in 9.6

2016-12-19 Thread Tom Lane
Etsuro Fujita  writes:
> On 2016/12/17 1:13, Tom Lane wrote:
>> So I think the rule could be

>> "When first asked to produce a path for a given foreign joinrel, collect
>> the cheapest paths for its left and right inputs, and make a nestloop path
>> (or hashjoin path, if full join) from those, using the join quals needed
>> for the current input relation pair.

> Seems reasonable.

>> Use this as the fdw_outerpath for
>> all foreign paths made for the joinrel."

> I'm not sure that would work well for foreign joins with sort orders. 
> Consider a merge join, whose left input is a 2-way foreign join with a 
> sort order that implements a full join and whose right input is a sorted 
> local table scan.  If the EPQ subplan for the foreign join wouldn't 
> produce the right sort order, the merge join might break during EPQ 
> rechecks (note that in this case the EPQ subplan for the foreign join 
> might produce more than a single row during an EPQ recheck).

How so?  We only recheck one row at a time, therefore it can be claimed to
have any sort order you care about.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw bug in 9.6

2016-12-19 Thread Etsuro Fujita

On 2016/12/19 13:59, Ashutosh Bapat wrote:

On Fri, Dec 16, 2016 at 9:43 PM, Tom Lane  wrote:

Etsuro Fujita  writes:

On 2016/12/16 11:25, Etsuro Fujita wrote:

As I said upthread, an alternative I am thinking is (1) to create an
equivalent nestloop join path using inner/outer paths of a foreign join
path, except when that join path implements a full join, in which case a
merge/hash join path is used, (2) store it in fdw_outerpath as-is, and
(3) during an EPQ recheck, apply postgresRecheckForeignScan to the outer
subplan created from the fdw_outerpath as-is.  What do you think about
that?



Let me explain about #1 and #2 a bit more.  What I have in mind is:



* modify postgresGetForeignPaths so that a simple foreign table scan
path is stored into the base relation's PgFdwRelationInfo.
* modify postgresGetForeignJoinPaths so that
 (a) a local join path for a 2-way foreign join is created using
simple foreign table scan paths stored in the base relations'
PgFdwRelationInfos, and stored into the join relation's PgFdwRelationInfo.
 (b) a local join path for a 3-way foreign join, whose left input is
a 2-way foreign join, is created using a local join path stored in the
left input join relation's PgFdwRelationInfo and a simple foreign table
scan path stored into the right input base relation's PgFdwRelationInfo.
 (c) Likewise for higher level foreign joins.
 (d) local join paths created are passed to create_foreignscan_path
and stored into the fdw_outerpaths of the resulting ForeignPahts.



Hm, isn't this overcomplicated?  As you said earlier, we don't really
care all that much whether the fdw_outerpath is free of lower foreign
joins, because EPQ setup will select those lower join's substitute EPQ
plans anyway.  All that we need is that the EPQ tree be a legal
implementation of the join order with join quals applied at the right
places.  So I think the rule could be

"When first asked to produce a path for a given foreign joinrel, collect
the cheapest paths for its left and right inputs, and make a nestloop path
(or hashjoin path, if full join) from those, using the join quals needed
for the current input relation pair.  Use this as the fdw_outerpath for
all foreign paths made for the joinrel."



We could use fdw_outerpath of the left and right inputs instead of
looking for the cheapest paths. For base relations as left or right
relations, use the unparameterized cheapest foreign path. This will
ensure that all joins in fdw_outerpath are local joins, making EPQ
checks slightly efficient by avoiding redirection at every foreign
path.


That seems close to what I had in mind described above.  One additional 
work required for that would to store the fdw_outerpath into the 
RelOptInfo's fdw_private such as PgFdwRelationInfo before add_path for 
the foreign-join path containing the fdw_outerpath, because add_path 
might reject the foreign-join path.  I think the additional work would 
make that rather complicated.


Best regards,
Etsuro Fujita




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw bug in 9.6

2016-12-18 Thread Ashutosh Bapat
On Fri, Dec 16, 2016 at 9:43 PM, Tom Lane  wrote:
> Etsuro Fujita  writes:
>> On 2016/12/16 11:25, Etsuro Fujita wrote:
>>> As I said upthread, an alternative I am thinking is (1) to create an
>>> equivalent nestloop join path using inner/outer paths of a foreign join
>>> path, except when that join path implements a full join, in which case a
>>> merge/hash join path is used, (2) store it in fdw_outerpath as-is, and
>>> (3) during an EPQ recheck, apply postgresRecheckForeignScan to the outer
>>> subplan created from the fdw_outerpath as-is.  What do you think about
>>> that?
>
>> Let me explain about #1 and #2 a bit more.  What I have in mind is:
>
>> * modify postgresGetForeignPaths so that a simple foreign table scan
>> path is stored into the base relation's PgFdwRelationInfo.
>> * modify postgresGetForeignJoinPaths so that
>>  (a) a local join path for a 2-way foreign join is created using
>> simple foreign table scan paths stored in the base relations'
>> PgFdwRelationInfos, and stored into the join relation's PgFdwRelationInfo.
>>  (b) a local join path for a 3-way foreign join, whose left input is
>> a 2-way foreign join, is created using a local join path stored in the
>> left input join relation's PgFdwRelationInfo and a simple foreign table
>> scan path stored into the right input base relation's PgFdwRelationInfo.
>>  (c) Likewise for higher level foreign joins.
>>  (d) local join paths created are passed to create_foreignscan_path
>> and stored into the fdw_outerpaths of the resulting ForeignPahts.
>
> Hm, isn't this overcomplicated?  As you said earlier, we don't really
> care all that much whether the fdw_outerpath is free of lower foreign
> joins, because EPQ setup will select those lower join's substitute EPQ
> plans anyway.  All that we need is that the EPQ tree be a legal
> implementation of the join order with join quals applied at the right
> places.  So I think the rule could be
>
> "When first asked to produce a path for a given foreign joinrel, collect
> the cheapest paths for its left and right inputs, and make a nestloop path
> (or hashjoin path, if full join) from those, using the join quals needed
> for the current input relation pair.  Use this as the fdw_outerpath for
> all foreign paths made for the joinrel."

We could use fdw_outerpath of the left and right inputs instead of
looking for the cheapest paths. For base relations as left or right
relations, use the unparameterized cheapest foreign path. This will
ensure that all joins in fdw_outerpath are local joins, making EPQ
checks slightly efficient by avoiding redirection at every foreign
path.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw bug in 9.6

2016-12-18 Thread Etsuro Fujita

On 2016/12/17 1:13, Tom Lane wrote:

Etsuro Fujita  writes:

On 2016/12/16 11:25, Etsuro Fujita wrote:

As I said upthread, an alternative I am thinking is (1) to create an
equivalent nestloop join path using inner/outer paths of a foreign join
path, except when that join path implements a full join, in which case a
merge/hash join path is used, (2) store it in fdw_outerpath as-is, and
(3) during an EPQ recheck, apply postgresRecheckForeignScan to the outer
subplan created from the fdw_outerpath as-is.  What do you think about
that?



Let me explain about #1 and #2 a bit more.  What I have in mind is:



* modify postgresGetForeignPaths so that a simple foreign table scan
path is stored into the base relation's PgFdwRelationInfo.
* modify postgresGetForeignJoinPaths so that
 (a) a local join path for a 2-way foreign join is created using
simple foreign table scan paths stored in the base relations'
PgFdwRelationInfos, and stored into the join relation's PgFdwRelationInfo.
 (b) a local join path for a 3-way foreign join, whose left input is
a 2-way foreign join, is created using a local join path stored in the
left input join relation's PgFdwRelationInfo and a simple foreign table
scan path stored into the right input base relation's PgFdwRelationInfo.
 (c) Likewise for higher level foreign joins.
 (d) local join paths created are passed to create_foreignscan_path
and stored into the fdw_outerpaths of the resulting ForeignPahts.



Hm, isn't this overcomplicated?  As you said earlier, we don't really
care all that much whether the fdw_outerpath is free of lower foreign
joins, because EPQ setup will select those lower join's substitute EPQ
plans anyway.  All that we need is that the EPQ tree be a legal
implementation of the join order with join quals applied at the right
places.


Exactly.  I thought the EPQ trees without lower foreign joins would be 
better because that would be easier to understand.



So I think the rule could be



"When first asked to produce a path for a given foreign joinrel, collect
the cheapest paths for its left and right inputs, and make a nestloop path
(or hashjoin path, if full join) from those, using the join quals needed
for the current input relation pair.


Seems reasonable.


Use this as the fdw_outerpath for
all foreign paths made for the joinrel."


I'm not sure that would work well for foreign joins with sort orders. 
Consider a merge join, whose left input is a 2-way foreign join with a 
sort order that implements a full join and whose right input is a sorted 
local table scan.  If the EPQ subplan for the foreign join wouldn't 
produce the right sort order, the merge join might break during EPQ 
rechecks (note that in this case the EPQ subplan for the foreign join 
might produce more than a single row during an EPQ recheck).  So, I 
think we would need to add an explicit sort to the fdw_outerpath for the 
foreign join.



The important point here is that we avoid using a merge join because that
has assumptions about input ordering that likely won't be satisfied by
the child paths chosen through this method.  (I guess you could fall back
to it for the case of no quals in a fulljoin, because then the ordering
assumptions are vacuous anyway.)


I agree on that point.  I'll create a patch.

Best regards,
Etsuro Fujita




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw bug in 9.6

2016-12-16 Thread Tom Lane
Etsuro Fujita  writes:
> On 2016/12/16 11:25, Etsuro Fujita wrote:
>> As I said upthread, an alternative I am thinking is (1) to create an
>> equivalent nestloop join path using inner/outer paths of a foreign join
>> path, except when that join path implements a full join, in which case a
>> merge/hash join path is used, (2) store it in fdw_outerpath as-is, and
>> (3) during an EPQ recheck, apply postgresRecheckForeignScan to the outer
>> subplan created from the fdw_outerpath as-is.  What do you think about
>> that?

> Let me explain about #1 and #2 a bit more.  What I have in mind is:

> * modify postgresGetForeignPaths so that a simple foreign table scan 
> path is stored into the base relation's PgFdwRelationInfo.
> * modify postgresGetForeignJoinPaths so that
>  (a) a local join path for a 2-way foreign join is created using 
> simple foreign table scan paths stored in the base relations' 
> PgFdwRelationInfos, and stored into the join relation's PgFdwRelationInfo.
>  (b) a local join path for a 3-way foreign join, whose left input is 
> a 2-way foreign join, is created using a local join path stored in the 
> left input join relation's PgFdwRelationInfo and a simple foreign table 
> scan path stored into the right input base relation's PgFdwRelationInfo.
>  (c) Likewise for higher level foreign joins.
>  (d) local join paths created are passed to create_foreignscan_path 
> and stored into the fdw_outerpaths of the resulting ForeignPahts.

Hm, isn't this overcomplicated?  As you said earlier, we don't really
care all that much whether the fdw_outerpath is free of lower foreign
joins, because EPQ setup will select those lower join's substitute EPQ
plans anyway.  All that we need is that the EPQ tree be a legal
implementation of the join order with join quals applied at the right
places.  So I think the rule could be

"When first asked to produce a path for a given foreign joinrel, collect
the cheapest paths for its left and right inputs, and make a nestloop path
(or hashjoin path, if full join) from those, using the join quals needed
for the current input relation pair.  Use this as the fdw_outerpath for
all foreign paths made for the joinrel."

The important point here is that we avoid using a merge join because that
has assumptions about input ordering that likely won't be satisfied by
the child paths chosen through this method.  (I guess you could fall back
to it for the case of no quals in a fulljoin, because then the ordering
assumptions are vacuous anyway.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw bug in 9.6

2016-12-16 Thread Etsuro Fujita

On 2016/12/16 11:25, Etsuro Fujita wrote:

As I said upthread, an alternative I am thinking is (1) to create an
equivalent nestloop join path using inner/outer paths of a foreign join
path, except when that join path implements a full join, in which case a
merge/hash join path is used, (2) store it in fdw_outerpath as-is, and
(3) during an EPQ recheck, apply postgresRecheckForeignScan to the outer
subplan created from the fdw_outerpath as-is.  What do you think about
that?


Let me explain about #1 and #2 a bit more.  What I have in mind is:

* modify postgresGetForeignPaths so that a simple foreign table scan 
path is stored into the base relation's PgFdwRelationInfo.

* modify postgresGetForeignJoinPaths so that
(a) a local join path for a 2-way foreign join is created using 
simple foreign table scan paths stored in the base relations' 
PgFdwRelationInfos, and stored into the join relation's PgFdwRelationInfo.
(b) a local join path for a 3-way foreign join, whose left input is 
a 2-way foreign join, is created using a local join path stored in the 
left input join relation's PgFdwRelationInfo and a simple foreign table 
scan path stored into the right input base relation's PgFdwRelationInfo.

(c) Likewise for higher level foreign joins.
(d) local join paths created are passed to create_foreignscan_path 
and stored into the fdw_outerpaths of the resulting ForeignPahts.


I think that that would need special handling for foreign joins with 
sort orders; add an explicit sort to the local join paths, if needed.


I am thinking to provide a helper function that creates a local join 
path for (a), (b), and (c).


Best regards,
Etsuro Fujita




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw bug in 9.6

2016-12-15 Thread Etsuro Fujita

On 2016/12/16 1:39, Tom Lane wrote:

Etsuro Fujita  writes:

On 2016/12/13 23:13, Ashutosh Bapat wrote:

A possible short-term fix may be this: instead of picking any random
path to stick into fdw_outerpath, we choose a path which covers the
pathkeys of ForeignPath.



Seems reasonable.



No, because GetExistingLocalJoinPath is called once per relation not once
per path.  Even if we were willing to eat the overhead of calling it once
per path, we'd have to give up considering foreign paths with sort orders
that there wasn't any cheap way to produce locally.


Hmm, I agree on that point that giving it up might result in a bad plan.

As I said upthread, an alternative I am thinking is (1) to create an 
equivalent nestloop join path using inner/outer paths of a foreign join 
path, except when that join path implements a full join, in which case a 
merge/hash join path is used, (2) store it in fdw_outerpath as-is, and 
(3) during an EPQ recheck, apply postgresRecheckForeignScan to the outer 
subplan created from the fdw_outerpath as-is.  What do you think about that?


Best regards,
Etsuro Fujita




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw bug in 9.6

2016-12-15 Thread Tom Lane
Etsuro Fujita  writes:
> On 2016/12/13 23:13, Ashutosh Bapat wrote:
>> On Tue, Dec 13, 2016 at 4:15 AM, Tom Lane  wrote:
>>> One way that we could make things better is to rely on the knowledge
>>> that EPQ isn't asked to evaluate joins for more than one row per input
>>> relation, and therefore using merge or hash join technology is really
>>> overkill.  We could make a tree of simple nestloop joins, which aren't
>>> going to care about sort order, if we could identify the correct join
>>> clauses to apply.

>> I am not able to understand how this strategy of applying all join
>> clauses on top of cross product would work if there are OUTER joins
>> involved. Wouldn't nullable sides change the result?

> I'm not able to understand that, either.

Yeah, you'd have to reproduce the outer join structure if any.

>> A possible short-term fix may be this: instead of picking any random
>> path to stick into fdw_outerpath, we choose a path which covers the
>> pathkeys of ForeignPath.

> Seems reasonable.

No, because GetExistingLocalJoinPath is called once per relation not once
per path.  Even if we were willing to eat the overhead of calling it once
per path, we'd have to give up considering foreign paths with sort orders
that there wasn't any cheap way to produce locally.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw bug in 9.6

2016-12-15 Thread Etsuro Fujita

On 2016/12/13 23:13, Ashutosh Bapat wrote:

On Tue, Dec 13, 2016 at 4:15 AM, Tom Lane  wrote:

I believe there are probably more problems here, or at least if there
aren't, it's not clear why not.  Because of GetExistingLocalJoinPath's
lack of curiosity about what's underneath the join pathnode it picks,
it seems to me that it's possible for it to return a path tree that
*isn't* all local joins.  If we're looking at, say, a hash path for
a 4-way join, whose left input is a hash path for a 3-way join, whose
left input is a 2-way foreign join, what's stopping that from being
returned as a "local" path tree?



Right, the so called "local join tree" can contain ForeignPath
representing joins between foreign relations. AFAIU what protects us
from getting into problem is that postgresRecheckForeignScan calls
ExecProcNode() on the outer plan. So, even if there are ForeignPaths
in fdw_outerpath tree, corresponding plans would use a local join plan
to apply EPQ recheck. The code in GetExistingLocalJoinPath() avoids
the redirection at least for the inner or outer ForeignPaths.


I think Ashutosh is right.


Any
comment claiming that path tree under fdw_outerpath is entirely
"local" should be removed, if it survives the fix.


+1


Likewise, it seems like the code is trying to reject any custom-path join
types, or at least this barely-intelligible comment seems to imply that:

 * Just skip anything else. We don't know if corresponding
 * plan would build the output row from whole-row references
 * of base relations and execute the EPQ checks.

But this coding fails to notice any such join type that's below the
level of the immediate two join inputs.


I wrote the first version of GetExistingLocalJoinPath (and 
postgresRecheckForeignScan), to verify that the changes to core for 
pushing down foreign/custom joins in 9.5 would work well for EPQ 
rechecks.  And I rejected custom paths in that version because I 
intended to use that function not only for foreign joins but custom 
joins.  (IIRC, for 9.6, I proposed to put GetExistingLocalJoinPath in a 
more common area such as src/backend/optimizer/path/joinpath.c, but that 
was ignored...)



I kind of wonder why this infrastructure exists at all; it's not the way
I'd have foreseen handling EPQ for remote joins.  However, while "throw
it away and start again" might be good advice going forward, I suppose
it won't be very popular for applying to 9.6.

One way that we could make things better is to rely on the knowledge
that EPQ isn't asked to evaluate joins for more than one row per input
relation, and therefore using merge or hash join technology is really
overkill.  We could make a tree of simple nestloop joins, which aren't
going to care about sort order, if we could identify the correct join
clauses to apply.  At least some of that could be laid off on the FDW,
which if it's gotten this far at all, ought to know what join clauses
need to be enforced by the foreign join.  So I'm thinking a little bit
in terms of "just collect the foreign scans for all the base rels
included in the join and then build a cross-product nestloop join tree,
applying all the join clauses at the top".  This would have the signal
value that it's guaranteed to succeed and so can be left for later,
rather than having to expensively redo it at each level of join planning.



I am not able to understand how this strategy of applying all join
clauses on top of cross product would work if there are OUTER joins
involved. Wouldn't nullable sides change the result?


I'm not able to understand that, either.


(Hm, that does sound a lot like "throw it away and start again", doesn't
it.  But what we've got here is busted enough that I'm not sure there
are good alternatives.  Maybe for 9.6 we could just rewrite
GetExistingLocalJoinPath, and limp along doing a lot of redundant
computation during planning.)


An alternative I was thinking was (1) to create an equivalent nestloop 
join path for each foreign/custom join path and store it in 
fdw_outerpath as-is, except when that join path implements a full join, 
in which case a merge or hash join path is created, and (2) to apply 
postgresRecheckForeignScan to the outer subplan created from the 
fdw_outerpath when executing EPQ rechecks.  So, I was thinking to 
provide a helper function that creates the equivalent local join path 
for a given foreign/custom join path in #1.



A possible short-term fix may be this: instead of picking any random
path to stick into fdw_outerpath, we choose a path which covers the
pathkeys of ForeignPath. If postgres_fdw was able to come up with a
ForeignPath with those pathkeys, there is high chance that there
exists a local path with those pathkeys. Since we are yet to call
add_path() with the ForeignPath being built, that local path should
exist in the path list. Stick that path in fdw_outerpath. If such a
path doesn't exist, return NULL from 

Re: [HACKERS] postgres_fdw bug in 9.6

2016-12-13 Thread Robert Haas
On Mon, Dec 12, 2016 at 5:45 PM, Tom Lane  wrote:
> One way that we could make things better is to rely on the knowledge
> that EPQ isn't asked to evaluate joins for more than one row per input
> relation, and therefore using merge or hash join technology is really
> overkill.  We could make a tree of simple nestloop joins, which aren't
> going to care about sort order, if we could identify the correct join
> clauses to apply.  At least some of that could be laid off on the FDW,
> which if it's gotten this far at all, ought to know what join clauses
> need to be enforced by the foreign join.  So I'm thinking a little bit
> in terms of "just collect the foreign scans for all the base rels
> included in the join and then build a cross-product nestloop join tree,
> applying all the join clauses at the top".  This would have the signal
> value that it's guaranteed to succeed and so can be left for later,
> rather than having to expensively redo it at each level of join planning.
>
> (Hm, that does sound a lot like "throw it away and start again", doesn't
> it.  But what we've got here is busted enough that I'm not sure there
> are good alternatives.  Maybe for 9.6 we could just rewrite
> GetExistingLocalJoinPath, and limp along doing a lot of redundant
> computation during planning.)

I think there was an earlier version of the patch that actually built
up NestLoop joins as it went, but at some point (possibly at my
suggestion) it got rewritten to try to fish out existing paths
instead.  Apparently, the other idea was better.

> BTW, what's "existing" about the result of GetExistingLocalJoinPath?

It's an existing path - i.e. already added to the RelOptInfo - to
perform the join locally.

> And for that matter, what's "outer" about the content of fdw_outerpath?

It's got the same meaning here that "outer path" does in general.
Right now, a Foreign Scan only has an outer subpath if it needs one
for EPQ rechecks, but in theory I suppose it could be used for
something else; every Plan has a lefttree, whether or not we choose to
populate it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw bug in 9.6

2016-12-13 Thread Ashutosh Bapat
On Tue, Dec 13, 2016 at 4:15 AM, Tom Lane  wrote:
> Jeff Janes  writes:
>> I have a test case where I made the fdw connect back to itself, and
>> stripped out all the objects that I could and still reproduce the case.  It
>> is large, 21MB compressed, 163MB uncompressed, so I am linking it here:
>> https://drive.google.com/open?id=0Bzqrh1SO9FcEZkpPM0JwUEMzcmc
>
> Thanks for the test case.  I believe I understand the fundamental problem,
> or one of the fundamental problems --- I'm not exactly convinced there
> aren't several here.
>
> The key issue is that GetExistingLocalJoinPath believes it can return
> any random join path as what to use for the fdw_outerpath of a foreign
> join.  You can get away with that, perhaps, as long as you only consider
> two-way joins.  But in a nest of foreign joins, this fails to consider
> the interrelationships of join paths and their children.  In particular,
> what we have happening in this example is that GetExistingLocalJoinPath
> seizes randomly on a MergePath for an upper join relation, clones it,
> sees that the outer child is a join ForeignPath, and replaces that outer
> child with its fdw_outerpath ... which was chosen equally cavalierly by
> some previous execution of GetExistingLocalJoinPath, and does not have
> the sort ordering expected by the MergePath.  So we'd generate an invalid
> EPQ plan, except that the debug cross-check in create_mergejoin_plan
> notices the discrepancy.

Thanks for the detailed explanation.

>
> I believe there are probably more problems here, or at least if there
> aren't, it's not clear why not.  Because of GetExistingLocalJoinPath's
> lack of curiosity about what's underneath the join pathnode it picks,
> it seems to me that it's possible for it to return a path tree that
> *isn't* all local joins.  If we're looking at, say, a hash path for
> a 4-way join, whose left input is a hash path for a 3-way join, whose
> left input is a 2-way foreign join, what's stopping that from being
> returned as a "local" path tree?

Right, the so called "local join tree" can contain ForeignPath
representing joins between foreign relations. AFAIU what protects us
from getting into problem is that postgresRecheckForeignScan calls
ExecProcNode() on the outer plan. So, even if there are ForeignPaths
in fdw_outerpath tree, corresponding plans would use a local join plan
to apply EPQ recheck. The code in GetExistingLocalJoinPath() avoids
the redirection at least for the inner or outer ForeignPaths. Any
comment claiming that path tree under fdw_outerpath is entirely
"local" should be removed, if it survives the fix.

>
> Likewise, it seems like the code is trying to reject any custom-path join
> types, or at least this barely-intelligible comment seems to imply that:
>
>  * Just skip anything else. We don't know if corresponding
>  * plan would build the output row from whole-row references
>  * of base relations and execute the EPQ checks.
>
> But this coding fails to notice any such join type that's below the
> level of the immediate two join inputs.

Similarly, here we assume that any custom plan would tell us whether
the given row survives EPQ recheck. As long as a custom plan doing
that correctly, it shouldn't be a problem.

>
> We've probably managed to not notice this so far because foreign joins
> generally ought to dominate any local join method, so that there wouldn't
> often be cases where the surviving paths use local joins for input
> sub-joins.  But Jeff's test case proves it can happen.
>
> I kind of wonder why this infrastructure exists at all; it's not the way
> I'd have foreseen handling EPQ for remote joins.  However, while "throw
> it away and start again" might be good advice going forward, I suppose
> it won't be very popular for applying to 9.6.
>
> One way that we could make things better is to rely on the knowledge
> that EPQ isn't asked to evaluate joins for more than one row per input
> relation, and therefore using merge or hash join technology is really
> overkill.  We could make a tree of simple nestloop joins, which aren't
> going to care about sort order, if we could identify the correct join
> clauses to apply.  At least some of that could be laid off on the FDW,
> which if it's gotten this far at all, ought to know what join clauses
> need to be enforced by the foreign join.  So I'm thinking a little bit
> in terms of "just collect the foreign scans for all the base rels
> included in the join and then build a cross-product nestloop join tree,
> applying all the join clauses at the top".  This would have the signal
> value that it's guaranteed to succeed and so can be left for later,
> rather than having to expensively redo it at each level of join planning.

I am not able to understand how this strategy of applying all join
clauses on top of cross product would work if there are OUTER joins
involved. Wouldn't nullable sides change 

Re: [HACKERS] postgres_fdw bug in 9.6

2016-12-12 Thread Tom Lane
Jeff Janes  writes:
> I have a test case where I made the fdw connect back to itself, and
> stripped out all the objects that I could and still reproduce the case.  It
> is large, 21MB compressed, 163MB uncompressed, so I am linking it here:
> https://drive.google.com/open?id=0Bzqrh1SO9FcEZkpPM0JwUEMzcmc

Thanks for the test case.  I believe I understand the fundamental problem,
or one of the fundamental problems --- I'm not exactly convinced there
aren't several here.

The key issue is that GetExistingLocalJoinPath believes it can return
any random join path as what to use for the fdw_outerpath of a foreign
join.  You can get away with that, perhaps, as long as you only consider
two-way joins.  But in a nest of foreign joins, this fails to consider
the interrelationships of join paths and their children.  In particular,
what we have happening in this example is that GetExistingLocalJoinPath
seizes randomly on a MergePath for an upper join relation, clones it,
sees that the outer child is a join ForeignPath, and replaces that outer
child with its fdw_outerpath ... which was chosen equally cavalierly by
some previous execution of GetExistingLocalJoinPath, and does not have
the sort ordering expected by the MergePath.  So we'd generate an invalid
EPQ plan, except that the debug cross-check in create_mergejoin_plan
notices the discrepancy.

I believe there are probably more problems here, or at least if there
aren't, it's not clear why not.  Because of GetExistingLocalJoinPath's
lack of curiosity about what's underneath the join pathnode it picks,
it seems to me that it's possible for it to return a path tree that
*isn't* all local joins.  If we're looking at, say, a hash path for
a 4-way join, whose left input is a hash path for a 3-way join, whose
left input is a 2-way foreign join, what's stopping that from being
returned as a "local" path tree?

Likewise, it seems like the code is trying to reject any custom-path join
types, or at least this barely-intelligible comment seems to imply that:

 * Just skip anything else. We don't know if corresponding
 * plan would build the output row from whole-row references
 * of base relations and execute the EPQ checks.

But this coding fails to notice any such join type that's below the
level of the immediate two join inputs.

We've probably managed to not notice this so far because foreign joins
generally ought to dominate any local join method, so that there wouldn't
often be cases where the surviving paths use local joins for input
sub-joins.  But Jeff's test case proves it can happen.

I kind of wonder why this infrastructure exists at all; it's not the way
I'd have foreseen handling EPQ for remote joins.  However, while "throw
it away and start again" might be good advice going forward, I suppose
it won't be very popular for applying to 9.6.

One way that we could make things better is to rely on the knowledge
that EPQ isn't asked to evaluate joins for more than one row per input
relation, and therefore using merge or hash join technology is really
overkill.  We could make a tree of simple nestloop joins, which aren't
going to care about sort order, if we could identify the correct join
clauses to apply.  At least some of that could be laid off on the FDW,
which if it's gotten this far at all, ought to know what join clauses
need to be enforced by the foreign join.  So I'm thinking a little bit
in terms of "just collect the foreign scans for all the base rels
included in the join and then build a cross-product nestloop join tree,
applying all the join clauses at the top".  This would have the signal
value that it's guaranteed to succeed and so can be left for later,
rather than having to expensively redo it at each level of join planning.

(Hm, that does sound a lot like "throw it away and start again", doesn't
it.  But what we've got here is busted enough that I'm not sure there
are good alternatives.  Maybe for 9.6 we could just rewrite
GetExistingLocalJoinPath, and limp along doing a lot of redundant
computation during planning.)

BTW, what's "existing" about the result of GetExistingLocalJoinPath?
And for that matter, what's "outer" about the content of fdw_outerpath?
Good luck figuring that out from the documentation of the field, all
zero words of it.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw bug in 9.6

2016-12-10 Thread Jeff Janes
On Thu, Dec 8, 2016 at 10:28 AM, Tom Lane  wrote:

> Robert Haas  writes:
> > Maybe it would help for Jeff to use elog_node_display() to the nodes
> > that are causing the problem - e.g. outerpathkeys and innerpathkeys
> > and best_path->path_mergeclauses, or just best_path - at the point
> > where the error is thrown. That might give us enough information to
> > see what's broken.
>
> I'll be astonished if that's sufficient evidence.  We already know that
> the problem is that the input path doesn't claim to be sorted in a way
> that would match the merge clauses, but that doesn't tell us how such
> a path came to be generated (or, if it wasn't intentionally done, where
> the data structure got clobbered later).
>
> It's possible that setting a breakpoint at create_mergejoin_path and
> capturing stack traces for all calls would yield usable insight.  But
> there are likely to be lots of calls if this is an 8-way join query,
> and probably only a few are wrong.
>
> I'd much rather have a test case than try to debug this remotely.
> Bandwidth too low.
>

I didn't get an asserts on the assert-enabled build.

I have a test case where I made the fdw connect back to itself, and
stripped out all the objects that I could and still reproduce the case.  It
is large, 21MB compressed, 163MB uncompressed, so I am linking it here:

https://drive.google.com/open?id=0Bzqrh1SO9FcEZkpPM0JwUEMzcmc

When running this against and default configured 9.6.1 or 10devel-a73491e,
I get the error.

psql -U postgres -f <(xzcat combined_anon.sql.xz)

...

ERROR:  outer pathkeys do not match mergeclauses
STATEMENT:  explain update local1 set local19 = jj_join.local19 from
jj_join where local1.local12=jj_join.local12  and local1.local12 in
('ddd','bbb','aaa','abc');

Cheers,

Jeff


Re: [HACKERS] postgres_fdw bug in 9.6

2016-12-08 Thread Tom Lane
Robert Haas  writes:
> Maybe it would help for Jeff to use elog_node_display() to the nodes
> that are causing the problem - e.g. outerpathkeys and innerpathkeys
> and best_path->path_mergeclauses, or just best_path - at the point
> where the error is thrown. That might give us enough information to
> see what's broken.

I'll be astonished if that's sufficient evidence.  We already know that
the problem is that the input path doesn't claim to be sorted in a way
that would match the merge clauses, but that doesn't tell us how such
a path came to be generated (or, if it wasn't intentionally done, where
the data structure got clobbered later).

It's possible that setting a breakpoint at create_mergejoin_path and
capturing stack traces for all calls would yield usable insight.  But
there are likely to be lots of calls if this is an 8-way join query,
and probably only a few are wrong.

I'd much rather have a test case than try to debug this remotely.
Bandwidth too low.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw bug in 9.6

2016-12-08 Thread Robert Haas
On Thu, Dec 8, 2016 at 12:50 PM, Tom Lane  wrote:
> Jeff Janes  writes:
>> I have a DML statement which triggers the error:
>> ERROR:  XX000: outer pathkeys do not match mergeclauses
>> LOCATION:  create_mergejoin_plan, createplan.c:3722
>
> Hmm.
>
>> Any tips on investigating this further in situ?  Or is the best option just
>> to work harder on a minimal and disclosable test case?
>
> I think we need a test case --- not minimal necessarily, but something
> other people can reproduce.  You might find that setting enable_hashjoin
> and/or enable_nestloop to false makes it easier to provoke the error,
> since evidently this requires that we (a) generate a faulty mergejoin Path
> and then (b) choose it as the cheapest one, since the error occurs while
> converting it to a Plan.

Maybe it would help for Jeff to use elog_node_display() to the nodes
that are causing the problem - e.g. outerpathkeys and innerpathkeys
and best_path->path_mergeclauses, or just best_path - at the point
where the error is thrown. That might give us enough information to
see what's broken.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw bug in 9.6

2016-12-08 Thread Tom Lane
Jeff Janes  writes:
> I have a DML statement which triggers the error:
> ERROR:  XX000: outer pathkeys do not match mergeclauses
> LOCATION:  create_mergejoin_plan, createplan.c:3722

Hmm.

> Any tips on investigating this further in situ?  Or is the best option just
> to work harder on a minimal and disclosable test case?

I think we need a test case --- not minimal necessarily, but something
other people can reproduce.  You might find that setting enable_hashjoin
and/or enable_nestloop to false makes it easier to provoke the error,
since evidently this requires that we (a) generate a faulty mergejoin Path
and then (b) choose it as the cheapest one, since the error occurs while
converting it to a Plan.

BTW, if you're not doing this in a debug (--enable-cassert) build, it'd
be useful to try it in one.  I'm a little suspicious that the root cause
might be a memory-stomp type of problem, ie somebody scribbling on a
pathkey data structure without accounting for it being shared with another
path.  It's possible that cassert memory checking would help catch that.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers