Re: [HACKERS] postgres_fdw bug in 9.6

2018-01-30 Thread Etsuro Fujita

(2018/01/31 4:56), Robert Haas wrote:

On Sat, Jan 20, 2018 at 12:20 AM, Tom Lane  wrote:

It looks like Etsuro-san's proposed patch locks down the choice of
plan more tightly, which is probably a reasonable answer.


OK, committed.  I added a comment along the lines you suggested
previously, since this no longer seems like a generic test that
happens to involve a bunch of merge joins.


Thank you!

Best regards,
Etsuro Fujita



Re: [HACKERS] postgres_fdw bug in 9.6

2018-01-30 Thread Robert Haas
On Sat, Jan 20, 2018 at 12:20 AM, Tom Lane  wrote:
> It looks like Etsuro-san's proposed patch locks down the choice of
> plan more tightly, which is probably a reasonable answer.

OK, committed.  I added a comment along the lines you suggested
previously, since this no longer seems like a generic test that
happens to involve a bunch of merge joins.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] postgres_fdw bug in 9.6

2018-01-19 Thread Tom Lane
Robert Haas  writes:
> Well, I'm a little stumped.  When I do it the way you did it, it fails
> with the same error you got:

> contrib_regression=# EXPLAIN (VERBOSE, COSTS OFF)
> SELECT * FROM ft1, ft2, ft4, ft5 WHERE ft1.c1 = ft2.c1 AND ft1.c1 = ft4.c1
> AND ft1.c1 = ft5.c1 FOR UPDATE;
> ERROR:  outer pathkeys do not match mergeclauses

> But when I add the same command to postgres_fdw.sql and run it as part
> of the regression tests, it works.  I tried adding it at the end with
> \c beforehand so that there wouldn't be any temporary settings in
> effect.

I duplicated those results, and then added an ANALYZE, and then it fails:

***
*** 7548,7550 
--- 7595,7604 
  (4 rows)
  
  RESET enable_partition_wise_join;
+ \c -
+ analyze "S 1"."T 1";
+ -- multi-way join involving multiple merge joins
+ EXPLAIN (VERBOSE, COSTS OFF)
+ SELECT * FROM ft1, ft2, ft4, ft5 WHERE ft1.c1 = ft2.c1 AND ft1.c1 = ft4.c1
+ AND ft1.c1 = ft5.c1 FOR UPDATE;
+ ERROR:  outer pathkeys do not match mergeclauses

So apparently the reason our manual tests replicated the failure is
that an auto-analyze happened in between.  Now that I realize that,
I find that the manual query doesn't fail if executed *immediately*
after the regression test run.  Wait a minute, repeat, it does fail.

OTOH, put the same ANALYZE before the test case where it is in the
script, no change.

So the contributing factors here are (1) there are a bunch of data
changes to "S 1"."T 1" further down in the script than where you
added the test case, and (2) you need an auto-analyze to have seen
those changes before the problematic plan gets picked.

It looks like Etsuro-san's proposed patch locks down the choice of
plan more tightly, which is probably a reasonable answer.

regards, tom lane



Re: [HACKERS] postgres_fdw bug in 9.6

2018-01-19 Thread Robert Haas
On Fri, Jan 19, 2018 at 10:40 AM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Fri, Jan 19, 2018 at 1:53 AM, Etsuro Fujita
>>  wrote:
>>> I noticed that this test case added by the patch is not appropriate:
>>> because it doesn't inject extra Sort nodes into EPQ recheck plans, so it
>>> works well without the fix.  I modified this to inject a Sort into the
>>> recheck plan of the very first foreign join.  Attached is a patch for that.
>
>> Mumble.  Tom provided me with that example and said it failed without
>> the patch.  I didn't check, I just believed him.  But I'm surprised if
>> he was wrong; Tom usually tries to avoid being wrong...
>
> Hm.  It did fail as advertised when I connected to the contrib_regression
> database (after installcheck) and entered the query by hand; I
> copied-and-pasted the result of that to show you.  It's possible that it
> would not have failed in the particular spot where you chose to insert it
> in the regression script, if for example there were nondefault planner GUC
> settings active at that spot.  Did you check that the script produced the
> expected failure against unpatched code?

No.  I guess I'll have to go debug this.  Argh.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] postgres_fdw bug in 9.6

2018-01-19 Thread Tom Lane
Robert Haas  writes:
> On Fri, Jan 19, 2018 at 1:53 AM, Etsuro Fujita
>  wrote:
>> I noticed that this test case added by the patch is not appropriate:
>> because it doesn't inject extra Sort nodes into EPQ recheck plans, so it
>> works well without the fix.  I modified this to inject a Sort into the
>> recheck plan of the very first foreign join.  Attached is a patch for that.

> Mumble.  Tom provided me with that example and said it failed without
> the patch.  I didn't check, I just believed him.  But I'm surprised if
> he was wrong; Tom usually tries to avoid being wrong...

Hm.  It did fail as advertised when I connected to the contrib_regression
database (after installcheck) and entered the query by hand; I
copied-and-pasted the result of that to show you.  It's possible that it
would not have failed in the particular spot where you chose to insert it
in the regression script, if for example there were nondefault planner GUC
settings active at that spot.  Did you check that the script produced the
expected failure against unpatched code?

regards, tom lane



Re: [HACKERS] postgres_fdw bug in 9.6

2018-01-19 Thread Robert Haas
On Fri, Jan 19, 2018 at 1:53 AM, Etsuro Fujita
 wrote:
> I noticed that this test case added by the patch is not appropriate:
>
> +-- multi-way join involving multiple merge joins
> +EXPLAIN (VERBOSE, COSTS OFF)
> +SELECT * FROM ft1, ft2, ft4, ft5 WHERE ft1.c1 = ft2.c1 AND ft1.c1 = ft4.c1
> +AND ft1.c1 = ft5.c1 FOR UPDATE;
> +SELECT * FROM ft1, ft2, ft4, ft5 WHERE ft1.c1 = ft2.c1 AND ft1.c1 = ft4.c1
> +AND ft1.c1 = ft5.c1 FOR UPDATE;
>
> because it doesn't inject extra Sort nodes into EPQ recheck plans, so it
> works well without the fix.  I modified this to inject a Sort into the
> recheck plan of the very first foreign join.  Attached is a patch for that.

Mumble.  Tom provided me with that example and said it failed without
the patch.  I didn't check, I just believed him.  But I'm surprised if
he was wrong; Tom usually tries to avoid being wrong...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] postgres_fdw bug in 9.6

2018-01-18 Thread Etsuro Fujita

(2018/01/18 15:40), Etsuro Fujita wrote:

(2018/01/18 7:09), Robert Haas wrote:

On Wed, Jan 17, 2018 at 4:08 PM, Tom Lane wrote:

It's debatable perhaps -- I tend to err in the other direction.
But likewise, I don't care deeply. Just push it ...


Done.


I noticed that this test case added by the patch is not appropriate:

+-- multi-way join involving multiple merge joins
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT * FROM ft1, ft2, ft4, ft5 WHERE ft1.c1 = ft2.c1 AND ft1.c1 = ft4.c1
+AND ft1.c1 = ft5.c1 FOR UPDATE;
+SELECT * FROM ft1, ft2, ft4, ft5 WHERE ft1.c1 = ft2.c1 AND ft1.c1 = ft4.c1
+AND ft1.c1 = ft5.c1 FOR UPDATE;

because it doesn't inject extra Sort nodes into EPQ recheck plans, so it 
works well without the fix.  I modified this to inject a Sort into the 
recheck plan of the very first foreign join.  Attached is a patch for that.


Best regards,
Etsuro Fujita
*** a/contrib/postgres_fdw/expected/postgres_fdw.out
--- b/contrib/postgres_fdw/expected/postgres_fdw.out
***
*** 2330,2367  SELECT ft5, ft5.c1, ft5.c2, ft5.c3, ft4.c1, ft4.c2 FROM ft5 left join ft4 on ft5
  (4 rows)
  
  -- multi-way join involving multiple merge joins
  EXPLAIN (VERBOSE, COSTS OFF)
! SELECT * FROM ft1, ft2, ft4, ft5 WHERE ft1.c1 = ft2.c1 AND ft1.c1 = ft4.c1
! AND ft1.c1 = ft5.c1 FOR UPDATE;
!  QUERY PLAN 
! 
   LockRows
 Output: ft1.c1, ft1.c2, ft1.c3, ft1.c4, ft1.c5, ft1.c6, ft1.c7, ft1.c8, ft2.c1, ft2.c2, ft2.c3, ft2.c4, ft2.c5, ft2.c6, ft2.c7, ft2.c8, ft4.c1, ft4.c2, ft4.c3, ft5.c1, ft5.c2, ft5.c3, ft1.*, ft2.*, ft4.*, ft5.*
 ->  Foreign Scan
   Output: ft1.c1, ft1.c2, ft1.c3, ft1.c4, ft1.c5, ft1.c6, ft1.c7, ft1.c8, ft2.c1, ft2.c2, ft2.c3, ft2.c4, ft2.c5, ft2.c6, ft2.c7, ft2.c8, ft4.c1, ft4.c2, ft4.c3, ft5.c1, ft5.c2, ft5.c3, ft1.*, ft2.*, ft4.*, ft5.*
   Relations: (((public.ft1) INNER JOIN (public.ft2)) INNER JOIN (public.ft4)) INNER JOIN (public.ft5)
!  Remote SQL: SELECT r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8, CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END, r3.c1, r3.c2, r3.c3, CASE WHEN (r3.*)::text IS NOT NULL THEN ROW(r3.c1, r3.c2, r3.c3) END, r4.c1, r4.c2, r4.c3, CASE WHEN (r4.*)::text IS NOT NULL THEN ROW(r4.c1, r4.c2, r4.c3) END FROM ((("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1" INNER JOIN "S 1"."T 3" r3 ON (((r1."C 1" = r3.c1 INNER JOIN "S 1"."T 4" r4 ON (((r1."C 1" = r4.c1 FOR UPDATE OF r1 FOR UPDATE OF r2 FOR UPDATE OF r3 FOR UPDATE OF r4
   ->  Merge Join
 Output: ft1.c1, ft1.c2, ft1.c3, ft1.c4, ft1.c5, ft1.c6, ft1.c7, ft1.c8, ft1.*, ft2.c1, ft2.c2, ft2.c3, ft2.c4, ft2.c5, ft2.c6, ft2.c7, ft2.c8, ft2.*, ft4.c1, ft4.c2, ft4.c3, ft4.*, ft5.c1, ft5.c2, ft5.c3, ft5.*
!Merge Cond: (ft1.c1 = ft5.c1)
 ->  Merge Join
   Output: ft1.c1, ft1.c2, ft1.c3, ft1.c4, ft1.c5, ft1.c6, ft1.c7, ft1.c8, ft1.*, ft2.c1, ft2.c2, ft2.c3, ft2.c4, ft2.c5, ft2.c6, ft2.c7, ft2.c8, ft2.*, ft4.c1, ft4.c2, ft4.c3, ft4.*
!  Merge Cond: (ft1.c1 = ft4.c1)
!  ->  Merge Join
 Output: ft1.c1, ft1.c2, 

Re: [HACKERS] postgres_fdw bug in 9.6

2018-01-17 Thread Etsuro Fujita

(2018/01/18 7:09), Robert Haas wrote:

On Wed, Jan 17, 2018 at 4:08 PM, Tom Lane  wrote:

It's debatable perhaps -- I tend to err in the other direction.
But likewise, I don't care deeply.  Just push it ...


Done.


Thanks for taking the time to work on this issue, Robert and Tom!

Best regards,
Etsuro Fujita



Re: [HACKERS] postgres_fdw bug in 9.6

2018-01-17 Thread Robert Haas
On Wed, Jan 17, 2018 at 4:08 PM, Tom Lane  wrote:
> It's debatable perhaps -- I tend to err in the other direction.
> But likewise, I don't care deeply.  Just push it ...

Done.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] postgres_fdw bug in 9.6

2018-01-17 Thread Tom Lane
Robert Haas  writes:
> On Wed, Jan 17, 2018 at 3:37 PM, Tom Lane  wrote:
>> Looks OK to me.  Would it be worth annotating the added regression test
>> case with a comment that this once caused EPQ-related planning problems?

> I tend to think somebody who is curious about the origin of any
> particular test can just use 'git blame' and/or 'git log -Gwhatever'
> to figure out which commits added it, and that therefore it's not
> worth including that in the comment explicitly.  But I don't care
> deeply.

It's debatable perhaps -- I tend to err in the other direction.
But likewise, I don't care deeply.  Just push it ...

regards, tom lane



Re: [HACKERS] postgres_fdw bug in 9.6

2018-01-17 Thread Robert Haas
On Wed, Jan 17, 2018 at 3:37 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> Thanks for the review.  Updated patch attached.
>
> Looks OK to me.  Would it be worth annotating the added regression test
> case with a comment that this once caused EPQ-related planning problems?

I tend to think somebody who is curious about the origin of any
particular test can just use 'git blame' and/or 'git log -Gwhatever'
to figure out which commits added it, and that therefore it's not
worth including that in the comment explicitly.  But I don't care
deeply.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] postgres_fdw bug in 9.6

2018-01-17 Thread Robert Haas
On Mon, Jan 15, 2018 at 4:38 PM, Tom Lane  wrote:
> There *is* a problem with GetExistingLocalJoinPath not honoring its
> API spec: it will sometimes return join nests that include remote
> joins at levels below the top, as I'd speculated to begin with.
> This turns out not to be a problem for postgres_fdw, because any
> such remote-join node will just recursively fob off EPQ checks
> onto its own outerpath, so that eventually you get down to base
> scan relations anyway, and everything works.  However, because
> GetExistingLocalJoinPath is claimed to be a general purpose function
> useful for other FDWs, the fact that its API contract is a lie
> seems to me to be a problem anyway.

Yeah, that's not good.  The idea, as the comments inside
GetExistingLocalJoinPath say, is that if we find an existing join
whose inner or outer side is a ForeignPath for a join, we fish out the
outer path which should be that subpath's EPQ path.  I'm not 100% sure
why that's not happening in your example, but my first guess is that
it's because the inner path has a Materialize node on top of the
join-ForeignPath, and GetExistingLocalJoinPath just sees the
Materialize node and concludes that it doesn't need to look further.
If that's correct, bet we could fix that by telling it to drill
through any MaterialPath it sees and use the sub-path instead; given
that there is only one tuple per relation, the Materialize step can't
be important for performance.

> I'm also still pretty unhappy with the amount of useless planning work
> caused by doing GetExistingLocalJoinPath during path creation.  It strikes
> me that we could likely replace the entire thing with some code that just
> reconstructs the join node's output tuple during EPQ using the rowmark
> data for all the base relations.  Outer joins aren't really a problem:
> we could tell which relations were replaced by nulls because the rowmark
> values that bubbled up to the top went to nulls themselves.  However,
> that's a nontrivial amount of work and probably wouldn't result in
> something we cared to back-patch, especially since it's not really a bug
> fix.

Yes, I believe something like this would be possible.  I did consider
that approach, but this seemed easier to implement and, as it was, it
took two release cycles to get join pushdown committed.  And multiple
hundreds of emails, most of them about EPQ, if I remember correctly.
I'm happy to have somebody implement something better, but I will
probably get a bit hypoxic if I hold my breath waiting for it to
happen.

> Your patch seems OK codewise, but I think the comment is not nearly
> adequate.  Maybe something like "The EPQ path must be at least as well
> sorted as the path itself, in case it gets used as input to a mergejoin."
>
> Also, a regression test case would be appropriate perhaps.  I tried
> to reproduce Jeff's complaint in the postgres_fdw regression database,
> and it actually failed on the very first multi-way join I tried:
>
> contrib_regression=# explain select * from ft1,ft2,ft4,ft5 where 
> ft1.c1=ft2.c1 and ft1.c1=ft4.c1 and ft1.c1=ft5.c1 for update;
> ERROR:  outer pathkeys do not match mergeclauses

Thanks for the review.  Updated patch attached.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


sort-epq-path-if-needed-v2.patch
Description: Binary data


Re: [HACKERS] postgres_fdw bug in 9.6

2018-01-17 Thread Tom Lane
Etsuro Fujita  writes:
> (2018/01/16 6:38), Tom Lane wrote:
>> I'm also still pretty unhappy with the amount of useless planning work
>> caused by doing GetExistingLocalJoinPath during path creation.  It strikes
>> me that we could likely replace the entire thing with some code that just
>> reconstructs the join node's output tuple during EPQ using the rowmark
>> data for all the base relations.

> The join tuple would be reconstructed without a local join execution plan?

It seems clear to me that this could be done in principle.

>> Outer joins aren't really a problem:
>> we could tell which relations were replaced by nulls because the rowmark
>> values that bubbled up to the top went to nulls themselves.

> Yeah, but we would need null-extension or projection...

Yes, the question is how complicated it would be to get the projection
right.  As long as the join node's scan tuple only needs to contain values
(possibly whole-row Vars) propagated up from base relations and possibly
nulled out, it seems like it's not very hard.  But if we ever try to
implement actual calculation of expressions in intermediate join nodes,
it could get to be a mess.  (I haven't looked at the aggregate-pushdown
patch in this connection, although probably that's not a hazard since
we don't support aggregation together with FOR UPDATE.  Perhaps we could
likewise simply refuse to push expressions into foreign join nests if
FOR UPDATE is active.)

As against that, the existing EPQ implementation is pretty messy already,
on top of consuming a lot of planning cycles.

> What do you think about a future extension to parameterized foreign paths?

Don't have an opinion on that; but it doesn't seem like EPQ is the
principal blocking factor there.  You could always skip generating
parameterized foreign paths if FOR UPDATE is needed.

regards, tom lane



Re: [HACKERS] postgres_fdw bug in 9.6

2018-01-17 Thread Etsuro Fujita

(2018/01/16 6:38), Tom Lane wrote:

Robert Haas  writes:

On Mon, Jan 15, 2018 at 12:31 PM, Tom Lane  wrote:

Hm.  Simple is certainly good, but if there's multiple rows coming
back during an EPQ recheck then I think we have a performance problem.



You are correct ... I was wrong about that part, and said so in an
email on this thread sent about 45 minutes before yours.


I was wrong too.


However, I
still think the patch is a good fix for the immediate issue, unless
you see some problem with it.  It's simple and back-patchable and does
not preclude further work anybody, including you, might want to do in
the future.


I still think so too.


I'm also still pretty unhappy with the amount of useless planning work
caused by doing GetExistingLocalJoinPath during path creation.  It strikes
me that we could likely replace the entire thing with some code that just
reconstructs the join node's output tuple during EPQ using the rowmark
data for all the base relations.


The join tuple would be reconstructed without a local join execution plan?


Outer joins aren't really a problem:
we could tell which relations were replaced by nulls because the rowmark
values that bubbled up to the top went to nulls themselves.


Yeah, but we would need null-extension or projection...


However,
that's a nontrivial amount of work and probably wouldn't result in
something we cared to back-patch, especially since it's not really a bug
fix.


What do you think about a future extension to parameterized foreign paths?

Best regards,
Etsuro Fujita



Re: [HACKERS] postgres_fdw bug in 9.6

2018-01-17 Thread Etsuro Fujita

(2018/01/16 12:00), Etsuro Fujita wrote:

(2018/01/16 11:17), Tom Lane wrote:

Etsuro Fujita writes:

(2018/01/16 1:47), Robert Haas wrote:

Hmm, I was thinking that bar and baz wouldn't be constrained to return
just one tuple in that case, but I'm wrong: there would just be one
tuple per relation in that case. However, that would also be true for
a full join, wouldn't it?



Consider:



postgres=# create table bar (a int, b text);
postgres=# create table baz (a int, b text);
postgres=# insert into bar values (1, 'bar');
postgres=# insert into baz values (2, 'baz');
postgres=# select * from bar full join baz on bar.a = baz.a;
a | b | a | b
---+-+---+-
1 | bar | |
| | 2 | baz
(2 rows)



Both relations have one tuple, but the full join produces two join
tuples. I think it would be possible that something like this happens
when executing a local join plan for a foreign join that performs a full
join remotely.


Doesn't really matter though, does it? Each of those join rows will
be processed as a separate EPQ event.


I assume that such a local join plan is executed as part of a FOR UPDATE
query like the one shown by Robert (the bar/baz foreign join part in
that query), so I am thinking that those join rows will be processed as
a single event.


I realized I am wrong; the local join execution plan would never produce 
multiple tuples in a single event.


Best regards,
Etsuro Fujita



Re: [HACKERS] postgres_fdw bug in 9.6

2018-01-15 Thread Etsuro Fujita

(2018/01/16 11:17), Tom Lane wrote:

Etsuro Fujita  writes:

(2018/01/16 1:47), Robert Haas wrote:

Hmm, I was thinking that bar and baz wouldn't be constrained to return
just one tuple in that case, but I'm wrong: there would just be one
tuple per relation in that case.  However, that would also be true for
a full join, wouldn't it?



Consider:



postgres=# create table bar (a int, b text);
postgres=# create table baz (a int, b text);
postgres=# insert into bar values (1, 'bar');
postgres=# insert into baz values (2, 'baz');
postgres=# select * from bar full join baz on bar.a = baz.a;
   a |  b  | a |  b
---+-+---+-
   1 | bar |   |
 | | 2 | baz
(2 rows)



Both relations have one tuple, but the full join produces two join
tuples.  I think it would be possible that something like this happens
when executing a local join plan for a foreign join that performs a full
join remotely.


Doesn't really matter though, does it?  Each of those join rows will
be processed as a separate EPQ event.


I assume that such a local join plan is executed as part of a FOR UPDATE 
query like the one shown by Robert (the bar/baz foreign join part in 
that query), so I am thinking that those join rows will be processed as 
a single event.


Best regards,
Etsuro Fujita



Re: [HACKERS] postgres_fdw bug in 9.6

2018-01-15 Thread Tom Lane
Etsuro Fujita  writes:
> (2018/01/16 1:47), Robert Haas wrote:
>> Hmm, I was thinking that bar and baz wouldn't be constrained to return
>> just one tuple in that case, but I'm wrong: there would just be one
>> tuple per relation in that case.  However, that would also be true for
>> a full join, wouldn't it?

> Consider:

> postgres=# create table bar (a int, b text);
> postgres=# create table baz (a int, b text);
> postgres=# insert into bar values (1, 'bar');
> postgres=# insert into baz values (2, 'baz');
> postgres=# select * from bar full join baz on bar.a = baz.a;
>   a |  b  | a |  b
> ---+-+---+-
>   1 | bar |   |
> | | 2 | baz
> (2 rows)

> Both relations have one tuple, but the full join produces two join 
> tuples.  I think it would be possible that something like this happens 
> when executing a local join plan for a foreign join that performs a full 
> join remotely.

Doesn't really matter though, does it?  Each of those join rows will
be processed as a separate EPQ event.

regards, tom lane



Re: [HACKERS] postgres_fdw bug in 9.6

2018-01-15 Thread Etsuro Fujita

(2018/01/16 1:47), Robert Haas wrote:

On Mon, Jan 15, 2018 at 3:09 AM, Etsuro Fujita
  wrote:

Yeah, but I don't think the above example is good enough to explain that,
because I think the bar/baz join would produce at most one tuple in an EPQ
recheck since we would have only one EPQ tuple for both bar and baz in that
recheck, and the join is inner.  I think such an example would probably be
given e.g., by a modified version of the SQL where we have a full join of
bar and baz, not an inner join.


Hmm, I was thinking that bar and baz wouldn't be constrained to return
just one tuple in that case, but I'm wrong: there would just be one
tuple per relation in that case.  However, that would also be true for
a full join, wouldn't it?


Consider:

postgres=# create table bar (a int, b text);
postgres=# create table baz (a int, b text);
postgres=# insert into bar values (1, 'bar');
postgres=# insert into baz values (2, 'baz');
postgres=# select * from bar full join baz on bar.a = baz.a;
 a |  b  | a |  b
---+-+---+-
 1 | bar |   |
   | | 2 | baz
(2 rows)

Both relations have one tuple, but the full join produces two join 
tuples.  I think it would be possible that something like this happens 
when executing a local join plan for a foreign join that performs a full 
join remotely.



Regardless of that, the patch fixes the reported problem with very
little code change, and somebody can always improve it further later.


Agreed.

Best regards,
Etsuro Fujita



Re: [HACKERS] postgres_fdw bug in 9.6

2018-01-15 Thread Robert Haas
On Mon, Jan 15, 2018 at 12:31 PM, Tom Lane  wrote:

>> After some thought, it seems that there's a much simpler way that we
>> could fix the problem you identified in that original email: if the
>> EPQ path isn't properly sorted, have postgres_fdw's
>> add_paths_with_pathkeys_for_rel stick a Sort node on top of it.  I
>> tried this and it does indeed fix Jeff Janes' initial test case.
>
> Hm.  Simple is certainly good, but if there's multiple rows coming
> back during an EPQ recheck then I think we have a performance problem.

You are correct ... I was wrong about that part, and said so in an
email on this thread sent about 45 minutes before yours.  However, I
still think the patch is a good fix for the immediate issue, unless
you see some problem with it.  It's simple and back-patchable and does
not preclude further work anybody, including you, might want to do in
the future.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] postgres_fdw bug in 9.6

2018-01-15 Thread Tom Lane
Robert Haas  writes:
> I started to look at this patch again today and got cold feet.  It
> seems to me that this entire design on the posted patch is based on
> your remarks in http://postgr.es/m/13242.1481582...@sss.pgh.pa.us --

> # One way that we could make things better is to rely on the knowledge
> # that EPQ isn't asked to evaluate joins for more than one row per input
> # relation, and therefore using merge or hash join technology is really
> # overkill.  We could make a tree of simple nestloop joins, which aren't
> # going to care about sort order, if we could identify the correct join
> # clauses to apply.

> However, those remarks are almost entirely wrong.

Hm, one of us is confused.  Possibly it's me, but:

> It's true that the
> relations for which we have EPQ tuples will only ever output a single
> tuple,

Yup, and that is all of them.  We should have a tuple identity
(ctid or whole-row) for every base relation in an EPQ test.

> but because of what you said in the quoted text above, this
> does NOT imply that the recheck plan never needs to worry about
> dealing with a large number of rows, as the following example shows.

This example doesn't prove anything whatsoever about what happens during
EPQ, though, or at least it shouldn't.  If we're examining more than one
row then we're doing it wrong, because the point of the EPQ evaluation
is only to determine whether one single joined row (still) meets the
query conditions.

> After some thought, it seems that there's a much simpler way that we
> could fix the problem you identified in that original email: if the
> EPQ path isn't properly sorted, have postgres_fdw's
> add_paths_with_pathkeys_for_rel stick a Sort node on top of it.  I
> tried this and it does indeed fix Jeff Janes' initial test case.

Hm.  Simple is certainly good, but if there's multiple rows coming
back during an EPQ recheck then I think we have a performance problem.

regards, tom lane



Re: [HACKERS] postgres_fdw bug in 9.6

2018-01-15 Thread Robert Haas
On Mon, Jan 15, 2018 at 3:09 AM, Etsuro Fujita
 wrote:
> Yeah, but I don't think the above example is good enough to explain that,
> because I think the bar/baz join would produce at most one tuple in an EPQ
> recheck since we would have only one EPQ tuple for both bar and baz in that
> recheck, and the join is inner.  I think such an example would probably be
> given e.g., by a modified version of the SQL where we have a full join of
> bar and baz, not an inner join.

Hmm, I was thinking that bar and baz wouldn't be constrained to return
just one tuple in that case, but I'm wrong: there would just be one
tuple per relation in that case.  However, that would also be true for
a full join, wouldn't it?

Regardless of that, the patch fixes the reported problem with very
little code change, and somebody can always improve it further later.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] postgres_fdw bug in 9.6

2018-01-15 Thread Etsuro Fujita

(2018/01/13 6:07), Robert Haas wrote:

On Sun, Dec 3, 2017 at 2:56 PM, Tom Lane  wrote:

I thought some more about this.  While it seems clear that we don't
actually need to recheck anything within a postgres_fdw foreign join,
there's still a problem: we have to be able to regurgitate the join
row during postgresRecheckForeignScan.  Since, in the current system
design, the only available data is scan-level rowmarks (that is,
whole-row values from each base foreign relation), there isn't any
very good way to produce the join row except to insert the rowmark
values into dummy scan nodes and then re-execute the join locally.
So really, that is what the outerPlan infrastructure is doing for us.


I started to look at this patch again today and got cold feet.  It
seems to me that this entire design on the posted patch is based on
your remarks in http://postgr.es/m/13242.1481582...@sss.pgh.pa.us --

# One way that we could make things better is to rely on the knowledge
# that EPQ isn't asked to evaluate joins for more than one row per input
# relation, and therefore using merge or hash join technology is really
# overkill.  We could make a tree of simple nestloop joins, which aren't
# going to care about sort order, if we could identify the correct join
# clauses to apply.


That's right.


However, those remarks are almost entirely wrong.  It's true that the
relations for which we have EPQ tuples will only ever output a single
tuple, but because of what you said in the quoted text above, this
does NOT imply that the recheck plan never needs to worry about
dealing with a large number of rows, as the following example shows.
(The part about making a tree of simple nestloop joins is wrong not
only for this reason but because it won't work in the executor, as
previously discussed.)

rhaas=# explain select * from foo join (bar join baz on bar.a = baz.a)
on foo.a = bar.a where foo.b = 'hello' for update;
   QUERY PLAN
---
  LockRows  (cost=109.02..121.46 rows=1 width=162)
->   Merge Join  (cost=109.02..121.45 rows=1 width=162)
  Merge Cond: (bar.a = foo.a)
  ->   Foreign Scan  (cost=100.58..4914.58 rows=4 width=119)
Relations: (public.bar) INNER JOIN (public.baz)
->   Hash Join  (cost=2556.00..4962.00 rows=4 width=119)
  Hash Cond: (bar.a = baz.a)
  ->   Foreign Scan on bar  (cost=100.00..1956.00
rows=4 width=28)
  ->   Hash  (cost=1956.00..1956.00 rows=4 width=27)
->   Foreign Scan on baz
(cost=100.00..1956.00 rows=4 width=27)
  ->   Sort  (cost=8.44..8.45 rows=1 width=28)
Sort Key: foo.a
->   Index Scan using foo_b_idx on foo  (cost=0.41..8.43
rows=1 width=28)
  Index Cond: (b = 'hello'::text)

There's really nothing to prevent the bar/baz join from producing
arbitrarily many rows, which means that it's not good to turn the
recheck plan into a Nested Loop unconditionally -- and come to think
of it, I'm pretty sure this kind of case is why
GetExistingLocalJoinPath() tries to using an existing path: it wants a
path that isn't going suck if it gets executed.


Yeah, but I don't think the above example is good enough to explain 
that, because I think the bar/baz join would produce at most one tuple 
in an EPQ recheck since we would have only one EPQ tuple for both bar 
and baz in that recheck, and the join is inner.  I think such an example 
would probably be given e.g., by a modified version of the SQL where we 
have a full join of bar and baz, not an inner join.



After some thought, it seems that there's a much simpler way that we
could fix the problem you identified in that original email: if the
EPQ path isn't properly sorted, have postgres_fdw's
add_paths_with_pathkeys_for_rel stick a Sort node on top of it.  I
tried this and it does indeed fix Jeff Janes' initial test case.
Patch attached.


The patch looks good to me.


One could do better here if one wanted to revise
GetExistingLocalJoinPath() to take Last *pathkeys as an additional
argument; we could then try to find the optimal EPQ path for each
possible set of sortkeys.  But I don't feel the need to work on that
right now, and it would be nicer not to back-patch a change to the
signature of GetExistingLocalJoinPath(), as has already been noted.


That makes sense to me.


The only problem that
*has actually been demonstrated* is that we can end up using as an EPQ
path something that doesn't generate the right sort order, and that is
easily fixed by making it do so, which the attached patch does.  So I
propose we commit and back-patch this and move on.


Agreed.

Best regards,
Etsuro Fujita



Re: [HACKERS] postgres_fdw bug in 9.6

2018-01-12 Thread Robert Haas
On Sun, Dec 3, 2017 at 2:56 PM, Tom Lane  wrote:
> I thought some more about this.  While it seems clear that we don't
> actually need to recheck anything within a postgres_fdw foreign join,
> there's still a problem: we have to be able to regurgitate the join
> row during postgresRecheckForeignScan.  Since, in the current system
> design, the only available data is scan-level rowmarks (that is,
> whole-row values from each base foreign relation), there isn't any
> very good way to produce the join row except to insert the rowmark
> values into dummy scan nodes and then re-execute the join locally.
> So really, that is what the outerPlan infrastructure is doing for us.

I started to look at this patch again today and got cold feet.  It
seems to me that this entire design on the posted patch is based on
your remarks in http://postgr.es/m/13242.1481582...@sss.pgh.pa.us --

# One way that we could make things better is to rely on the knowledge
# that EPQ isn't asked to evaluate joins for more than one row per input
# relation, and therefore using merge or hash join technology is really
# overkill.  We could make a tree of simple nestloop joins, which aren't
# going to care about sort order, if we could identify the correct join
# clauses to apply.

However, those remarks are almost entirely wrong.  It's true that the
relations for which we have EPQ tuples will only ever output a single
tuple, but because of what you said in the quoted text above, this
does NOT imply that the recheck plan never needs to worry about
dealing with a large number of rows, as the following example shows.
(The part about making a tree of simple nestloop joins is wrong not
only for this reason but because it won't work in the executor, as
previously discussed.)

rhaas=# explain select * from foo join (bar join baz on bar.a = baz.a)
on foo.a = bar.a where foo.b = 'hello' for update;
  QUERY PLAN
---
 LockRows  (cost=109.02..121.46 rows=1 width=162)
   ->  Merge Join  (cost=109.02..121.45 rows=1 width=162)
 Merge Cond: (bar.a = foo.a)
 ->  Foreign Scan  (cost=100.58..4914.58 rows=4 width=119)
   Relations: (public.bar) INNER JOIN (public.baz)
   ->  Hash Join  (cost=2556.00..4962.00 rows=4 width=119)
 Hash Cond: (bar.a = baz.a)
 ->  Foreign Scan on bar  (cost=100.00..1956.00
rows=4 width=28)
 ->  Hash  (cost=1956.00..1956.00 rows=4 width=27)
   ->  Foreign Scan on baz
(cost=100.00..1956.00 rows=4 width=27)
 ->  Sort  (cost=8.44..8.45 rows=1 width=28)
   Sort Key: foo.a
   ->  Index Scan using foo_b_idx on foo  (cost=0.41..8.43
rows=1 width=28)
 Index Cond: (b = 'hello'::text)

There's really nothing to prevent the bar/baz join from producing
arbitrarily many rows, which means that it's not good to turn the
recheck plan into a Nested Loop unconditionally -- and come to think
of it, I'm pretty sure this kind of case is why
GetExistingLocalJoinPath() tries to using an existing path: it wants a
path that isn't going suck if it gets executed.

After some thought, it seems that there's a much simpler way that we
could fix the problem you identified in that original email: if the
EPQ path isn't properly sorted, have postgres_fdw's
add_paths_with_pathkeys_for_rel stick a Sort node on top of it.  I
tried this and it does indeed fix Jeff Janes' initial test case.
Patch attached.  One could do better here if one wanted to revise
GetExistingLocalJoinPath() to take Last *pathkeys as an additional
argument; we could then try to find the optimal EPQ path for each
possible set of sortkeys.  But I don't feel the need to work on that
right now, and it would be nicer not to back-patch a change to the
signature of GetExistingLocalJoinPath(), as has already been noted.

It is altogether possible that this doesn't fix every issue with the
GetExistingLocalJoinPath interface, and perhaps that is due for a
broader rewrite.  However, I think that more recent discussion has
cast some doubt on the fiery criticism of that previous note.  That
note asserted that we'd have trouble with join nests containing more
than 2 joins, but as was pointed out at the time, that's not really
true, because if we happen to pull a join out of the pathlist that has
ForeignScan children, we'll fetch out the EPQ-recheck subpaths that
they created and use those instead.  And there are other problems with
the analysis in that email too as noted above.  The only problem that
*has actually been demonstrated* is that we can end up using as an EPQ
path something that doesn't generate the right sort order, and that is
easily fixed by making it do so, which the attached patch does.  So I
propose we commit and back-patch this and move on.

-- 
Robert 

Re: [HACKERS] postgres_fdw bug in 9.6

2018-01-11 Thread Etsuro Fujita

(2018/01/12 10:41), Thomas Munro wrote:

On Mon, Dec 11, 2017 at 8:25 PM, Etsuro Fujita
  wrote:

Now, if you're still super-concerned about this breaking something, we
could commit it only to master, where it will have 9 months to bake
before it gets released.  I think that's overly conservative, but I
think it's still better than waiting for the rewrite you'd like to see
happen.  We don't know when or if anyone is going to undertake that,
and if we wait, we may easing release a v11 that's got the same defect
as v9.6 and now v10.  And I don't see that we lose much by committing
this now even if that rewrite does happen in time for v11.  Ripping
out CreateLocalJoinPath() won't be any harder than ripping out
GetExistingLocalJoinPath().


Agreed.  Attached is an rebased version which moved the new fields in
JoinPathExtraData to the end of that struct.


FYI this doesn't compile anymore, because initial_cost_hashjoin() and
create_hashjoin_path() changed in master.


Thank you for letting me know about that!  Here is an updated version.

Best regards,
Etsuro Fujita
*** a/contrib/postgres_fdw/expected/postgres_fdw.out
--- b/contrib/postgres_fdw/expected/postgres_fdw.out
***
*** 1701,1722  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t
 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1" ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR UPDATE OF r1
!->  Merge Join
   Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!  Merge Cond: (t1.c1 = t2.c1)
!  ->  Sort
 Output: t1.c1, t1.c3, t1.*
!Sort Key: t1.c1
!->  Foreign Scan on public.ft1 t1
!  Output: t1.c1, t1.c3, t1.*
!  Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR UPDATE
!  ->  Sort
 Output: t2.c1, t2.*
!Sort Key: t2.c1
!->  Foreign Scan on public.ft2 t2
!  Output: t2.c1, t2.*
!  Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1"
! (23 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR UPDATE OF t1;
   c1  | c1  
--- 1701,1716 
 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1" ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR UPDATE OF r1
!->  Nested Loop
   Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!  Join Filter: (t1.c1 = t2.c1)
!  ->  Foreign Scan on public.ft1 t1
 Output: t1.c1, t1.c3, t1.*
!Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR UPDATE
!  ->  Foreign Scan on public.ft2 t2
 Output: t2.c1, t2.*
!Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1"
! (17 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR UPDATE OF t1;
   c1  | c1  
***
*** 1745,1766  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t
 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1" ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR UPDATE OF r1 FOR UPDATE OF r2
!->  Merge Join
   Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!   

Re: [HACKERS] postgres_fdw bug in 9.6

2018-01-11 Thread Thomas Munro
On Mon, Dec 11, 2017 at 8:25 PM, Etsuro Fujita
 wrote:
>> Now, if you're still super-concerned about this breaking something, we
>> could commit it only to master, where it will have 9 months to bake
>> before it gets released.  I think that's overly conservative, but I
>> think it's still better than waiting for the rewrite you'd like to see
>> happen.  We don't know when or if anyone is going to undertake that,
>> and if we wait, we may easing release a v11 that's got the same defect
>> as v9.6 and now v10.  And I don't see that we lose much by committing
>> this now even if that rewrite does happen in time for v11.  Ripping
>> out CreateLocalJoinPath() won't be any harder than ripping out
>> GetExistingLocalJoinPath().
>
> Agreed.  Attached is an rebased version which moved the new fields in
> JoinPathExtraData to the end of that struct.

FYI this doesn't compile anymore, because initial_cost_hashjoin() and
create_hashjoin_path() changed in master.

-- 
Thomas Munro
http://www.enterprisedb.com



Re: [HACKERS] postgres_fdw bug in 9.6

2017-12-10 Thread Etsuro Fujita

(2017/12/09 5:53), Robert Haas wrote:

On Sun, Dec 3, 2017 at 2:56 PM, Tom Lane  wrote:

Not sure where that leaves us for the patch at hand.  It feels to me
like it's a temporary band-aid for code that we want to get rid of
in the not-too-long run.  As such, it'd be okay if it were smaller,
but it seems bigger and more invasive than I could wish for a band-aid.


Well, the bug report is a year old today, so we should try to do
something.  The patch needs a rebase, but reading through it, I'm not
convinced that it's particularly risky.  I mean, it's going to break
FDWs that are calling GetExistingLocalJoinPath(), but there are
probably very few third-party FDWs that do that.  Any that do are
precisely the ones that need this fix, so I think there's a good
chance they'll forgive of us for requiring them to make a code change.
I think we can contain the risk of anything else getting broken to an
acceptable level because there's not really very much other code
changing.  The changes to joinpath.c need to be carefully audited to
make sure that they are not changing the semantics, but that seems
like it shouldn't take more than an hour of code-reading to check.
The new fields in JoinPathExtraData can be moved to the end of the
struct to minimize ABI breakage.  I don't see what else there is that
could break apart from the EPQ rechecks themselves, and if that
weren't broken already, this patch wouldn't exist.

Now, if you're still super-concerned about this breaking something, we
could commit it only to master, where it will have 9 months to bake
before it gets released.  I think that's overly conservative, but I
think it's still better than waiting for the rewrite you'd like to see
happen.  We don't know when or if anyone is going to undertake that,
and if we wait, we may easing release a v11 that's got the same defect
as v9.6 and now v10.  And I don't see that we lose much by committing
this now even if that rewrite does happen in time for v11.  Ripping
out CreateLocalJoinPath() won't be any harder than ripping out
GetExistingLocalJoinPath().


Agreed.  Attached is an rebased version which moved the new fields in 
JoinPathExtraData to the end of that struct.


Best regards,
Etsuro Fujita
*** a/contrib/postgres_fdw/expected/postgres_fdw.out
--- b/contrib/postgres_fdw/expected/postgres_fdw.out
***
*** 1701,1722  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t
 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1" ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR UPDATE OF r1
!->  Merge Join
   Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!  Merge Cond: (t1.c1 = t2.c1)
!  ->  Sort
 Output: t1.c1, t1.c3, t1.*
!Sort Key: t1.c1
!->  Foreign Scan on public.ft1 t1
!  Output: t1.c1, t1.c3, t1.*
!  Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR UPDATE
!  ->  Sort
 Output: t2.c1, t2.*
!Sort Key: t2.c1
!->  Foreign Scan on public.ft2 t2
!  Output: t2.c1, t2.*
!  Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1"
! (23 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR UPDATE OF t1;
   c1  | c1  
--- 1701,1716 
 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1" ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR UPDATE OF r1
!->  Nested Loop
   Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!  Join Filter: (t1.c1 = t2.c1)
!  ->  Foreign Scan on public.ft1 t1
 Output: t1.c1, t1.c3, t1.*
!Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR UPDATE
! 

Re: [HACKERS] postgres_fdw bug in 9.6

2017-12-08 Thread Robert Haas
On Sun, Dec 3, 2017 at 2:56 PM, Tom Lane  wrote:
> Not sure where that leaves us for the patch at hand.  It feels to me
> like it's a temporary band-aid for code that we want to get rid of
> in the not-too-long run.  As such, it'd be okay if it were smaller,
> but it seems bigger and more invasive than I could wish for a band-aid.

Well, the bug report is a year old today, so we should try to do
something.  The patch needs a rebase, but reading through it, I'm not
convinced that it's particularly risky.  I mean, it's going to break
FDWs that are calling GetExistingLocalJoinPath(), but there are
probably very few third-party FDWs that do that.  Any that do are
precisely the ones that need this fix, so I think there's a good
chance they'll forgive of us for requiring them to make a code change.
I think we can contain the risk of anything else getting broken to an
acceptable level because there's not really very much other code
changing.  The changes to joinpath.c need to be carefully audited to
make sure that they are not changing the semantics, but that seems
like it shouldn't take more than an hour of code-reading to check.
The new fields in JoinPathExtraData can be moved to the end of the
struct to minimize ABI breakage.  I don't see what else there is that
could break apart from the EPQ rechecks themselves, and if that
weren't broken already, this patch wouldn't exist.

Now, if you're still super-concerned about this breaking something, we
could commit it only to master, where it will have 9 months to bake
before it gets released.  I think that's overly conservative, but I
think it's still better than waiting for the rewrite you'd like to see
happen.  We don't know when or if anyone is going to undertake that,
and if we wait, we may easing release a v11 that's got the same defect
as v9.6 and now v10.  And I don't see that we lose much by committing
this now even if that rewrite does happen in time for v11.  Ripping
out CreateLocalJoinPath() won't be any harder than ripping out
GetExistingLocalJoinPath().

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] postgres_fdw bug in 9.6

2017-12-08 Thread Etsuro Fujita

(2017/11/30 23:22), Tom Lane wrote:

Etsuro Fujita  writes:

(2017/11/30 7:32), Tom Lane wrote:

the output of the foreign join cannot change during EPQ, since the remote
server already locked the rows before returning them.  The only thing that
can change is the output of the local scan on public.tab.  Yes, we then
need to re-verify that foo.a = tab.a ... but in this example, that's the
responsibility of the NestLoop plan node, not the foreign join.



That's right, but is that true when the FDW uses late row locking?


An FDW using late row locking would need to work harder, yes.  But
that's true at the scan level as well as the join level.  We have
already committed to using early locking in postgres_fdw, for the
network-round-trip-cost reasons I mentioned before, and I can't see
why we'd change that decision at the join level.


My concern is FDWs that support join pushdown in combination with late 
row locking.  I don't know whether such FDWs really exist, but if so, an 
output of a foreign join computed from re-fetched tuples might change.



Right now we've got the worst of both worlds, in that we're actually
doing early row locking on the remote, but we're paying (some of) the
code complexity costs required for late locking.


Because we have provided the late row locking API, I think we should pay 
a certain degree of consideration for such FDWs.


Best regards,
Etsuro Fujita



Re: [HACKERS] postgres_fdw bug in 9.6

2017-12-03 Thread Tom Lane
Robert Haas  writes:
> But have a look at this:
> http://postgr.es/m/561e12d4.7040...@lab.ntt.co.jp
> That shows a case where, before
> 5fc4c26db5120bd90348b6ee3101fcddfdf54800, a query that required the
> foreign table to do an EPQ recheck produced an unambiguously wrong
> answer; the query stipulates that inttab.a = ft1.a but the row
> returned lacks that property.  I think this clearly shows that EPQ
> rechecks are needed at least for FDW paths that are parameterized.

Certainly; that's what I said before.

> However, I guess that doesn't actually refute your point, which IIUC
> is that maybe we don't need these checks for FDW join paths, because
> we don't build parameterized paths for those yet. Now I think it would
> still be a good idea to have a way of handling that case, because
> somebody's likely going want to fix that /* XXX Consider parameterized
> paths for the join relation */ eventually, but I guess for the
> back-branches just setting epq_path = NULL instead of calling
> GetExistingLocalJoinPath() might be the way to go... assuming that
> your contention that this won't break anything is indeed correct.

I thought some more about this.  While it seems clear that we don't
actually need to recheck anything within a postgres_fdw foreign join,
there's still a problem: we have to be able to regurgitate the join
row during postgresRecheckForeignScan.  Since, in the current system
design, the only available data is scan-level rowmarks (that is,
whole-row values from each base foreign relation), there isn't any
very good way to produce the join row except to insert the rowmark
values into dummy scan nodes and then re-execute the join locally.
So really, that is what the outerPlan infrastructure is doing for us.

We could imagine that, instead of the scan-level rowmarks, the foreign
join node could produce a ROW() expression containing the values it
actually has to output, and then use that as a "join-level rowmark" to
reconstitute the join row without any of the outerPlan infrastructure.
This would have some pretty significant advantages, notably that we
could (in principle) avoid fetching all the columns of remote tables
we no longer needed scan-level rowmarks for.

However, to do that, we'd have to dynamically decide which rowmark
expressions actually need to wind up getting computed in the final join
plan, based on which joins we choose to do remotely.  The current scheme
of statically adding rowmarks to the query during base relation setup
doesn't cut it.  So that's sounding like a nontrivial rewrite (and one
that would break the related FDW APIs, although the patch at hand does
that too).

As for the question of a future extension to parameterized paths,
I think that it would likely work to recheck the parameterized qual
at the join node, making it basically just like rechecks for scan-level
nodes.  The objection to this is that it might not produce the same
answer if the parameterized qual references relations that are on the
insides of outer joins ... but I think that in practice we could just
blow off that case (ie, reject producing parameterized paths involving
such quals).  The reason is that I think such cases arise only very
rarely.  It could only happen for non-strict qual expressions, because
a strict qual would have resulted in the outer join getting strength
reduced to a regular join.

Not sure where that leaves us for the patch at hand.  It feels to me
like it's a temporary band-aid for code that we want to get rid of
in the not-too-long run.  As such, it'd be okay if it were smaller,
but it seems bigger and more invasive than I could wish for a band-aid.

regards, tom lane



Re: [HACKERS] postgres_fdw bug in 9.6

2017-11-30 Thread Robert Haas
On Wed, Nov 29, 2017 at 5:32 PM, Tom Lane  wrote:
> AFAICT, [1] just demonstrates that nobody had thought about the scenario
> up to that point, not that the subsequently chosen solution was a good
> one.

But have a look at this:

http://postgr.es/m/561e12d4.7040...@lab.ntt.co.jp

That shows a case where, before
5fc4c26db5120bd90348b6ee3101fcddfdf54800, a query that required the
foreign table to do an EPQ recheck produced an unambiguously wrong
answer; the query stipulates that inttab.a = ft1.a but the row
returned lacks that property.  I think this clearly shows that EPQ
rechecks are needed at least for FDW paths that are parameterized.

However, I guess that doesn't actually refute your point, which IIUC
is that maybe we don't need these checks for FDW join paths, because
we don't build parameterized paths for those yet. Now I think it would
still be a good idea to have a way of handling that case, because
somebody's likely going want to fix that /* XXX Consider parameterized
paths for the join relation */ eventually, but I guess for the
back-branches just setting epq_path = NULL instead of calling
GetExistingLocalJoinPath() might be the way to go... assuming that
your contention that this won't break anything is indeed correct.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] postgres_fdw bug in 9.6

2017-11-30 Thread Tom Lane
Etsuro Fujita  writes:
> (2017/11/30 7:32), Tom Lane wrote:
>> the output of the foreign join cannot change during EPQ, since the remote
>> server already locked the rows before returning them.  The only thing that
>> can change is the output of the local scan on public.tab.  Yes, we then
>> need to re-verify that foo.a = tab.a ... but in this example, that's the
>> responsibility of the NestLoop plan node, not the foreign join.

> That's right, but is that true when the FDW uses late row locking?

An FDW using late row locking would need to work harder, yes.  But
that's true at the scan level as well as the join level.  We have
already committed to using early locking in postgres_fdw, for the
network-round-trip-cost reasons I mentioned before, and I can't see
why we'd change that decision at the join level.

Right now we've got the worst of both worlds, in that we're actually
doing early row locking on the remote, but we're paying (some of) the
code complexity costs required for late locking.

regards, tom lane



Re: [HACKERS] postgres_fdw bug in 9.6

2017-11-30 Thread Etsuro Fujita

(2017/11/30 7:32), Tom Lane wrote:

AFAICT, [1] just demonstrates that nobody had thought about the scenario
up to that point, not that the subsequently chosen solution was a good
one.  In that example,

LockRows  (cost=100.00..101.18 rows=4 width=70)
Output: tab.a, tab.b, tab.ctid, foo.*, bar.*
->   Nested Loop  (cost=100.00..101.14 rows=4 width=70)
  Output: tab.a, tab.b, tab.ctid, foo.*, bar.*
  Join Filter: (foo.a = tab.a)
  ->   Seq Scan on public.tab  (cost=0.00..1.01 rows=1 width=14)
Output: tab.a, tab.b, tab.ctid
  ->   Foreign Scan  (cost=100.00..100.08 rows=4 width=64)
Output: foo.*, foo.a, bar.*, bar.a
Relations: (public.foo) INNER JOIN (public.bar)
Remote SQL: SELECT l.a1, l.a2, r.a1, r.a2 FROM (SELECT 
ROW(l.a9), l.a9 FROM (SELECT a a9 FROM public.foo FOR UPDATE) l) l (a1, a2) 
INNER JOIN (SELECT ROW(r.a9), r.a9 FROM (SELECT a a9 FROM public.bar FOR 
UPDATE) r) r (a1, a2) ON ((l.a2 = r.a2))

the output of the foreign join cannot change during EPQ, since the remote
server already locked the rows before returning them.  The only thing that
can change is the output of the local scan on public.tab.  Yes, we then
need to re-verify that foo.a = tab.a ... but in this example, that's the
responsibility of the NestLoop plan node, not the foreign join.


That's right, but is that true when the FDW uses late row locking?

Best regards,
Etsuro Fujita



Re: [HACKERS] postgres_fdw bug in 9.6

2017-11-29 Thread Michael Paquier
On Thu, Nov 30, 2017 at 7:32 AM, Tom Lane  wrote:
> [snip]

Moving to next CF per the hotness of the topic.
-- 
Michael



Re: [HACKERS] postgres_fdw bug in 9.6

2017-11-28 Thread Ashutosh Bapat
On Tue, Nov 28, 2017 at 11:05 PM, Robert Haas  wrote:
> On Tue, Nov 28, 2017 at 11:21 AM, Tom Lane  wrote:
>> In short, we should get rid of all of this expensive and broken logic and
>> just make EPQ recheck on a foreign join be a no-op, just as it is for a
>> foreign base table.
>
> I'm not sure that it is.  What of
> 5fc4c26db5120bd90348b6ee3101fcddfdf54800?  That was before I started
> putting "Discussion" links into commit messages as a matter of
> routine, but I'm pretty sure that fixed what seemed to Etsuro Fujita,
> Kyotaro Horiguchi, and myself to be pretty clear misbehavior.  See
> also 385f337c9f39b21dca96ca4770552a10a6d5af24.  We've put an awful lot
> of effort into this over the last few years; I'm a bit hesitant to
> believe none of that did anything.
>

IIUC, the problem here is not about stale foreign tuples (or whether
they can change) -  as you described above they can not, since we have
requested FOR UPDATE. But when a foreign join is further joined with a
local table, tuples in which change after they are joined, we need to
make sure that the total join tuple (local table joined with foreign
join) still qualifies. We do this by locally joining those rows again.
[1] is where the discussion started

[1] https://www.postgresql.org/message-id/558a18b3.9050...@lab.ntt.co.jp

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company



Re: [HACKERS] postgres_fdw bug in 9.6

2017-11-28 Thread Robert Haas
On Tue, Nov 28, 2017 at 11:21 AM, Tom Lane  wrote:
> In short, we should get rid of all of this expensive and broken logic and
> just make EPQ recheck on a foreign join be a no-op, just as it is for a
> foreign base table.

I'm not sure that it is.  What of
5fc4c26db5120bd90348b6ee3101fcddfdf54800?  That was before I started
putting "Discussion" links into commit messages as a matter of
routine, but I'm pretty sure that fixed what seemed to Etsuro Fujita,
Kyotaro Horiguchi, and myself to be pretty clear misbehavior.  See
also 385f337c9f39b21dca96ca4770552a10a6d5af24.  We've put an awful lot
of effort into this over the last few years; I'm a bit hesitant to
believe none of that did anything.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] postgres_fdw bug in 9.6

2017-11-28 Thread Tom Lane
I wrote:
> Robert Haas  writes:
>> On Tue, Aug 29, 2017 at 5:14 PM, Tom Lane  wrote:
>>> We'd definitely need to do things that way in 9.6.  I'm not quite sure
>>> whether it's too late to adopt the clean solution in v10.

>> It probably is now.  Are you still planning to do something about this patch?

> It's still on my list, but I didn't get to it during the CF.

I got around to looking at this thread again.  Perhaps as a result of
not having looked at it for awhile, the thing that had been nagging
at me from the very beginning (where I said "I kind of wonder why this
infrastructure exists at all") finally crystallized: I do not think
this infrastructure *should* exist, at least not for postgres_fdw.

The reason EPQ exists at all is basically to avoid applying the overhead
of SELECT FOR UPDATE in cases where we don't have to.  If we chased up
to the end of every update chain and locked the frontmost tuple every
time, then EPQ rechecks wouldn't be needed because we'd already have
locked and qual-checked the right tuple.  This is, I think, a reasonable
optimization on local tables; but it's far less clear that it would be
sane to do it like that on remote tables, because of the extra network
round trips required.  Moreover, we already foreclosed the decision
because postgres_fdw always issues SELECT FOR UPDATE, not just SELECT,
to the remote side in any case where EPQ locking might happen.

Therefore, the remote side is already holding a tuple lock on the right
tuple, and it's already done anything EPQ-like that might need to have
happened, and it's guaranteed to have given us back a fully up-to-date
row value.  This is true whether the remote query is for a single table
or for a join.

In short, we should get rid of all of this expensive and broken logic and
just make EPQ recheck on a foreign join be a no-op, just as it is for a
foreign base table.  (Indeed, considering the fact that what we think is
a base table might be a join view on the far side, it's hard to come to
any conclusion but that making such a distinction is fundamentally wrong.)
If there are any cases where the join pushdown logic is failing to emit
"FOR UPDATE", we need to fix that, but a quick check says that it does
emit that at least in simple cases.

Comments?

regards, tom lane