Re: BUG #15781: subselect on foreign table (postgres_fdw) can crash (segfault)

2019-05-24 Thread Etsuro Fujita
On Sat, May 25, 2019 at 2:19 AM Sean Johnston
 wrote:
> Obviously this isn't mainstream postgres but just wondering if anyone has 
> looked into issues with regards to pushing order/limit to remote nodes for 
> fdw.

In PostgreSQL 12 Beta 1 released this week [1], we can push down ORDER
BY/LIMIT to the remote PostgreSQL server.  Give it a try!

Best regards,
Etsuro Fujita

[1] https://www.postgresql.org/about/news/1943/




Re: BUG #15781: subselect on foreign table (postgres_fdw) can crash (segfault)

2019-05-24 Thread Tom Lane
Sean Johnston  writes:
> Not sure if this is the right avenue to follow up on this. The patch works
> fine. However, we're working on a modified version of the postgres_fdw in
> which we're trying to push as much as possible to the remote nodes,
> including ordering and limits. The patch causes the upper paths for the
> ordering and limit to be rejected as they have no relids.

Uh, what?  If you're speaking of 8cad5adb9, the only case I'm aware of
where it might be a performance issue is if you have "GROUP BY
local-expr", which seems like a pretty weird thing to need to push
to the remote side, since the local expression would be effectively
a constant on the far end.

You could imagine working around it by discarding such GROUP BY
columns in what's to be sent to the far end, and if you end up with
an empty GROUP BY clause, sending "HAVING TRUE" instead to keep the
semantics the same.  But I'm uninterested in stacking yet more
klugery atop 8cad5adb9 so far as the community code is concerned.
The right way forward, as noted in the commit message, is to revert
that patch in favor of adding some API that will let the FDW control
how setrefs.c processes a ForeignScan's expressions.  We just can't
do that in released branches :-(.

It's possible that we should treat this issue as an open item for v12
instead of letting it slide to v13 or later.  But I think people would
only be amenable to that if you can point to a non-silly example where
failure to push the GROUP BY creates a performance issue.

regards, tom lane




Re: BUG #15781: subselect on foreign table (postgres_fdw) can crash (segfault)

2019-05-24 Thread Sean Johnston
Not sure if this is the right avenue to follow up on this. The patch works
fine. However, we're working on a modified version of the postgres_fdw in
which we're trying to push as much as possible to the remote nodes,
including ordering and limits. The patch causes the upper paths for the
ordering and limit to be rejected as they have no relids. I've had a quick
look at maybe how to pull in relids in the fdw private data but its not
obvious. Obviously this isn't mainstream postgres but just wondering if
anyone has looked into issues with regards to pushing order/limit to remote
nodes for fdw.

On Sat, Apr 27, 2019 at 3:47 PM Tom Lane  wrote:

> Etsuro Fujita  writes:
> > On Sat, Apr 27, 2019 at 2:10 AM Tom Lane  wrote:
> >> If we don't want to change what the core code does with fdw_exprs,
> >> I think the only way to fix it is to hack postgres_fdw so that it
> >> won't generate plans involving the problematic case.
>
> > Seems reasonable.
>
> >> See attached.
>
> > I read the patch.  It looks good to me.  I didn't test it, though.
>
> Thanks for looking!  Have a good vacation ...
>
> regards, tom lane
>


Re: BUG #15781: subselect on foreign table (postgres_fdw) can crash (segfault)

2019-04-27 Thread Tom Lane
Etsuro Fujita  writes:
> On Sat, Apr 27, 2019 at 2:10 AM Tom Lane  wrote:
>> If we don't want to change what the core code does with fdw_exprs,
>> I think the only way to fix it is to hack postgres_fdw so that it
>> won't generate plans involving the problematic case.

> Seems reasonable.

>> See attached.

> I read the patch.  It looks good to me.  I didn't test it, though.

Thanks for looking!  Have a good vacation ...

regards, tom lane




Re: BUG #15781: subselect on foreign table (postgres_fdw) can crash (segfault)

2019-04-27 Thread Etsuro Fujita
On Sat, Apr 27, 2019 at 2:10 AM Tom Lane  wrote:

> > (2019/04/26 3:24), Tom Lane wrote:
> >> If we do leave it like this, then the only way for postgres_fdw to
> >> avoid trouble is to not have any entries in fdw_exprs that exactly
> >> match entries in fdw_scan_tlist.  So that pretty much devolves back
> >> to what I said before: don't ship values to the far end that are
> >> just going to be fed back as-is.  But now it's a correctness
> >> requirement not just an optimization.

> Well, the releases are coming up fast, so I spent some time on this.
> If we don't want to change what the core code does with fdw_exprs,
> I think the only way to fix it is to hack postgres_fdw so that it
> won't generate plans involving the problematic case.

Seems reasonable.

> See attached.

I read the patch.  It looks good to me.  I didn't test it, though.

> We end up with slightly weird-looking plans if the troublesome Param
> is actually a GROUP BY expression, but if it's not, I think things
> are fine.  Maybe we could do something smarter about the GROUP BY case,
> but it seems weird enough to maybe not be worth additional trouble.

Agreed.

Thanks for working on this!

Best regards,
Etsuro Fujita




Re: BUG #15781: subselect on foreign table (postgres_fdw) can crash (segfault)

2019-04-26 Thread Tom Lane
Etsuro Fujita  writes:
> (2019/04/26 3:24), Tom Lane wrote:
>> If we do leave it like this, then the only way for postgres_fdw to
>> avoid trouble is to not have any entries in fdw_exprs that exactly
>> match entries in fdw_scan_tlist.  So that pretty much devolves back
>> to what I said before: don't ship values to the far end that are
>> just going to be fed back as-is.  But now it's a correctness
>> requirement not just an optimization.

> I worked on the ORDERED/FINAL-upperrel pushdown for PG12, but I don't 
> think that that's directly related to this issue, because this arises in 
> PG11 already.  Maybe I'm missing something, but the 
> UPPERREL_GROUP_AGG-upperrel pushdown added in PG10 is likely to be 
> related to this.  I'll work on this issue unless somebody wants to.  But 
> I'll take a 10-day vocation from tomorrow, so I don't think I'll be able 
> to fix this in the next minor release...

Well, the releases are coming up fast, so I spent some time on this.
If we don't want to change what the core code does with fdw_exprs,
I think the only way to fix it is to hack postgres_fdw so that it
won't generate plans involving the problematic case.  See attached.

We end up with slightly weird-looking plans if the troublesome Param
is actually a GROUP BY expression, but if it's not, I think things
are fine.  Maybe we could do something smarter about the GROUP BY case,
but it seems weird enough to maybe not be worth additional trouble.

regards, tom lane

diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index 079406f..c4e3311 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -842,6 +842,55 @@ foreign_expr_walker(Node *node,
 }
 
 /*
+ * Returns true if given expr is something we'd have to send the value of
+ * to the foreign server.
+ *
+ * This should return true when the expression is a shippable node that
+ * deparseExpr would add to context->params_list.  Note that we don't care
+ * if the expression *contains* such a node, only whether one appears at top
+ * level.  We need this to detect cases where setrefs.c would recognize a
+ * false match between an fdw_exprs item (which came from the params_list)
+ * and an entry in fdw_scan_tlist (which we're considering putting the given
+ * expression into).
+ */
+bool
+is_foreign_param(PlannerInfo *root,
+ RelOptInfo *baserel,
+ Expr *expr)
+{
+	if (expr == NULL)
+		return false;
+
+	switch (nodeTag(expr))
+	{
+		case T_Var:
+			{
+/* It would have to be sent unless it's a foreign Var */
+Var		   *var = (Var *) expr;
+PgFdwRelationInfo *fpinfo = (PgFdwRelationInfo *) (baserel->fdw_private);
+Relids		relids;
+
+if (IS_UPPER_REL(baserel))
+	relids = fpinfo->outerrel->relids;
+else
+	relids = baserel->relids;
+
+if (bms_is_member(var->varno, relids) && var->varlevelsup == 0)
+	return false;	/* foreign Var, so not a param */
+else
+	return true;	/* it'd have to be a param */
+break;
+			}
+		case T_Param:
+			/* Params always have to be sent to the foreign server */
+			return true;
+		default:
+			break;
+	}
+	return false;
+}
+
+/*
  * Convert type OID + typmod info into a type name we can ship to the remote
  * server.  Someplace else had better have verified that this type name is
  * expected to be known on the remote end.
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 48ea67a..e034b03 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -2707,6 +2707,46 @@ select sum(c1) from ft1 group by c2 having avg(c1 * (random() <= 1)::int) > 100
Remote SQL: SELECT "C 1", c2 FROM "S 1"."T 1"
 (10 rows)
 
+-- Remote aggregate in combination with a local Param (for the output
+-- of an initplan) can be trouble, per bug #15781
+explain (verbose, costs off)
+select exists(select 1 from pg_enum), sum(c1) from ft1;
+QUERY PLAN
+--
+ Foreign Scan
+   Output: $0, (sum(ft1.c1))
+   Relations: Aggregate on (public.ft1)
+   Remote SQL: SELECT sum("C 1") FROM "S 1"."T 1"
+   InitPlan 1 (returns $0)
+ ->  Seq Scan on pg_catalog.pg_enum
+(6 rows)
+
+select exists(select 1 from pg_enum), sum(c1) from ft1;
+ exists |  sum   
++
+ t  | 500500
+(1 row)
+
+explain (verbose, costs off)
+select exists(select 1 from pg_enum), sum(c1) from ft1 group by 1;
+QUERY PLAN 
+---
+ GroupAggregate
+   Output: ($0), sum(ft1.c1)
+   Group Key: $0
+   InitPlan 1 (returns $0)
+ ->  Seq Scan on pg_catalog.pg_enum
+   ->  Foreign Scan on public.ft1
+ Output: $0, ft1.c1
+ Remote SQL: SELECT "C 1" FROM "S 1"."T 1"
+(8 rows)
+
+select exists(select 1 from pg_enum), sum(c1) from ft1 group by 1;
+ 

Re: BUG #15781: subselect on foreign table (postgres_fdw) can crash (segfault)

2019-04-26 Thread Etsuro Fujita

(2019/04/26 3:24), Tom Lane wrote:

PG Bug reporting form  writes:

[ this crashes if ft4 is a postgres_fdw foreign table: ]
select exists(select c1 from ft4), avg(c1) from ft4 where c1 = (select
max(c1) from ft4);


Hm, the max() subquery isn't necessary, this is sufficient:

select exists(select c1 from ft4), avg(c1) from ft4 where c1 = 42;

That produces a plan like

 QUERY PLAN
---
  Foreign Scan  (cost=200.07..246.67 rows=1 width=33)
Output: ($0), (avg(ft4.c1))
Relations: Aggregate on (public.ft4)
Remote SQL: SELECT $1::boolean, avg(c1) FROM "S 1"."T 3" WHERE ((c1 = 432))
InitPlan 1 (returns $0)
  ->   Foreign Scan on public.ft4 ft4_1  (cost=100.00..212.39 rows=3413 
width=0)
Remote SQL: SELECT NULL FROM "S 1"."T 3"
(7 rows)

Now one's first observation about that is that it's kind of dumb to send
the result of the locally-executed InitPlan over to the far end only to
read it back.  So maybe we should be thinking about how to avoid that.
We do avoid it for plain foreign scans:

regression=# explain verbose
  select exists(select c1 from ft4), * from ft4 where c1 = 42;
 QUERY PLAN
---
  Foreign Scan on public.ft4  (cost=200.03..226.15 rows=6 width=41)
Output: $0, ft4.c1, ft4.c2, ft4.c3
Remote SQL: SELECT c1, c2, c3 FROM "S 1"."T 3" WHERE ((c1 = 42))
InitPlan 1 (returns $0)
  ->   Foreign Scan on public.ft4 ft4_1  (cost=100.00..212.39 rows=3413 
width=0)
Remote SQL: SELECT NULL FROM "S 1"."T 3"
(6 rows)

and also for foreign joins:

regression=# explain verbose
  select exists(select c1 from ft4), * from ft4, ft4 ft4b where ft4.c1 = 42 and 
ft4b.c1 = 43;
   QUERY 
PLAN
--
  Foreign Scan  (cost=200.03..252.93 rows=36 width=81)
Output: $0, ft4.c1, ft4.c2, ft4.c3, ft4b.c1, ft4b.c2, ft4b.c3
Relations: (public.ft4) INNER JOIN (public.ft4 ft4b)
Remote SQL: SELECT r1.c1, r1.c2, r1.c3, r2.c1, r2.c2, r2.c3 FROM ("S 1"."T 3" r1 INNER JOIN 
"S 1"."T 3" r2 ON (((r2.c1 = 43)) AND ((r1.c1 = 42
InitPlan 1 (returns $0)
  ->   Foreign Scan on public.ft4 ft4_1  (cost=100.00..212.39 rows=3413 
width=0)
Remote SQL: SELECT NULL FROM "S 1"."T 3"
(7 rows)


but the code for upper-relation scans is apparently stupider than either
of those cases.

The proximate cause of the crash is that we have {PARAM 1}
(representing the output of the InitPlan) in the path's fdw_exprs, and
also the identical expression in fdw_scan_tlist, and that means that when
setrefs.c processes the ForeignScan node it thinks it should replace the
{PARAM 1} in fdw_exprs with a Var representing a reference to the
fdw_scan_tlist entry.  That would be fine if the fdw_exprs represented
expressions to be evaluated over the output of the foreign scan, but of
course they don't --- postgres_fdw uses fdw_exprs to compute values to be
sent to the remote end, instead.  So we crash at runtime because there's
no slot to supply such output to the fdw_exprs.

I was able to make the crash go away by removing this statement from
set_foreignscan_references:

 fscan->fdw_exprs = (List *)
 fix_upper_expr(root,
(Node *) fscan->fdw_exprs,
itlist,
INDEX_VAR,
rtoffset);

and we still pass check-world without that (which means we lack test
coverage, because the minimum that should happen to fdw_exprs is
fix_scan_list :-().  But I do not think that's an acceptable route to
a patch, because it amounts to having the core code know what the FDW
is using fdw_exprs for, and we explicitly disclaim any assumptions about
that.  fdw_exprs is specified to be processed the same as other
expressions in the same plan node, so I think this fix_upper_expr call
probably ought to stay like it is, even though it's not really the right
thing for postgres_fdw.  It might be the right thing for other FDWs.

(One could imagine, perhaps, having some flag in the ForeignPlan
node that tells setrefs.c what to do.  But that would be an API break
for FDWs, so it wouldn't be a back-patchable solution.)

(Actually, it seems to me that set_foreignscan_references is *already*
assuming too much about the semantics of these expressions in upper
plan nodes, so maybe we need to have a chat about that anyway.)

If we do leave it like this, then the only way for postgres_fdw to
avoid trouble is to not have any entries in fdw_exprs that exactly
match entries in fdw_scan_tlist.  So that pretty much devolves back
to what I said before: don't ship values to 

Re: BUG #15781: subselect on foreign table (postgres_fdw) can crash (segfault)

2019-04-25 Thread Tom Lane
I wrote:
> Well, that's just coincidental for the case where the problem fdw_expr is
> a Param.  I haven't tried to figure out exactly what upper-path generation
> thinks it should put into fdw_exprs, but is it really only Params?

Oh, this is interesting:

regression=# explain verbose 
 select exists(select c1 from ft4) as c, avg(c1) from ft4 where ft4.c1 = 42;
QUERY PLAN  
   
---
 Foreign Scan  (cost=200.07..246.67 rows=1 width=33)
   Output: ($0), (avg(ft4.c1))
   Relations: Aggregate on (public.ft4)
   Remote SQL: SELECT $1::boolean, avg(c1) FROM "S 1"."T 3" WHERE ((c1 = 42))
   InitPlan 1 (returns $0)
 ->  Foreign Scan on public.ft4 ft4_1  (cost=100.00..212.39 rows=3413 
width=0)
   Remote SQL: SELECT NULL FROM "S 1"."T 3"
(7 rows)

That would crash if I tried to execute it, but:

regression=# explain verbose 
 select case when exists(select c1 from ft4) then 1 else 2 end as c, avg(c1) 
from ft4 where ft4.c1 = 42;
QUERY PLAN  
   
---
 Foreign Scan  (cost=200.07..246.67 rows=1 width=36)
   Output: CASE WHEN $0 THEN 1 ELSE 2 END, (avg(ft4.c1))
   Relations: Aggregate on (public.ft4)
   Remote SQL: SELECT avg(c1) FROM "S 1"."T 3" WHERE ((c1 = 42))
   InitPlan 1 (returns $0)
 ->  Foreign Scan on public.ft4 ft4_1  (cost=100.00..212.39 rows=3413 
width=0)
   Remote SQL: SELECT NULL FROM "S 1"."T 3"
(7 rows)

That's just fine.  So there is something stupid happening in creation
of the fdw_scan_tlist when a relation tlist item is a bare Param,
which doesn't happen if the same Param is buried in a larger expression.

regards, tom lane




Re: BUG #15781: subselect on foreign table (postgres_fdw) can crash (segfault)

2019-04-25 Thread Tom Lane
Dmitry Dolgov <9erthali...@gmail.com> writes:
>> On Thu, Apr 25, 2019 at 8:24 PM Tom Lane  wrote:
>> The proximate cause of the crash is that we have {PARAM 1}
>> (representing the output of the InitPlan) in the path's fdw_exprs, and
>> also the identical expression in fdw_scan_tlist, and that means that when
>> setrefs.c processes the ForeignScan node it thinks it should replace the
>> {PARAM 1} in fdw_exprs with a Var representing a reference to the
>> fdw_scan_tlist entry.

> I've noticed, that it behaves like that since f9f63ed1f2e5 (originally I found
> it pretty strange, but after this explanation it does make sense). As an
> experiment, I've changed the position of condition of
> if (context->subplan_itlist->has_non_vars)
> back - it also made problem to disappear,

Well, that's just coincidental for the case where the problem fdw_expr is
a Param.  I haven't tried to figure out exactly what upper-path generation
thinks it should put into fdw_exprs, but is it really only Params?

Anyway, ideally we'd not have any entries in fdw_scan_tlist that don't
include at least one foreign Var, and then there can't be a false match.

regards, tom lane




Re: BUG #15781: subselect on foreign table (postgres_fdw) can crash (segfault)

2019-04-25 Thread Dmitry Dolgov
> On Thu, Apr 25, 2019 at 8:24 PM Tom Lane  wrote:
>
> The proximate cause of the crash is that we have {PARAM 1}
> (representing the output of the InitPlan) in the path's fdw_exprs, and
> also the identical expression in fdw_scan_tlist, and that means that when
> setrefs.c processes the ForeignScan node it thinks it should replace the
> {PARAM 1} in fdw_exprs with a Var representing a reference to the
> fdw_scan_tlist entry.

I've noticed, that it behaves like that since f9f63ed1f2e5 (originally I found
it pretty strange, but after this explanation it does make sense). As an
experiment, I've changed the position of condition of

if (context->subplan_itlist->has_non_vars)

back - it also made problem to disappear, and what was interesting is that the
test case for update (exactly what this commit was fixing) is not crashing
either. I've checked on the commit right before f9f63ed1f2e5, without mentioned
reordering there is a crash, but I couldn't reproduce it on the master.




Re: BUG #15781: subselect on foreign table (postgres_fdw) can crash (segfault)

2019-04-25 Thread Tom Lane
PG Bug reporting form  writes:
> [ this crashes if ft4 is a postgres_fdw foreign table: ]
> select exists(select c1 from ft4), avg(c1) from ft4 where c1 = (select
> max(c1) from ft4);

Hm, the max() subquery isn't necessary, this is sufficient:

select exists(select c1 from ft4), avg(c1) from ft4 where c1 = 42;

That produces a plan like

QUERY PLAN  
   
---
 Foreign Scan  (cost=200.07..246.67 rows=1 width=33)
   Output: ($0), (avg(ft4.c1))
   Relations: Aggregate on (public.ft4)
   Remote SQL: SELECT $1::boolean, avg(c1) FROM "S 1"."T 3" WHERE ((c1 = 432))
   InitPlan 1 (returns $0)
 ->  Foreign Scan on public.ft4 ft4_1  (cost=100.00..212.39 rows=3413 
width=0)
   Remote SQL: SELECT NULL FROM "S 1"."T 3"
(7 rows)

Now one's first observation about that is that it's kind of dumb to send
the result of the locally-executed InitPlan over to the far end only to
read it back.  So maybe we should be thinking about how to avoid that.
We do avoid it for plain foreign scans:

regression=# explain verbose 
 select exists(select c1 from ft4), * from ft4 where c1 = 42;
QUERY PLAN  
   
---
 Foreign Scan on public.ft4  (cost=200.03..226.15 rows=6 width=41)
   Output: $0, ft4.c1, ft4.c2, ft4.c3
   Remote SQL: SELECT c1, c2, c3 FROM "S 1"."T 3" WHERE ((c1 = 42))
   InitPlan 1 (returns $0)
 ->  Foreign Scan on public.ft4 ft4_1  (cost=100.00..212.39 rows=3413 
width=0)
   Remote SQL: SELECT NULL FROM "S 1"."T 3"
(6 rows)

and also for foreign joins:

regression=# explain verbose 
 select exists(select c1 from ft4), * from ft4, ft4 ft4b where ft4.c1 = 42 and 
ft4b.c1 = 43;
  QUERY 
PLAN  
--
 Foreign Scan  (cost=200.03..252.93 rows=36 width=81)
   Output: $0, ft4.c1, ft4.c2, ft4.c3, ft4b.c1, ft4b.c2, ft4b.c3
   Relations: (public.ft4) INNER JOIN (public.ft4 ft4b)
   Remote SQL: SELECT r1.c1, r1.c2, r1.c3, r2.c1, r2.c2, r2.c3 FROM ("S 1"."T 
3" r1 INNER JOIN "S 1"."T 3" r2 ON (((r2.c1 = 43)) AND ((r1.c1 = 42
   InitPlan 1 (returns $0)
 ->  Foreign Scan on public.ft4 ft4_1  (cost=100.00..212.39 rows=3413 
width=0)
   Remote SQL: SELECT NULL FROM "S 1"."T 3"
(7 rows)


but the code for upper-relation scans is apparently stupider than either
of those cases.

The proximate cause of the crash is that we have {PARAM 1}
(representing the output of the InitPlan) in the path's fdw_exprs, and
also the identical expression in fdw_scan_tlist, and that means that when
setrefs.c processes the ForeignScan node it thinks it should replace the
{PARAM 1} in fdw_exprs with a Var representing a reference to the
fdw_scan_tlist entry.  That would be fine if the fdw_exprs represented
expressions to be evaluated over the output of the foreign scan, but of
course they don't --- postgres_fdw uses fdw_exprs to compute values to be
sent to the remote end, instead.  So we crash at runtime because there's
no slot to supply such output to the fdw_exprs.

I was able to make the crash go away by removing this statement from
set_foreignscan_references:

fscan->fdw_exprs = (List *)
fix_upper_expr(root,
   (Node *) fscan->fdw_exprs,
   itlist,
   INDEX_VAR,
   rtoffset);

and we still pass check-world without that (which means we lack test
coverage, because the minimum that should happen to fdw_exprs is
fix_scan_list :-().  But I do not think that's an acceptable route to
a patch, because it amounts to having the core code know what the FDW
is using fdw_exprs for, and we explicitly disclaim any assumptions about
that.  fdw_exprs is specified to be processed the same as other
expressions in the same plan node, so I think this fix_upper_expr call
probably ought to stay like it is, even though it's not really the right
thing for postgres_fdw.  It might be the right thing for other FDWs.

(One could imagine, perhaps, having some flag in the ForeignPlan
node that tells setrefs.c what to do.  But that would be an API break
for FDWs, so it wouldn't be a back-patchable solution.)

(Actually, it seems to me that set_foreignscan_references is *already*
assuming too much about the semantics of these expressions in upper
plan nodes, so maybe we need to have a chat about that anyway.)

If we do leave it like this, then the only way for postgres_fdw to
avoid trouble is to not have any entries in fdw_exprs that exactly
match entries in fdw_scan_tlist.