Re: [HACKERS] Oddity in handling of cached plans for FDW queries

2016-07-22 Thread Tom Lane
Etsuro Fujita  writes:
> On 2016/07/21 16:30, Etsuro Fujita wrote:
>> One thing I'd like to discuss is GetUserMappingById/GetUserMappingId.
>> Though those functions aren't used in any places, I didn't take them
>> out, because I thought somebody else would need them someday.  But
>> considering that user mapping OIDs now aren't available in RelOptInfos,
>> at least the former might be rather useless.  Should we keep them in core?

> I thought a bit more about this.  ISTM that there isn't much value in  
> the latter either, because we can use GetUserMapping and get that OID  
> from the result, instead of using the latter, so I'd vote for removing  
> those functions.  Attached is a patch for that.

Agreed - pushed.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Oddity in handling of cached plans for FDW queries

2016-07-21 Thread Etsuro Fujita

On 2016/07/21 16:30, Etsuro Fujita wrote:

One thing I'd like to discuss is GetUserMappingById/GetUserMappingId.
Though those functions aren't used in any places, I didn't take them
out, because I thought somebody else would need them someday.  But
considering that user mapping OIDs now aren't available in RelOptInfos,
at least the former might be rather useless.  Should we keep them in core?


I thought a bit more about this.  ISTM that there isn't much value in  
the latter either, because we can use GetUserMapping and get that OID  
from the result, instead of using the latter, so I'd vote for removing  
those functions.  Attached is a patch for that.


Best regards,
Etsuro Fujita


remove-usermapping-helper-funcs.patch
Description: binary/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Oddity in handling of cached plans for FDW queries

2016-07-21 Thread Etsuro Fujita

On 2016/07/16 6:25, Tom Lane wrote:

Pushed, after fooling around with it some more so that it would cover the
case we discussed where the join could be allowed if we restrict the plan
to be executed by the owner of a view used in the query.


I left that for future work, but I'm happy to have that in 9.6.  Thanks  
for improving and committing the patch!


One thing I'd like to discuss is GetUserMappingById/GetUserMappingId.  
Though those functions aren't used in any places, I didn't take them  
out, because I thought somebody else would need them someday.  But  
considering that user mapping OIDs now aren't available in RelOptInfos,  
at least the former might be rather useless.  Should we keep them in core?


Best regards,
Etsuro Fujita




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Oddity in handling of cached plans for FDW queries

2016-07-20 Thread Robert Haas
On Wed, Jul 20, 2016 at 10:15 AM, Tom Lane  wrote:
> Robert Haas  writes:
>> Mumble.  Why, exactly, was this a good idea?  The upside of commit
>> 45639a0525a58a2700cf46d4c934d6de78349dac is only that you do fewer
>> plan invalidations, but surely that's not a significant benefit for
>> most people: user mappings don't change that often.  On the downside,
>> there are now cases where joins would have gotten pushed down
>> previously and now they won't.  In other words, you've saved some
>> replanning activity at the cost of delivering worse plans.  That seems
>> pretty suspect to me, although I grant that the scenarios where either
>> the old or the new behavior is actually a problem are all somewhat off
>> the beaten path.
>
> I think that you are undervaluing the removal of user-mapping-based plan
> invalidation.  That was never more than a kluge, and here is the reason:
> we have no way to lock user mappings.  The system whereby we invalidate
> plans as a consequence of table DDL changes is bulletproof, because we
> (re) acquire locks on the tables used in the plan, then check for
> invalidation signals, before deciding whether the plan is stale.  The
> corresponding scenario where a user mapping changes between that check
> and execution time is unprotected, so that we could end up using a plan
> that is insecure for the mappings selected at execution.

OK, that's a fair point.  Thanks for explaining.

> Another way we could have removed the race condition is the suggestion
> made upthread of embedding the user mapping details right into the plan
> instead of looking them up afresh at execution.  But I didn't much like
> that approach, per upthread discussion.

OK.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Oddity in handling of cached plans for FDW queries

2016-07-20 Thread Tom Lane
Robert Haas  writes:
> Mumble.  Why, exactly, was this a good idea?  The upside of commit
> 45639a0525a58a2700cf46d4c934d6de78349dac is only that you do fewer
> plan invalidations, but surely that's not a significant benefit for
> most people: user mappings don't change that often.  On the downside,
> there are now cases where joins would have gotten pushed down
> previously and now they won't.  In other words, you've saved some
> replanning activity at the cost of delivering worse plans.  That seems
> pretty suspect to me, although I grant that the scenarios where either
> the old or the new behavior is actually a problem are all somewhat off
> the beaten path.

I think that you are undervaluing the removal of user-mapping-based plan
invalidation.  That was never more than a kluge, and here is the reason:
we have no way to lock user mappings.  The system whereby we invalidate
plans as a consequence of table DDL changes is bulletproof, because we
(re) acquire locks on the tables used in the plan, then check for
invalidation signals, before deciding whether the plan is stale.  The
corresponding scenario where a user mapping changes between that check
and execution time is unprotected, so that we could end up using a plan
that is insecure for the mappings selected at execution.

Another way we could have removed the race condition is the suggestion
made upthread of embedding the user mapping details right into the plan
instead of looking them up afresh at execution.  But I didn't much like
that approach, per upthread discussion.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Oddity in handling of cached plans for FDW queries

2016-07-20 Thread Robert Haas
On Fri, Jul 15, 2016 at 5:25 PM, Tom Lane  wrote:
> I wrote:
>> Etsuro Fujita  writes:
>>> Here is a patch for that redesign proposed by you; reverts commits
>>> fbe5a3fb73102c2cfec114a67943f4474383 and
>>> 5d4171d1c70edfe3e9be1de9e66603af28e3afe1, adds changes for that redesign
>>> to the core, and adjusts the postgres_fdw code to that changes.  Also, I
>>> rearranged the postgres_fdw regression tests to match that changes.
>
>> OK, I'll review this later today.
>
> Pushed, after fooling around with it some more so that it would cover the
> case we discussed where the join could be allowed if we restrict the plan
> to be executed by the owner of a view used in the query.

Mumble.  Why, exactly, was this a good idea?  The upside of commit
45639a0525a58a2700cf46d4c934d6de78349dac is only that you do fewer
plan invalidations, but surely that's not a significant benefit for
most people: user mappings don't change that often.  On the downside,
there are now cases where joins would have gotten pushed down
previously and now they won't.  In other words, you've saved some
replanning activity at the cost of delivering worse plans.  That seems
pretty suspect to me, although I grant that the scenarios where either
the old or the new behavior is actually a problem are all somewhat off
the beaten path.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Oddity in handling of cached plans for FDW queries

2016-07-15 Thread Tom Lane
I wrote:
> Etsuro Fujita  writes:
>> Here is a patch for that redesign proposed by you; reverts commits  
>> fbe5a3fb73102c2cfec114a67943f4474383 and  
>> 5d4171d1c70edfe3e9be1de9e66603af28e3afe1, adds changes for that redesign  
>> to the core, and adjusts the postgres_fdw code to that changes.  Also, I  
>> rearranged the postgres_fdw regression tests to match that changes.

> OK, I'll review this later today.

Pushed, after fooling around with it some more so that it would cover the
case we discussed where the join could be allowed if we restrict the plan
to be executed by the owner of a view used in the query.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Oddity in handling of cached plans for FDW queries

2016-07-15 Thread Tom Lane
Etsuro Fujita  writes:
> On 2016/07/15 11:48, Tom Lane wrote:
>> If we add a mechanism to let us know that the FDW doesn't care, we could
>> relax the requirement for such cases.  I don't have a strong opinion on
>> whether that's worthwhile.  It'd depend in part on how many FDWs there
>> are that don't care, versus those that do; and I have no idea about that.

> So, I'd vote for leaving that for future work if necessary.

Makes sense to me.

> Here is a patch for that redesign proposed by you; reverts commits  
> fbe5a3fb73102c2cfec114a67943f4474383 and  
> 5d4171d1c70edfe3e9be1de9e66603af28e3afe1, adds changes for that redesign  
> to the core, and adjusts the postgres_fdw code to that changes.  Also, I  
> rearranged the postgres_fdw regression tests to match that changes.

OK, I'll review this later today.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Oddity in handling of cached plans for FDW queries

2016-07-15 Thread Etsuro Fujita

On 2016/07/15 11:48, Tom Lane wrote:

Etsuro Fujita  writes:

One thing I'm not sure about is: should we insist that a join can be
pushed down only if the checkAsUser fields of the relevant RTEs are
equal in the case where user mappings are meaningless to the FDW, like
file_fdw?



If we add a mechanism to let us know that the FDW doesn't care, we could
relax the requirement for such cases.  I don't have a strong opinion on
whether that's worthwhile.  It'd depend in part on how many FDWs there
are that don't care, versus those that do; and I have no idea about that.


So, I'd vote for leaving that for future work if necessary.

Here is a patch for that redesign proposed by you; reverts commits  
fbe5a3fb73102c2cfec114a67943f4474383 and  
5d4171d1c70edfe3e9be1de9e66603af28e3afe1, adds changes for that redesign  
to the core, and adjusts the postgres_fdw code to that changes.  Also, I  
rearranged the postgres_fdw regression tests to match that changes.


Best regards,
Etsuro Fujita
*** a/contrib/postgres_fdw/expected/postgres_fdw.out
--- b/contrib/postgres_fdw/expected/postgres_fdw.out
***
*** 2053,2207  SELECT t1."C 1" FROM "S 1"."T 1" t1, LATERAL (SELECT DISTINCT t2.c1, t3.c1 FROM
 1
  (10 rows)
  
- -- create another user for permission, user mapping, effective user tests
- CREATE USER view_owner;
- -- grant privileges on ft4 and ft5 to view_owner
- GRANT ALL ON ft4 TO view_owner;
- GRANT ALL ON ft5 TO view_owner;
- -- prepare statement with current session user
- PREPARE join_stmt AS SELECT t1.c1, t2.c1 FROM ft4 t1 LEFT JOIN ft5 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c1, t2.c1 OFFSET 10 LIMIT 10;
- EXPLAIN (COSTS OFF, VERBOSE) EXECUTE join_stmt;
- QUERY PLAN 
- ---
-  Limit
-Output: t1.c1, t2.c1
-->  Foreign Scan
-  Output: t1.c1, t2.c1
-  Relations: (public.ft4 t1) LEFT JOIN (public.ft5 t2)
-  Remote SQL: SELECT r1.c1, r2.c1 FROM ("S 1"."T 3" r1 LEFT JOIN "S 1"."T 4" r2 ON (((r1.c1 = r2.c1 ORDER BY r1.c1 ASC NULLS LAST, r2.c1 ASC NULLS LAST
- (6 rows)
- 
- EXECUTE join_stmt;
-  c1 | c1 
- +
-  22 |   
-  24 | 24
-  26 |   
-  28 |   
-  30 | 30
-  32 |   
-  34 |   
-  36 | 36
-  38 |   
-  40 |   
- (10 rows)
- 
- -- change the session user to view_owner and execute the statement. Because of
- -- change in session user, the plan should get invalidated and created again.
- -- The join will not be pushed down since the joining relations do not have a
- -- valid user mapping.
- SET SESSION ROLE view_owner;
- EXPLAIN (COSTS OFF, VERBOSE) EXECUTE join_stmt;
- QUERY PLAN
- --
-  Limit
-Output: t1.c1, t2.c1
-->  Sort
-  Output: t1.c1, t2.c1
-  Sort Key: t1.c1, t2.c1
-  ->  Hash Left Join
-Output: t1.c1, t2.c1
-Hash Cond: (t1.c1 = t2.c1)
-->  Foreign Scan on public.ft4 t1
-  Output: t1.c1, t1.c2, t1.c3
-  Remote SQL: SELECT c1 FROM "S 1"."T 3"
-->  Hash
-  Output: t2.c1
-  ->  Foreign Scan on public.ft5 t2
-Output: t2.c1
-Remote SQL: SELECT c1 FROM "S 1"."T 4"
- (16 rows)
- 
- RESET ROLE;
- DEALLOCATE join_stmt;
- CREATE VIEW v_ft5 AS SELECT * FROM ft5;
- -- change owner of v_ft5 to view_owner so that the effective user for scan on
- -- ft5 is view_owner and not the current user.
- ALTER VIEW v_ft5 OWNER TO view_owner;
- -- create a public user mapping for loopback server
- -- drop user mapping for current_user.
- DROP USER MAPPING FOR CURRENT_USER SERVER loopback;
- CREATE USER MAPPING FOR PUBLIC SERVER loopback;
- -- different effective user for permission check, but same user mapping for the
- -- joining sides, join pushed down, no result expected.
- PREPARE join_stmt AS SELECT t1.c1, t2.c1 FROM ft5 t1 JOIN v_ft5 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c1, t2.c1 OFFSET 100 LIMIT 10;
- EXPLAIN (COSTS false, VERBOSE) EXECUTE join_stmt;
-   QUERY PLAN  
- --
-  Limit
-Output: t1.c1, ft5.c1
-->  Foreign Scan
-  Output: t1.c1, ft5.c1
-  Relations: (public.ft5 t1) INNER JOIN (public.ft5)
-  Remote SQL: SELECT r1.c1, r6.c1 FROM ("S 1"."T 4" r1 INNER JOIN "S 1"."T 4" r6 ON (((r1.c1 

Re: [HACKERS] Oddity in handling of cached plans for FDW queries

2016-07-14 Thread Ashutosh Bapat
On Fri, Jul 15, 2016 at 12:19 AM, Tom Lane  wrote:

> I wrote:
> > I concur with Etsuro-san's dislike for hasForeignJoin; that flag is
> > underspecified and doesn't convey nearly enough information.  I do not
> > think a uses_user_mapping flag is much better.  ISTM what should happen
> is
> > that any time we decide to push down a foreign join, we should record the
> > identity of the common user mapping that made that pushdown possible in
> > the plan's invalItems list.  That would make it possible to invalidate
> > only the relevant plans when a user mapping is changed.
>
> I thought a bit more about this and realized that the above doesn't work
> too well.  Initially, a join might have been pushed down on the strength
> of both userids mapping to a PUBLIC user mapping for the server.  If now
> someone does CREATE USER MAPPING to install a new mapping for one of
> those userids, we should invalidate the plan --- but there is certainly
> not going to be anything in the plan matching the new user mapping.
>

I replied to your earlier mail before reading this. Ok, so we agree there.


>
> > Another way we could attack it would be to record the foreign server OID
> > as an invalItem for any query that has more than one foreign table
> > belonging to the same foreign server.  Then, invalidate whenever any user
> > mapping for that server changes.
>
> And that doesn't work so well either, because the most that the plan inval
> code is going to have its hands on is (a hash of) the OID of the user
> mapping that changed.  We can't tell which server that's for.
>

I assumed that there is a way to get server's oid from user mapping or we
record it to be passed to the invalidation logic. Looks like there's no
easy way to do that.


>
> On reflection, it seems to me that we've gone wrong by tying planning to
> equality of user mappings at all, and the best way to get out of this is
> to not do that.  Instead, let's insist that a join can be pushed down only
> if the checkAsUser fields of the relevant RTEs are equal.  If they are,
> then the same user mapping must apply to both at runtime, whatever it is
> --- and we don't need to predetermine that.  With this approach, the need
> for plan invalidation due to user mapping changes goes away entirely.
>

I have already explained in my earlier mail, that the problem you described
doesn't exist. With the invalidation logic we are able to also support
pushing down joins between table with different effective user.


>
> This doesn't cost us anything at all in simple cases such as direct
> execution of a query, because all the checkAsUser fields will be equal
> (i.e., zero).  And it also doesn't hurt if the potential foreign join is
> encapsulated in a view, where all the checkAsUser fields would contain
> the OID of the view owner.
>

> The situation where we potentially lose something is a case like
> Etsuro-san's original example, where the query contains one foreign table
> reference coming from a view and one coming from the outer query, or maybe
> from a different view.  In the two-views case we would have to not push
> down the join if the views have different owners, even though perhaps both
> owners will use the PUBLIC mapping at runtime.  I think that's a narrow
> enough case that we can just live with not optimizing it.  In the
> view-and-outer-query case, the simplest answer is that we can't push down
> because zero is not equal to the view owner's OID.  We could make that a
> little better if we know that the query will be executed as the view
> owner, so that the relevant user IDs will be the same at runtime.  There
> is already some mechanism in the plan cache to track whether a plan
> depends on the identity of the user running it (for RLS), so we could use
> that to enforce that a plan containing such a pushed-down join is only run
> by the same user that owns the view.
>
>
Join between views on foreign tables or between foreign tables and views
containing foreign tables won't be rare. This feature is yet to be
released, so we don't know if PostgreSQL users would find it useful. But I
do see Oracle users joining views on dblink tables. I would guess same
would be the case in PostgreSQL. But I would like to hear from other
PostgreSQL FDW users. In such cases, being able to push down a join between
foreign tables across view boundaries will be useful.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


Re: [HACKERS] Oddity in handling of cached plans for FDW queries

2016-07-14 Thread Ashutosh Bapat
On Thu, Jul 14, 2016 at 10:02 PM, Tom Lane  wrote:

> Ashutosh Bapat  writes:
> > Exactly, for a rare scenario, should we be penalizing large number of
> plans
> > or just continue to use a previously prepared plan when an optimal plan
> has
> > become available because of changed condition. I would choose second over
> > the first as it doesn't make things worse than they are.
>
> You seem to be arguing from a premise that the worst consequence is use of
> an inefficient plan.  This is false; the worst consequence is use of a
> *wrong* plan, specifically one where a join has been pushed down but doing
> so is no longer legal because of a user mapping change.  That is, it was
> previously correct to access the two remote tables under the same remote
> userID but now they should be accessed under different userIDs.  The
> result will be that one or the other remote table is accessed under a
> userID that is not what the current user mappings say should be used.
> That could lead to either unexpected permissions failures (if the
> actually-used userID lacks needed permissions) or security breakdowns
> (if the actually-used userID has permissions to do something but the
> query should not have been allowed to do it).
>

> I do not think fixing this is optional.
>

There is confusion here. The current situation is this:
If at the time of preparing a statement a join can be pushed down to
foreign server, we mark that plan as hasForeignJoin. Before execution, if
user mapping changes (add/modify/drop), all plans with hasForeignJoin are
invalidated. This means the situation you are describing above does not
exist in the current code. So, nothing needs to be fixed.


>
> I concur with Etsuro-san's dislike for hasForeignJoin; that flag is
> underspecified and doesn't convey nearly enough information.  I do not
> think a uses_user_mapping flag is much better.  ISTM what should happen is
> that any time we decide to push down a foreign join, we should record the
> identity of the common user mapping that made that pushdown possible in
> the plan's invalItems list.  That would make it possible to invalidate
> only the relevant plans when a user mapping is changed.
>

This, as you have proposed, does not solve the problem either. Consider a
case when a public user mapping was associated with the foreign tables
being joined and the join was pushed down while preparing statement and
plan. Before execution, a user mapping was added which changed one of the
table's associated user mapping from public to the new one. Now the join is
not pushable, but the user mapping that was added/changed was not recorded
in invalItems, which contains the public user mapping. An improvement to
your proposal would be to invalidate plans related to a public user
mapping, when any user mapping is added/changed/dropped. But I guess, there
are also some problems there too. Adding/dropping/changing a user mapping
affects other user mappings as well, unlike say doing that to tables or
views. When implementing this, we debated whether it's worth to add that
much complexity. We favoured not adding the complexity that time. The code
as is not optimistic and might lead to using already created suboptimal
plans, but it doesn't have any known bugs there (assuming, I have explained
why the above situation doesn't lead to a bug).


>
> I believe what Ashutosh is focusing on is whether, when some user mapping
> changes, we should invalidate all plans that could potentially now use a
> pushed-down join but did not before.  I tend to agree that that's probably
> not something we want to do, as it would be very hard to identify just the
> plans likely to benefit.  So we would cause replanning of a lot of queries
> that won't actually benefit, and in this direction it is indeed only an
> optimization not a correctness matter.
>

Right, that's my argument.


>
> Another way we could attack it would be to record the foreign server OID
> as an invalItem for any query that has more than one foreign table
> belonging to the same foreign server.  Then, invalidate whenever any user
> mapping for that server changes.  This would take care of both the case
> where a join pushdown becomes possible and where it stops becoming
> possible.  But I'm not sure that the extra invalidations would be
> worthwhile.
>

Yes. That wouldn't have the problem I described above. Again, I am not sure
whether it's worth making code complex for that case. User mappings are not
something added/dropped/modified very frequently.


>
> I'm not excited about Etsuro-san's proposal to record user mapping info
> in the plan and skip execution-time mapping lookups altogether.  I think
> (1) that's solving a problem that hasn't been proven to be a problem,
> (2) it doesn't help unless the FDW code is changed to take advantage of
> it, which is unlikely to happen for third-party FDWs, and (3) it opens
> the door to a class of new bugs, as any 

Re: [HACKERS] Oddity in handling of cached plans for FDW queries

2016-07-14 Thread Tom Lane
Etsuro Fujita  writes:
> One thing I'm not sure about is: should we insist that a join can be  
> pushed down only if the checkAsUser fields of the relevant RTEs are  
> equal in the case where user mappings are meaningless to the FDW, like  
> file_fdw?

If we add a mechanism to let us know that the FDW doesn't care, we could
relax the requirement for such cases.  I don't have a strong opinion on
whether that's worthwhile.  It'd depend in part on how many FDWs there
are that don't care, versus those that do; and I have no idea about that.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Oddity in handling of cached plans for FDW queries

2016-07-14 Thread Etsuro Fujita

On 2016/07/15 3:49, Tom Lane wrote:

On reflection, it seems to me that we've gone wrong by tying planning to
equality of user mappings at all, and the best way to get out of this is
to not do that.  Instead, let's insist that a join can be pushed down only
if the checkAsUser fields of the relevant RTEs are equal.  If they are,
then the same user mapping must apply to both at runtime, whatever it is
--- and we don't need to predetermine that.  With this approach, the need
for plan invalidation due to user mapping changes goes away entirely.


+1


The situation where we potentially lose something is a case like
Etsuro-san's original example, where the query contains one foreign table
reference coming from a view and one coming from the outer query, or maybe
from a different view.  In the two-views case we would have to not push
down the join if the views have different owners, even though perhaps both
owners will use the PUBLIC mapping at runtime.  I think that's a narrow
enough case that we can just live with not optimizing it.  In the
view-and-outer-query case, the simplest answer is that we can't push down
because zero is not equal to the view owner's OID.  We could make that a
little better if we know that the query will be executed as the view
owner, so that the relevant user IDs will be the same at runtime.  There
is already some mechanism in the plan cache to track whether a plan
depends on the identity of the user running it (for RLS), so we could use
that to enforce that a plan containing such a pushed-down join is only run
by the same user that owns the view.


Seems reasonable to me.

One thing I'm not sure about is: should we insist that a join can be  
pushed down only if the checkAsUser fields of the relevant RTEs are  
equal in the case where user mappings are meaningless to the FDW, like  
file_fdw?


Best regards,
Etsuro Fujita




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Oddity in handling of cached plans for FDW queries

2016-07-14 Thread Tom Lane
I wrote:
> I concur with Etsuro-san's dislike for hasForeignJoin; that flag is
> underspecified and doesn't convey nearly enough information.  I do not
> think a uses_user_mapping flag is much better.  ISTM what should happen is
> that any time we decide to push down a foreign join, we should record the
> identity of the common user mapping that made that pushdown possible in
> the plan's invalItems list.  That would make it possible to invalidate
> only the relevant plans when a user mapping is changed.

I thought a bit more about this and realized that the above doesn't work
too well.  Initially, a join might have been pushed down on the strength
of both userids mapping to a PUBLIC user mapping for the server.  If now
someone does CREATE USER MAPPING to install a new mapping for one of
those userids, we should invalidate the plan --- but there is certainly
not going to be anything in the plan matching the new user mapping.

> Another way we could attack it would be to record the foreign server OID
> as an invalItem for any query that has more than one foreign table
> belonging to the same foreign server.  Then, invalidate whenever any user
> mapping for that server changes.

And that doesn't work so well either, because the most that the plan inval
code is going to have its hands on is (a hash of) the OID of the user
mapping that changed.  We can't tell which server that's for.

On reflection, it seems to me that we've gone wrong by tying planning to
equality of user mappings at all, and the best way to get out of this is
to not do that.  Instead, let's insist that a join can be pushed down only
if the checkAsUser fields of the relevant RTEs are equal.  If they are,
then the same user mapping must apply to both at runtime, whatever it is
--- and we don't need to predetermine that.  With this approach, the need
for plan invalidation due to user mapping changes goes away entirely.

This doesn't cost us anything at all in simple cases such as direct
execution of a query, because all the checkAsUser fields will be equal
(i.e., zero).  And it also doesn't hurt if the potential foreign join is
encapsulated in a view, where all the checkAsUser fields would contain
the OID of the view owner.

The situation where we potentially lose something is a case like
Etsuro-san's original example, where the query contains one foreign table
reference coming from a view and one coming from the outer query, or maybe
from a different view.  In the two-views case we would have to not push
down the join if the views have different owners, even though perhaps both
owners will use the PUBLIC mapping at runtime.  I think that's a narrow
enough case that we can just live with not optimizing it.  In the
view-and-outer-query case, the simplest answer is that we can't push down
because zero is not equal to the view owner's OID.  We could make that a
little better if we know that the query will be executed as the view
owner, so that the relevant user IDs will be the same at runtime.  There
is already some mechanism in the plan cache to track whether a plan
depends on the identity of the user running it (for RLS), so we could use
that to enforce that a plan containing such a pushed-down join is only run
by the same user that owns the view.

Comments?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Oddity in handling of cached plans for FDW queries

2016-07-14 Thread Tom Lane
Ashutosh Bapat  writes:
> Exactly, for a rare scenario, should we be penalizing large number of plans
> or just continue to use a previously prepared plan when an optimal plan has
> become available because of changed condition. I would choose second over
> the first as it doesn't make things worse than they are.

You seem to be arguing from a premise that the worst consequence is use of
an inefficient plan.  This is false; the worst consequence is use of a
*wrong* plan, specifically one where a join has been pushed down but doing
so is no longer legal because of a user mapping change.  That is, it was
previously correct to access the two remote tables under the same remote
userID but now they should be accessed under different userIDs.  The
result will be that one or the other remote table is accessed under a
userID that is not what the current user mappings say should be used.
That could lead to either unexpected permissions failures (if the
actually-used userID lacks needed permissions) or security breakdowns
(if the actually-used userID has permissions to do something but the
query should not have been allowed to do it).

I do not think fixing this is optional.

I concur with Etsuro-san's dislike for hasForeignJoin; that flag is
underspecified and doesn't convey nearly enough information.  I do not
think a uses_user_mapping flag is much better.  ISTM what should happen is
that any time we decide to push down a foreign join, we should record the
identity of the common user mapping that made that pushdown possible in
the plan's invalItems list.  That would make it possible to invalidate
only the relevant plans when a user mapping is changed.

I believe what Ashutosh is focusing on is whether, when some user mapping
changes, we should invalidate all plans that could potentially now use a
pushed-down join but did not before.  I tend to agree that that's probably
not something we want to do, as it would be very hard to identify just the
plans likely to benefit.  So we would cause replanning of a lot of queries
that won't actually benefit, and in this direction it is indeed only an
optimization not a correctness matter.

Another way we could attack it would be to record the foreign server OID
as an invalItem for any query that has more than one foreign table
belonging to the same foreign server.  Then, invalidate whenever any user
mapping for that server changes.  This would take care of both the case
where a join pushdown becomes possible and where it stops becoming
possible.  But I'm not sure that the extra invalidations would be
worthwhile.

I'm not excited about Etsuro-san's proposal to record user mapping info
in the plan and skip execution-time mapping lookups altogether.  I think
(1) that's solving a problem that hasn't been proven to be a problem,
(2) it doesn't help unless the FDW code is changed to take advantage of
it, which is unlikely to happen for third-party FDWs, and (3) it opens
the door to a class of new bugs, as any failure to invalidate after a
mapping change would become a security fail even for non-join situations.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Oddity in handling of cached plans for FDW queries

2016-07-14 Thread Ashutosh Bapat
On Thu, Jul 14, 2016 at 5:10 PM, Etsuro Fujita 
wrote:

> On 2016/07/13 18:00, Ashutosh Bapat wrote:
>
>> To fix the first, I'd like to propose (1) replacing the existing
>> has_foreign_join flag in the CachedPlan data structure with a new
>> flag, say uses_user_mapping, that indicates whether a cached plan
>> uses any user mapping regardless of whether the cached plan has
>> foreign joins or not (likewise, replace the hasForeignJoin flag in
>> PlannedStmt), and (2) invalidating the cached plan if the
>> uses_user_mapping flag is true.
>>
>
> That way we will have plan cache invalidations even when simple foreign
>> tables scans (not join) are involved, which means all the plans
>> involving any reference to a foreign table with valid user mapping
>> associated with it. That can be a huge cost as compared to the current
>> solution where sub-optimal plan will be used only when a user mapping is
>> changed while a statement has been prepared. That's a rare scenario and
>> somebody can work around that by preparing the statement again.
>>
>
> I'm not sure that's a good workaround.  ISTM that people often don't pay
> much attention to plan changes, so they would execute the inefficient plan
> without realizing the plan change, it would take long, they would start
> thinking what's happening there, and finally, they would find that the
> reason for that is due to the plan change.  I think we should prevent such
> a trouble.
>

The case you described is other way round. When the statement was prepared
the join was not pushed down. A change in user mapping afterwards may allow
the join to be pushed down. But right now it won't be, so a user wouldn't
see any difference, right?

>
> IIRC, we
>> had discussed this situation when implementing the cache invalidation
>> logic.
>>
>
> I didn't know that.  Sorry for speaking up late.
>
> But there's no workaround for your solution.
>>
>
> As you said, this is a rare scenario; in many cases, people define user
> mappings properly beforehand.  So, just invalidating all relevant plans on
> the syscache invalidation events would be fine.  (I thought one possible
> improvement might be to track exactly the dependencies of plans on user
> mappings and invalidate just those plans that depend on the user mapping
> being modified the same way for user-defined functions, but I'm not sure
> it's worth complicating the code.)
>

Exactly, for a rare scenario, should we be penalizing large number of plans
or just continue to use a previously prepared plan when an optimal plan has
become available because of changed condition. I would choose second over
the first as it doesn't make things worse than they are.


> I don't think the above change is sufficient to fix the second.  The
>> root reason for that is that since currently, we allow the user
>> mapping OID (rel->umid) to be InvalidOid in two cases: (1) user
>> mappings mean something to the FDW but it can't get any user mapping
>> at planning time and (2) user mappings are meaningless to the FDW,
>> we cannot distinguish these two cases.
>>
>
> The way to differentiate between these two is to look at the serverid.
>> If server id is invalid it's the case 1,
>>
>
> Really?  Maybe my explanation was not good, but consider a foreign join
> plan created through GetForeignJoinPaths, by an FDW to which user mappings
> are meaningless, like file_fdw.  In that plan, the corresponding server id
> would be valid, not invalid.  No?
>

While planning join, we invalidate user mapping id if the user mappings do
not match (see build_join_rel()). In such case, the joinrel will have
invalid user mapping (and invalid server id) even though user mapping is
associated with the joining tables. The way to differentiate between this
case and the case when an FDW doesn't need user mappings and the join is
shippable is through valid serverid (see build_join_rel()).
Non-availability of a user mapping for a table whose FDW requires user
mappings should end up in an error (in FDW code), so we shouldn't add
complexity for that case.


>
> So, I'd like to introduce a new callback routine to specify that
>> user mappings mean something to the FDW as proposed by Tom [2], and
>> use that to reject the former case, which allows us to set the above
>> uses_user_mapping flag appropriately, ie, set the flag to true only
>> if user mapping changes require forcing a replan.
>>
>
> This routine is meaningless unless the core (or FDW) does not allow a
>> user mapping to be created for such FDWs. Without that, core code would
>> get confused as to what it should do when it sees a user mapping for an
>> FDW which says user mappings are meaningless.
>>
>
> The core wouldn't care about such a user mapping for the FDW; the core
> would just ignore the user mapping.  No?
>

See build_join_rel(). I would like to see, how do you change the conditions
below in that function with 

Re: [HACKERS] Oddity in handling of cached plans for FDW queries

2016-07-14 Thread Etsuro Fujita

On 2016/07/13 18:00, Ashutosh Bapat wrote:

To fix the first, I'd like to propose (1) replacing the existing
has_foreign_join flag in the CachedPlan data structure with a new
flag, say uses_user_mapping, that indicates whether a cached plan
uses any user mapping regardless of whether the cached plan has
foreign joins or not (likewise, replace the hasForeignJoin flag in
PlannedStmt), and (2) invalidating the cached plan if the
uses_user_mapping flag is true.



That way we will have plan cache invalidations even when simple foreign
tables scans (not join) are involved, which means all the plans
involving any reference to a foreign table with valid user mapping
associated with it. That can be a huge cost as compared to the current
solution where sub-optimal plan will be used only when a user mapping is
changed while a statement has been prepared. That's a rare scenario and
somebody can work around that by preparing the statement again.


I'm not sure that's a good workaround.  ISTM that people often don't pay 
much attention to plan changes, so they would execute the inefficient 
plan without realizing the plan change, it would take long, they would 
start thinking what's happening there, and finally, they would find that 
the reason for that is due to the plan change.  I think we should 
prevent such a trouble.



IIRC, we
had discussed this situation when implementing the cache invalidation
logic.


I didn't know that.  Sorry for speaking up late.


But there's no workaround for your solution.


As you said, this is a rare scenario; in many cases, people define user 
mappings properly beforehand.  So, just invalidating all relevant plans 
on the syscache invalidation events would be fine.  (I thought one 
possible improvement might be to track exactly the dependencies of plans 
on user mappings and invalidate just those plans that depend on the user 
mapping being modified the same way for user-defined functions, but I'm 
not sure it's worth complicating the code.)



One benefit of using the proposed approach is that we could make the
FDW's handling of user mappings in BeginForeignScan more efficient;
currently, there is additional overhead caused by catalog re-lookups
to obtain the user mapping information for the
simple-foreign-table-scan case where user mappings mean something to
the FDW as in postgres_fdw (probably, updates on the catalogs are
infrequent, though), but we could improve the efficiency by using
the validated user mapping information created at planning time for
that case as well as for the foreign-join case.



postgres_fdw to fetches user mapping in some cases but never remembers
it. If what you are describing is a better way, it should have been
implemented before join pushdown was implemented. Refetching a user
mapping is not that expensive given that there is a high chance that it
will be found in the syscache, because it was accessed at the time of
planning.


That assumption is reasonably valid if execution is done immediately 
after planning, but that doesn't necessarily follow.



Effect of plan cache invalidation is probably worse than
fetching the value from a sys cache again.


As I said above, we could expect updates on pg_user_mapping to be 
infrequent, so the effect of the plan cache invalidation would be more 
limited than that of re-fetching user mappings during BeginForeignScan.



I don't think the above change is sufficient to fix the second.  The
root reason for that is that since currently, we allow the user
mapping OID (rel->umid) to be InvalidOid in two cases: (1) user
mappings mean something to the FDW but it can't get any user mapping
at planning time and (2) user mappings are meaningless to the FDW,
we cannot distinguish these two cases.



The way to differentiate between these two is to look at the serverid.
If server id is invalid it's the case 1,


Really?  Maybe my explanation was not good, but consider a foreign join 
plan created through GetForeignJoinPaths, by an FDW to which user 
mappings are meaningless, like file_fdw.  In that plan, the 
corresponding server id would be valid, not invalid.  No?



So, I'd like to introduce a new callback routine to specify that
user mappings mean something to the FDW as proposed by Tom [2], and
use that to reject the former case, which allows us to set the above
uses_user_mapping flag appropriately, ie, set the flag to true only
if user mapping changes require forcing a replan.



This routine is meaningless unless the core (or FDW) does not allow a
user mapping to be created for such FDWs. Without that, core code would
get confused as to what it should do when it sees a user mapping for an
FDW which says user mappings are meaningless.


The core wouldn't care about such a user mapping for the FDW; the core 
would just ignore the user mapping.  No?


Best regards,
Etsuro Fujita




--
Sent via pgsql-hackers 

Re: [HACKERS] Oddity in handling of cached plans for FDW queries

2016-07-13 Thread Ashutosh Bapat
> Seems odd to me.  Both relations use the same user mapping as before, so
> the join should be pushed down.
>
> Another thing I noticed is that the code in plancache.c would invalidate a
> cached plan with a foreign join due to user mapping changes even in the
> case where user mappings are meaningless to the FDW.
>
> To fix the first, I'd like to propose (1) replacing the existing
> has_foreign_join flag in the CachedPlan data structure with a new flag, say
> uses_user_mapping, that indicates whether a cached plan uses any user
> mapping regardless of whether the cached plan has foreign joins or not
> (likewise, replace the hasForeignJoin flag in PlannedStmt), and (2)
> invalidating the cached plan if the uses_user_mapping flag is true.  I
> thought that we invalidate the cached plan if the flag is true and there is
> a possibility of performing foreign joins, but it seems to me that updates
> on the corresponding catalog are infrequent enough that such a detailed
> tracking is not worth the effort.


That way we will have plan cache invalidations even when simple foreign
tables scans (not join) are involved, which means all the plans involving
any reference to a foreign table with valid user mapping associated with
it. That can be a huge cost as compared to the current solution where
sub-optimal plan will be used only when a user mapping is changed while a
statement has been prepared. That's a rare scenario and somebody can work
around that by preparing the statement again. IIRC, we had discussed this
situation when implementing the cache invalidation logic. But there's no
workaround for your solution.


> One benefit of using the proposed approach is that we could make the FDW's
> handling of user mappings in BeginForeignScan more efficient; currently,
> there is additional overhead caused by catalog re-lookups to obtain the
> user mapping information for the simple-foreign-table-scan case where user
> mappings mean something to the FDW as in postgres_fdw (probably, updates on
> the catalogs are infrequent, though), but we could improve the efficiency
> by using the validated user mapping information created at planning time
> for that case as well as for the foreign-join case.  For that, I'd like to
> propose storing the validated user mapping information into ForeignScan,
> not fdw_private.


postgres_fdw to fetches user mapping in some cases but never remembers it.
If what you are describing is a better way, it should have been implemented
before join pushdown was implemented. Refetching a user mapping is not that
expensive given that there is a high chance that it will be found in the
syscache, because it was accessed at the time of planning. Effect of plan
cache invalidation is probably worse than fetching the value from a sys
cache again.


> There is a discussion about using an ExtensibleNode [1] for that, but the
> proposed approach would make the FDW's job much simpler.
>
> I don't think the above change is sufficient to fix the second.  The root
> reason for that is that since currently, we allow the user mapping OID
> (rel->umid) to be InvalidOid in two cases: (1) user mappings mean something
> to the FDW but it can't get any user mapping at planning time and (2) user
> mappings are meaningless to the FDW, we cannot distinguish these two cases.


The way to differentiate between these two is to look at the serverid. If
server id is invalid it's the case 1, otherwise 2. For a simple table,
there will always be a serverid associated with it. A user mapping will
always be associated with a simple table if there is one i.e. FDW requires
it. Albeit, there will be a case when FDW requires a user mapping but it's
not created, in which case the table will be useless for fetching remote
data anyway. I don't think we should be programming for that case.


> So, I'd like to introduce a new callback routine to specify that user
> mappings mean something to the FDW as proposed by Tom [2], and use that to
> reject the former case, which allows us to set the above uses_user_mapping
> flag appropriately, ie, set the flag to true only if user mapping changes
> require forcing a replan.  This would change the postgres_fdw's behavior
> that it allows to prepare statements involving foreign tables without any
> user mappings as long as those aren't required at planning time, but I'm
> not sure that it's a good idea to keep that behavior because ISTM that such
> a behavior would make people sloppy about creating user mappings, which
> could lead to latent security problems [2].
>
> Attached is a proposed patch for that.
>
>
This routine is meaningless unless the core (or FDW) does not allow a user
mapping to be created for such FDWs. Without that, core code would get
confused as to what it should do when it sees a user mapping for an FDW
which says user mappings are meaningless. If we do that, we might not set
hasForeignJoin flag in create_foreignscan_plan() when the user mapping for
pushed down join is 

[HACKERS] Oddity in handling of cached plans for FDW queries

2016-07-12 Thread Etsuro Fujita

Hi,

I noticed odd behavior in invalidating a cached plan with a foreign join  
due to user mapping changes.  Consider:


postgres=# create extension postgres_fdw;
postgres=# create server loopback foreign data wrapper postgres_fdw  
options (dbname 'postgres');

postgres=# create user mapping for public server loopback;
postgres=# create table t1 (a int, b int);
postgres=# insert into t1 values (1, 1);
postgres=# create foreign table ft1 (a int, b int) server loopback  
options (table_name 't1');

postgres=# analyze ft1;
postgres=# create view v1 as select * from ft1;
postgres=# create user v1_owner;
postgres=# alter view v1 owner to v1_owner;
postgres=# grant select on ft1 to v1_owner;
postgres=# prepare join_stmt as select * from ft1, v1 where ft1.a = v1.a;
postgres=# explain verbose execute join_stmt;
  QUERY PLAN
--
 Foreign Scan  (cost=100.00..102.04 rows=1 width=16)
   Output: ft1.a, ft1.b, ft1_1.a, ft1_1.b
   Relations: (public.ft1) INNER JOIN (public.ft1)
   Remote SQL: SELECT r1.a, r1.b, r5.a, r5.b FROM (public.t1 r1 INNER  
JOIN public.t1 r5 ON (((r1.a = r5.a

(4 rows)

That's great.

postgres=# create user mapping for v1_owner server loopback;
postgres=# explain verbose execute join_stmt;
  QUERY PLAN
--
 Nested Loop  (cost=200.00..202.07 rows=1 width=16)
   Output: ft1.a, ft1.b, ft1_1.a, ft1_1.b
   Join Filter: (ft1.a = ft1_1.a)
   ->  Foreign Scan on public.ft1  (cost=100.00..101.03 rows=1 width=8)
 Output: ft1.a, ft1.b
 Remote SQL: SELECT a, b FROM public.t1
   ->  Foreign Scan on public.ft1 ft1_1  (cost=100.00..101.03 rows=1  
width=8)

 Output: ft1_1.a, ft1_1.b
 Remote SQL: SELECT a, b FROM public.t1
(9 rows)

That's also great.  Since ft1 and v1 use different user mappings, the  
join should not be pushed down.


postgres=# drop user mapping for v1_owner server loopback;
postgres=# explain verbose execute join_stmt;
  QUERY PLAN
--
 Nested Loop  (cost=200.00..202.07 rows=1 width=16)
   Output: ft1.a, ft1.b, ft1_1.a, ft1_1.b
   Join Filter: (ft1.a = ft1_1.a)
   ->  Foreign Scan on public.ft1  (cost=100.00..101.03 rows=1 width=8)
 Output: ft1.a, ft1.b
 Remote SQL: SELECT a, b FROM public.t1
   ->  Foreign Scan on public.ft1 ft1_1  (cost=100.00..101.03 rows=1  
width=8)

 Output: ft1_1.a, ft1_1.b
 Remote SQL: SELECT a, b FROM public.t1
(9 rows)

Seems odd to me.  Both relations use the same user mapping as before, so  
the join should be pushed down.


Another thing I noticed is that the code in plancache.c would invalidate  
a cached plan with a foreign join due to user mapping changes even in  
the case where user mappings are meaningless to the FDW.


To fix the first, I'd like to propose (1) replacing the existing  
has_foreign_join flag in the CachedPlan data structure with a new flag,  
say uses_user_mapping, that indicates whether a cached plan uses any  
user mapping regardless of whether the cached plan has foreign joins or  
not (likewise, replace the hasForeignJoin flag in PlannedStmt), and (2)  
invalidating the cached plan if the uses_user_mapping flag is true.  I  
thought that we invalidate the cached plan if the flag is true and there  
is a possibility of performing foreign joins, but it seems to me that  
updates on the corresponding catalog are infrequent enough that such a  
detailed tracking is not worth the effort.  One benefit of using the  
proposed approach is that we could make the FDW's handling of user  
mappings in BeginForeignScan more efficient; currently, there is  
additional overhead caused by catalog re-lookups to obtain the user  
mapping information for the simple-foreign-table-scan case where user  
mappings mean something to the FDW as in postgres_fdw (probably, updates  
on the catalogs are infrequent, though), but we could improve the  
efficiency by using the validated user mapping information created at  
planning time for that case as well as for the foreign-join case.  For  
that, I'd like to propose storing the validated user mapping information  
into ForeignScan, not fdw_private.  There is a discussion about using an  
ExtensibleNode [1] for that, but the proposed approach would make the  
FDW's job much simpler.


I don't think the above change is sufficient to fix the second.  The  
root reason for that is that since currently, we allow the user mapping  
OID (rel->umid) to be InvalidOid in two cases: (1) user mappings mean  
something to the FDW but it can't get any user mapping at planning time  
and (2) user mappings are meaningless to the FDW, we cannot distinguish  
these two cases.  So, I'd like to