Re: [HACKERS] API change advice: Passing plan invalidation info from the rewriter into the planner?

2014-06-25 Thread Robert Haas
On Tue, Jun 24, 2014 at 12:27 PM, Stephen Frost sfr...@snowman.net wrote:
 I feel like we are getting to the point of simply talking past each
 other and so I'll try anew, and I'll include my understanding of how the
 different approaches would address the specific use-case you outlined
 up-thread.

Thanks, you're right, and this is a good write-up.

 Single policy
 -
 The current implementation approach only allows a single policy to be
 included.
 [...snip...]
 For the case where a sales rep isn't also a partner, you could simplify
 this to:

 WHERE
   sales_rep_id = 16384

 but I'm not sure that really buys you much?  With the bitmap heap
 scan, if one side of the OR ends up not returning anything then it
 doesn't contribute to the blocks which have to be scanned.  The index
 might still need to be scanned, although I think you could avoid even
 that with an EXISTS check to see if the user is a partner at all.
 That's not to say that a bitmap scan is equivilant to an index scan, but
 it's certainly likely to be far better than a sequential scan.

True, but the wins could be much larger if one policy is WHERE
sales_rep_id = (SELECT oid FROM pg_roles WHERE rolname = current_user)
and the other policy is WHERE complexfn().  I'll also throw out a +1
for Dean's comments on this topic.

 Multiple, Non-overlapping policies
 --
 Preventing the overlap of policies ends up being very complicated if
 many dimensions are allowed.  For the simple case, perhaps only the
 'current role' dimension is useful.  I expect that going down that
 route would very quickly lead to requests for other dimensions (client
 IP, etc) which is why I'm not a big fan of it, but if that's the
 concensus then let's work out the syntax and update the patch and move
 on.

I think role is good enough.  That's the primary identifier for all
access-control related decisions, so it should be good enough here,
too.

I don't really understand your concerns about overlapping policies
being complex.  If you've got a couple of WHERE clauses, combining
them with OR is not hard.  Now, the query optimizer may have trouble
with it, but on the whole I expect to win more than we lose, by
entirely excluding some branches of an OR for users for whom entirely
policies can be excluded.

 Overall, while I'm interested in defining where this is going in a way
 which allows us implement an initial RLS capability while avoiding
 future upgrade issues, I am perfectly happy to say that the 9.5 RLS
 implementation may not be exactly syntax-compatible with 9.6 or 10.0.

Again, I think that's completely non-viable.  Are you going to tell
people they can't pg_upgrade, and they can't dump-and-reload either,
without manual fiddling?  There's no way that's going to be accepted.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] API change advice: Passing plan invalidation info from the rewriter into the planner?

2014-06-24 Thread Robert Haas
On Mon, Jun 23, 2014 at 2:29 PM, Stephen Frost sfr...@snowman.net wrote:
 What are these policies going to depend on?  Will they be allowed to
 overlap?  I don't see multi-policy support as being very easily added.

We discussed the point about overlap upthread, and I gave specific
examples.  If there's something else you want me to provide here,
please be more clear about it.

 If there are specific ways to design the syntax which would make it
 easier to support multiple policies in the future, I'm all for it.  Have
 any specific thoughts regarding that?

I did propose something already upthread, and then Dean said this:

# Note that the syntax proposed elsewhere --- GRANT SELECT (polname) ON
# TABLE tab TO role --- doesn't work because it conflicts with the
# syntax for granting column privileges, so there needs to be a distinct
# syntax for this, and I think it ought to ultimately allow things like
#
# GRANT SELECT (col1, col2), UPDATE (col1) ON t1 TO bob USING policy1;

He's got a good point there.  I don't know whether the policy should
be given inline (e.g. GRANT ... WHERE stuff()) or out-of-line (GRANT
... USING policy1) but it seems like specifying it as some sort of
GRANT modifier might make sense.  I'm sure there are other ways also,
of course.

  - Require the user to specify in some way which of the available
  policies they want applied, and then apply only that one.
 
  I'd want to at least see a way to apply an ordering to the policies
  being applied, or have PG work out which one is cheapest and try that
  one first.

 Cost-based comparison of policies that return different results
 doesn't seem sensible to me.

 I keep coming back to the thought that, really, having multiple
 overlapping policies just adds unnecessary complication to the system
 for not much gain in real functionality.  Being able to specify a policy
 per-role might be useful, but that's only one dimension and I can
 imagine a lot of other dimensions that one might want to use to control
 which policy is used.

Well, I don't agree, and I've given examples upthread showing the
kinds of scenarios that I'm concerned about, which are drawn from real
experiences I've had.  It may be that I'm the only one who has had
such experiences, of course; or that there aren't enough people who
have to justify catering to such use cases.  But I'm not sure there's
much point in trying to have a conversation about how such a thing
could be made to work if you're just going to revert back to well, we
don't really need this anyway each time I make or refute a technical
point.

 I think it would be a VERY bad idea to design the system around the
 assumption that the RLS quals will be much more or less selective than
 the user-supplied quals.  That's going to be different in different
 environments.

 Fine- but do you really see the query planner having a problem pushing
 down whichever is the more selective qual, if the user-provided qual is
 marked as leakproof?

I'm not quite sure I understand the scenario you're describing here.
Can you provide a tangible example?  I expect that most of the things
the RLS-limited user might write in the WHERE clause will NOT get
pushed down because most functions are not leakproof.  However, the
issue I'm actually concerned about is whether the *security* qual is
simple enough to permit an index-scan.  Anything with an OR clause in
it probably won't be, and any function call definitely won't be.

 I realize that you want multiple policies because you'd like a way for
 the RLS qual to be made simpler for certain cases while also having more
 complex quals for other cases.  What I keep waiting to hear is exactly
 how you want to specify which policy is used because that's where it
 gets ugly and complicated.  I still really don't like the idea of trying
 to apply multiple policies inside of a single query execution.

See above comments.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] API change advice: Passing plan invalidation info from the rewriter into the planner?

2014-06-24 Thread Alvaro Herrera
Robert Haas wrote:

  Right, if we were to support multiple policies on a given table then we
  would have to support adding and removing them individually, as well as
  specify when they are to be applied- and what if that when overlaps?
  Do we apply both and only a row which passed them all gets sent to the
  user?  Essentially we'd be defining the RLS policies to be AND'd
  together, right?  Would we want to support both AND-based and OR-based,
  and allow users to pick what set of conditionals they want applied to
  their various overlapping RLS policies?
 
 AND is not a sensible policy; it would need to be OR.  If you grant
 someone access to two different subsets of the rows in a table, it
 stands to reason that they will expect to have access to all of the
 rows that are in at least one of those subsets.

I haven't been following this thread, but this bit caught my attention.
I'm not sure I agree that OR is always the right policy either.
There is a case for a policy that says forbid these rows to these guys,
even if they have read permissions from elsewhere.  If OR is the only
way to mix multiple policies there might not be a way to implement this.
So ISTM each policy must be able to indicate what to do -- sort of how
PAM config files allow you to specify required, optional and so
forth for each module.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] API change advice: Passing plan invalidation info from the rewriter into the planner?

2014-06-24 Thread Robert Haas
On Tue, Jun 24, 2014 at 10:30 AM, Alvaro Herrera
alvhe...@2ndquadrant.com wrote:
 Robert Haas wrote:
  Right, if we were to support multiple policies on a given table then we
  would have to support adding and removing them individually, as well as
  specify when they are to be applied- and what if that when overlaps?
  Do we apply both and only a row which passed them all gets sent to the
  user?  Essentially we'd be defining the RLS policies to be AND'd
  together, right?  Would we want to support both AND-based and OR-based,
  and allow users to pick what set of conditionals they want applied to
  their various overlapping RLS policies?

 AND is not a sensible policy; it would need to be OR.  If you grant
 someone access to two different subsets of the rows in a table, it
 stands to reason that they will expect to have access to all of the
 rows that are in at least one of those subsets.

 I haven't been following this thread, but this bit caught my attention.
 I'm not sure I agree that OR is always the right policy either.
 There is a case for a policy that says forbid these rows to these guys,
 even if they have read permissions from elsewhere.  If OR is the only
 way to mix multiple policies there might not be a way to implement this.
 So ISTM each policy must be able to indicate what to do -- sort of how
 PAM config files allow you to specify required, optional and so
 forth for each module.

Hmm.  Well, that could be useful, but I'm not sure I'd view it as
something we absolutely have to have...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] API change advice: Passing plan invalidation info from the rewriter into the planner?

2014-06-24 Thread Stephen Frost
Robert,

I feel like we are getting to the point of simply talking past each
other and so I'll try anew, and I'll include my understanding of how the
different approaches would address the specific use-case you outlined
up-thread.

Single policy
-
The current implementation approach only allows a single policy to be
included.  The concern raised with this approach is that it won't be
very performant due to the qual complexity, which you outlined
(reformatted a bit) as:

WHERE
  sales_rep_id = (SELECT oid FROM pg_roles
  WHERE rolname = current_user
  AND
  oid IN (SELECT id FROM person WHERE is_sales_rep))
  OR
  partner_id = (SELECT p.org_id
FROM pg_roles a, person p
WHERE a.rolname = current_user
  AND a.oid = p.id)

Which I take to mean there is a 'person' table which looks like:

id, is_sales_rep, org_id

and a table which has the RLS qual which looks like:

pk_id, sales_rep_id, partner_id

Then, if the individual is_sales_rep and it's their account by
sales_rep_id, or if the individual's org_id matches the partner_id, they
can see the record.

Using this example with security barrier views and indexes on person.id,
data.pk_id, data.sales_rep_id, and data.partner_id, we'll get a bitmap
heap scan across the 'data' table by having the two OR's run as
InitPlan 1 and InitPlan 2.

Does that address the concern you had around multi-branch OR policies?
This works with more than two OR branches also, though of course we need
appropriate indexes to make use of a Bitmap Heap Scan.

Even with per-user policies, we would define a policy along these lines,
for the sfrost role:

WHERE
  sales_rep_id = 16384
  OR partner_id = 1

Which also ends up doing a Bitmap Heap Scan across the data table.

For the case where a sales rep isn't also a partner, you could simplify
this to:

WHERE
  sales_rep_id = 16384

but I'm not sure that really buys you much?  With the bitmap heap
scan, if one side of the OR ends up not returning anything then it
doesn't contribute to the blocks which have to be scanned.  The index
might still need to be scanned, although I think you could avoid even
that with an EXISTS check to see if the user is a partner at all.
That's not to say that a bitmap scan is equivilant to an index scan, but
it's certainly likely to be far better than a sequential scan.

Now, if the query is select * from data_view with pk_id = 1002;, then
we get an indexed lookup on the data table based on the PK.  That's what
I was trying to point out previously regarding leakproof functions
(which comprise about half of the boolean functions we provide, if I
recall my previous analysis correctly).  We also get indexed lookups
with pk_id  10 or similar as those are also leakproof.

Multiple, Overlapping policies
--
Per discussion, these would generally be OR'd together.

Building up the overall qual which has to include an OR branch for each
individual policy qual(s) looks like a complicated bit of work and one
which might be better left to the user (and, as just pointed out, the
user may actually want AND instead of OR in some cases..).

Managing the plan cache in a sensible way is certainly made more
complicated by this and might mean that it can't be used at all, which
has already been raised as a show-stopper issue.

In the example which you provided, while we could represent that the two
policies exist (sales representatives vs partners) and that they are to
be OR'd together in the catalog, but I don't immediately see how that
would change the qual which ends up being added to the query in this
case or really improving the overall query plan; at least, not without
eliminating one of the OR branches somehow- which I discuss below.

Multiple, Non-overlapping policies
--
Preventing the overlap of policies ends up being very complicated if
many dimensions are allowed.  For the simple case, perhaps only the
'current role' dimension is useful.  I expect that going down that
route would very quickly lead to requests for other dimensions (client
IP, etc) which is why I'm not a big fan of it, but if that's the
concensus then let's work out the syntax and update the patch and move
on.

Another option might be to have a qual for each policy which
the user can define that indicates if that policy is to be applied or
not and then simply pick the first policy for which that qual which
returns 'true'.  We would require an ordering to be defined in this
case, which I believe was an issue up-thread.  If we allow all policies
matching the quals then we run into the complications mentioned under
Overlapping policies above.

If we decide that per-role policies need to be supported, I very
quickly see the need to have groups of roles to which a policy is to
be applied.  This would differ from roles today as they would not be
allowed to overlap (otherwise we are into overlapping 

Re: [HACKERS] API change advice: Passing plan invalidation info from the rewriter into the planner?

2014-06-24 Thread Dean Rasheed
On 24 June 2014 17:27, Stephen Frost sfr...@snowman.net wrote:
 Single policy vs Multiple, Overlapping policies vs Multiple, Non-overlapping 
 policies


What I was describing upthread was multiple non-overlapping policies.

I disagree that this will be more complicated to use. It's a strict
superset of the single policy functionality, so if you want to do it
all using a single policy then you can. But I think that once the ACLs
reach a certain level of complexity, you probably will want to break
it up into multiple policies, and I think doing so will make things
simpler, not more complicated.

Taking a specific, simplistic example, suppose you had 2 groups of
users - some are normal users who should only be able to access their
own records. For these users, you might have a policy like

  WHERE person_id = current_user

which would be highly selective, and probably use an index scan. Then
there might be another group of users who are managers with access to
the records of, say, everyone in their department. This might then be
a more complex qual along the lines of

  WHERE person_id IN (SELECT ... FROM person_department
   WHERE mgr_id = current_user AND ...)

which might end up being a hash or merge join, depending on any
user-supplied quals.

You _could_ combine those into a single policy, but I think it would
be much better to have 2 distinct policies, since they're 2 very
different queries, for different use cases. Normal users would only be
granted permission to use the normal_user_policy. Managers might be
granted permission to use either the normal_user_policy or the
manager_policy (but not both at the same time).

That's a very simplified example. In more realistic situations there
are likely to be many more classes of users, and trying to enforce all
the logic in a single WHERE clause is likely to get unmanageable, or
inefficient if it involves lots of logic hidden away in functions.
Allowing multiple, non-overlapping policies allows the problem to be
broken up into more manageable pieces, which also makes the planner's
job easier, since only a single, simpler policy is in effect in any
given query.

Regards,
Dean


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] API change advice: Passing plan invalidation info from the rewriter into the planner?

2014-06-24 Thread Dean Rasheed
Thinking about the examples upthread, a separate issue occurs to me
--- when defining a RLS qual, I think that there has to be a syntax to
specify an alias for the main table, so that correlated subqueries can
refer to it. I'm not sure if that's been mentioned in any of the
discussions so far, but it might be quite hard to define certain quals
without it.

Regards,
Dean


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] API change advice: Passing plan invalidation info from the rewriter into the planner?

2014-06-24 Thread Stephen Frost
Dean,

* Dean Rasheed (dean.a.rash...@gmail.com) wrote:
 Thinking about the examples upthread, a separate issue occurs to me
 --- when defining a RLS qual, I think that there has to be a syntax to
 specify an alias for the main table, so that correlated subqueries can
 refer to it. I'm not sure if that's been mentioned in any of the
 discussions so far, but it might be quite hard to define certain quals
 without it.

Yeah, that thought had occured to me also.  Have any suggestions about
how to approach that issue?  The way triggers have OLD/NEW comes to mind
but I'm not sure how easily that'd work.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] API change advice: Passing plan invalidation info from the rewriter into the planner?

2014-06-24 Thread Craig Ringer
On 06/24/2014 10:30 PM, Alvaro Herrera wrote:
 I haven't been following this thread, but this bit caught my attention.
 I'm not sure I agree that OR is always the right policy either.
 There is a case for a policy that says forbid these rows to these guys,
 even if they have read permissions from elsewhere.

That's generally considered a DENY policy, a concept borrowed from ACLs.

You have access to a resource if:

- You have at least one policy that gives you access AND
- You have no policies that deny you access

 If OR is the only
 way to mix multiple policies there might not be a way to implement this.

I think that's a later myself, but we shouldn't design ourselves into
a corner where we can't support deny rules either.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] API change advice: Passing plan invalidation info from the rewriter into the planner?

2014-06-24 Thread Stephen Frost
Craig,

* Craig Ringer (cr...@2ndquadrant.com) wrote:
 On 06/24/2014 10:30 PM, Alvaro Herrera wrote:
  I haven't been following this thread, but this bit caught my attention.
  I'm not sure I agree that OR is always the right policy either.
  There is a case for a policy that says forbid these rows to these guys,
  even if they have read permissions from elsewhere.
 
 That's generally considered a DENY policy, a concept borrowed from ACLs.

Right.

  If OR is the only
  way to mix multiple policies there might not be a way to implement this.
 
 I think that's a later myself, but we shouldn't design ourselves into
 a corner where we can't support deny rules either.

Agreed, but I don't want to get so wrapped up in all of this that we end
up with a set of requirements so long that we'll never be able to
accomplish them all in a single release...

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] API change advice: Passing plan invalidation info from the rewriter into the planner?

2014-06-23 Thread Robert Haas
On Wed, Jun 18, 2014 at 2:18 PM, Stephen Frost sfr...@snowman.net wrote:
 I'm also of the opinion that this isn't strictly necessary for the
 initial RLS offering in PG- there's a clear way we could migrate
 existing users to a multi-policy system from a single-policy system.
 Sure, to get the performance and optimization benefits that we'd
 presumably have in the multi-policy case they'd need to re-work their
 RLS configuration, but for users who care, they'll likely be very happy
 to do so to gain those benefits.

I think a lot depends on the syntax we choose.  If we choose a syntax
that only makes sense in a single-policy framework, then I think
allowing upgrades to a multi-policy syntax is going to be really
difficult.  On the other hand, if we choose a syntax that allows
multiple policies, I suspect we can support multiple policies from the
beginning without much extra effort.

 - Require the user to specify in some way which of the available
 policies they want applied, and then apply only that one.

 I'd want to at least see a way to apply an ordering to the policies
 being applied, or have PG work out which one is cheapest and try that
 one first.

Cost-based comparison of policies that return different results
doesn't seem sensible to me.

 I think exactly the opposite, for the query planning reasons
 previously stated.  I think the policies will quickly get so
 complicated that they're no longer optimizable.  Here's a simple
 example:

 - Policy 1 allows the user to access rows for which complexfunc() returns 
 true.
 - Policy 2 allows the user to access rows for which a = 1.

 Most users have access only through policy 2, but some have access
 through policy 1.  Users who have access through policy 1 will always
 get a sequential scan,

 This is the thing which I most object to- if the quals being provided at
 any level are leakproof and would be able to reduce the returned set
 sufficiently that an index scan is the best bet, we should be doing
 that.  I don't anticipate the RLS quals to be as selective as the
 quals which the user is adding.

I think it would be a VERY bad idea to design the system around the
assumption that the RLS quals will be much more or less selective than
the user-supplied quals.  That's going to be different in different
environments.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] API change advice: Passing plan invalidation info from the rewriter into the planner?

2014-06-23 Thread Stephen Frost
* Robert Haas (robertmh...@gmail.com) wrote:
 On Wed, Jun 18, 2014 at 2:18 PM, Stephen Frost sfr...@snowman.net wrote:
  I'm also of the opinion that this isn't strictly necessary for the
  initial RLS offering in PG- there's a clear way we could migrate
  existing users to a multi-policy system from a single-policy system.
  Sure, to get the performance and optimization benefits that we'd
  presumably have in the multi-policy case they'd need to re-work their
  RLS configuration, but for users who care, they'll likely be very happy
  to do so to gain those benefits.
 
 I think a lot depends on the syntax we choose.  If we choose a syntax
 that only makes sense in a single-policy framework, then I think
 allowing upgrades to a multi-policy syntax is going to be really
 difficult.  On the other hand, if we choose a syntax that allows
 multiple policies, I suspect we can support multiple policies from the
 beginning without much extra effort.

What are these policies going to depend on?  Will they be allowed to
overlap?  I don't see multi-policy support as being very easily added.

If there are specific ways to design the syntax which would make it
easier to support multiple policies in the future, I'm all for it.  Have
any specific thoughts regarding that?

  - Require the user to specify in some way which of the available
  policies they want applied, and then apply only that one.
 
  I'd want to at least see a way to apply an ordering to the policies
  being applied, or have PG work out which one is cheapest and try that
  one first.
 
 Cost-based comparison of policies that return different results
 doesn't seem sensible to me.

I keep coming back to the thought that, really, having multiple
overlapping policies just adds unnecessary complication to the system
for not much gain in real functionality.  Being able to specify a policy
per-role might be useful, but that's only one dimension and I can
imagine a lot of other dimensions that one might want to use to control
which policy is used.

  I think exactly the opposite, for the query planning reasons
  previously stated.  I think the policies will quickly get so
  complicated that they're no longer optimizable.  Here's a simple
  example:
 
  - Policy 1 allows the user to access rows for which complexfunc() returns 
  true.
  - Policy 2 allows the user to access rows for which a = 1.
 
  Most users have access only through policy 2, but some have access
  through policy 1.  Users who have access through policy 1 will always
  get a sequential scan,
 
  This is the thing which I most object to- if the quals being provided at
  any level are leakproof and would be able to reduce the returned set
  sufficiently that an index scan is the best bet, we should be doing
  that.  I don't anticipate the RLS quals to be as selective as the
  quals which the user is adding.
 
 I think it would be a VERY bad idea to design the system around the
 assumption that the RLS quals will be much more or less selective than
 the user-supplied quals.  That's going to be different in different
 environments.

Fine- but do you really see the query planner having a problem pushing
down whichever is the more selective qual, if the user-provided qual is
marked as leakproof?

I realize that you want multiple policies because you'd like a way for
the RLS qual to be made simpler for certain cases while also having more
complex quals for other cases.  What I keep waiting to hear is exactly
how you want to specify which policy is used because that's where it
gets ugly and complicated.  I still really don't like the idea of trying
to apply multiple policies inside of a single query execution.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] API change advice: Passing plan invalidation info from the rewriter into the planner?

2014-06-22 Thread Dean Rasheed
On 17 June 2014 20:19, Robert Haas robertmh...@gmail.com wrote:
 On Fri, Jun 13, 2014 at 3:11 AM, Dean Rasheed dean.a.rash...@gmail.com 
 wrote:
 Yeah, I was thinking something like this could work, but I would go
 further. Suppose you had separate GRANTable privileges for direct
 access to individual tables, bypassing RLS, e.g.

   GRANT DIRECT SELECT|INSERT|UPDATE|DELETE ON table_name TO role_name

 So, is this one new privilege (DIRECT) or four separate new privileges
 that are variants of the existing privileges (DIRECT SELECT, DIRECT
 INSERT, DIRECT UPDATE, DIRECT DELETE)?


I was thinking it would be 4 new privileges, so that a user could for
example be granted DIRECT SELECT permission on a table, but not DIRECT
UPDATE.

On reflection though, I think I prefer the approach of allowing
multiple named security policies per table, because it gives the
planner more opportunity to optimize queries against specific RLS
quals, which won't work if the ACL logic is embedded in functions.
That seems like something that would have to be designed in now,
because it's difficult to see how you could add it later.

Managing policy names becomes an issue though, because if you have 2
tables each with 1 policy, but you give them different names, how can
the user querying the data specify that they want policy1 for table1
and policy2 for table2, possibly in the same query? I think that can
be made more manageable by making policies top-level objects that
exist independently of any particular tables. So you might do
something like:

\c - alice
CREATE POLICY policy1;
CREATE POLICY policy2;
ALTER TABLE t1 SET POLICY policy1 TO t1_quals;
ALTER TABLE t2 SET POLICY policy1 TO t2_quals;
...
GRANT SELECT ON TABLE t1, t2 TO bob USING policy1;
GRANT SELECT ON TABLE t1, t2 TO manager; -- Can use any policy, or
bypass all policies

Then a particular user would typically only have to set their policy
once per session, for accessing multiple tables:

\c - bob
SET rls_policy = policy1;
SELECT * FROM t1 JOIN t2; -- OK
SET rls_policy = policy2;
SELECT * FROM t1; -- ERROR: no permission to access t1 using policy2

or you'd be able to set a default policy for users, so that they
wouldn't need to explicitly choose one:

ALTER ROLE bob SET rls_policy = policy1;


Note that the syntax proposed elsewhere --- GRANT SELECT (polname) ON
TABLE tab TO role --- doesn't work because it conflicts with the
syntax for granting column privileges, so there needs to be a distinct
syntax for this, and I think it ought to ultimately allow things like

GRANT SELECT (col1, col2), UPDATE (col1) ON t1 TO bob USING policy1;

Regards,
Dean


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] API change advice: Passing plan invalidation info from the rewriter into the planner?

2014-06-22 Thread Stephen Frost
Dean,

* Dean Rasheed (dean.a.rash...@gmail.com) wrote:
 On 17 June 2014 20:19, Robert Haas robertmh...@gmail.com wrote:
  On Fri, Jun 13, 2014 at 3:11 AM, Dean Rasheed dean.a.rash...@gmail.com 
  wrote:
  Yeah, I was thinking something like this could work, but I would go
  further. Suppose you had separate GRANTable privileges for direct
  access to individual tables, bypassing RLS, e.g.
 
GRANT DIRECT SELECT|INSERT|UPDATE|DELETE ON table_name TO role_name
 
  So, is this one new privilege (DIRECT) or four separate new privileges
  that are variants of the existing privileges (DIRECT SELECT, DIRECT
  INSERT, DIRECT UPDATE, DIRECT DELETE)?
 
 I was thinking it would be 4 new privileges, so that a user could for
 example be granted DIRECT SELECT permission on a table, but not DIRECT
 UPDATE.

Ok.

 On reflection though, I think I prefer the approach of allowing
 multiple named security policies per table, because it gives the
 planner more opportunity to optimize queries against specific RLS
 quals, which won't work if the ACL logic is embedded in functions.

Having more than one policy for the purpose of performance really
doesn't make a huge amount of sense to me.  Perhaps someone could
explain the use-case with specific example applications where they would
benefit from this?  Based on the discussion, they would have to be OR'd
together in the query as built with any result being marked as success.
One could build an SQL function which could be in-lined potentially
which does the same if their case is that simple.

Being able to define the policy based on some criteria may allow it to
be simpler (eg: policy 'a' applies for certain roles, while policy 'b'
applies for other roles), but I'm not enthusiastic about that approach
because there could be a huge number of permutations to allow.

How about another approach- what about having a function which is called
(as the table owner, I'm thinking..) that then returns the qual to be
included, instead of having to define a specific qual which is included
in the catalog?  That function could take into consideration the user,
table, etc, and return a qual which includes constants to compare rows
against for planning purposes.  This would have to be done early enough,
of course, which might be difficult.  For my part, having that
capability would be neat, but nothing we're trying to do here would
preclude us from adding it later either.

 That seems like something that would have to be designed in now,
 because it's difficult to see how you could add it later.

I don't follow this at all.  Going from supporting one qual to
supporting multiple seems like it'd be quite straight-forward to add
in later?  Going the other way would be difficult.

 Managing policy names becomes an issue though, because if you have 2
 tables each with 1 policy, but you give them different names, how can
 the user querying the data specify that they want policy1 for table1
 and policy2 for table2, possibly in the same query? 

From my experience, users don't pick the policy any more than they get
to pick which set of permissions get applied to them when querying
tables (modulo roles, of course, but that's a mechanism for changing
users, not for saying which set of permissions you get).  All that you
describe could be done for regular permissions also, but we don't, and
I don't think we get complaints about that because we have roles-
which would work just the same for RLS (assuming the RLS policy defined
has a role component).

Having a function be able to be called to return a qual to be used would
be a way to have per-role RLS also, along with providing the flexibility
to have per-source-IP, per-connection-type, etc, RLS policies also.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] API change advice: Passing plan invalidation info from the rewriter into the planner?

2014-06-20 Thread Stephen Frost
Robert,

* Robert Haas (robertmh...@gmail.com) wrote:
 On Wed, Jun 18, 2014 at 10:40 AM, Stephen Frost sfr...@snowman.net wrote:
  * Robert Haas (robertmh...@gmail.com) wrote:
  Technically, there are 4 bits left, and that's what we need for
  separate privileges.
 
  I'd really hate to chew them all up..
 
 Usually it's the patch author who WANTS to chew up all the available
 bit space and OTHER people who say no.  :-)

Ah, well, technically I'm not the patch author here, though I would like
to see it happen. :)  Still, have to balance these features and
capabilities against the future unknown options we might want to add and
it certainly doesn't seem terribly nice to chew up all that remain
rather than addressing the need to support more.

Still, perhaps we can put together a patch for this and then review the
implementation and, if we like it and that functionality, we can make
the decision about if it should be on this patch to make more bits
available.

  Perhaps, or we might come up with some new whiz-bang permission to add
  next year. :/
 
 Well, people proposed separate permissions for things like VACUUM and
 ANALYZE around the time TRUNCATE was added, and those were rejected on
 the grounds that they didn't add enough value to justify wasting bits
 on them.  I think we see whether there's a workable system that such
 that marginal permissions (like TRUNCATE) that won't be checked often
 don't have to consume bits.

That's an interesting approach but I'm not sure that we need to go a
system where we segregate often-used bits from less-used ones.

  My thoughts on how to address this were to segregate the ACL bits by
  object type.  That is to say, the AclMode stored for databases might
  only use bits 0-2 (create/connect/temporary), while tables would use
  bits 0-7 (insert/select/update/delete/references/trigger).  This would
  allow us to more easily add more rights at the database and/or
  tablespace level too.
 
 Yeah, that's another idea.  But it really deserves its own thread.
 I'm still not convinced we have to do this at all to meet this need,
 but that should be argued back and forth on that other thread.

Fair enough.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] API change advice: Passing plan invalidation info from the rewriter into the planner?

2014-06-20 Thread Tom Lane
Stephen Frost sfr...@snowman.net writes:
 * Robert Haas (robertmh...@gmail.com) wrote:
 [ counting bits in ACLs ]

Wouldn't it be fairly painless to widen AclMode from 32 bits to 64,
and thereby double the number of available bits?

That code was all written before we required platforms to have an int64
primitive type, but of course now we expect that.

In any case, I concur with the position that this feature patch should
be separate from a patch to make additional bitspace available.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] API change advice: Passing plan invalidation info from the rewriter into the planner?

2014-06-20 Thread Stephen Frost
Tom,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
 Stephen Frost sfr...@snowman.net writes:
  * Robert Haas (robertmh...@gmail.com) wrote:
  [ counting bits in ACLs ]
 
 Wouldn't it be fairly painless to widen AclMode from 32 bits to 64,
 and thereby double the number of available bits?

Thanks for commenting on this.  I hadn't considered that but I don't see
any particular problem with it either..

 In any case, I concur with the position that this feature patch should
 be separate from a patch to make additional bitspace available.

Certainly.  Thanks for your thoughts.

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] API change advice: Passing plan invalidation info from the rewriter into the planner?

2014-06-18 Thread Robert Haas
On Tue, Jun 17, 2014 at 9:45 PM, Brightwell, Adam
adam.brightw...@crunchydatasolutions.com wrote:
 I absolutely appreciate all of the feedback that has been provided.  It has
 been educational.  To your point above, I started putting together a wiki
 page, as Stephen has spoken to, that is meant to capture these concerns and
 considerations as well as to capture ideas around solutions.

 https://wiki.postgresql.org/wiki/Row_Security_Considerations

 This page is obviously not complete, but I think it is a good start.
 Hopefully this document will help to continue the conversation and assist in
 addressing all the concerns that have been brought to the table.  As well, I
 hope that this document serves to demonstrate our intent and that we *are*
 taking these concerns seriously.  I assure you that as one of the
 individuals who is working towards the acceptance of this feature/patch, I
 am very much concerned about meeting the expected standards of quality and
 security.

Cool, thanks for weighing in.  I think that page is a good start.  An
item that I think should be added there is the potential overlap
between security_barrier views and row-level security.  How can we
reuse code (and SQL syntax?) for existing features like WITH CHECK
OPTION instead of writing new code (and inventing new syntax) for very
similar concepts?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] API change advice: Passing plan invalidation info from the rewriter into the planner?

2014-06-18 Thread Robert Haas
On Tue, Jun 17, 2014 at 10:06 PM, Stephen Frost sfr...@snowman.net wrote:
 * Robert Haas (robertmh...@gmail.com) wrote:
 On Fri, Jun 13, 2014 at 3:11 AM, Dean Rasheed dean.a.rash...@gmail.com 
 wrote:
  Yeah, I was thinking something like this could work, but I would go
  further. Suppose you had separate GRANTable privileges for direct
  access to individual tables, bypassing RLS, e.g.
 
GRANT DIRECT SELECT|INSERT|UPDATE|DELETE ON table_name TO role_name

 So, is this one new privilege (DIRECT) or four separate new privileges
 that are variants of the existing privileges (DIRECT SELECT, DIRECT
 INSERT, DIRECT UPDATE, DIRECT DELETE)?

 I had taken it to be a single privilege, but you're right, it could be
 done for each of those..  I really don't think we have the bits for more
 than one case here though (if that) without a fair bit of additional
 rework.  I'm not against that rework (and called for it wayyy back when
 I proposed the TRUNCATE privilege, as I recall) but that's a whole
 different challenge and no small bit of work..

Technically, there are 4 bits left, and that's what we need for
separate privileges.  We last consumed bits in 2008 (for TRUNCATE) and
2006 (for GRANT ON DATABASE), so even if we used all of the remaining
bits it might be another 5 years before anyone has to do that
refactoring.  But even if the refactoring needs to be done now for
some reason, it's only June, and the last CommitFest doesn't start
until February 15th.  I think we're being way too quick to jump to
talking about what can and can't be done in time for 9.5.  Let's start
by figuring out how we'd really like it to work and then, if it's too
ambitious, we can scale it back.

My main concern about using only one bit is that someone might want to
allow a user to bypass RLS on SELECT while still enforcing it for
data-modifying operations.  That seems like a plausible use case to
me.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] API change advice: Passing plan invalidation info from the rewriter into the planner?

2014-06-18 Thread Stephen Frost
* Robert Haas (robertmh...@gmail.com) wrote:
 On Tue, Jun 17, 2014 at 10:06 PM, Stephen Frost sfr...@snowman.net wrote:
  I had taken it to be a single privilege, but you're right, it could be
  done for each of those..  I really don't think we have the bits for more
  than one case here though (if that) without a fair bit of additional
  rework.  I'm not against that rework (and called for it wayyy back when
  I proposed the TRUNCATE privilege, as I recall) but that's a whole
  different challenge and no small bit of work..
 
 Technically, there are 4 bits left, and that's what we need for
 separate privileges.  

I'd really hate to chew them all up..

 We last consumed bits in 2008 (for TRUNCATE) and
 2006 (for GRANT ON DATABASE), so even if we used all of the remaining
 bits it might be another 5 years before anyone has to do that
 refactoring.  

Perhaps, or we might come up with some new whiz-bang permission to add
next year. :/

 But even if the refactoring needs to be done now for
 some reason, it's only June, and the last CommitFest doesn't start
 until February 15th.  I think we're being way too quick to jump to
 talking about what can and can't be done in time for 9.5.  Let's start
 by figuring out how we'd really like it to work and then, if it's too
 ambitious, we can scale it back.

Alright- perhaps we can discuss what kind of refactoring would be needed
for such a change then, to get a better idea as to the scope of the
change and the level of effort required.

My thoughts on how to address this were to segregate the ACL bits by
object type.  That is to say, the AclMode stored for databases might
only use bits 0-2 (create/connect/temporary), while tables would use
bits 0-7 (insert/select/update/delete/references/trigger).  This would
allow us to more easily add more rights at the database and/or
tablespace level too.

 My main concern about using only one bit is that someone might want to
 allow a user to bypass RLS on SELECT while still enforcing it for
 data-modifying operations.  That seems like a plausible use case to
 me.

I absolutely agree that it's a real use-case and one which we should
support, just trying to avoid biting off more than can be done between
now and February.  Still, if we get things hammered out and more-or-less
agreement on the way forward, getting the code written may move quickly.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] API change advice: Passing plan invalidation info from the rewriter into the planner?

2014-06-18 Thread Robert Haas
On Tue, Jun 17, 2014 at 10:25 PM, Stephen Frost sfr...@snowman.net wrote:
 Yeah, if we have to ask an external security module a question for
 each row, there's little hope of any real optimization.  However, I
 think there will be a significant number of cases where people will
 want filtering clauses that can be realized by doing an index scan
 instead of a sequential scan, and if we end up forcing a sequential
 scan anyway, the feature will be useless to those people.

 I agree that we want to support that, if we can do so reasonably.  What
 I was trying to get at is simply this- don't we provide that already
 with the leakproof attribute and functions?  If we don't have enough
 there to allow index scans then we should be looking to add more, I'm
 thinking.

So the reason why we got onto this particular topic was because of the
issue of multiple security policies for a single table.  Of course,
multiple security policies can always be merged into a single
more-complex policy, but the resulting policy may be so complex that
the query-planner is no longer capable of doing a good job optimizing
it.  I won't mention here exactly what a certain large commercial
database vendor has implemented here; suffice it to say, however, that
their design avoids this pitfall, and ours currently does not.

 I agree on this point, but I'm still hopeful that we'll be able to get a
 good feature into 9.5.  There are quite a few resources available for
 the 'just programming' part, so the long pole in the tent here is
 absolutely hashing out what we want and how it should function.

Agreed.

 I'd be happy to host or participate in a conference call or similar if
 that would be useful to move this along- or we can continue to
 communicate via email.  There's a bit of a lull in conferences to which
 I'm going to right now, so in person is unlikely, unless folks want to
 get together somewhere on the east coast (I'd be happy to travel to
 Philly, Pittsburgh, NYC, etc, if it'd help..).

For me, email is easiest; but there are other options, too.

  What solution did you come up with for this case, which performed well
  and was also secure..?

 I put the logic in the client.  :-(

 Well, that's not helpful here. ;)

Sure.  The reason I brought it up is to say - hey, look, I had this
come up in the real world.  What would it take to be able to do
actually do it in the database server?  And the answer is - something
that will handle multiple security policies cleanly.

 But I'm not sure; that
 feels like it's giving something up that might be important.  And I
 think that the kinds of syntax we're discussing won't support leaving
 that out of the initial version and adding it later, so if we commit
 to this syntax, we're stuck with that behavior.  To avoid that, we'd
 need something like this:

 ALTER TABLE tab ADD POLICY polname WHERE quals;
 GRANT SELECT (polname) ON TABLE tab TO role;

 Right, if we were to support multiple policies on a given table then we
 would have to support adding and removing them individually, as well as
 specify when they are to be applied- and what if that when overlaps?
 Do we apply both and only a row which passed them all gets sent to the
 user?  Essentially we'd be defining the RLS policies to be AND'd
 together, right?  Would we want to support both AND-based and OR-based,
 and allow users to pick what set of conditionals they want applied to
 their various overlapping RLS policies?

AND is not a sensible policy; it would need to be OR.  If you grant
someone access to two different subsets of the rows in a table, it
stands to reason that they will expect to have access to all of the
rows that are in at least one of those subsets.  If you give someone
your car key and your house key, that means they can operate your car
or enter your house; it does not mean that they can operate your car
but only when it's inside your garage.

Alternatively, we could:

- Require the user to specify in some way which of the available
policies they want applied, and then apply only that one.
or
- Decide that such scenarios constitute misconfiguration. Throw an
error and make the table owner or other relevant local authority fix
it.

 Sounds all rather painful and much better done programatically by the
 user in a language which is suited to that task- eg: pl/pgsql, perl, C,
 or something besides our ALTER syntax + catalog representation.

I think exactly the opposite, for the query planning reasons
previously stated.  I think the policies will quickly get so
complicated that they're no longer optimizable.  Here's a simple
example:

- Policy 1 allows the user to access rows for which complexfunc() returns true.
- Policy 2 allows the user to access rows for which a = 1.

Most users have access only through policy 2, but some have access
through policy 1.  Users who have access through policy 1 will always
get a sequential scan, but users who have access through policy 2 have
an excellent chance of getting an index 

Re: [HACKERS] API change advice: Passing plan invalidation info from the rewriter into the planner?

2014-06-18 Thread Robert Haas
On Wed, Jun 18, 2014 at 10:40 AM, Stephen Frost sfr...@snowman.net wrote:
 * Robert Haas (robertmh...@gmail.com) wrote:
 On Tue, Jun 17, 2014 at 10:06 PM, Stephen Frost sfr...@snowman.net wrote:
  I had taken it to be a single privilege, but you're right, it could be
  done for each of those..  I really don't think we have the bits for more
  than one case here though (if that) without a fair bit of additional
  rework.  I'm not against that rework (and called for it wayyy back when
  I proposed the TRUNCATE privilege, as I recall) but that's a whole
  different challenge and no small bit of work..

 Technically, there are 4 bits left, and that's what we need for
 separate privileges.

 I'd really hate to chew them all up..

Usually it's the patch author who WANTS to chew up all the available
bit space and OTHER people who say no.  :-)

 We last consumed bits in 2008 (for TRUNCATE) and
 2006 (for GRANT ON DATABASE), so even if we used all of the remaining
 bits it might be another 5 years before anyone has to do that
 refactoring.

 Perhaps, or we might come up with some new whiz-bang permission to add
 next year. :/

Well, people proposed separate permissions for things like VACUUM and
ANALYZE around the time TRUNCATE was added, and those were rejected on
the grounds that they didn't add enough value to justify wasting bits
on them.  I think we see whether there's a workable system that such
that marginal permissions (like TRUNCATE) that won't be checked often
don't have to consume bits.

 But even if the refactoring needs to be done now for
 some reason, it's only June, and the last CommitFest doesn't start
 until February 15th.  I think we're being way too quick to jump to
 talking about what can and can't be done in time for 9.5.  Let's start
 by figuring out how we'd really like it to work and then, if it's too
 ambitious, we can scale it back.

 Alright- perhaps we can discuss what kind of refactoring would be needed
 for such a change then, to get a better idea as to the scope of the
 change and the level of effort required.

 My thoughts on how to address this were to segregate the ACL bits by
 object type.  That is to say, the AclMode stored for databases might
 only use bits 0-2 (create/connect/temporary), while tables would use
 bits 0-7 (insert/select/update/delete/references/trigger).  This would
 allow us to more easily add more rights at the database and/or
 tablespace level too.

Yeah, that's another idea.  But it really deserves its own thread.
I'm still not convinced we have to do this at all to meet this need,
but that should be argued back and forth on that other thread.

 My main concern about using only one bit is that someone might want to
 allow a user to bypass RLS on SELECT while still enforcing it for
 data-modifying operations.  That seems like a plausible use case to
 me.

 I absolutely agree that it's a real use-case and one which we should
 support, just trying to avoid biting off more than can be done between
 now and February.  Still, if we get things hammered out and more-or-less
 agreement on the way forward, getting the code written may move quickly.

Nifty.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] API change advice: Passing plan invalidation info from the rewriter into the planner?

2014-06-18 Thread Stephen Frost
* Robert Haas (robertmh...@gmail.com) wrote:
 On Tue, Jun 17, 2014 at 10:25 PM, Stephen Frost sfr...@snowman.net wrote:
  I agree that we want to support that, if we can do so reasonably.  What
  I was trying to get at is simply this- don't we provide that already
  with the leakproof attribute and functions?  If we don't have enough
  there to allow index scans then we should be looking to add more, I'm
  thinking.
 
 So the reason why we got onto this particular topic was because of the
 issue of multiple security policies for a single table.  Of course,
 multiple security policies can always be merged into a single
 more-complex policy, but the resulting policy may be so complex that
 the query-planner is no longer capable of doing a good job optimizing
 it.  

Yeah, I could see that happening with some use-cases.

  ALTER TABLE tab ADD POLICY polname WHERE quals;
  GRANT SELECT (polname) ON TABLE tab TO role;
 
  Right, if we were to support multiple policies on a given table then we
  would have to support adding and removing them individually, as well as
  specify when they are to be applied- and what if that when overlaps?
  Do we apply both and only a row which passed them all gets sent to the
  user?  Essentially we'd be defining the RLS policies to be AND'd
  together, right?  Would we want to support both AND-based and OR-based,
  and allow users to pick what set of conditionals they want applied to
  their various overlapping RLS policies?
 
 AND is not a sensible policy; it would need to be OR.  If you grant
 someone access to two different subsets of the rows in a table, it
 stands to reason that they will expect to have access to all of the
 rows that are in at least one of those subsets.  

I think I can buy off on this.  What that also means is that any
'short-circuiting' that we try to do here would be based on stop once
we get back a 'true'.  This could seriously change how we actually
implement RLS though as doing it all through query rewrites and making
this work with multiple security policies which are OR'd together and
yet keeping the optimization and qual push-down and index-based plans is
looking pretty daunting.  

I'm also of the opinion that this isn't strictly necessary for the
initial RLS offering in PG- there's a clear way we could migrate
existing users to a multi-policy system from a single-policy system.
Sure, to get the performance and optimization benefits that we'd
presumably have in the multi-policy case they'd need to re-work their
RLS configuration, but for users who care, they'll likely be very happy
to do so to gain those benefits.

Perhaps the question here is- if we implement RLS one way for the single
case and then change the implementation all around for the multi case,
will we end up breaking the single case?  Or destroying the performance
for it?  I can't see either of those cases being allowed- if and when we
support multi, we must still support single and the whole point of multi
would be to allow more performant implementations and that solution will
require the single case to be at least as performant as what we're
proposing to do today, I believe.

Or are you thinking that we would never support calling user-defined
functions in any RLS scheme because we want to be able to do that
optimization?  I don't see that being acceptable from a feature
standpoint.

 Alternatively, we could:
 
 - Require the user to specify in some way which of the available
 policies they want applied, and then apply only that one.

I'd want to at least see a way to apply an ordering to the policies
being applied, or have PG work out which one is cheapest and try that
one first.

 - Decide that such scenarios constitute misconfiguration. Throw an
 error and make the table owner or other relevant local authority fix
 it.

Having them all be OR'd together feels simpler and easier to work with
than trying to provide the user with all the knobs necessary to select
which subset of users they want the policy applied to when (user X from
IP range a.b.c.d/24 at time 1500).  We could probably make it work with
exclusion constraints, range types, etc, and perhaps it'd be a reason to
bring btree_gist into core (which I'm all for) and make it work with
catalog tables, but... just 'yuck' all around, for my part.

  Sounds all rather painful and much better done programatically by the
  user in a language which is suited to that task- eg: pl/pgsql, perl, C,
  or something besides our ALTER syntax + catalog representation.
 
 I think exactly the opposite, for the query planning reasons
 previously stated.  I think the policies will quickly get so
 complicated that they're no longer optimizable.  Here's a simple
 example:
 
 - Policy 1 allows the user to access rows for which complexfunc() returns 
 true.
 - Policy 2 allows the user to access rows for which a = 1.
 
 Most users have access only through policy 2, but some have access
 through policy 1.  Users who have access through policy 1 

Re: [HACKERS] API change advice: Passing plan invalidation info from the rewriter into the planner?

2014-06-17 Thread Robert Haas
On Thu, Jun 12, 2014 at 6:33 PM, Gregory Smith gregsmithpg...@gmail.com wrote:
 I'm kind of surprised to see this turn into a hot button all of the sudden
 though, because my thought on all that so far has been a giant so what?
 This is what PostgreSQL does.
[...]
 But let's not act like RLS is a scary bogeyman because it introduces a new
 way to hack the server or get surprising side-effects.  That's expected and
 possibly unavoidable behavior in a feature like this, and there are much
 worse instances of arbitrary function risk throughout the core code already.

I have some technical comments on later emails in this thread, but
first let me address this point.  In the past, people have sometimes
complained that reviewers waited until very late in the cycle to
complain about issues which they found problematic.  By the time the
issues were pointed out, insufficient time remained before feature
freeze to get those issues addressed, causing the patch to slip out of
the release and provoking developer frustration.  It has therefore
been requested numerous times by numerous people that potential issues
be raised as early as possible.

The concerns that I have raised in this thread are not new; I have
raised them before.  However, we are now at the beginning of a new
development cycle, and it seems fair to assume that the people who are
working on this patch are hoping very much that something will get
committed to 9.5.  Since it seems to me that there are unaddressed
issues with the design of this patch, I felt that it was a good idea
to make sure that those concerns were on the table right from the
beginning of the process, rather than waiting until the patch was on
the verge of commit or, indeed, already committed.  That is why, when
this thread was revived on June 10th, I decide that it was a good time
to again comment on the design points about which I was concerned.

After sending that one (1) email, I was promptly told that I'm very
disappointed to hear that the mechanical pieces around making RLS easy
for users to use ... is receiving such push-back.  The push-back, at
that point in time, consisted of one (1) email.  Several more emails
have been sent that time, including the above-quoted text, seeming to
me to imply that the people who are concerned about this feature are
being unreasonable.  I don't believe I am the only such person,
although I may be the main one right at the moment, and you may not be
entirely surprised to hear that I don't think I'm being unreasonable.

I will admit that my initial email may have contained just a touch of
hyperbole.  But I won't admit to more than a touch, and frankly, I
think it was warranted.  I perfectly well understand that people
really, really, really want this feature, and if I hadn't understood
that before, I certainly understand it now.  However, I believe that
there has been a lack of focus in the development of the patch thus
far in a couple of key areas - first in terms of articulating how it
is different from and better than a writeable security barrier view,
and second on how to manage the security and operational aspects of
having a feature like this.  I think that the discussion subsequent to
my June 10th email has let to some good discussion on both points,
which was my intent, but I still think much more time and thought
needs to be spent on those issues if we are to have a feature which is
up to our usual standards.  I do apologize to anyone who interpreted
that initial as a pure rant, because it really wasn't intended that
way.  Contrariwise, I hope that the people defending this patch will
admit that the issues I am raising are real and focus on whether and
how those concerns can be addressed.

Thanks,

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] API change advice: Passing plan invalidation info from the rewriter into the planner?

2014-06-17 Thread Robert Haas
On Fri, Jun 13, 2014 at 3:11 AM, Dean Rasheed dean.a.rash...@gmail.com wrote:
 Yeah, I was thinking something like this could work, but I would go
 further. Suppose you had separate GRANTable privileges for direct
 access to individual tables, bypassing RLS, e.g.

   GRANT DIRECT SELECT|INSERT|UPDATE|DELETE ON table_name TO role_name

So, is this one new privilege (DIRECT) or four separate new privileges
that are variants of the existing privileges (DIRECT SELECT, DIRECT
INSERT, DIRECT UPDATE, DIRECT DELETE)?

 Actually, given the fact that the majority of users won't be using
 RLS, I would be tempted to invert the above logic and have the new
 privilege be for LIMITED access (via RLS quals). So a user granted the
 normal SELECT privilege would be able to bypass RLS, but a user only
 granted LIMITED SELECT wouldn't.

Well, for the people who are not using RLS, there's no difference
anyway.  I think it matters more what users of RLS will expect from a
command like GRANT SELECT ... and I'm guessing they'll prefer that RLS
always apply unless they very specifically grant the right for RLS to
not apply.  I might be wrong, though.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] API change advice: Passing plan invalidation info from the rewriter into the planner?

2014-06-17 Thread Robert Haas
On Thu, Jun 12, 2014 at 8:13 PM, Stephen Frost sfr...@snowman.net wrote:
 I'm in full agreement we should clearly communicate the issues around
 pg_dump in particular, because they can't necessarily be eliminated
 altogether without some major work that's going to take a while to finish.
 And if the work-around is some sort of GUC for killing RLS altogether,
 that's ugly but not unacceptable to me as a short-term fix.

 A GUC which is enable / disable / error-instead may work quiet well, with
 error-instead for pg_dump default if people really want it (there would have
 to be a way to disable that though, imv).

 Note that enable is default in general, disable would be for superuser only
 (or on start-up) to disable everything, and error-instead anyone could use
 but it would error instead of implementing RLS when querying an RLS-enabled
 table.

 This approach was suggested by an existing user testing out this RLS
 approach, to be fair, but it looks pretty sane to me as a way to address
 some of these concerns. Certainly open to other ideas and thoughts though.

In general, I agree that this is a good approach.  I think it will be
difficult to have a GUC with three values, one of which is
superuser-only and the other two of which are not.  I don't think
there's any precedent for something like that in the existing
framework, and I think it's likely we'll run into unpleasant corner
cases if we try to graft it in.  Also, I think we need to separate
things: whether the system is willing to allow the user to access the
table without RLS, and whether the user is willing to accept RLS if
the system deems it necessary.

For the first one, two solutions have been proposed.  The initial
proposal was to insist on RLS except for the superuser (and maybe the
table owner?).  Having a separate grantable privilege, as Dean
suggests, may be better.  I'll reply separately to that email also, as
I have a question about what he's proposing.

For the second one, I think the two most useful behaviors are normal
mode - i.e. allow access to the table, applying RLS predicates if
required and not applying them if I am exempt - and error-instead
mode - i.e. if my access to this table would be mediated by an RLS
predicate, then throw an error instead.  There's a third mode which
might be useful as well, which is even though I have the *right* to
bypass the RLS predicate on this table, please apply the predicate
anyway.  This could be used by the table owner in testing, for
example.  Here again, the level of granularity we want to provide is
an interesting question.  Having a GUC (e.g. enable_row_level_security
= on, off, force) would be adequate for pg_dump, but allowing the
table name to be qualified in the query, as proposed downthread, would
be more granular, admittedly at some parser cost.  I'm personally of
the view that we *at least* need the GUC, because that seems like the
best way to secure pg_dump, and perhaps other applications.  We can
and should give pg_dump an--allow-row-level-security flag, I think,
but pg_dump's default behavior should be to configure the system in
such a way that the dump will fail rather than complete with a subset
of the data.  I'm less sure whether we should have something that can
be used to qualify table names in particular queries.  I think it
would be really useful, but I'm concerned that it will require
creating additional fully-reserved keywords, which are somewhat
painful for users.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] API change advice: Passing plan invalidation info from the rewriter into the planner?

2014-06-17 Thread Stephen Frost
Robert,

On Tuesday, June 17, 2014, Robert Haas robertmh...@gmail.com wrote:

 After sending that one (1) email, I was promptly told that I'm very
 disappointed to hear that the mechanical pieces around making RLS easy
 for users to use ... is receiving such push-back.  The push-back, at
 that point in time, consisted of one (1) email.  Several more emails
 have been sent that time, including the above-quoted text, seeming to
 me to imply that the people who are concerned about this feature are
 being unreasonable.  I don't believe I am the only such person,
 although I may be the main one right at the moment, and you may not be
 entirely surprised to hear that I don't think I'm being unreasonable.


I'm on my phone at the moment but that looks like a quote from me. My email
and concern there was regarding the specific suggestion that we could check
off the RLS capability which users have been asking us to provide nearly
since I started with PG by saying that they could use Updatable SB views. I
did not intend it as a comment regarding the specific technical concerns
raised and have been responding to and trying to address those
independently and openly.

I've expressed elsewhere on this thread my gratitude that the technical
concerns are being brought up now, near the beginning of the cycle, so we
can address them. I've been working with others who are interested in RLS
on a wiki page to outline and understand the options and identify
dependencies and priorities. Hopefully the link will be posted shortly
(again, not at a computer right now) and we can get comments back. There
are some very specific questions which really need to be addressed and
which I've mentioned before (in particular the question of what user the
functions in a view definition should run as, both for normal views, for
SB views, and for when an RLS qual is included and run through that
framework, and if doing so would address some of the concerns which have
been raised regarding selects running code).

Thanks,

Stephen


Re: [HACKERS] API change advice: Passing plan invalidation info from the rewriter into the planner?

2014-06-17 Thread Robert Haas
On Mon, Jun 16, 2014 at 1:15 AM, Stephen Frost sfr...@snowman.net wrote:
 I'm not referring to the proposed implementation particularly; or at
 least not that aspect of it.  I don't think trying to run the view
 quals as the defining user is likely to be very appealing, because I
 think it's going to hurt performance, for example by preventing
 function inlining and requiring lots of user-ID switches.

 I understand that there are performance implications.  As mentioned to
 Tom, realistically, there's zero way to optimized at least some of these
 use-cases because they require a completely external module (eg:
 SELlinux) to be involved in the decision about who can view what
 records.  If we can optimize that, it'd be by a completely different
 approach whereby we pull up the qual higher because we know the whole
 query only involves leakproof functions or similar, allowing us to only
 apply the filter to the final set of records prior to them being
 returned to the user.  The point being that such optimizations would
 happen independently and regardless of the quals or user-defined
 functions involved.  At the end of the day, I can't think of a better
 optimization for such a case (where we have to ask an external security
 module if a row is acceptable to return to the user) than that.  Is
 there something specific you're thinking about that we'd be missing out
 on?

Yeah, if we have to ask an external security module a question for
each row, there's little hope of any real optimization.  However, I
think there will be a significant number of cases where people will
want filtering clauses that can be realized by doing an index scan
instead of a sequential scan, and if we end up forcing a sequential
scan anyway, the feature will be useless to those people.

 But I'm not
 gonna complain if someone wants to mull it over and make a proposal
 for how to make it work.  Rather, my concern is that all we've got is
 what might be called the core of the feature; the actual guts of it.
 There are a lot of ancillary details that seem to me to be not worked
 out at all yet, or only half-baked.

 Perhaps it's just my experience, but I've been focused on the main core
 feature for quite some time and it feels like we're really close to
 having it there.  I agree that a few additional bits would be nice to
 have but these strike me as relatively straight-forward to implement
 overtop of this general construct.  I do see value in documenting these
 concerns and will see about making that happen, along with what the
 general viewpoints and thoughts are about how to address the concern.

I feel like there's quite a bit of work left to do around these
issues.  The technical bits may not be too hard, but deciding what we
want will take some thought and discussion.

  The current approach allows a nearly unlimited level of flexibility,
  should the user wish it, by being able to run user-defined code.
  Perhaps that would be considered 'one policy', but it could certainly
  take under consideration the calling user, the object being queried
  (if a function is defined per table, or if we provide a way to get
  that information in the function), etc.

 In theory, that's true.  But in practice, performance will suck unless
 the security qual is easily optimizable.  If your security qual is
 WHERE somecomplexfunction() you're going to have to implement that by
 sequential-scanning the table and evaluating the function for each
 row.

 That's not actualy true today, is it?  Given our leak-proof attribute,
 if the qual is WHERE somecomplexfunction() AND leakprooffunctionx()
 then we would be able to push down the leak-proof function and not
 necessairly run a straight sequential scan, no?  Even so, though, we've
 had users who have tested exactly what this patch implements and they've
 been happy with their real-world use-cases.  I'm certainly all for
 optimization and would love to see us make this better for everyone, but
 I don't view that as a reason to delay this particular feature which is
 really just bringing us up to parity with other RDMBS's.

I'm a bit confused here, because your example seems to be totally
different from my example.  In my example, somecomplexfunction() will
get pushed down because it's the security qual; that needs to be
inside the security_barrier view, or a malicious user can subvert the
system by getting some other qual evaluated first.  In your example,
you seem to be imagining WHERE somecomplexfunction() AND
leakprooffunctionx() as queries sent by the untrusted user, in which
case, yet, the leak-proof one will get pushed down and the other one
will not.

 But that's probably not going to perform very well, because to match
 an index on sales_rep_id, or an index on partner_id, that's going to
 have to get simplified a whole lot, and that's probably not going to
 happen.  If we've only got one branch of the OR, I think we'll realize
 we can evaluate the subquery as an InitPlan and then use an 

Re: [HACKERS] API change advice: Passing plan invalidation info from the rewriter into the planner?

2014-06-17 Thread Brightwell, Adam
Robert,

However, I believe that
 there has been a lack of focus in the development of the patch thus
 far in a couple of key areas - first in terms of articulating how it
 is different from and better than a writeable security barrier view,
 and second on how to manage the security and operational aspects of
 having a feature like this.  I think that the discussion subsequent to
 my June 10th email has let to some good discussion on both points,
 which was my intent, but I still think much more time and thought
 needs to be spent on those issues if we are to have a feature which is
 up to our usual standards.  I do apologize to anyone who interpreted
 that initial as a pure rant, because it really wasn't intended that
 way.  Contrariwise, I hope that the people defending this patch will
 admit that the issues I am raising are real and focus on whether and
 how those concerns can be addressed.


I absolutely appreciate all of the feedback that has been provided.  It has
been educational.  To your point above, I started putting together a wiki
page, as Stephen has spoken to, that is meant to capture these concerns and
considerations as well as to capture ideas around solutions.

https://wiki.postgresql.org/wiki/Row_Security_Considerations

This page is obviously not complete, but I think it is a good start.
Hopefully this document will help to continue the conversation and assist
in addressing all the concerns that have been brought to the table.  As
well, I hope that this document serves to demonstrate our intent and that
we *are* taking these concerns seriously.  I assure you that as one of the
individuals who is working towards the acceptance of this feature/patch, I
am very much concerned about meeting the expected standards of quality and
security.

Thanks,
Adam


Re: [HACKERS] API change advice: Passing plan invalidation info from the rewriter into the planner?

2014-06-17 Thread Stephen Frost
* Robert Haas (robertmh...@gmail.com) wrote:
 On Thu, Jun 12, 2014 at 8:13 PM, Stephen Frost sfr...@snowman.net wrote:
  This approach was suggested by an existing user testing out this RLS
  approach, to be fair, but it looks pretty sane to me as a way to address
  some of these concerns. Certainly open to other ideas and thoughts though.
 
 In general, I agree that this is a good approach.  I think it will be
 difficult to have a GUC with three values, one of which is
 superuser-only and the other two of which are not.  I don't think
 there's any precedent for something like that in the existing
 framework, and I think it's likely we'll run into unpleasant corner
 cases if we try to graft it in.  Also, I think we need to separate
 things: whether the system is willing to allow the user to access the
 table without RLS, and whether the user is willing to accept RLS if
 the system deems it necessary.

Good point- I agree that it's best to avoid having to support individual
superuser-only only options on a GUC.  Also, addressing the issues
independently also makes sense to me.

 For the first one, two solutions have been proposed.  The initial
 proposal was to insist on RLS except for the superuser (and maybe the
 table owner?).  Having a separate grantable privilege, as Dean
 suggests, may be better.  I'll reply separately to that email also, as
 I have a question about what he's proposing.

I like the idea of a grantable privilege as it allows the granularity
that some users may require (or be frustrated that we don't have it).

 For the second one, I think the two most useful behaviors are normal
 mode - i.e. allow access to the table, applying RLS predicates if
 required and not applying them if I am exempt - and error-instead
 mode - i.e. if my access to this table would be mediated by an RLS
 predicate, then throw an error instead.  

Right, makes sense.

 There's a third mode which
 might be useful as well, which is even though I have the *right* to
 bypass the RLS predicate on this table, please apply the predicate
 anyway.  This could be used by the table owner in testing, for
 example.  

Agreed, this sounds very useful too.

 Here again, the level of granularity we want to provide is
 an interesting question.  Having a GUC (e.g. enable_row_level_security
 = on, off, force) would be adequate for pg_dump, but allowing the
 table name to be qualified in the query, as proposed downthread, would
 be more granular, admittedly at some parser cost.  I'm personally of
 the view that we *at least* need the GUC, because that seems like the
 best way to secure pg_dump, and perhaps other applications.  We can
 and should give pg_dump an--allow-row-level-security flag, I think,
 but pg_dump's default behavior should be to configure the system in
 such a way that the dump will fail rather than complete with a subset
 of the data.  

This sounds good to me.

 I'm less sure whether we should have something that can
 be used to qualify table names in particular queries.  I think it
 would be really useful, but I'm concerned that it will require
 creating additional fully-reserved keywords, which are somewhat
 painful for users.

I've been trying to think of the use-case for this.  It certainly
*sounds* nice, but on reflection, the use-case for this seems to me to
be that you're trying to develop some application which will be
constrained by RLS totally and therefore want to flip back-and-forth
between RLS on and RLS off (for the tables involved).  When would
you really need, in the same query, to have RLS enabled for table X but
disabled for table Y?  I do like the idea of an *independent* option to
(just) COPY which says give me all the data or error, independent of
the GUC for the same purpose.  Would be curious to hear what others
think of that proposal.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] API change advice: Passing plan invalidation info from the rewriter into the planner?

2014-06-17 Thread Stephen Frost
* Robert Haas (robertmh...@gmail.com) wrote:
 On Fri, Jun 13, 2014 at 3:11 AM, Dean Rasheed dean.a.rash...@gmail.com 
 wrote:
  Yeah, I was thinking something like this could work, but I would go
  further. Suppose you had separate GRANTable privileges for direct
  access to individual tables, bypassing RLS, e.g.
 
GRANT DIRECT SELECT|INSERT|UPDATE|DELETE ON table_name TO role_name
 
 So, is this one new privilege (DIRECT) or four separate new privileges
 that are variants of the existing privileges (DIRECT SELECT, DIRECT
 INSERT, DIRECT UPDATE, DIRECT DELETE)?

I had taken it to be a single privilege, but you're right, it could be
done for each of those..  I really don't think we have the bits for more
than one case here though (if that) without a fair bit of additional
rework.  I'm not against that rework (and called for it wayyy back when
I proposed the TRUNCATE privilege, as I recall) but that's a whole
different challenge and no small bit of work..

  Actually, given the fact that the majority of users won't be using
  RLS, I would be tempted to invert the above logic and have the new
  privilege be for LIMITED access (via RLS quals). So a user granted the
  normal SELECT privilege would be able to bypass RLS, but a user only
  granted LIMITED SELECT wouldn't.
 
 Well, for the people who are not using RLS, there's no difference
 anyway.  I think it matters more what users of RLS will expect from a
 command like GRANT SELECT ... and I'm guessing they'll prefer that RLS
 always apply unless they very specifically grant the right for RLS to
 not apply.  I might be wrong, though.

The preference from the folks using RLS that I've talked to is
absolutely that it be applied by default for all 'normal' (eg:
non-pg_dump) sessions.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] API change advice: Passing plan invalidation info from the rewriter into the planner?

2014-06-17 Thread Stephen Frost
* Robert Haas (robertmh...@gmail.com) wrote:
 On Mon, Jun 16, 2014 at 1:15 AM, Stephen Frost sfr...@snowman.net wrote:
  I understand that there are performance implications.  As mentioned to
  Tom, realistically, there's zero way to optimized at least some of these
  use-cases because they require a completely external module (eg:
  SELlinux) to be involved in the decision about who can view what
  records.  If we can optimize that, it'd be by a completely different
  approach whereby we pull up the qual higher because we know the whole
  query only involves leakproof functions or similar, allowing us to only
  apply the filter to the final set of records prior to them being
  returned to the user.  The point being that such optimizations would
  happen independently and regardless of the quals or user-defined
  functions involved.  At the end of the day, I can't think of a better
  optimization for such a case (where we have to ask an external security
  module if a row is acceptable to return to the user) than that.  Is
  there something specific you're thinking about that we'd be missing out
  on?
 
 Yeah, if we have to ask an external security module a question for
 each row, there's little hope of any real optimization.  However, I
 think there will be a significant number of cases where people will
 want filtering clauses that can be realized by doing an index scan
 instead of a sequential scan, and if we end up forcing a sequential
 scan anyway, the feature will be useless to those people.

I agree that we want to support that, if we can do so reasonably.  What
I was trying to get at is simply this- don't we provide that already
with the leakproof attribute and functions?  If we don't have enough
there to allow index scans then we should be looking to add more, I'm
thinking.

  Perhaps it's just my experience, but I've been focused on the main core
  feature for quite some time and it feels like we're really close to
  having it there.  I agree that a few additional bits would be nice to
  have but these strike me as relatively straight-forward to implement
  overtop of this general construct.  I do see value in documenting these
  concerns and will see about making that happen, along with what the
  general viewpoints and thoughts are about how to address the concern.
 
 I feel like there's quite a bit of work left to do around these
 issues.  The technical bits may not be too hard, but deciding what we
 want will take some thought and discussion.

I agree on this point, but I'm still hopeful that we'll be able to get a
good feature into 9.5.  There are quite a few resources available for
the 'just programming' part, so the long pole in the tent here is
absolutely hashing out what we want and how it should function.

I'd be happy to host or participate in a conference call or similar if
that would be useful to move this along- or we can continue to
communicate via email.  There's a bit of a lull in conferences to which
I'm going to right now, so in person is unlikely, unless folks want to
get together somewhere on the east coast (I'd be happy to travel to
Philly, Pittsburgh, NYC, etc, if it'd help..).

  That's not actualy true today, is it?  Given our leak-proof attribute,
  if the qual is WHERE somecomplexfunction() AND leakprooffunctionx()
  then we would be able to push down the leak-proof function and not
  necessairly run a straight sequential scan, no?  Even so, though, we've
  had users who have tested exactly what this patch implements and they've
  been happy with their real-world use-cases.  I'm certainly all for
  optimization and would love to see us make this better for everyone, but
  I don't view that as a reason to delay this particular feature which is
  really just bringing us up to parity with other RDMBS's.
 
 I'm a bit confused here, because your example seems to be totally
 different from my example.  In my example, somecomplexfunction() will
 get pushed down because it's the security qual; that needs to be
 inside the security_barrier view, or a malicious user can subvert the
 system by getting some other qual evaluated first.  In your example,
 you seem to be imagining WHERE somecomplexfunction() AND
 leakprooffunctionx() as queries sent by the untrusted user, in which
 case, yet, the leak-proof one will get pushed down and the other one
 will not.

Right- my point there was that the leakproof one might allow an index
scan to be run.  This is all pretty hand-wavey, I admit, so I'll see if
I can get more details about how the currently-proposed patch is
performing for the users who are testing it and what kind of plans
they're seeing.  If that falls through, I'll try and build up my own set
of realistic-looking (to myself and the users who are testing) example.

  What solution did you come up with for this case, which performed well
  and was also secure..?
 
 I put the logic in the client.  :-(

Well, that's not helpful here. ;)

  I don't want to overstate the 

Re: [HACKERS] API change advice: Passing plan invalidation info from the rewriter into the planner?

2014-06-16 Thread Stephen Frost
Dean,

* Dean Rasheed (dean.a.rash...@gmail.com) wrote:
 On 13 June 2014 01:13, Stephen Frost sfr...@snowman.net wrote:
  This approach was suggested by an existing user testing out this RLS
  approach, to be fair, but it looks pretty sane to me as a way to address
  some of these concerns. Certainly open to other ideas and thoughts though.
 
 Yeah, I was thinking something like this could work, but I would go
 further. Suppose you had separate GRANTable privileges for direct
 access to individual tables, bypassing RLS, e.g.
 
   GRANT DIRECT SELECT|INSERT|UPDATE|DELETE ON table_name TO role_name

This is certainly an interesting idea and I'm glad we're getting this
level of discussion early on in the 9.5 cycle as I'd really like to see
a good solution implemented for 9.5.

I've been going back-and-forth about this and what's really swaying me
right now is that it'd be nearly impossible to determine if a given RLS
qual actually allows full access to a table for a given user without
going through the entire table and testing the qual against each row.
With this GRANT ability, we'd be able to completely avoid calling the
RLS quals when the user is granted this right.

Not sure offhand how many bits we've got left at the per-table level
though; we added TRUNCATE rights not that long ago and this looks like
another good right to add, but there are only so many bits available..
At the same time, I do think this is something we could also add later,
perhaps after figuring out a good way to extend the set of bits
available for privileges on tables.

 Combined with the GUC (direct_table_access, say) to request direct
 access to all tables. Then with direct_table_access = true/required, a
 SELECT from a table would error if the user hadn't been granted the
 DIRECT SELECT privilege on all the tables referenced in the query.

I can see this working.  One thing I'm curious about is if we would want
to support this inside of the SELECT statement (or perhaps COPY?)
directly, rather than making a user have to flip a GUC back and forth
while they're doing something.  I can imagine, during testing, a session
looking like this:

select * from table;
@#@!$!
set direct_table_access = true;
select * from table;
select * from table where blah = x;
alter table set row level security blah = x;
select * from table;
select * from table;
select * from table;
@!#$!@#!
set direct_table_access = false;
select * from table;
...

Would 'select direct' or 'select * from DIRECT table' (or maybe 'ONLY'?)
be workable?  There's certainly SQL standard concerns to be thought of
here which might precldue anything we do with SELECT, but we could
support something with COPY.

 Tools like pg_dump would require direct_table_access, but there might
 be other levels of access that didn't error out.

pg_dump would need an option to set direct_table_access or not.  Having
it ask by default is acceptable to me, but I do think we need to be able
to tell it to *not* set that.

 I think if I were using RLS, I would definitely want/expect this level
 of fine-grained control over permissions on a per-table basis, rather
 than the superuser/non-superuser level of control, or having
 RLS-exempt users.

I agree that it'd be great to have- and we need to make sure we don't
paint ourselves into a corner with the initial versions.  What I'm
worried about is that we're going to end up feature-creeping this to
death and ending up with nothing in 9.5.  I'll try to get a wiki page
going to discuss these items (as mentioned up-thread) and we can look at
prioritizing them and looking at what dependencies exist on other parts
of the system and seeing what's required for the initial version.

 Actually, given the fact that the majority of users won't be using
 RLS, I would be tempted to invert the above logic and have the new
 privilege be for LIMITED access (via RLS quals). So a user granted the
 normal SELECT privilege would be able to bypass RLS, but a user only
 granted LIMITED SELECT wouldn't.

This I don't agree with- it goes against what is done on existing
systems afaik and part of the idea is that you can minimize changes to
the applications or users but still be able to curtail what they can
see.  Making regular SELECTs start erroring if they haven't set some GUC
because RLS has been implemented on a given table would be quite
annoying, imv.

Now, that said, wouldn't the end user be able to control this for their
particular environment by setting the GUC accordingly in
postgresql.conf?  I'd still argue that it should be defaulted to what I
view as the 'normal' case, where RLS is applied unless you asked for
your queries to error instead, but if a user wants to have it flipped
around the other way, they could update their postgresql.conf to make it
so.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] API change advice: Passing plan invalidation info from the rewriter into the planner?

2014-06-16 Thread Kevin Grittner
Stephen Frost sfr...@snowman.net wrote:
 Kevin Grittner (kgri...@ymail.com) wrote:

 The proposed approach would leave the validity of any dump which
 was not run as a superuser in doubt.  The last thing we need, in
 terms of improving security, is another thing you can't do
 without connecting as a superuser.

 Any dump not run by a superuser is already in doubt, imv.  That
 is a problem we already have which really needs to be addressed,
 but I view that as an independent issue.

I'm not seeing that.  If the user can't dump, you get an error and
pg_dump returns something other than SUCCESS.

test=# create user bob;
CREATE ROLE
test=# create user tom;
CREATE ROLE
test=# set role bob;
SET
test= create table person(person_id int primary key, name text not null, ssn 
text);
CREATE TABLE
test= insert into person values (1, 'Stephen Frost', '123-45-6789');
INSERT 0 1
test= insert into person values (2, 'Kevin Grittner');
INSERT 0 1
test= grant select (person_id, name) on person to tom;
GRANT
test= \q
kgrittn@Kevin-Desktop:~/pg/master$ pg_dump -U bob test bob-backup.sql
kgrittn@Kevin-Desktop:~/pg/master$ pg_dump -U tom test tom-backup.sql
pg_dump: [archiver (db)] query failed: ERROR:  permission denied for relation 
person
pg_dump: [archiver (db)] query was: LOCK TABLE public.person IN ACCESS SHARE 
MODE
kgrittn@Kevin-Desktop:~/pg/master$ echo $?
1

 I agree with avoiding adding another superuser-only capability;
 see the other sub-thread about making this a per-user capability.

It should be possible to design something which does not have this
risk.  What I was saying was that what was being described at that
point wasn't it, and IMV was not acceptable.  I think that there
should never by any doubt that a pg_dump run which completes
without error copied all requested tables in their entirety, not a
subset of the rows in the tables.

A GUC which only caused an error on the attempt to actually read
specific rows which the user does not have permission to see would
leak too much information.  A GUC which caused a SELECT or COPY
from a table to throw an error if the user was not entitled to see
all rows in the table could work.  Another thing which could work,
if it can be coded, would be a GUC which would throw an error if
the there were not quals on the query to prohibit seeing rows which
the security conditions would prohibit, whether or not any matching
rows actually existed.  The latter would match the behavior of
column level security -- you get an error when trying to select a
prohibited column even if there are no rows in the table.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] API change advice: Passing plan invalidation info from the rewriter into the planner?

2014-06-16 Thread Stephen Frost
* Kevin Grittner (kgri...@ymail.com) wrote:
 Stephen Frost sfr...@snowman.net wrote:
  Any dump not run by a superuser is already in doubt, imv.  That
  is a problem we already have which really needs to be addressed,
  but I view that as an independent issue.
 
 I'm not seeing that.  If the user can't dump, you get an error and
 pg_dump returns something other than SUCCESS.

We've outlined an approach with RLS which would do the same.

I'm still of the opinion that, today, we have a problem that only a
superuser-run dump has any chance of success (and even if you get it
working today it'll probably break tomorrow, and you had better be
paying attention).  I'd like to fix that situation, but it's an
independent effort from this.  We've had issues in the past with pg_dump
creating things that can't be restored and they're certainly bugs but
trying to make that work with a regular user as a whole system backup
strategy, today, is just asking for trouble.

  I agree with avoiding adding another superuser-only capability;
  see the other sub-thread about making this a per-user capability.
 
 It should be possible to design something which does not have this
 risk.  

The risk that pg_dump might create a dump which can't be restored?
Agreed, and I'd love to hear your thoughts on the proposal.

 What I was saying was that what was being described at that
 point wasn't it, and IMV was not acceptable.  I think that there
 should never by any doubt that a pg_dump run which completes
 without error copied all requested tables in their entirety, not a
 subset of the rows in the tables.

pg_dump needs to be able to have an option to go either way on this
case, as I can see value in running pg_dump in RLS-enforcing mode, but
it could default to error-if-RLS.

 A GUC which only caused an error on the attempt to actually read
 specific rows which the user does not have permission to see would
 leak too much information.  A GUC which caused a SELECT or COPY
 from a table to throw an error if the user was not entitled to see
 all rows in the table could work.

Right- this would be the 'DIRECT SELECT' which would allow bypassing all
RLS and therefore mean that the user is allowed to see ALL rows of a
table.  That's one of the reasons why I agree with Dean's approach,
because we really need to know at the outset if the calling user is
allowed to extract all rows from a table or not- we can't go looking
through the entire table testing each row before we start running the
query.

   Another thing which could work,
 if it can be coded, would be a GUC which would throw an error if
 the there were not quals on the query to prohibit seeing rows which
 the security conditions would prohibit, whether or not any matching
 rows actually existed.  

If I'm following you correctly, this would be an optimization that
allows avoiding RLS in the case where some information about the user
causes the overall qual to always return 'true', correct?  I'd certainly
like to see what happens in that case today and agree that it'd be great
to optimize for and perhaps even allow a user for which that is true to
not need the 'DIRECT SELECT' privilege, but in practice, I don't think
it'll be possible in most cases (certainly not in the case where an
external security module is deciding the access) and the optimization
may not be worth it.

 The latter would match the behavior of
 column level security -- you get an error when trying to select a
 prohibited column even if there are no rows in the table.

Agreed, but that would be a relaxation of the proposed approach and
therefore something which could be added later, if it's deemed
worthwhile.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] API change advice: Passing plan invalidation info from the rewriter into the planner?

2014-06-15 Thread Stephen Frost
* Tom Lane (t...@sss.pgh.pa.us) wrote:
 Adam Brightwell adam.brightw...@crunchydatasolutions.com writes:
  Through this effort, we have concluded that for RLS the case of
  invalidating a plan is only necessary when switching between a superuser
  and a non-superuser.  Obviously, re-planning on every role change would be
  too costly, but this approach should help minimize that cost.  As well,
  there were not any cases outside of this one that were immediately apparent
  with respect to RLS that would require re-planning on a per userid basis.
 
 Hm ... I'm not following why we'd need a special case for superusers and
 not anyone else?  Seems like any useful RLS scheme is going to require
 more privilege levels than just superuser and not-superuser.

Just to clarify this- the proposal allows RLS to be implemented
essentially by any user-defined qual, where that qual can include the
current user, the IP the user is connecting from, or more-or-less
anything else, possibly even via a user-defined function or security
module.  It is not superuser-or-not.  This discussion is about how to
support users for which RLS should not be applied.  I can see that being
useful at a more granular level than superuser-or-not, but even at that
level, RLS is still extremely useful.

 Could we put the if superuser then ok test into the RLS condition test
 and thereby not need more than one plan at all?

As discussed, that unfortunately doesn't quite work.

This discussion, in general, has been quite useful and I'll work on
adding documentation to the wiki pages which discusses the consideration
and suggestions for a GUC to disable-or-error when RLS is encountered,
along with a per-role capability to bypass RLS; that is in line with the
goal of avoiding adding superuser-specific capabilities.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] API change advice: Passing plan invalidation info from the rewriter into the planner?

2014-06-15 Thread Stephen Frost
* Tom Lane (t...@sss.pgh.pa.us) wrote:
 Craig Ringer cr...@2ndquadrant.com writes:
  I agree, and now that the urgency of trying to deliver this for 9.4 is
  over it's worth seeing if we can just run as table owner.
 
  Failing that, we could take the approach a certain other RDBMS does and
  make the ability to define row security quals a GRANTable right
  initially held only by the superuser.
 
 Hmm ... that might be a workable compromise.  I think the main issue here
 is whether we expect that RLS quals will be something that the planner
 could optimize to any meaningful extent.  If they're always (in effect)
 wrapped in SECURITY DEFINER functions, I think that largely blocks any
 optimizations; but maybe that wouldn't matter in practice.

From what I've heard from actual users with other RDBMS's who are coming
to PostgreSQL- the reality is that they're going to be using a security
module (eg: SELinux) whose responsibility it is to manage this whole
question of can this user see this row, meaning there's zero chance of
optimization.

I'd certainly like to see the ability to optimize remain in cases where
the qual itself gives us a way to filter (eg: a table partitioned based
on some security level, where another table maps users to levels), but
that is, from a practical standpoint, not an immediate concern from real
users and I don't believe our approach paints us into a corner which
would prevent that.  What that would require is better support for true
partitioning rather than constraint exclusions.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] API change advice: Passing plan invalidation info from the rewriter into the planner?

2014-06-15 Thread Stephen Frost
Robert,

* Robert Haas (robertmh...@gmail.com) wrote:
 On Wed, Jun 11, 2014 at 8:59 PM, Stephen Frost sfr...@snowman.net wrote:
  In this case the user-defined code needs to return a boolean.  We don't
  currently do anything to prevent it from having side-effects, no, but
  the same is true with views which incorporate functions.  I agree that
  it makes a difference when compared to column-level privileges, but my
  point was that we have provided easier ways to do things which were
  possible using more complicated methods before.  Perhaps the risk with
  RLS is higher but these issues look managable to me and the level of
  doubt about our ability to provide this feature in a reasonable and
  principled way that our users will understand surprises me.
 
 I'm glad the issues look manageable to you, but you haven't really
 explained how to manage them.  

There's been a number of suggestions made and it'd be great to get more
feedback on them- running the quals as the table owner, having a GUC
which can be set to either run 'as normal' or either ignore RLS (if the
user has that right) or error out if RLS would happen, and undoubtably
there are other ideas along those same lines to address the pg_dump and
other concerns.

 For my part, I'm mildly surprised that anyone thinks it's a good idea
 to have SELECT * FROM tab to mean different things depending on who is
 typing it.  

Realistically, in the RDBMS realm in which we're in and that we're
working to break into- this is absolutely a given and expected.  It's
new to PostgreSQL, certainly, but it's not uncommon or surprising at all
in our industry.

 To me, that seems very confusing; how does an unprivileged
 user with no ability to assume some other role validate that the row
 security policy they've configured works at all and exposes precisely
 the intended set of rows?  

While I see what you're getting at, I'm not convinced it's really all
that different from being set up without access to some schema or table
which the administrator setting up accounts didn't include for you.
Sure, in the case of a schema or table, you can get an error back
instead of just not seeing the data, but if you're looking for specific
data, chances are pretty good you'll realize the lack of data quickly
and ask the same question regarding access.

To wit, I've certainly had users ask exactly that question of- do I
have access to all the data in this table? even when using PG where
it's a bit tricky to limit such access.  Clearly, the same risk applies
when using views and so the question is understandable.  Perhaps these
were users with more experience in other RDBMS's where it's more common
to have RLS, but there are at least a couple cases which I can think of
where that wouldn't apply.

 Even aside from security exposures, how
 does a non-superuser who runs pg_dump know whether they've got a
 complete backup or a filtered dump that's missing some rows?  

This would be addressed with the GUC that's been proposed.  As would the
previous paragraph, though I wanted to apply to that independently.

 I'm not referring to the proposed implementation particularly; or at
 least not that aspect of it.  I don't think trying to run the view
 quals as the defining user is likely to be very appealing, because I
 think it's going to hurt performance, for example by preventing
 function inlining and requiring lots of user-ID switches.  

I understand that there are performance implications.  As mentioned to
Tom, realistically, there's zero way to optimized at least some of these
use-cases because they require a completely external module (eg:
SELlinux) to be involved in the decision about who can view what
records.  If we can optimize that, it'd be by a completely different
approach whereby we pull up the qual higher because we know the whole
query only involves leakproof functions or similar, allowing us to only
apply the filter to the final set of records prior to them being
returned to the user.  The point being that such optimizations would
happen independently and regardless of the quals or user-defined
functions involved.  At the end of the day, I can't think of a better
optimization for such a case (where we have to ask an external security
module if a row is acceptable to return to the user) than that.  Is
there something specific you're thinking about that we'd be missing out
on?

 But I'm not
 gonna complain if someone wants to mull it over and make a proposal
 for how to make it work.  Rather, my concern is that all we've got is
 what might be called the core of the feature; the actual guts of it.
 There are a lot of ancillary details that seem to me to be not worked
 out at all yet, or only half-baked.

Perhaps it's just my experience, but I've been focused on the main core
feature for quite some time and it feels like we're really close to
having it there.  I agree that a few additional bits would be nice to
have but these strike me as relatively straight-forward to 

Re: [HACKERS] API change advice: Passing plan invalidation info from the rewriter into the planner?

2014-06-15 Thread Stephen Frost
Kevin,

* Kevin Grittner (kgri...@ymail.com) wrote:
 Robert Haas robertmh...@gmail.com wrote:
  Even aside from security exposures, how
  does a non-superuser who runs pg_dump know whether they've got a
  complete backup or a filtered dump that's missing some rows?
 
 This seems to me to be a killer objection to the feature as
 proposed, and points out a huge difference between column level
 security and the proposed implementation of row level security. 

I really hate this notion of killer objection.  It's been discussed
(perhaps not seen by all) at least one suggestion for how to address
this specific issue and there are other ways in which to address it
(having COPY have the same behavior as the GUC being discussed, instead
of having a GUC, though I feel like the GUC is a better approach..).

 (In fact it is a difference between just about any GRANTed
 permission and row level security.)  If you try to SELECT * FROM
 sometable and you don't have rights to all the columns, you get an
 error.  A dump would always either work as expected or generate an
 error.

Provided you know all of the tables and other objects which need to be
included in such a partial dump (as a full dump, today, must be run by a
superuser to be sure you're actually getting everything anyway...).

 The proposed approach would leave the validity of any dump which
 was not run as a superuser in doubt.  The last thing we need, in
 terms of improving security, is another thing you can't do without
 connecting as a superuser.

Any dump not run by a superuser is already in doubt, imv.  That is a
problem we already have which really needs to be addressed, but I view
that as an independent issue.

I agree with avoiding adding another superuser-only capability; see the
other sub-thread about making this a per-user capability.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] API change advice: Passing plan invalidation info from the rewriter into the planner?

2014-06-13 Thread Dean Rasheed
On 13 June 2014 01:13, Stephen Frost sfr...@snowman.net wrote:
 Greg, all,

 I will reply to the emails in detail when I get a chance but am out of town
 at a funeral, so it'll likely be delayed. I did want to echo my agreement
 for the most part with Greg and in particular...

 On Thursday, June 12, 2014, Gregory Smith gregsmithpg...@gmail.com wrote:

 On 6/11/14, 10:26 AM, Robert Haas wrote:

 Now, as soon as we introduce the concept that selecting from a table
 might not really mean read from the table but read from the table after
 applying this owner-specified qual, we're opening up a whole new set of
 attack surfaces. Every pg_dump is an opportunity to hack somebody else's
 account, or at least audit their activity.


 I'm in full agreement we should clearly communicate the issues around
 pg_dump in particular, because they can't necessarily be eliminated
 altogether without some major work that's going to take a while to finish.
 And if the work-around is some sort of GUC for killing RLS altogether,
 that's ugly but not unacceptable to me as a short-term fix.


 A GUC which is enable / disable / error-instead may work quiet well, with
 error-instead for pg_dump default if people really want it (there would have
 to be a way to disable that though, imv).

 Note that enable is default in general, disable would be for superuser only
 (or on start-up) to disable everything, and error-instead anyone could use
 but it would error instead of implementing RLS when querying an RLS-enabled
 table.

 This approach was suggested by an existing user testing out this RLS
 approach, to be fair, but it looks pretty sane to me as a way to address
 some of these concerns. Certainly open to other ideas and thoughts though.


Yeah, I was thinking something like this could work, but I would go
further. Suppose you had separate GRANTable privileges for direct
access to individual tables, bypassing RLS, e.g.

  GRANT DIRECT SELECT|INSERT|UPDATE|DELETE ON table_name TO role_name

Combined with the GUC (direct_table_access, say) to request direct
access to all tables. Then with direct_table_access = true/required, a
SELECT from a table would error if the user hadn't been granted the
DIRECT SELECT privilege on all the tables referenced in the query.
Tools like pg_dump would require direct_table_access, but there might
be other levels of access that didn't error out.

I think if I were using RLS, I would definitely want/expect this level
of fine-grained control over permissions on a per-table basis, rather
than the superuser/non-superuser level of control, or having
RLS-exempt users.

Actually, given the fact that the majority of users won't be using
RLS, I would be tempted to invert the above logic and have the new
privilege be for LIMITED access (via RLS quals). So a user granted the
normal SELECT privilege would be able to bypass RLS, but a user only
granted LIMITED SELECT wouldn't.

Regards,
Dean


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] API change advice: Passing plan invalidation info from the rewriter into the planner?

2014-06-12 Thread Kevin Grittner
Robert Haas robertmh...@gmail.com wrote:

 Even aside from security exposures, how
 does a non-superuser who runs pg_dump know whether they've got a
 complete backup or a filtered dump that's missing some rows?

This seems to me to be a killer objection to the feature as
proposed, and points out a huge difference between column level
security and the proposed implementation of row level security. 
(In fact it is a difference between just about any GRANTed
permission and row level security.)  If you try to SELECT * FROM
sometable and you don't have rights to all the columns, you get an
error.  A dump would always either work as expected or generate an
error.

test=# create user bob;
CREATE ROLE
test=# create user bill;
CREATE ROLE
test=# set role bob;
SET
test= create table person (person_id int not null primary key,
name text not null, ssn text);
CREATE TABLE
test= grant select (person_id, name) on table person to bill;
GRANT
test= reset role;
RESET
test=# set role bill;
SET
test= select person_id, name from person;
 person_id | name 
---+--
(0 rows)

test= select * from person;
ERROR:  permission denied for relation person

The proposed approach would leave the validity of any dump which
was not run as a superuser in doubt.  The last thing we need, in
terms of improving security, is another thing you can't do without
connecting as a superuser.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] API change advice: Passing plan invalidation info from the rewriter into the planner?

2014-06-12 Thread Gregory Smith

On 6/11/14, 10:26 AM, Robert Haas wrote:
Now, as soon as we introduce the concept that selecting from a table 
might not really mean read from the table but read from the table 
after applying this owner-specified qual, we're opening up a whole 
new set of attack surfaces. Every pg_dump is an opportunity to hack 
somebody else's account, or at least audit their activity. 


I'm in full agreement we should clearly communicate the issues around 
pg_dump in particular, because they can't necessarily be eliminated 
altogether without some major work that's going to take a while to 
finish.  And if the work-around is some sort of GUC for killing RLS 
altogether, that's ugly but not unacceptable to me as a short-term fix.


One of the difficult design requests in my inbox right now asks how 
pg_dump might be changed both to reduce its overlap with superuser 
permissions and to allow auditing of its activity.  Those requests 
aren't going away; their incoming frequency is actually rising quite 
fast right now.  They're both things people expect from serious SQL 
oriented commercial database products, and I'd like to see PostgreSQL 
continue to displace those as we reach feature parity in those areas.


Any way you implement finer grained user permissions and auditing 
features will be considered a new attack vector when you use those 
features.  The way the proposed RLS feature inserts an arbitrary 
function for reads has a similar new attack vector when you use that 
feature.


I'm kind of surprised to see this turn into a hot button all of the 
sudden though, because my thought on all that so far has been a giant so 
what?  This is what PostgreSQL does.


You wanna write your own C code and then link the thing right into the 
server, so that bugs can expose data and crash the whole server?  Not 
only can you shoot yourself in the foot that way, we supply a sample gun 
and bullets in contrib.  How about writing arbitrary code in any one of 
a dozen server-side languages of wildly varying quality, then hooking 
that code so it runs as a trigger function whenever you change a row?  
PostgreSQL is *on it*; we love letting people write some random thing, 
and then running that random thing against your data as a side-effect of 
doing an operation.  And if you like that...just wait until you learn 
about this half-assed rules feature we have too!


And when the database breaks because the functions people inserted were 
garbage, that's their fault, not a cause for a CVE.  And when someone 
blindly installs adminpack because it sounded like a pgAdmin 
requirement, lets a monitoring system run as root so it can watch 
pg_stat_activity, and then discovers that pair of reasonable decisions 
suddenly means any fool with monitoring access can call 
pg_file_unlink...that's their fault too. These are powerful tools with 
serious implications, and they're expected to be used by equally serious 
users.


We as a development community do need to put a major amount of work into 
refactoring all of these security mechanisms.  There should be less of 
these embarrassing incidents where bad software design really forced the 
insecure thing to happen, which I'd argue is the case for that 
pg_stat_activity example.  And luckily so far development resources are 
appearing for organizations I know of working in that direction 
recently, as fast as the requirements are rising.  I think there's a 
good outcome at the end of that road.


But let's not act like RLS is a scary bogeyman because it introduces a 
new way to hack the server or get surprising side-effects.  That's 
expected and possibly unavoidable behavior in a feature like this, and 
there are much worse instances of arbitrary function risk throughout the 
core code already.


--
Greg Smith greg.sm...@crunchydatasolutions.com
Chief PostgreSQL Evangelist - http://crunchydatasolutions.com/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] API change advice: Passing plan invalidation info from the rewriter into the planner?

2014-06-12 Thread Stephen Frost
Greg, all,

I will reply to the emails in detail when I get a chance but am out of town
at a funeral, so it'll likely be delayed. I did want to echo my agreement
for the most part with Greg and in particular...

On Thursday, June 12, 2014, Gregory Smith gregsmithpg...@gmail.com wrote:

 On 6/11/14, 10:26 AM, Robert Haas wrote:

 Now, as soon as we introduce the concept that selecting from a table
 might not really mean read from the table but read from the table after
 applying this owner-specified qual, we're opening up a whole new set of
 attack surfaces. Every pg_dump is an opportunity to hack somebody else's
 account, or at least audit their activity.


 I'm in full agreement we should clearly communicate the issues around
 pg_dump in particular, because they can't necessarily be eliminated
 altogether without some major work that's going to take a while to finish.
  And if the work-around is some sort of GUC for killing RLS altogether,
 that's ugly but not unacceptable to me as a short-term fix.


A GUC which is enable / disable / error-instead may work quiet well, with
error-instead for pg_dump default if people really want it (there would
have to be a way to disable that though, imv).

Note that enable is default in general, disable would be for superuser only
(or on start-up) to disable everything, and error-instead anyone could use
but it would error instead of implementing RLS when querying an RLS-enabled
table.

This approach was suggested by an existing user testing out this RLS
approach, to be fair, but it looks pretty sane to me as a way to address
some of these concerns. Certainly open to other ideas and thoughts though.

Thanks,

Stephen


Re: [HACKERS] API change advice: Passing plan invalidation info from the rewriter into the planner?

2014-06-11 Thread Robert Haas
On Tue, Jun 10, 2014 at 7:18 PM, Craig Ringer cr...@2ndquadrant.com wrote:
 On 06/11/2014 02:19 AM, Tom Lane wrote:
 Hm ... I'm not following why we'd need a special case for superusers and
 not anyone else?  Seems like any useful RLS scheme is going to require
 more privilege levels than just superuser and not-superuser.

 What it really needs is to invalidate plans when switching between
 RLS-enabled and RLS-exempt users, yes. I'm sure we'll want an RLS
 exempt right or mode sooner rather than later, so I'm against tying
 this explicitly to superuser as such.

 I wouldn't be surprised to see

 SET ROW SECURITY ON|OFF

 down the track, with a right controlling whether you can or not. Or at
 least, a right that directly exempts a user from row security.

I'm really concerned about the security implications of this patch.  I
think we're setting ourselves up for a whole lot of hurt for somewhat
unclear gain.

In my view, commit 842faa714c0454d67e523f5a0b6df6500e9bc1a5 basically
*is* row-level security: instead of applying a row-level security
policy to a table, just create a security-barrier view over the table
and grant access to the view.  Forget that the table ever existed.
Done.

With this approach, there's a lot of stuff that we don't have to
reinvent.  We've talked a lot about whether row-level security should
only be concerned with the rows it scans, or whether it should also
restrict the new rows that can be created.  You can get either
behavior by choosing whether or not to use WITH CHECK OPTION.  And
then there's this question of who should be RLS-exempt; that's
basically a question of to whom you grant privileges on the underlying
table.  Note that this can be very fine-grained: for example, you can
allow someone to exempt themselves for selects but not for updates by
granting them SELECT privileges but not UPDATE privileges on the
underlying table.  And potentially-exempt users can choose whether
they want a particular access to actually be exempt by targeting the
view when they don't want to be exempt and the table when they do.
That's mighty useful for debugging, at least IMHO.  And, if you want
to have several row-level security policies for different classes of
users, just create more than one view and grant different privileges
on each.

By contrast, it seems to me that every design so far proposed for
something that is actually called row-level security - as opposed to
commit 842faa714c0454d67e523f5a0b6df6500e9bc1a5, which *really is*
row-level security, is extremely limited.  Look back at all the things
listed in the previous paragraph; can you do those things easily with
the designs that have been proposed?  As far as I can see, not really.
Your (Craig's) rls-9.4-upd-sb-views patch seems to have a rough
equivalent of WITH CHECK OPTION, probably because we've talked a lot
about that specific issue, but it doesn't line up exactly to what WITH
CHECK OPTION actually does.  There's no independently-grantable
RLS-exemption privilege - and even when we talk about that, it's
usually some kind of global bit that applies to all tables and all
operations equally - whereas with the above approach it can be
per-table and per-operation and doesn't require superuser intervention
to flip the bit.  There's no way for users who are RLS exempt to turn
off their exemption for testing purposes, let alone on a per-table
basis.  There's no way to have multiple RLS policies on a single
table.  All of those are things that we get for free in the
view-over-table model, and implementing formal RLS basically requires
us to either invent a new RLS-specific way of doing each of those
things, or suffer along with a subset of the functionality.  Yuck.

But what's really awful about this whole design is that it breaks the
invariant that reading from a table doesn't run anybody else's code.
It's already the case that users need to be awfully careful about
modifying tables, because that might fire triggers that do bad things.
But at least you can SELECT from a table and it will either work, or
it will fail with a permission denied error.  What it will not do is
unexpectedly run some code that you weren't expecting it to run.  You
can't be so blithe about selecting from views, but reading a plain
table is always OK.  Now, as soon as we introduce the concept that
selecting from a table might not really mean read from the table but
read from the table after applying this owner-specified qual, we're
opening up a whole new set of attack surfaces.  Every pg_dump is an
opportunity to hack somebody else's account, or at least audit their
activity.  Protecting the superuser against everybody else is nice,
but I think it's just as important to protect non-superusers against
each other, and I think that's going to be hard -- because in the RLS
world, SELECT * FROM tab is now *fundamentally* ambiguous.  Maybe it's
reading from the table, and maybe it's really clandestinely reading
from a view over the table, and the user has no way 

Re: [HACKERS] API change advice: Passing plan invalidation info from the rewriter into the planner?

2014-06-11 Thread Stephen Frost
* Craig Ringer (cr...@2ndquadrant.com) wrote:
 On 06/11/2014 02:19 AM, Tom Lane wrote:
  Hm ... I'm not following why we'd need a special case for superusers and
  not anyone else?  Seems like any useful RLS scheme is going to require
  more privilege levels than just superuser and not-superuser.
 
 What it really needs is to invalidate plans when switching between
 RLS-enabled and RLS-exempt users, yes. I'm sure we'll want an RLS
 exempt right or mode sooner rather than later, so I'm against tying
 this explicitly to superuser as such.

That certainly sounds reasonable to me, but the point is we're just
looking to see if the current role executing the plan should or should
not have RLS applied and, if it's changing, we need to re-plan.  We
don't need to actually track an independent plan for each and every user
executing the plan, which means that the plan cache can be largely left
alone.

 I wouldn't be surprised to see
 
 SET ROW SECURITY ON|OFF
 
 down the track, with a right controlling whether you can or not. Or at
 least, a right that directly exempts a user from row security.

Agreed, but doing a re-planning in that case seems reasonable to me.  I
find it pretty unlikely that there will be a lot of critical path cases
of the same plan flipping back and forth between a role for which RLS is
applied and a role where it shouldn't be.

  Could we put the if superuser then ok test into the RLS condition test
  and thereby not need more than one plan at all?
 
 Only if we put it in another level of security barrier subquery, because
 otherwise the planner might execute the other quals (including possible
 user defined functions) before the superuser test. Which was the whole
 reason for the superuser test in the first place.

Yeah, I'm not a big fan of this and it certainly seems a simpler
approach to just force a re-plan.  We're talking about a query which
has been prepared and then is being executed by different roles, some
of which are RLS enabled and some which are RLS exempt.  That just
strikes me as pretty unlikely to happen and if it does become an issue,
a user could work around it by having two different plans prepared and
making sure that they are called from the appropriate roles to avoid the
replanning.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] API change advice: Passing plan invalidation info from the rewriter into the planner?

2014-06-11 Thread Stephen Frost
* Craig Ringer (cr...@2ndquadrant.com) wrote:
 On 06/11/2014 07:24 AM, Tom Lane wrote:
  Is the point of that that the table owner might have put trojan-horse
  functions into the RLS qual?  If so, why are we only concerned about
  defending the superuser and not other users?  Seems like the right fix
  would be to insist that functions in the RLS qual run as the table owner.
  Granted, that might be painful to do.  But it still seems like we only
  need to do this for superusers is designing with blinkers on.
 
 I agree, and now that the urgency of trying to deliver this for 9.4 is
 over it's worth seeing if we can just run as table owner.

We'll need to work out how to ensure that things like current_user()
still returns the calling user in that case, otherwise it won't make any
sense.  In general, I agree that having the RLS quals run as the table
owner is a good approach and would love to hear suggestions about how we
can make that happen.

 Failing that, we could take the approach a certain other RDBMS does and
 make the ability to define row security quals a GRANTable right
 initially held only by the superuser.

I don't particularly like this idea- it's akin, to me anyway, to making
the ability to control other permissions on a table (SELECT, INSERT,
etc) something which a user would have to be granted- and it doesn't
really address the issue.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] API change advice: Passing plan invalidation info from the rewriter into the planner?

2014-06-11 Thread Stephen Frost
* Robert Haas (robertmh...@gmail.com) wrote:
 I'm really concerned about the security implications of this patch.  I
 think we're setting ourselves up for a whole lot of hurt for somewhat
 unclear gain.

I'm certainly of a different opinion and, for the most part, I feel that
if there are security concerns then they need to be addressed- and
better by us than by asking users to use some other mechanism to
implement RLS.

 In my view, commit 842faa714c0454d67e523f5a0b6df6500e9bc1a5 basically
 *is* row-level security: instead of applying a row-level security
 policy to a table, just create a security-barrier view over the table
 and grant access to the view.  Forget that the table ever existed.
 Done.

This argument could have been made for column-level privileges also, no?
Yet I don't hear any calls for that to be ripped out now that you could
implement it through updatable security-barrier views.  That commit was
the ground-work to allow us to finally get proper RLS and I'm very
disappointed to hear that the mechanical pieces around making RLS easy
for users to use (and getting that check-box taken care of in a wide
variety of fields that we are being exposed to now, see the PGConf.NYC
keynote speakers...) is receiving such push-back.

 With this approach, there's a lot of stuff that we don't have to
 reinvent.  We've talked a lot about whether row-level security should
 only be concerned with the rows it scans, or whether it should also
 restrict the new rows that can be created.  You can get either
 behavior by choosing whether or not to use WITH CHECK OPTION.  And
 then there's this question of who should be RLS-exempt; that's
 basically a question of to whom you grant privileges on the underlying
 table.  Note that this can be very fine-grained: for example, you can
 allow someone to exempt themselves for selects but not for updates by
 granting them SELECT privileges but not UPDATE privileges on the
 underlying table.  And potentially-exempt users can choose whether
 they want a particular access to actually be exempt by targeting the
 view when they don't want to be exempt and the table when they do.

I agree that views, or even security-definer functions, offer a great
deal of flexibility, and that may be necessary in some use-cases, but I
fail to see why that means we should avoid providing the mechanics to
achieve simple and usable RLS akin to what other major RDBMS's have.

 That's mighty useful for debugging, at least IMHO.  And, if you want
 to have several row-level security policies for different classes of
 users, just create more than one view and grant different privileges
 on each.

I'm really not impressed with the idea that RLS should be done with
multiple different views of the same underlying table.

 By contrast, it seems to me that every design so far proposed for
 something that is actually called row-level security - as opposed to
 commit 842faa714c0454d67e523f5a0b6df6500e9bc1a5, which *really is*
 row-level security, is extremely limited.  Look back at all the things
 listed in the previous paragraph; can you do those things easily with
 the designs that have been proposed?  As far as I can see, not really.

I don't feel that RLS will, or even *should*, have the same level of
flexibility that you can achieve with views and/or security definer
functions.  I expect that, over time, we will add more capabilities to
it, but it's never going to be able to redefine the contents of a column
as a view can, nor will it be able to add columns to a table as views
can.  I don't see those as reasons against having support for RLS.

 Your (Craig's) rls-9.4-upd-sb-views patch seems to have a rough
 equivalent of WITH CHECK OPTION, probably because we've talked a lot
 about that specific issue, but it doesn't line up exactly to what WITH
 CHECK OPTION actually does.  There's no independently-grantable
 RLS-exemption privilege - and even when we talk about that, it's
 usually some kind of global bit that applies to all tables and all
 operations equally - whereas with the above approach it can be
 per-table and per-operation and doesn't require superuser intervention
 to flip the bit.  

I'm glad to hear your thoughts on the level of granularity which might
be nice to have with RLS.  What would be great is to spend a bit more
time reviewing what other systems provide in this area and considering
what makes sense for us.  This will also be a feature and an area which
we will be improving for a long time to come, but we do need this
capability and we have to start somewhere.

 There's no way for users who are RLS exempt to turn
 off their exemption for testing purposes, let alone on a per-table
 basis.  

I don't follow this argument entirely- users can't turn off the existing
permissions system for testing either, unless an authorized user with
the correct permissions makes the change to allow it- or the user bumps
themselves up to superuser, or to a role which has broader permissions,
both of 

Re: [HACKERS] API change advice: Passing plan invalidation info from the rewriter into the planner?

2014-06-11 Thread Tom Lane
Stephen Frost sfr...@snowman.net writes:
 * Robert Haas (robertmh...@gmail.com) wrote:
 I'm really concerned about the security implications of this patch.  I
 think we're setting ourselves up for a whole lot of hurt for somewhat
 unclear gain.

 I'm certainly of a different opinion and, for the most part, I feel that
 if there are security concerns then they need to be addressed- and
 better by us than by asking users to use some other mechanism to
 implement RLS.

TBH, I found Robert's argument pretty persuasive.  The idea that
SELECT * FROM table might invoke arbitrary processing ought to scare
anyone who's concerned about security, because that's going to completely
break any assumptions about pg_dump being safe for instance, as well as
force top-to-bottom rethinking of many other security assumptions.

 ...  That commit was
 the ground-work to allow us to finally get proper RLS and I'm very
 disappointed to hear that the mechanical pieces around making RLS easy
 for users to use (and getting that check-box taken care of in a wide
 variety of fields that we are being exposed to now, see the PGConf.NYC
 keynote speakers...) is receiving such push-back.

If this is being sold as merely ease of use, then it is probably going
to get rejected.  In order to get some extra ease of use for the minority
of users who need RLS, you are going to significantly complicate the lives
of all Postgres users.  That's not a net win in any sane calculation of
ease of use.

Maybe the right thing to think about is how we can make it easier to set
up table + view combinations according to the pattern Robert described.
I wouldn't have a problem with some more-or-less-automated support for
doing that.  (Consider SERIAL as a possible precedent here: it's basically
a table creation macro.)

 You're suggesting that we use views instead, which clearly could run
 someone else's code.  Perhaps the user will notice that they're
 selecting from a view instead of a table, but I've never seen a security
 design around making sure that what is being select'd from is a table
 vs. a view.

pg_dump is a sufficient counterexample to that statement.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] API change advice: Passing plan invalidation info from the rewriter into the planner?

2014-06-11 Thread Robert Haas
On Wed, Jun 11, 2014 at 12:23 PM, Stephen Frost sfr...@snowman.net wrote:
 In my view, commit 842faa714c0454d67e523f5a0b6df6500e9bc1a5 basically
 *is* row-level security: instead of applying a row-level security
 policy to a table, just create a security-barrier view over the table
 and grant access to the view.  Forget that the table ever existed.
 Done.

 This argument could have been made for column-level privileges also, no?

Not really.  First of all, we didn't have security_barrier views at
that time, let alone security barrier views that are auto-updateable.
That's a really important piece of technology which makes filtering
access via views feasible in ways that really were not feasible in the
past.  Secondly, column-level permissions - like every other
currently-existing type of permissions - are declarative.  They are an
additional opportunity for the system to say no to something it
otherwise would have allowed, but no user-defined code is executed.
Row-level security is not a chance for the system to deny access; it's
a chance for user-defined code to take control and perform arbitrary
operations.  So the scope of what we're contemplating for row-level
security is really far, far more invasive than what we did for
column-level privileges.

 I agree that views, or even security-definer functions, offer a great
 deal of flexibility, and that may be necessary in some use-cases, but I
 fail to see why that means we should avoid providing the mechanics to
 achieve simple and usable RLS akin to what other major RDBMS's have.

Because we don't have a good design.

I'm not categorically opposed to adding more RLS features to
PostgreSQL and never have been; in fact, I was deeply involved in the
original design of security barrier views and committed the original
patch to add that functionality to PostgreSQL, without which none of
what we're talking about here would be possible.  But the
currently-proposed design is very unappealing to me, for the reasons
that I've explained.  The right answer to this feature doesn't
provide anything that we don't already have and will introduce major
new security exposures that haven't been adequate thought is
debatable, but well other people have this so we should too is
definitely not it.  Craig's patch really hasn't grappled with any of
these thorny definition and security issues; it's just about making
the basic functionality work.  That's fine for a POC, but it's not
enough for a feature that the project would be committing to maintain
for the indefinite future.

 That's mighty useful for debugging, at least IMHO.  And, if you want
 to have several row-level security policies for different classes of
 users, just create more than one view and grant different privileges
 on each.

 I'm really not impressed with the idea that RLS should be done with
 multiple different views of the same underlying table.

Are you equally unimpressed with the idea that RLS as proposed can't
support more than one security policy right now *at all*?  Because it
seems to me that either you think multiple RLS policies on a single
table is important (in which case the current patch is inadequate) or
you think it's not important (in which case we need not argue about
whether doing it with multiple views over the same underlying table is
awkward).

 By contrast, it seems to me that every design so far proposed for
 something that is actually called row-level security - as opposed to
 commit 842faa714c0454d67e523f5a0b6df6500e9bc1a5, which *really is*
 row-level security, is extremely limited.  Look back at all the things
 listed in the previous paragraph; can you do those things easily with
 the designs that have been proposed?  As far as I can see, not really.

 I don't feel that RLS will, or even *should*, have the same level of
 flexibility that you can achieve with views and/or security definer
 functions.  I expect that, over time, we will add more capabilities to
 it, but it's never going to be able to redefine the contents of a column
 as a view can, nor will it be able to add columns to a table as views
 can.  I don't see those as reasons against having support for RLS.

What this patch is doing is basically allowing a table to really be a
view over itself.  If we choose to support that, I think it is
absolutely inevitable that people are going to want all the same
options that they would have if they really made a separate view -
separate permissions, WITH CHECK OPTION, all of it.  I find the
contrary argument - that people will only want X amount and no more -
simply not plausible.  If it's valuable to have some of those
capabilities in an RLS framework, somebody's going to want all of
them.  There's no bright line to divide the things that are valuable
in that context from those that aren't.

 I'm glad to hear your thoughts on the level of granularity which might
 be nice to have with RLS.  What would be great is to spend a bit more
 time reviewing what other systems provide in 

Re: [HACKERS] API change advice: Passing plan invalidation info from the rewriter into the planner?

2014-06-11 Thread Stephen Frost
Tom,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
 Stephen Frost sfr...@snowman.net writes:
  * Robert Haas (robertmh...@gmail.com) wrote:
  I'm really concerned about the security implications of this patch.  I
  think we're setting ourselves up for a whole lot of hurt for somewhat
  unclear gain.
 
  I'm certainly of a different opinion and, for the most part, I feel that
  if there are security concerns then they need to be addressed- and
  better by us than by asking users to use some other mechanism to
  implement RLS.
 
 TBH, I found Robert's argument pretty persuasive.  The idea that
 SELECT * FROM table might invoke arbitrary processing ought to scare
 anyone who's concerned about security, because that's going to completely
 break any assumptions about pg_dump being safe for instance, as well as
 force top-to-bottom rethinking of many other security assumptions.

SELECT triggers for a wide variety of use-cases are pretty commonly
asked for here and are something I'd like to see us support also.  There
are also quite a few ways in which a select can end up executing code.
Today it requires more than 'select * from table;', but not very
much..  I agree that it'd be good if we had a way to address that but I
continue to view that as an independent issue.

What I haven't heard any comments on, yet found interesting, was the
idea of having the RLS quals run as the owner of the table.  Would that
address these concerns?  I seem to recall wondering why we didn't do
that for views in the first place, though I doubt we could change it
now even if we wanted to (and I'm guessing the spec has something to say
about this, though I haven't gone and looked and don't remember
offhand).  It's certainly rather curious that functions called under a
view are run as the calling user while permissions checks on relations
referred to by the view are as the view owner.

Hopefully that will make the rest of this discussion less relevant, but
I'll respond with my feelings anyway.

  ...  That commit was
  the ground-work to allow us to finally get proper RLS and I'm very
  disappointed to hear that the mechanical pieces around making RLS easy
  for users to use (and getting that check-box taken care of in a wide
  variety of fields that we are being exposed to now, see the PGConf.NYC
  keynote speakers...) is receiving such push-back.
 
 If this is being sold as merely ease of use, then it is probably going
 to get rejected.  In order to get some extra ease of use for the minority
 of users who need RLS, you are going to significantly complicate the lives
 of all Postgres users.  That's not a net win in any sane calculation of
 ease of use.

I don't view this as being at all accurate- how is this complicating the
lives of all Postgres users?  If they are worried about running user
defined code then they *already* have a lot to worry about.

While the users of RLS might be less than 50% and therefore the
minority, I expect it will have quite a bit of up-take in certain
industries and I know that our lack of any RLS is currently preventing
use of Postgres in some rather important cases.

As for it being ease-of-use, again, there are ways in which column level
privileges could have been dealt with using views, rules, security
definer functions, etc, but that doesn't mean we don't want that
feature.  I certainly view RLS (and have for quite some time..) as a much
needed capability, even if it can be done today using a bunch of user
written code that must be security audited.

 Maybe the right thing to think about is how we can make it easier to set
 up table + view combinations according to the pattern Robert described.

While this sounds interesting, I don't see adding columns or redefining
them as being in the perview of RLS.  The current approach of
allowing a boolean expression to be defined is both extremely flexible
while also being simple when the requirement is simple.  Having to
create, manage, update, etc, an independent object would add unnecessary
complexity.

Perhaps having it be a boolean expression is too much flexibility but
the alternatives that I can think of aren't terribly attractive to me
and the boolean expression approach is what folks coming from other
RDBMS's will be familiar with and understand how to build their
applications around.  We may need to provide some additional pieces
around this (perhaps a trigger-like function type which also gets
information about the object being queried, etc) but the point is to
have a straight-forward and simply reasoned about way of limiting what
data is returned.

 I wouldn't have a problem with some more-or-less-automated support for
 doing that.  (Consider SERIAL as a possible precedent here: it's basically
 a table creation macro.)

Perhaps there's a way to make that work, but personally it looks like a
whole bunch more work and I don't see the gain.  How would adding RLS to
an existing table work?  It's worse than the SERIAL case as at least
a default clause can be 

Re: [HACKERS] API change advice: Passing plan invalidation info from the rewriter into the planner?

2014-06-11 Thread Stephen Frost
* Robert Haas (robertmh...@gmail.com) wrote:
 On Wed, Jun 11, 2014 at 12:23 PM, Stephen Frost sfr...@snowman.net wrote:
  This argument could have been made for column-level privileges also, no?
 
 Not really.  First of all, we didn't have security_barrier views at
 that time, let alone security barrier views that are auto-updateable.

We had security definer functions, even set-returning ones, along with
rules and triggers.

 That's a really important piece of technology which makes filtering
 access via views feasible in ways that really were not feasible in the
 past.  Secondly, column-level permissions - like every other
 currently-existing type of permissions - are declarative.  They are an
 additional opportunity for the system to say no to something it
 otherwise would have allowed, but no user-defined code is executed.

We could try to avoid calling user-defined code for RLS, but it'd add a
whole lot of complexity and as far as I can see and your proposed
solution isn't avoiding the user-defined code anyway, so I'm not sure
why this solution should be required to meet that.

 Row-level security is not a chance for the system to deny access; it's
 a chance for user-defined code to take control and perform arbitrary
 operations.  So the scope of what we're contemplating for row-level
 security is really far, far more invasive than what we did for
 column-level privileges.

In this case the user-defined code needs to return a boolean.  We don't
currently do anything to prevent it from having side-effects, no, but
the same is true with views which incorporate functions.  I agree that
it makes a difference when compared to column-level privileges, but my
point was that we have provided easier ways to do things which were
possible using more complicated methods before.  Perhaps the risk with
RLS is higher but these issues look managable to me and the level of
doubt about our ability to provide this feature in a reasonable and
principled way that our users will understand surprises me.

  I agree that views, or even security-definer functions, offer a great
  deal of flexibility, and that may be necessary in some use-cases, but I
  fail to see why that means we should avoid providing the mechanics to
  achieve simple and usable RLS akin to what other major RDBMS's have.
 
 Because we don't have a good design.

We're using a design that's found in multiple other RDBMS's and used
extensively by certain industries which use those RDBMS's today.  I'm
certainly open to improving what is found in other systems for PG but I
have a hard time seeing this approach as a bad design.  Perhaps you're
referring to our implementation, in which case I might agree and things
like running the quals as the table owner is something which should be
considered (I don't know how the other RDBMS's operate in this regard
offhand- it'd be good to find out).

 I'm not categorically opposed to adding more RLS features to
 PostgreSQL and never have been; in fact, I was deeply involved in the
 original design of security barrier views and committed the original
 patch to add that functionality to PostgreSQL, without which none of
 what we're talking about here would be possible.  But the
 currently-proposed design is very unappealing to me, for the reasons
 that I've explained.  The right answer to this feature doesn't
 provide anything that we don't already have and will introduce major
 new security exposures that haven't been adequate thought is
 debatable, but well other people have this so we should too is
 definitely not it.

How about it's in high demand by our user base?  In particular, it's
being asked for by a *highly* technical section of our user base who
uses these capabilities today, with this design, in those other
databases.

 Craig's patch really hasn't grappled with any of
 these thorny definition and security issues; it's just about making
 the basic functionality work.  That's fine for a POC, but it's not
 enough for a feature that the project would be committing to maintain
 for the indefinite future.

Improving the patch is exactly what I'd like to do, but throwing out the
notion that RLS can't be allowed to execute user-defined code is cutting
the legs out of the feature completely- particularly with our system
where users can create all manner of objects in the system with their
own code being run.

  That's mighty useful for debugging, at least IMHO.  And, if you want
  to have several row-level security policies for different classes of
  users, just create more than one view and grant different privileges
  on each.
 
  I'm really not impressed with the idea that RLS should be done with
  multiple different views of the same underlying table.
 
 Are you equally unimpressed with the idea that RLS as proposed can't
 support more than one security policy right now *at all*?  Because it
 seems to me that either you think multiple RLS policies on a single
 table is important (in which case the current patch is 

Re: [HACKERS] API change advice: Passing plan invalidation info from the rewriter into the planner?

2014-06-11 Thread Robert Haas
On Wed, Jun 11, 2014 at 8:59 PM, Stephen Frost sfr...@snowman.net wrote:
 Row-level security is not a chance for the system to deny access; it's
 a chance for user-defined code to take control and perform arbitrary
 operations.  So the scope of what we're contemplating for row-level
 security is really far, far more invasive than what we did for
 column-level privileges.

 In this case the user-defined code needs to return a boolean.  We don't
 currently do anything to prevent it from having side-effects, no, but
 the same is true with views which incorporate functions.  I agree that
 it makes a difference when compared to column-level privileges, but my
 point was that we have provided easier ways to do things which were
 possible using more complicated methods before.  Perhaps the risk with
 RLS is higher but these issues look managable to me and the level of
 doubt about our ability to provide this feature in a reasonable and
 principled way that our users will understand surprises me.

I'm glad the issues look manageable to you, but you haven't really
explained how to manage them.  The way to dispel doubt is to come up
with specific technical proposals that address the technical issues
that have been raised.  I accept that you are surprised that someone
might not think we are on the right course here, but it's entirely
appropriate for me to express my doubts about this or any other patch,
much as many people do in regards to many patches that are posted here
- generally for good and valid reasons.

For my part, I'm mildly surprised that anyone thinks it's a good idea
to have SELECT * FROM tab to mean different things depending on who is
typing it.  To me, that seems very confusing; how does an unprivileged
user with no ability to assume some other role validate that the row
security policy they've configured works at all and exposes precisely
the intended set of rows?  Even aside from security exposures, how
does a non-superuser who runs pg_dump know whether they've got a
complete backup or a filtered dump that's missing some rows?  A
filtered dump might not even be restorable if foreign keys are
involved.  I think those are serious issues that deserve serious
thought and consideration, not just a vague assurance that the issues
are probably manageable.

 Because we don't have a good design.

 We're using a design that's found in multiple other RDBMS's and used
 extensively by certain industries which use those RDBMS's today.  I'm
 certainly open to improving what is found in other systems for PG but I
 have a hard time seeing this approach as a bad design.  Perhaps you're
 referring to our implementation, in which case I might agree and things
 like running the quals as the table owner is something which should be
 considered (I don't know how the other RDBMS's operate in this regard
 offhand- it'd be good to find out).

I'm not referring to the proposed implementation particularly; or at
least not that aspect of it.  I don't think trying to run the view
quals as the defining user is likely to be very appealing, because I
think it's going to hurt performance, for example by preventing
function inlining and requiring lots of user-ID switches.  But I'm not
gonna complain if someone wants to mull it over and make a proposal
for how to make it work.  Rather, my concern is that all we've got is
what might be called the core of the feature; the actual guts of it.
There are a lot of ancillary details that seem to me to be not worked
out at all yet, or only half-baked.

 I'm not categorically opposed to adding more RLS features to
 PostgreSQL and never have been; in fact, I was deeply involved in the
 original design of security barrier views and committed the original
 patch to add that functionality to PostgreSQL, without which none of
 what we're talking about here would be possible.  But the
 currently-proposed design is very unappealing to me, for the reasons
 that I've explained.  The right answer to this feature doesn't
 provide anything that we don't already have and will introduce major
 new security exposures that haven't been adequate thought is
 debatable, but well other people have this so we should too is
 definitely not it.

 How about it's in high demand by our user base?  In particular, it's
 being asked for by a *highly* technical section of our user base who
 uses these capabilities today, with this design, in those other
 databases.

Sure, that's a valid reason for considering any feature.  But it's not
an excuse to overlook whatever design problems may exist.

 Are you equally unimpressed with the idea that RLS as proposed can't
 support more than one security policy right now *at all*?  Because it
 seems to me that either you think multiple RLS policies on a single
 table is important (in which case the current patch is inadequate) or
 you think it's not important (in which case we need not argue about
 whether doing it with multiple views over the same underlying table is
 awkward).

 

Re: [HACKERS] API change advice: Passing plan invalidation info from the rewriter into the planner?

2014-06-10 Thread Adam Brightwell
Hi all,

This is my first post to the mailing list and I am looking forward to
working with everyone in the community.

With that said...

I'll take a look at changing the cache key to include user ID and
 ripping out the plan invalidation logic from the current patch tomorrow
 but I seriously doubt I'll be able to get all of that done in the next
 day or two.  If anyone else is able to help out, it'd certainly be
 appreciated; I really think that's the main hurdle to address at this
 point with this patch- without the plan invalidation complexity, the
 the patch is really just building out the catalog, the SQL-level
 statements for managing it, and the bit of code required to add the
 conditional to statements involving RLS-enabled tables.


I have been collaborating with Stephen on addressing this particular item
with RLS.

As a basis, I have been working with Craig's 'rls-9.4-upd-sb-views' branch
rebased against master around 9.4beta1.

Through this effort, we have concluded that for RLS the case of
invalidating a plan is only necessary when switching between a superuser
and a non-superuser.  Obviously, re-planning on every role change would be
too costly, but this approach should help minimize that cost.  As well,
there were not any cases outside of this one that were immediately apparent
with respect to RLS that would require re-planning on a per userid basis.

I have tested this approach with the following patch.

https://github.com/abrightwell/postgres/commit/4c959e63f7a89b24ebbd46575a31a629d24efa75

Does this sound like a sane approach?  Thoughts?  Recommendations?

Thanks,
Adam


Re: [HACKERS] API change advice: Passing plan invalidation info from the rewriter into the planner?

2014-06-10 Thread Tom Lane
Adam Brightwell adam.brightw...@crunchydatasolutions.com writes:
 Through this effort, we have concluded that for RLS the case of
 invalidating a plan is only necessary when switching between a superuser
 and a non-superuser.  Obviously, re-planning on every role change would be
 too costly, but this approach should help minimize that cost.  As well,
 there were not any cases outside of this one that were immediately apparent
 with respect to RLS that would require re-planning on a per userid basis.

Hm ... I'm not following why we'd need a special case for superusers and
not anyone else?  Seems like any useful RLS scheme is going to require
more privilege levels than just superuser and not-superuser.

Could we put the if superuser then ok test into the RLS condition test
and thereby not need more than one plan at all?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] API change advice: Passing plan invalidation info from the rewriter into the planner?

2014-06-10 Thread Brightwell, Adam
Hey Tom,


 Hm ... I'm not following why we'd need a special case for superusers and
 not anyone else?  Seems like any useful RLS scheme is going to require
 more privilege levels than just superuser and not-superuser.


As it stands right now, superuser is the only case where RLS policies
should not be applied/completely ignored.  I suppose it is possible to
create RLS policies that are related to other privilege levels, but those
would still need to be applied despite user id, excepting superuser.  I'll
defer to Stephen or Craig on the usefulness of this scheme.

Could we put the if superuser then ok test into the RLS condition test
 and thereby not need more than one plan at all?


As I understand it, the application of RLS policies occurs in the rewriter.
 Therefore, when switching back and forth between superuser and
not-superuser the query must be rewritten, which would ultimately result in
the need for a new plan correct?  If that is the case, then I am not sure
how one plan is possible.  However, again, I'll have to defer to Stephen or
Craig on this one.

Thanks,
Adam


Re: [HACKERS] API change advice: Passing plan invalidation info from the rewriter into the planner?

2014-06-10 Thread Craig Ringer
On 06/11/2014 02:19 AM, Tom Lane wrote:
 Hm ... I'm not following why we'd need a special case for superusers and
 not anyone else?  Seems like any useful RLS scheme is going to require
 more privilege levels than just superuser and not-superuser.

What it really needs is to invalidate plans when switching between
RLS-enabled and RLS-exempt users, yes. I'm sure we'll want an RLS
exempt right or mode sooner rather than later, so I'm against tying
this explicitly to superuser as such.

I wouldn't be surprised to see

SET ROW SECURITY ON|OFF

down the track, with a right controlling whether you can or not. Or at
least, a right that directly exempts a user from row security.

 Could we put the if superuser then ok test into the RLS condition test
 and thereby not need more than one plan at all?

Only if we put it in another level of security barrier subquery, because
otherwise the planner might execute the other quals (including possible
user defined functions) before the superuser test. Which was the whole
reason for the superuser test in the first place.



-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] API change advice: Passing plan invalidation info from the rewriter into the planner?

2014-06-10 Thread Tom Lane
Craig Ringer cr...@2ndquadrant.com writes:
 On 06/11/2014 02:19 AM, Tom Lane wrote:
 Could we put the if superuser then ok test into the RLS condition test
 and thereby not need more than one plan at all?

 Only if we put it in another level of security barrier subquery, because
 otherwise the planner might execute the other quals (including possible
 user defined functions) before the superuser test. Which was the whole
 reason for the superuser test in the first place.

Is the point of that that the table owner might have put trojan-horse
functions into the RLS qual?  If so, why are we only concerned about
defending the superuser and not other users?  Seems like the right fix
would be to insist that functions in the RLS qual run as the table owner.
Granted, that might be painful to do.  But it still seems like we only
need to do this for superusers is designing with blinkers on.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] API change advice: Passing plan invalidation info from the rewriter into the planner?

2014-06-10 Thread Craig Ringer
On 06/11/2014 07:24 AM, Tom Lane wrote:
 Is the point of that that the table owner might have put trojan-horse
 functions into the RLS qual?  If so, why are we only concerned about
 defending the superuser and not other users?  Seems like the right fix
 would be to insist that functions in the RLS qual run as the table owner.
 Granted, that might be painful to do.  But it still seems like we only
 need to do this for superusers is designing with blinkers on.

I agree, and now that the urgency of trying to deliver this for 9.4 is
over it's worth seeing if we can just run as table owner.

Failing that, we could take the approach a certain other RDBMS does and
make the ability to define row security quals a GRANTable right
initially held only by the superuser.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] API change advice: Passing plan invalidation info from the rewriter into the planner?

2014-06-10 Thread Tom Lane
Craig Ringer cr...@2ndquadrant.com writes:
 On 06/11/2014 07:24 AM, Tom Lane wrote:
 Is the point of that that the table owner might have put trojan-horse
 functions into the RLS qual?  If so, why are we only concerned about
 defending the superuser and not other users?  Seems like the right fix
 would be to insist that functions in the RLS qual run as the table owner.
 Granted, that might be painful to do.  But it still seems like we only
 need to do this for superusers is designing with blinkers on.

 I agree, and now that the urgency of trying to deliver this for 9.4 is
 over it's worth seeing if we can just run as table owner.

 Failing that, we could take the approach a certain other RDBMS does and
 make the ability to define row security quals a GRANTable right
 initially held only by the superuser.

Hmm ... that might be a workable compromise.  I think the main issue here
is whether we expect that RLS quals will be something that the planner
could optimize to any meaningful extent.  If they're always (in effect)
wrapped in SECURITY DEFINER functions, I think that largely blocks any
optimizations; but maybe that wouldn't matter in practice.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] API change advice: Passing plan invalidation info from the rewriter into the planner?

2014-04-24 Thread Craig Ringer
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 04/15/2014 10:06 AM, Stephen Frost wrote:
 I've uploaded the latest patch, rebased against master, with my
 changes to here: http://snowman.net/~sfrost/rls_ringerc_sf.patch.gz
 as I don't believe it'd clear the mailing list (it's 29k).

Does this exist in the form of an accessible git branch, too?

I was trying to maintain the patch as a series of distinct changes to
make it easier to see what each part is doing, and it'd be nice to
preserve that if possible. It also makes seeing what's changed a lot
easier.

- -- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
-BEGIN PGP SIGNATURE-
Version: GnuPG v1
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQEcBAEBAgAGBQJTWbGNAAoJELBXNkqjr+S28W4H/R49CJfz4Y3TMbvwxhrwkjL2
WEv80qY4GDCzG5CGKROn3kT9H5xePvL9eadSjr+CPsilerHrPkHmXnU5w+K2LnKV
MCL/A2969b4ng1cUK9eHEFVx9BLLQmiVI6DbJ2OA2oWUs/Y7Zne5h6q0fNnnnTSq
XEU6r3tVkUp5ipbhHi+aJ+mfckirdcMR0U5X+2fgGpLZ3D+8j9azvuXvQjSOekVB
3+EVVI0UXhhvw4It4/1CjieHvScdxnsz9bOpKGiEeePUB3CGC0iPtBgIGtE0n2OK
cqKryuwZ3++LZih74M8z+Rn6yao5f4ElJrO3gz5q8axKzH/bHkEYElwEUhVfbSE=
=AKzL
-END PGP SIGNATURE-


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] API change advice: Passing plan invalidation info from the rewriter into the planner?

2014-04-24 Thread Stephen Frost
* Craig Ringer (cr...@2ndquadrant.com) wrote:
 On 04/15/2014 10:06 AM, Stephen Frost wrote:
  I've uploaded the latest patch, rebased against master, with my
  changes to here: http://snowman.net/~sfrost/rls_ringerc_sf.patch.gz
  as I don't believe it'd clear the mailing list (it's 29k).
 
 Does this exist in the form of an accessible git branch, too?

Eh, no.

 I was trying to maintain the patch as a series of distinct changes to
 make it easier to see what each part is doing, and it'd be nice to
 preserve that if possible. It also makes seeing what's changed a lot
 easier.

Yeah, I almost just posted a patch against your tree.  I'll look at
doing that tomorrow.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] API change advice: Passing plan invalidation info from the rewriter into the planner?

2014-04-14 Thread Stephen Frost
Craig, Tom, all,

I've been through the RLS code over the past couple of days which I
pulled from Craig's repo and have a bunch of minor updates.  In general,
the patch seems pretty reasonable- except for the issues discussed
below.  Quite a bit of this patch is tied up in plan invalidation and
tracking if the security quals depend on the current user, all of which
seems pretty grotty and the wrong way around to me.

* Tom Lane (t...@sss.pgh.pa.us) wrote:
 Craig Ringer cr...@2ndquadrant.com writes:
  It's only that the plan depends on the user ID. There's no point keeping
  the plan around if the user no longer exists.
 
 [ shrug... ]  Leaving such a plan cached would be harmless, though.

Agreed.

 Furthermore, the user ID we'd be talking about is either the owner
 of the current session, or the owner of some view or security-definer
 function that the plan is already dependent on, so it's fairly hard
 to credit that the plan would survive long enough for the issue to
 arise.

I don't entirely follow which 'issue' is being referred to here, but we
need to consider that 'set role' changes should also cause a new plan.

 Even if there is a scenario where invalidating by user ID is actually
 useful, I think adding infrastructure to cause invalidation in such a case
 is optimizing for the wrong thing.  You're adding cycles to every query to
 benefit a case that is going to be quite infrequent in practice.

Yeah, I have a hard time seeing that there's an issue w/ keeping the
cached plans around even if the session never goes back to being under
the user ID for which those older plans were built.  Also, wouldn't a
'RESET ALL' clear any of them anyway?

  Yes, that would be good, but is IMO more of a separate optimization. I'm
  currently using KaiGai's code to invalidate and re-plan when a user ID
  change is detected.
 
 I'm unlikely to accept a patch that does that; wouldn't it be catastrophic
 for performance in the presence of security-definer functions?  You can't
 just trash the whole plan cache when a user ID switch occurs.

Yeah, this doesn't seem like the right approach.  Adding the user ID to
the cache key definitely strikes me as the right way to fix this.

I've uploaded the latest patch, rebased against master, with my changes
to here: http://snowman.net/~sfrost/rls_ringerc_sf.patch.gz as I don't
believe it'd clear the mailing list (it's 29k).

I'll take a look at changing the cache key to include user ID and
ripping out the plan invalidation logic from the current patch tomorrow
but I seriously doubt I'll be able to get all of that done in the next
day or two.  If anyone else is able to help out, it'd certainly be
appreciated; I really think that's the main hurdle to address at this
point with this patch- without the plan invalidation complexity, the
the patch is really just building out the catalog, the SQL-level
statements for managing it, and the bit of code required to add the
conditional to statements involving RLS-enabled tables.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] API change advice: Passing plan invalidation info from the rewriter into the planner?

2014-04-14 Thread Tom Lane
Stephen Frost sfr...@snowman.net writes:
 I've uploaded the latest patch, rebased against master, with my changes
 to here: http://snowman.net/~sfrost/rls_ringerc_sf.patch.gz as I don't
 believe it'd clear the mailing list (it's 29k).

Please actually post it, for the archives' sake.  29k is far below the
list limit.  (Which I don't know exactly what it is ... but certainly
in the hundreds of KB.)

 I'll take a look at changing the cache key to include user ID and
 ripping out the plan invalidation logic from the current patch tomorrow
 but I seriously doubt I'll be able to get all of that done in the next
 day or two.

TBH I think we are up against the deadline.  April 15 was the agreed-to
drop dead date for pushing new features into 9.4.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] API change advice: Passing plan invalidation info from the rewriter into the planner?

2014-04-14 Thread Stephen Frost
* Tom Lane (t...@sss.pgh.pa.us) wrote:
 Stephen Frost sfr...@snowman.net writes:
  I've uploaded the latest patch, rebased against master, with my changes
  to here: http://snowman.net/~sfrost/rls_ringerc_sf.patch.gz as I don't
  believe it'd clear the mailing list (it's 29k).
 
 Please actually post it, for the archives' sake.  29k is far below the
 list limit.  (Which I don't know exactly what it is ... but certainly
 in the hundreds of KB.)

Huh, thought it was more like 25k.  Well, here goes then...

  I'll take a look at changing the cache key to include user ID and
  ripping out the plan invalidation logic from the current patch tomorrow
  but I seriously doubt I'll be able to get all of that done in the next
  day or two.
 
 TBH I think we are up against the deadline.  April 15 was the agreed-to
 drop dead date for pushing new features into 9.4.

Yeah. :/  May be for the best anyway, this should be able to go in early
in the 9.5 cycle and get more testing and refinement.  Still stinks
though as I feel like this patch didn't get the attention it should have
due to a simple misunderstanding, but we do need to stop at some point
to get a release together.

Thanks,

Stephen


rls_ringerc_sf.patch.gz
Description: Binary data


signature.asc
Description: Digital signature


Re: [HACKERS] API change advice: Passing plan invalidation info from the rewriter into the planner?

2014-03-07 Thread Tom Lane
Craig Ringer cr...@2ndquadrant.com writes:
 On 03/06/2014 02:58 AM, Tom Lane wrote:
 The PlanInvalItem could perfectly well be generated by the planner,
 no, if it has the user OID?  But I'm not real sure why you need it.
 I don't see the reason for an invalidation triggered by user ID.
 What exactly about the *user*, and not something else, would trigger
 plan invalidation?

 It's only that the plan depends on the user ID. There's no point keeping
 the plan around if the user no longer exists.

[ shrug... ]  Leaving such a plan cached would be harmless, though.
Furthermore, the user ID we'd be talking about is either the owner
of the current session, or the owner of some view or security-definer
function that the plan is already dependent on, so it's fairly hard
to credit that the plan would survive long enough for the issue to
arise.

Even if there is a scenario where invalidating by user ID is actually
useful, I think adding infrastructure to cause invalidation in such a case
is optimizing for the wrong thing.  You're adding cycles to every query to
benefit a case that is going to be quite infrequent in practice.

 What we do need is a notion that a plan cache entry might only be
 valid for a specific calling user ID; but that's a matter for cache
 entry lookup not for subsequent invalidation.

 Yes, that would be good, but is IMO more of a separate optimization. I'm
 currently using KaiGai's code to invalidate and re-plan when a user ID
 change is detected.

I'm unlikely to accept a patch that does that; wouldn't it be catastrophic
for performance in the presence of security-definer functions?  You can't
just trash the whole plan cache when a user ID switch occurs.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] API change advice: Passing plan invalidation info from the rewriter into the planner?

2014-03-06 Thread Craig Ringer
On 03/06/2014 02:58 AM, Tom Lane wrote:
 Craig Ringer cr...@hobby.2ndquadrant.com writes:
 One of the remaining issues with row security is how to pass plan
 invalidation information generated in the rewriter back into the planner.
 
 With row security, it's necessary to set a field in PlannerGlobal,
 tracking the user ID of the user the query was planned for if row
 security was applied. It is also necessary to add a PlanInvalItem for
 the user ID.
 
 TBH I'd just add a user OID field in struct Query and not hack up a bunch
 of existing function APIs.  It's not much worse than the existing
 constraintDeps field.

If you're happy with that, I certainly won't complain. It's much simpler
and less intrusive.

I should be able to post an update using this later today.

 The PlanInvalItem could perfectly well be generated by the planner,
 no, if it has the user OID?  But I'm not real sure why you need it.
 I don't see the reason for an invalidation triggered by user ID.
 What exactly about the *user*, and not something else, would trigger
 plan invalidation?

It's only that the plan depends on the user ID. There's no point keeping
the plan around if the user no longer exists.

You're quite right that this can be done in the planner when a
dependency on the user ID is found, though. So there's no need to pass a
PlanInvalItem down, which is a lot nicer.

 What we do need is a notion that a plan cache entry might only be
 valid for a specific calling user ID; but that's a matter for cache
 entry lookup not for subsequent invalidation.

Yes, that would be good, but is IMO more of a separate optimization. I'm
currently using KaiGai's code to invalidate and re-plan when a user ID
change is detected.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] API change advice: Passing plan invalidation info from the rewriter into the planner?

2014-03-05 Thread Tom Lane
Craig Ringer cr...@hobby.2ndquadrant.com writes:
 One of the remaining issues with row security is how to pass plan
 invalidation information generated in the rewriter back into the planner.

 With row security, it's necessary to set a field in PlannerGlobal,
 tracking the user ID of the user the query was planned for if row
 security was applied. It is also necessary to add a PlanInvalItem for
 the user ID.

TBH I'd just add a user OID field in struct Query and not hack up a bunch
of existing function APIs.  It's not much worse than the existing
constraintDeps field.

The PlanInvalItem could perfectly well be generated by the planner,
no, if it has the user OID?  But I'm not real sure why you need it.
I don't see the reason for an invalidation triggered by user ID.
What exactly about the *user*, and not something else, would trigger
plan invalidation?

What we do need is a notion that a plan cache entry might only be
valid for a specific calling user ID; but that's a matter for cache
entry lookup not for subsequent invalidation.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers