* Robert Haas (robertmh...@gmail.com) wrote: > On Tue, Jun 17, 2014 at 10:25 PM, Stephen Frost <sfr...@snowman.net> wrote: > > I agree that we want to support that, if we can do so reasonably. What > > I was trying to get at is simply this- don't we provide that already > > with the leakproof attribute and functions? If we don't have enough > > there to allow index scans then we should be looking to add more, I'm > > thinking. > > So the reason why we got onto this particular topic was because of the > issue of multiple security policies for a single table. Of course, > multiple security policies can always be merged into a single > more-complex policy, but the resulting policy may be so complex that > the query-planner is no longer capable of doing a good job optimizing > it.
Yeah, I could see that happening with some use-cases. > >> ALTER TABLE tab ADD POLICY polname WHERE quals; > >> GRANT SELECT (polname) ON TABLE tab TO role; > > > > Right, if we were to support multiple policies on a given table then we > > would have to support adding and removing them individually, as well as > > specify when they are to be applied- and what if that "when" overlaps? > > Do we apply both and only a row which passed them all gets sent to the > > user? Essentially we'd be defining the RLS policies to be AND'd > > together, right? Would we want to support both AND-based and OR-based, > > and allow users to pick what set of conditionals they want applied to > > their various overlapping RLS policies? > > AND is not a sensible policy; it would need to be OR. If you grant > someone access to two different subsets of the rows in a table, it > stands to reason that they will expect to have access to all of the > rows that are in at least one of those subsets. I think I can buy off on this. What that also means is that any 'short-circuiting' that we try to do here would be based on "stop once we get back a 'true'". This could seriously change how we actually implement RLS though as doing it all through query rewrites and making this work with multiple security policies which are OR'd together and yet keeping the optimization and qual push-down and index-based plans is looking pretty daunting. I'm also of the opinion that this isn't strictly necessary for the initial RLS offering in PG- there's a clear way we could migrate existing users to a multi-policy system from a single-policy system. Sure, to get the performance and optimization benefits that we'd presumably have in the multi-policy case they'd need to re-work their RLS configuration, but for users who care, they'll likely be very happy to do so to gain those benefits. Perhaps the question here is- if we implement RLS one way for the single case and then change the implementation all around for the multi case, will we end up breaking the single case? Or destroying the performance for it? I can't see either of those cases being allowed- if and when we support multi, we must still support single and the whole point of multi would be to allow more performant implementations and that solution will require the single case to be at least as performant as what we're proposing to do today, I believe. Or are you thinking that we would never support calling user-defined functions in any RLS scheme because we want to be able to do that optimization? I don't see that being acceptable from a feature standpoint. > Alternatively, we could: > > - Require the user to specify in some way which of the available > policies they want applied, and then apply only that one. I'd want to at least see a way to apply an ordering to the policies being applied, or have PG work out which one is "cheapest" and try that one first. > - Decide that such scenarios constitute misconfiguration. Throw an > error and make the table owner or other relevant local authority fix > it. Having them all be OR'd together feels simpler and easier to work with than trying to provide the user with all the knobs necessary to select which subset of users they want the policy applied to when (user X from IP range a.b.c.d/24 at time 1500). We could probably make it work with exclusion constraints, range types, etc, and perhaps it'd be a reason to bring btree_gist into core (which I'm all for) and make it work with catalog tables, but... just 'yuck' all around, for my part. > > Sounds all rather painful and much better done programatically by the > > user in a language which is suited to that task- eg: pl/pgsql, perl, C, > > or something besides our ALTER syntax + catalog representation. > > I think exactly the opposite, for the query planning reasons > previously stated. I think the policies will quickly get so > complicated that they're no longer optimizable. Here's a simple > example: > > - Policy 1 allows the user to access rows for which complexfunc() returns > true. > - Policy 2 allows the user to access rows for which a = 1. > > Most users have access only through policy 2, but some have access > through policy 1. Users who have access through policy 1 will always > get a sequential scan, This is the thing which I most object to- if the quals being provided at any level are leakproof and would be able to reduce the returned set sufficiently that an index scan is the best bet, we should be doing that. I don't anticipate the RLS quals to be as selective as the quals which the user is adding. I agree that in cases where the user isn't using a leakproof function in their quals and the policy is complex, a sequential scan would have to be done over the table, but looking at the set of leakproof vs not leakproof functions used by operators which return boolean, certainly the most common of the index using cases are covered and we may be able to add more leakproof functions, should we get user complaints that the function they're using works fine with an index but isn't leakproof. > but users who have access through policy 2 have > an excellent chance of getting an index scan if the selectivity of a = > 1 is high. When you merge those two things into a single policy, no > matter how you do it, everyone gets sequential scans all the time. > That sucks. It just strikes me as unlikely that in such a simple policy the selectivity of the RLS qual used will be high and this feels like a lot of mechanism and complication to be adding for that use-case. If the selectivity is actually high in terms of what the RLS qual will allow, then it seems likely, to me at least, that it's going to need to depend on another table or function, eg: exists(select 1 from security_table where (current_user(),a) = (sec_user,sec_label)) Still thinking about this approach in general. Having a good answer to the question about granularity and how this multiple RLS-policy would actually work would certainly help. Being able to pick a single policy (rather than deal with overlapping policies that all have to be tested) would definitely make this simpler, but I suppose we could build up "X OR Y OR Z..." inside the query.. Thanks, Stephen
signature.asc
Description: Digital signature