Re: GIST/GIN index not used with Row Level Security

2019-08-14 Thread Derek Hans
>
>
> > I've updated word_similarity_op(text,text) to be leakproof, and
> > pg_proc agrees it is. I'm assuming word_similarity_op() is equivalent to
> > <%, though I haven't found explicit confirmation. However, using
> > word_similarity() instead of <% on a 100k row table, without any RLS
> > involved, doesn't make use of the index, while using <% does. Obviously,
> > adding the RLS doesn't make that any better. Any idea what might be the
> > cause?
>
> Just to be clear, you should be looking at pg_operator (oprcode) to
> determine the function that is under the operator that you wish to
> change to being leakproof.
>
>
Thanks for that pointer.


> Note that the selectivity functions are associated with the operator,
> not the function itself.
>

That was the missing piece, thanks. How come operators get optimized but
functions don't?

Quick summary:
The text similarity/full text search/like operators are not marked as
leakproof, which stops them from having access to table statistics. When
combined with row level security, operators that aren't leakproof can't get
pushed down and therefore happen after the RLS check, preventing use of
GIN/GIST indexes. A workaround is marking the underlying function as
leakproof but that is only reasonable because our particular setup makes it
acceptable if information leaks via database error messages.

To resolve:
- Lookup function associated with operator being used via the pg_operator
table
- Check if that function is leakproof based on info in pg_proc table
- ALTER FUNCTION func LEAKPROOF
- Use original operator in code - the underlying function doesn't get
optimized and bypasses the index

While those steps work on my local machine, unfortunately we're deployed on
AWS Aurora which doesn't allow marking functions as leakproof. Functions
are owned by the rdsadmin user and controlled by AWS. In practice, that
appears to mean that fuzzy search/full text search with reasonable
performance isn't compatible with RLS on Amazon Aurora. We may end up
setting up Elasticsearch to support text search. In any case, we need to
separate search from checking who is allowed to see the results.

Thanks for the help from everyone!


Re: GIST/GIN index not used with Row Level Security

2019-08-13 Thread Derek Hans
Thanks for the detailed response, super helpful in understanding what's
happening, in particular understanding the risk of not marking functions as
leakproof. I'll take a look at the underlying code to understand what's
involved in getting a function to be leakproof.

That said, it does seem like it should be possible and reasonable to
specify that a user should have access to the table stats so that the query
planner works as expected. Maybe it comes down to the fact that RLS is
still a work in progress, and I shouldn't be relying on it unless I'm
really certain it supports the functionality I need.

I've updated word_similarity_op(text,text) to be leakproof, and
pg_proc agrees it is. I'm assuming word_similarity_op() is equivalent to
<%, though I haven't found explicit confirmation. However, using
word_similarity() instead of <% on a 100k row table, without any RLS
involved, doesn't make use of the index, while using <% does. Obviously,
adding the RLS doesn't make that any better. Any idea what might be the
cause?


On Tue, Aug 13, 2019 at 5:39 PM Stephen Frost  wrote:

> Greetings,
>
> * Derek Hans (derek.h...@gmail.com) wrote:
> > Unfortunately only "alter function" supports "leakproof" - "alter
> operator"
> > does not. Is there a function-equivalent for marking operators as
> > leakproof? Is there any documentation for which operators/functions are
> > leakproof?
>
> Tom's query downthread provides the complete list.
>
> Note that the list is not completely static- it's entirely possible that
> additional functions can be made leak-proof, what's needed is a careful
> review of the function code to ensure that it can't leak information
> about the data (or, if it does today, a patch which removes that).  If
> you have an interest in that then I'd encourage you to dig into the code
> and look for possible leaks (Tom's already hinted in the direction you'd
> want to go in) and then propose a patch to address those cases and to
> mark the function(s) as leakproof.
>
> > In my particular case, RLS is still useful even if operators are leaky
> as I
> > control the application code and therefore can ensure leaky errors are
> > handled. If it's possible to disable all checking for "leakproof", that
> > would work for me.
>
> There isn't a way to disable the leakproof-checking system.  Certainly
> in the general case that wouldn't be acceptable and I'm not entirely
> convinced by your argument that such an option should exist, though you
> could go through and set all of the functions to be leakproof if you
> really wish to.
>
> > > If that's not possible, it sounds like it
> > > > effectively blocks the use of GIN/GIST indexes when RLS is in use.
> > >
> > > There's a whole lot of daylight between "it doesn't pick an indexscan
> in
> > > this one example" and "it effectively blocks the use of GIN/GIST".
> >
> > True indeed :). Would you have a working example of using a GIN/GIST
> index
> > with RLS? All the attempts I've made have ended in seq scans. In
> practice,
> > I'm looking to implement fuzzy search using trigrams, so % and %>
> operators
> > are what matter to me. ~~ also happens to fail. Should I expect to be
> able
> > to use any of these with RLS, large amounts of data and reasonable
> > performance?
>
> Functions that aren't marked leakproof aren't going to be able to be
> pushed down.
>
> > Your description of leakproof (and the documentation I've found) makes it
> > sound like I'm not just hitting an isolated problem, but a general
> problem
> > with RLS that represents a substantial limitation and is likely worth
> > documenting.
>
> There's some documentation regarding leakproof functions here:
>
> https://www.postgresql.org/docs/current/ddl-rowsecurity.html
>
> and here:
>
> https://www.postgresql.org/docs/11/sql-createfunction.html
>
> Of course, patches are welcome to improve on our documentation.
>
> One thing that it sounds like you're not quite appreciating is that in
> the general case, verifying that a function is leakproof isn't optional.
> Without such a check, any user could create a function and then get PG
> to push that function down below the RLS checks and therefore gain
> access to the data that they aren't supposed to be able to see.
>
> All that said, there's quite a few functions that *are* marked as
> leakproof already and they're quite handy and work well with RLS
> already, as I expect you'll see when you go querying pg_proc.
>
> Thanks,
>
> Stephen
>


-- 
*Derek*
+1 (415) 754-0519 | derek.h...@gmail.com | Skype: derek.hans


Re: GIST/GIN index not used with Row Level Security

2019-08-13 Thread Derek Hans
Thanks for the pointer for marking functions as leakproof, I was unaware of
that whole concept.

Unfortunately only "alter function" supports "leakproof" - "alter operator"
does not. Is there a function-equivalent for marking operators as
leakproof? Is there any documentation for which operators/functions are
leakproof?

In my particular case, RLS is still useful even if operators are leaky as I
control the application code and therefore can ensure leaky errors are
handled. If it's possible to disable all checking for "leakproof", that
would work for me.

> If that's not possible, it sounds like it
> > effectively blocks the use of GIN/GIST indexes when RLS is in use.
>
> There's a whole lot of daylight between "it doesn't pick an indexscan in
> this one example" and "it effectively blocks the use of GIN/GIST".
>

True indeed :). Would you have a working example of using a GIN/GIST index
with RLS? All the attempts I've made have ended in seq scans. In practice,
I'm looking to implement fuzzy search using trigrams, so % and %> operators
are what matter to me. ~~ also happens to fail. Should I expect to be able
to use any of these with RLS, large amounts of data and reasonable
performance?

Your description of leakproof (and the documentation I've found) makes it
sound like I'm not just hitting an isolated problem, but a general problem
with RLS that represents a substantial limitation and is likely worth
documenting.

-- 
*Derek*
+1 (415) 754-0519 | derek.h...@gmail.com | Skype: derek.hans


Re: GIST/GIN index not used with Row Level Security

2019-08-13 Thread Derek Hans
>
>
> Your example is obscuring the issue by incorporating a tenant_name
> condition (where did that come from, anyway?) in one case and not
> the other.  Without knowing how selective that is, it's hard to
> compare the EXPLAIN results.
>
>
That's RLS kicking in - RLS condition is defined as
((tenant_name)::name = CURRENT_USER)


> However, wild-guess time: it might be that without access to the
> table statistics, the "search like '%yo'" condition is estimated
> to be too unselective to make an indexscan profitable.  And putting
> RLS in the way would disable that access if the ~~ operator is not
> marked leakproof, which it isn't.
>

I didn't realize you could set access to table statistics. How do I enable
this access for this user? If that's not possible, it sounds like it
effectively blocks the use of GIN/GIST indexes when RLS is in use.


> I'm not sure that you should get too excited about this, however.
> You're evidently testing on a toy-size table, else the seqscan
> cost estimate would be a lot higher.  With a table large enough
> to make it really important to guess right, even the default
> selectivity estimate might be enough to get an indexscan.
>
>
I've tried this with larger data sets, with the same results. I discovered
this problem because the select was taking 10-30 seconds instead of the
expected sub-second, when using larger data sets and more fields getting
searched. The example is the simplest repro case I could create.



> regards, tom lane
>


-- 
*Derek*
+1 (415) 754-0519 | derek.h...@gmail.com | Skype: derek.hans


Re: GIST/GIN index not used with Row Level Security

2019-08-13 Thread Derek Hans
>
>
> What are the RLS policies on the table?
>
> From select * from pg_policies:
"((tenant_name)::name = CURRENT_USER)"


> What is the definition of the GIN index?
>
> CREATE INDEX search__gist
ON public.search USING gist
(search COLLATE pg_catalog."default" gist_trgm_ops)
TABLESPACE pg_default;


> Best guess is the RLS is preventing access to the field needed by the
> index.
>
> I didn't realize RLS can limit access to a specific field/index - my
understanding was that it only affects what rows get returned/can be
update/inserted.


>
> >
> > select * from search where search like '%yo'
> >
> > Creates this query plan:
> > "Seq Scan on search  (cost=0.00..245.46 rows=1 width=163)"
> > "  Filter: (((tenant_name)::name = CURRENT_USER) AND (search ~~
> > '%yo'::text))"
> >
> > Running this same query with the owner of the table, thereby disabling
> > RLS, the index gets used as expected:
> > "Bitmap Heap Scan on search  (cost=4.49..96.33 rows=44 width=163)"
> > "  Recheck Cond: (search ~~ '%yo'::text)"
> > "  ->  Bitmap Index Scan on search__gist  (cost=0.00..4.48 rows=44
> width=0)"
> > "Index Cond: (search ~~ '%yo'::text)"
> >
> > I see the same behavior with more complex queries, switching to GIN
> > index, more complex RLS rules, using word_similarity instead of like,
> > using full text search and larger data sets (e.g. 100k rows). This is on
> > PostgreSQL v11.1 on Windows 10.
> >
> > --
> > *Derek*
> > +1 (415) 754-0519 |derek.h...@gmail.com  |
> > Skype: derek.hans
>
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>


-- 
*Derek*
+1 (415) 754-0519 | derek.h...@gmail.com | Skype: derek.hans


GIST/GIN index not used with Row Level Security

2019-08-13 Thread Derek Hans
When using row level security, GIN and GIST indexes appear to get ignored.
Is this expected behavior? Can I change the query to get PostgreSQL using
the index? For example, with RLS enabled, this query:

select * from search where search like '%yo'

Creates this query plan:
"Seq Scan on search  (cost=0.00..245.46 rows=1 width=163)"
"  Filter: (((tenant_name)::name = CURRENT_USER) AND (search ~~
'%yo'::text))"

Running this same query with the owner of the table, thereby disabling RLS,
the index gets used as expected:
"Bitmap Heap Scan on search  (cost=4.49..96.33 rows=44 width=163)"
"  Recheck Cond: (search ~~ '%yo'::text)"
"  ->  Bitmap Index Scan on search__gist  (cost=0.00..4.48 rows=44 width=0)"
"Index Cond: (search ~~ '%yo'::text)"

I see the same behavior with more complex queries, switching to GIN index,
more complex RLS rules, using word_similarity instead of like, using full
text search and larger data sets (e.g. 100k rows). This is on PostgreSQL
v11.1 on Windows 10.

-- 
*Derek*
+1 (415) 754-0519 | derek.h...@gmail.com | Skype: derek.hans


Re: Update does not move row across foreign partitions in v11

2019-03-04 Thread Derek Hans
Based on a reply to reporting this as a bug, moving rows out of foreign
partitions is not yet implemented so this is behaving as expected. There's
a mention of this limitation in the Notes section of the Update docs.

On Wed, Feb 27, 2019 at 6:12 PM Alvaro Herrera 
wrote:

> On 2019-Feb-22, Derek Hans wrote:
>
> > I've set up 2 instances of PostgreSQL 11. On instance A, I created a
> table
> > with 2 local partitions and 2 partitions on instance B using foreign data
> > wrappers, following https://pgdash.io/blog/postgres-11-sharding.html.
> > Inserting rows into this table works as expected, with rows ending up in
> > the appropriate partition. However, updating those rows only moves them
> > across partitions in some of the situations:
> >
> >- From local partition to local partition
> >- From local partition to foreign partition
> >
> > Rows are not moved
> >
> >- From foreign partition to local partition
> >- From foreign partition to foreign partition
> >
> > Is this the expected behavior? Am I missing something or configured
> > something incorrectly?
>
> Sounds like a bug to me.
>
> --
> Álvaro Herrerahttps://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>


-- 
*Derek*
+1 (415) 754-0519 | derek.h...@gmail.com | Skype: derek.hans


Re: Update does not move row across foreign partitions in v11

2019-02-27 Thread Derek Hans
Hi all,
This behavior makes the new data sharding functionality in v11 only
marginally useful as you can't shard across database instances.
Considering data sharding appeared to be one of the key improvements in
v11, I'm confused - am I misunderstanding the expected functionality?

Thanks!

On Fri, Feb 22, 2019 at 9:44 AM Derek Hans  wrote:

> I've set up 2 instances of PostgreSQL 11. On instance A, I created a table
> with 2 local partitions and 2 partitions on instance B using foreign data
> wrappers, following https://pgdash.io/blog/postgres-11-sharding.html.
> Inserting rows into this table works as expected, with rows ending up in
> the appropriate partition. However, updating those rows only moves them
> across partitions in some of the situations:
>
>- From local partition to local partition
>- From local partition to foreign partition
>
> Rows are not moved
>
>- From foreign partition to local partition
>- From foreign partition to foreign partition
>
> Is this the expected behavior? Am I missing something or configured
> something incorrectly?
>
> Thanks,
> Derek
>


-- 
*Derek*
+1 (415) 754-0519 | derek.h...@gmail.com | Skype: derek.hans


Update does not move row across foreign partitions in v11

2019-02-22 Thread Derek Hans
I've set up 2 instances of PostgreSQL 11. On instance A, I created a table
with 2 local partitions and 2 partitions on instance B using foreign data
wrappers, following https://pgdash.io/blog/postgres-11-sharding.html.
Inserting rows into this table works as expected, with rows ending up in
the appropriate partition. However, updating those rows only moves them
across partitions in some of the situations:

   - From local partition to local partition
   - From local partition to foreign partition

Rows are not moved

   - From foreign partition to local partition
   - From foreign partition to foreign partition

Is this the expected behavior? Am I missing something or configured
something incorrectly?

Thanks,
Derek