[GENERAL] Re: Are new connection/security features in order, given connection pooling?

2017-01-13 Thread Tomas Vondra

On 01/12/2017 03:12 AM, Karl Czajkowski wrote:

I can relate to the original plea from my own exploration of this
topic.

Before I get into that, I will mention as an aside that to date we
have found RLS to be really slow for web client authorization, given
that we have to use session parameters to store web client context
and there doesn't seem to be a way to get the query planner to treat
these settings as constants nor to lift static parts of RLS policy
out of per-row loops. It's so slow that I wonder whether future
releases will improve it before we are forced to move authorization
back into the client-side application logic, making our generated SQL
much more complex but better tuned to the query planner's
capabilities.



I'm no expert in RLS, but I know others have been making the planner 
smarter when it comes to RLS. It would be helpful to compile this into a 
bunch of examples with queries/explain plans, and report it to 
pgsql-performance. It'd provide practical feedback from real-world RLS 
use, and maybe there already is a reasonable way to improve the plans.


I assume you use current_config() function to access the parameters? I 
see that while it's marked as stable, it's not marked as leak-proof, 
which might be why the planner can't apply some of the optimizations 
when used in a RLS policy.


>

As background, our web applications essentially compile new SQL
queries on the fly and send them through to PostgreSQL. There is a
lot of metaprogramming involved in generating SQL statements to
embody the work we want done on behalf of our clients, and this
doesn't fit well into traditional prepared statements etc.



Sure, a lot of systems generate queries on the fly. Also, if the main 
problem is poor plan choice due to RLS, I'm not sure how prepared 
statements could help with that.


>

The web service needs to assert the client context and the
client-level statements we generate and execute should not be able
to override this. The service logic to determine and assert client
context (once per web request) is dramatically simpler than the
service logic producing all the client-level SQL statements, and it
would be nice to have a restricted execution context to isolate the
two kinds. We also like the idea that policy enforcement mechanisms
could be applied orthogonally to the generated client-level SQL
statements, as it feels like a better defense-in-depth architecture
and is also easier to reason about. To do so in our service logic
would mean replicating more and more of the query parsing and engine
to do general query rewriting.

So, it would be nice to have a tunneling mechanism where I can
distinguish the control-plane operations I am performing from the
application operations I am translating and pushing down to
PostgreSQL on behalf of the web client. In my case, I might want to
perform a mixture of service-level and client-level statements within
the same transaction. The client-level statements cannot control
transactions.

I could imagine something like a two-level feature set. At the
top-level in the connection, we can statefully manipulate our
security contexts, set configurable privilege masks for the
second-level execution context, set effective roles (reversibly, as
today), manage transactions, etc. With some kind of nested execution
block, we could submit less trusted statements to run within the
second-level execution context:

   EXECUTE RESTRICTED $guard$ app_query... $guard$ ;

This would take the guarded query string, restart a safe parser on
it, and only on successful parse go forward with planning and
executing it in the restricted mode that has been configured for the
connection.

Eventually, I could see wanting more resource management controls on
this restricted context too, i.e. setting limits of CPU/IO/RAM
consumption or execution time. Both planning time limits (reject
based on estimates) and runtime (abort query if limit is reached).



That might be an interesting feature, but it's also significantly more 
complex than the topic of implementing a safe context for secrets, 
making RLS less problematic with connection pools.



regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Streaming replication protocol

2017-01-13 Thread Christopher J. Bottaro
Hello,

I'm trying to write a program that speaks the streaming replication
protocol (for logical decoding).  I get to the part where I issue a query:

START_REPLICATION SLOT regression_slot LOGICAL 0/0;

And after that, I receive an empty copy_both_response then a copy_data that
has a "Primary keepalive message" in it.  Then I'm stuck; I don't know how
to respond to that.

I'm going off the documentation here:

https://www.postgresql.org/docs/9.5/static/protocol-replication.html

Which I find really confusing and hard to understand.  Any help would be
appreciated!  Thank you.


Re: [GENERAL] efficiently migrating 'old' data from one table to another

2017-01-13 Thread Merlin Moncure
On Fri, Jan 13, 2017 at 12:03 PM,   wrote:
> On Jan 12, 2017, Jonathan Vanasco  wrote:
>>On Jan 12, 2017, at 5:52 PM, Merlin Moncure wrote:
>>
>>> On Thu, Jan 12, 2017 at 2:19 PM, bto...@computer.org
>>>  wrote:

 Review manual section 7.8.2. Data-Modifying Statements in WITH


 https://www.postgresql.org/docs/9.6/static/queries-with.html
>>>
>>> this.
>>>
>>> with data as (delete from foo where ... returning * ) insert into
>>> foo_backup select * from data;
>>
>>Thanks, btober and merlin.  that's exactly what i want.
>
> To help you a little more, I just did this for a set of tables within the
> last week. :) The heart of the program is this sql:
>
> my $Chunk_size = 1;
> my $Interval = 24;
> my $sql = "
> WITH
> keys AS (
> SELECT $pk_column
> FROM $table
> WHERE $time_column < NOW() - '$Interval MONTHS'::INTERVAL
> ORDER BY $pk_column
> LIMIT $Chunk_size ),
> data AS (
> DELETE FROM $table
> WHERE $pk_column <= (SELECT MAX($pk_column) FROM keys)
> RETURNING * )
> INSERT INTO archive_$table SELECT * FROM data;";
>
> That's from Perl, but I suspect you can guess as to what each var should be 
> for
> your application. You can set $Chunk_size to whatever you want. There is
> obviously a loop around that which executes until we get 0 rows, then we move
> on to the next table.
>
> The point of the chunks was to limit the impact on the production tables
> as we move data out of them. If you don't have that concern and want to do all
> rows at once then remove the LIMIT and ORDER BY.

FYI, although it's likely ok in this instance, directly inserting
table names without precaution is considered dubious and should be
avoided as practice.  SQL injection is a risk, and your code will fail
in the presence of unusual bug legal table names containing spaces.

For posterity handling this kind of action inside the database (via
plpgsql/EXECUTE) in order to leverage some internal routines,
especially quote_ident(), is generally a good idea.

merlin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Are new connection/security features in order, given connection pooling?

2017-01-13 Thread Peter J. Holzer
On 2017-01-11 00:49:19 -0800, Guyren Howe wrote:
> I’m not following. What I would like is just a lightweight way to switch the
> connections to use a different role, or some moral equivalent, that would
> prevent an SQL injection from wrecking havoc. I’m not proposing anything that
> will change anything else about how the application is using the database.
> 
> SET ROLE doesn’t work, because an SQL injection can just SET ROLE back to the
> privileged user.

But then you are no worse off than with the commonly used scheme of
executing all queries as the same (necessarily "privileged") user.

In both cases the attacker can execute queries as a privileged user IF
he succeeds at sql injections.

But as others have already noted this is relatively easy to prevent.
Just preparing all queries is sufficient, even if you don't actually
parametrize them. Perl DBI does this, so this dangerous-looking line of
code (assume that the string wasn't hardcoded but the result of an SQL
injection):

$r = $dbh->selectall_arrayref("select * from twoqueries; insert into 
twoqueries(t) values('b')");

will fail with

DBD::Pg::db selectall_arrayref failed: ERROR:  cannot insert
multiple commands into a prepared statement at ./twoqueries line 21.

So I think just using 

set local role 

at the beginning of each transaction should work well with session
pooling. It doesn't protect you against sql injections, but you won't
have to reinvent the authorization system.

> I would like a regime where there is no permanent privileged relationship
> between the client application and the database; a user would need to supply
> validating information that we can use to switch a connection to something 
> with
> minimal privileges for just that role, for the duration of one session or
> transaction.

I haven't read the blog post referenced in this thread yet, so maybe
this is covered there, but I think "supplying validating information"
would be the hard part. In general you wouldn't want a web-frontend to
cache plain-text passwords to resubmit them for each transaction, but to
use something more ethereal, like session cookies or kerberos tickets.

hp

-- 
   _  | Peter J. Holzer| A coding theorist is someone who doesn't
|_|_) || think Alice is crazy.
| |   | h...@hjp.at | -- John Gordon
__/   | http://www.hjp.at/ |http://downlode.org/Etext/alicebob.html


signature.asc
Description: Digital signature


Re: [GENERAL] efficiently migrating 'old' data from one table to another

2017-01-13 Thread
On Jan 12, 2017, Jonathan Vanasco  wrote:
>On Jan 12, 2017, at 5:52 PM, Merlin Moncure wrote:
>
>> On Thu, Jan 12, 2017 at 2:19 PM, bto...@computer.org 
>>  wrote:
>>> 
>>> Review manual section 7.8.2. Data-Modifying Statements in WITH
>>> 
>>> 
>>> https://www.postgresql.org/docs/9.6/static/queries-with.html
>> 
>> this.
>> 
>> with data as (delete from foo where ... returning * ) insert into 
>> foo_backup select * from data;
>
>Thanks, btober and merlin.  that's exactly what i want.

To help you a little more, I just did this for a set of tables within the
last week. :) The heart of the program is this sql:

my $Chunk_size = 1;
my $Interval = 24;
my $sql = "
WITH
keys AS (
SELECT $pk_column
FROM $table
WHERE $time_column < NOW() - '$Interval MONTHS'::INTERVAL
ORDER BY $pk_column
LIMIT $Chunk_size ),
data AS (
DELETE FROM $table
WHERE $pk_column <= (SELECT MAX($pk_column) FROM keys)
RETURNING * )
INSERT INTO archive_$table SELECT * FROM data;";

That's from Perl, but I suspect you can guess as to what each var should be for
your application. You can set $Chunk_size to whatever you want. There is
obviously a loop around that which executes until we get 0 rows, then we move
on to the next table.

The point of the chunks was to limit the impact on the production tables
as we move data out of them. If you don't have that concern and want to do all
rows at once then remove the LIMIT and ORDER BY.

HTH,
Kevin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] temporarily disable autovacuum on a database or server ?

2017-01-13 Thread Melvin Davidson
On Thu, Jan 12, 2017 at 12:09 PM, Jonathan Vanasco 
wrote:

>
> On Jan 11, 2017, at 8:19 PM, Melvin Davidson wrote:
>
>
>>
> *Yes, you're right about ALTER SYSTER. Unfortunately, the op provided
> neither PostgreSQL version or O/S, so we can't even be sure that is *
> *an option. That is  why I stated "I cannot confirm".*
>
>
>
> I didn't think that would matter, but postgres 9.6.1 and ubuntu 16.04
>
> anyways, thanks.  i'll test that approach.
>
>
>


*Jonathan,*


*I've tested this in PostgreSQL 9.4.6, so it should work for 9.6.1 also*


*Edit the postgresql.conf and change #autovacuum = on*


*toautovacuum = off*



*and save it.*


*Then psql -U postgres -c "SELECT pg_reload_conf();"*



*No need to restart postgres.*
After you finish your processing, do not forget to re-edit postgresql.conf
and change

*autovacuum = off*


*toautovacuum = on*

*save and*

*psql -U postgres -c "SELECT pg_reload_conf();"*




-- 

*Melvin Davidson*
I reserve the right to fantasize.  Whether or not you
wish to share my fantasy is entirely up to you.


Re: [GENERAL] efficiently migrating 'old' data from one table to another

2017-01-13 Thread Jonathan Vanasco

On Jan 12, 2017, at 5:52 PM, Merlin Moncure wrote:

> On Thu, Jan 12, 2017 at 2:19 PM, bto...@computer.org
>  wrote:
>> 
>> Review manual section 7.8.2. Data-Modifying Statements in WITH
>> 
>> 
>> https://www.postgresql.org/docs/9.6/static/queries-with.html
> 
> this.
> 
> with data as (delete from foo where ... returning * ) insert into
> foo_backup select * from data;

Thanks, btober and merlin.  that's exactly what i want.


On Jan 12, 2017, at 4:45 PM, Adrian Klaver wrote:
> Maybe I am missing something, but why do the UPDATE?
> Why not?:
> ...
> With an index on record_timestamp.

That's actually the production deployment that we're trying to optimize.  
Depending on the size of the table (rows, width) it performs "less than great", 
even with the index on record_timestamp.

The UPDATE actually worked faster in most situations.  I honestly don't know 
why (the only thing that makes sense to me is server-load)... but the update + 
bool test ended up being (much) faster than the timestamp comparison.  

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general