[HACKERS] questions about ON SELECT rules

2013-01-22 Thread Zhaomo Yang
Hi folks,

I'm working on a project where I need to run some commands/functions when
SELECT are executed on certain tables. I looked at the rule system and
found the following paragraph in the documentation:

  Currently, there can be only one action in an ON SELECT rule, and it
must be an unconditional SELECT action that is INSTEAD. This
  restriction was required to make rules safe enough to open them for
ordinary users, and it restricts ON SELECT rules to act like views.

These restrictions make it impossible  to implement what I need using the
rule system. I'm think about tweaking how rule statement is executed to
lift some restrictions (more actions, other type actions, etc.) but not
sure what impacts this will introduce into the whole system. Why are rules
not safe when these restrictions are removed?

Thanks,
Zhaomo


[HACKERS] A mechanism securing web applications in DBMS

2014-09-04 Thread Zhaomo Yang
 statement. Also, it can be hard for
developers to debug.

4. Collaboration.

We are looking forward to joining forces with PostgreSQL community on
this problem. More specifically, we have the following suggestions
which we hope the community would consider:

1)  Using GRANT-WHERE syntax for the row-level access control. If for
any reasons it is impossible, we hope at least a clause will be added
to the “ALTER TABLE … SET ROW SECURITY” syntax such that a recipient
can be specified;

2) Restarting Global Temporary Table project;

3) Accepting “CREATE AUTHENTICATION FUNCTION” as a variant of “CREATE
FUNCTION” command.

We’d like to discuss the mechanisms above with the community further,
and if any of the mechanisms are accepted by the community, we also
would like to take part in implementing it. Please feel free to
contact us if you have any questions.


Zhaomo Yang  zhy...@ucsd.edu
Kirill Levchenko   klevc...@ucsd.edu


References:
[1] S. Chaudhuri, T. Dutta, and S. Sudarashan. Fine Grained
Authorization Through Predicated Grants. In Proceedings of the 23rd
IEEE International Conference on Data Engineering (ICDE), pages
1174–1183, April 2007.

[2] idea: global temp tables
http://www.postgresql.org/message-id/162867790904271344s1ec96d90j6cde295fdcc78...@mail.gmail.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] A mechanism securing web applications in DBMS

2014-09-05 Thread Zhaomo Yang
Stephen,

There are several things I want to explain:

(1) Two philosophies important to our design
- Try to not force web application developers to make too many changes
to their apps if they wa.
- Try to make our mechanism as simple as possible.
Web application developers have all kinds of backgrounds. If the
security mechanism is too alien to them, they wouldn't use it.

(2) Why we need to cache application-level users' identifiers
We want to differentiate application-level users in DBMS, but not by
creating a DB user (or role in PG's terminology ) for every
application-level user, otherwise there will be all sorts of problems
when the number of application-level users is greater than a threshold
(e.g. catalog, as you mentioned). Instead, we still use one DB user
representing all the application-level users, just as how web apps
work now. Besides the identifiers (attributes) of a application-level
user are stored in some private place of the corresponding session
(e.g. temp table) when the application-level user authenticates so
that the DBMS can differentiate application-level users. (Connection
pooling should be fine as long as an application session doesn't
return its connection until it finishes. )

Normally, a web application authenticates an application-level user by
making a SELECT query with the user provided user id and password on
the password table to see if there is a match (Of course this is an
over simplified version of how authentication works. ).  Using our
mechanism, the web application instead calls the authentication
function, which does a SELECT on the table first, and store the
identifiers of that application-level user somewhere if a match found.
The identifiers of the current application-level user are referenced
by the policies so that fine-grained access control can be enforced.

(3) CREATE AUTHENTICATION FUNCTION
In our mechanism, we ask web application developers provide an
authentication function which normally takes user id and password as
inputs and returns a row containing all the identifiers (attributes)
of the corresponding application-level user. Let us call the place
storing the current application-level user's identifiers as
identifier store.

For example,

This is an authentication function provided by a app developer.

CREATE AUTHENTICATION FUNCTION auth (user_id TEXT, password TEXT)
RETURNS table(uid BIGINT, permissions TEXT[]) AS $$
...
$$ LANGUAGE plpgsql SECURITY DEFINER;

Under the hood, two regular functions will be defined

# the same function with a different name
CREATE FUNCTION _auth (user_id TEXT, password TEXT)
RETURNS table(uid BIGINT, permissions TEXT[]) AS $$
  # copy the function body from the CREATE AUTHENTICATION FUNCTION above
$$ LANGUAGE plpgsql SECURITY DEFINER;

# the function which is actually called in the server code
CREATE FUNCTION auth (user_id TEXT, password TEXT)
RETURNS table(uid BIGINT, permissions TEXT[]) AS $$
  # clear the identifier store
  # execute function _auth and insert the result into the identifier store.
  # return the row in the identifier store
$$ LANGUAGE plpgsql SECURITY DEFINER;

The whole point of this CREATE AUTHENTICATION FUNCTION syntax is to
reduce developers' work.  By giving developers very specific
instructions on how to write an authentication function, we hope they
would find it easy to write one. Admittedly, however, what CREATE
AUTHENTICATION FUNCTION does can be achieved by CREATE FUNCTION.

Please let me know if you have any other questions.

Zhaomo

On Thu, Sep 4, 2014 at 6:53 PM, Stephen Frost sfr...@snowman.net wrote:
 Zhaomo,

 * Zhaomo Yang (zhy...@cs.ucsd.edu) wrote:
 I am a graduate student from UC San Diego. My adviser, Dr. Kirill
 Levchenko, and I have been working on a web/DB security project for
 the last few months. Since fine-grained access control in DBMS is part
 of our project and the PostgreSQL community is also working on it now,
 we would like to exchange some ideas on the issue with you.

 Fantastic!  Very interested.

 1. Background
 Nowadays people entrust more and more sensitive data to web
 applications, while security vulnerabilities are common in these
 applications. The normal structure of web applications consists of
 three tiers: client-side code, server-side code and a database. In
 server-side code a database user representing all the
 application-level users interacts with the database with full
 privileges. Since currently database built-in access control is too
 coarse, no access control mechanism is used in most of web
 applications. This situation is not ideal since a malicious
 application user can tamper with other users’ data by exploiting
 vulnerabilities in the application, and if the application is
 completely under the malicious user’s control so is the database.

 Agreed- we are certainly working to improve that situation, though
 consideration must also be given to how our catalogs are structured and
 that they are unlikely to support web-scale numbers

Re: [HACKERS] A mechanism securing web applications in DBMS

2014-09-06 Thread Zhaomo Yang
Stephen,

 As an FYI- we generally prefer inline responses rather than top-posting on 
 the PostgreSQL mailing lists.  Thanks.

Sorry for that.

  - Try to make our mechanism as simple as possible.
  Web application developers have all kinds of backgrounds. If the
  security mechanism is too alien to them, they wouldn't use it.
 I'm surprised to hear this and a suggestion to used stored procedures in
 the same email- SPs are generally considered 'foreign' to the web
 developers that I've talked to. :)  That said, I'll grant that there are
 generally two camps: those who expect a database to only have BerkleyDB
 level key/value capabilities, and those who know what they're doing and
 what relational databases and SQL are all about.  The latter (and clear
 minority) group will take advantage of these capabilites, certainly,
 regardless of how they are expressed and are likely already comfortable
 using stored procedures and database-level roles.

I am surprised to hear this too. :) We haven't talked to many web
developers yet and this is one of the things we need to do in the
future.

The goal of this mechanism is to add another layer of protection
inside DBMS so that even if the application server is compromised the
users' data is under protection*. This requires DBMS to be able to
authenticate application-level users  (know which application-level
user it is communicating with). That it, we need to move the
authentication logic of application-level users into DBMS. For this
purpose, using store procedures (or something similar) is a must. I
think even if a security mechanism is designed to be easy to use, it
will still require some expertise.

* this mechanism can't help if the attackers control the app server
completely and the users are not aware of that and keep using the app.
In that case the attackers will be able to collect all the credentials
of the users who log in while they are in charge.

 If a temp table is being used then dynamic SQL may be required and therefore
 a plpgsql function will be involved to handle looking up the current user, as 
 you
 won't be using PG roles.

This is why I'd like to have global temp table in PG. With that we can
probably get around of dynamic SQL.

  (3) CREATE AUTHENTICATION FUNCTION
  In our mechanism, we ask web application developers provide an
  authentication function which normally takes user id and password as
  inputs and returns a row containing all the identifiers (attributes)
  of the corresponding application-level user. Let us call the place
  storing the current application-level user's identifiers as
  identifier store.
 I would *strongly* advocate *against* passing the password to the
 database in any (non-hashed) form.  You are much better off using a
 one-way hash as early as possible in the stack (ideally, in whatever
 system initially receives the password on the server side) and then
 comparing that one-way hash.  Of course, passwords in general are not
 considered secure and one-time passwords, hardware tokens, or PIV /
 HSPD12 / CAC cards with client-side certificates.

You are absolutely right. I should've explained it better. I just
wanted to show how authentication works and skipped all the hashing
part.

Thanks,
Zhaomo

On Fri, Sep 5, 2014 at 5:52 PM, Stephen Frost sfr...@snowman.net wrote:
 Zhaomo,

   As an FYI- we generally prefer inline responses rather than
   top-posting on the PostgreSQL mailing lists.  Thanks.

 * Zhaomo Yang (zhy...@cs.ucsd.edu) wrote:
 (1) Two philosophies important to our design
 - Try to not force web application developers to make too many changes
 to their apps if they wa.

 That's certainly fair.

 - Try to make our mechanism as simple as possible.
 Web application developers have all kinds of backgrounds. If the
 security mechanism is too alien to them, they wouldn't use it.

 I'm surprised to hear this and a suggestion to used stored procedures in
 the same email- SPs are generally considered 'foreign' to the web
 developers that I've talked to. :)  That said, I'll grant that there are
 generally two camps: those who expect a database to only have BerkleyDB
 level key/value capabilities, and those who know what they're doing and
 what relational databases and SQL are all about.  The latter (and clear
 minority) group will take advantage of these capabilites, certainly,
 regardless of how they are expressed and are likely already comfortable
 using stored procedures and database-level roles.

 (2) Why we need to cache application-level users' identifiers
 We want to differentiate application-level users in DBMS, but not by
 creating a DB user (or role in PG's terminology ) for every
 application-level user, otherwise there will be all sorts of problems
 when the number of application-level users is greater than a threshold
 (e.g. catalog, as you mentioned).

 While I agree that this can be an issue when things scale up, you *can*
 address it by sharding the database based on user.  Even so though, I

Re: [HACKERS] A mechanism securing web applications in DBMS

2014-09-20 Thread Zhaomo Yang
Stephen,

 Yes- but that's pretty trivially done, given that you've stipulated that
 a single connection DB connection must be used from authentication until
 de-authentication.  All that is needed is an additional column in the
 auth table which is populated with a pseudo-random value which is
 guaranteed to be unique and constant for the duration of the
 authenticated time- and the database backend PID is perfect for that.
 The auth function can call the pg_backend_pid() function directly and
 then the policies can include a 'pid = pg_backend_pid()' as part of the
 join to the auth table.  The auth function can also complain loudly if
 an entry in the pid table is found with the current PID during auth (and
 similar- the de-auth function can complain if an entry with the current
 PID is *not* found).  This would eliminate the need for the on-connect
 triggers, I believe (though those are interesting for other reasons..).


You are right. Using unlogged table is a good idea. I'll try it out.
Thanks for your advice!

  It'd be very interesting to see this done with the unlogged table,
 security definer functions, and the row-level policies patch which we're
 working on.  I'd further suggest that the application also use multiple
 roles which are set noinherit and 'set role' based on the operation
 which it's currently being used for- this would add another level of
 protection.  Using stored procedures (for more than just the auth and
 de-auth functions as suggested here) can also be a good idea.


Currently auth functions are security definer functions. I'm gonna try
to create a patch using unlogged table + RLS and put it online (e.g.
this mail list) so that people can try it.

Thanks,
Zhaomo

On Sat, Sep 13, 2014 at 4:00 PM, Stephen Frost sfr...@snowman.net wrote:
 Zhaomo,

 * Zhaomo Yang (zhy...@cs.ucsd.edu) wrote:
  Have you considered just using a regular, but unlogged, table?  That
  would also avoid any risk that the application manages to drop or shadow
  the temp table somehow with a fake table that changes who is currently
  authenticated, and avoids having to figure out how to deal with the temp
  table vanishing due to the connections going away.

 So then all the currently logged in users will be stored in the same
 table, which means we also need to make sure that the correct row in
 that table is used when the row-level security policy refers to the
 current application-level user.

 Yes- but that's pretty trivially done, given that you've stipulated that
 a single connection DB connection must be used from authentication until
 de-authentication.  All that is needed is an additional column in the
 auth table which is populated with a pseudo-random value which is
 guaranteed to be unique and constant for the duration of the
 authenticated time- and the database backend PID is perfect for that.
 The auth function can call the pg_backend_pid() function directly and
 then the policies can include a 'pid = pg_backend_pid()' as part of the
 join to the auth table.  The auth function can also complain loudly if
 an entry in the pid table is found with the current PID during auth (and
 similar- the de-auth function can complain if an entry with the current
 PID is *not* found).  This would eliminate the need for the on-connect
 triggers, I believe (though those are interesting for other reasons..).

 Let me send you a copy of our paper in a separate email which is a
 thorough description of the mechanism (including background, threat
 model, how it works, etc), which should give you an better idea on
 every aspect of the mechanism. Please do not distribute it because it
 has been accepted for publication. Note that the implementation we
 show in the paper is just a prototype (we made the changes so that we
 could implement it quickly). Our goal always is to integrate our
 mechanism into open source DBMS's like PG and MySQL cleanly.

 It'd be very interesting to see this done with the unlogged table,
 security definer functions, and the row-level policies patch which we're
 working on.  I'd further suggest that the application also use multiple
 roles which are set noinherit and 'set role' based on the operation
 which it's currently being used for- this would add another level of
 protection.  Using stored procedures (for more than just the auth and
 de-auth functions as suggested here) can also be a good idea.

 Thanks,

 Stephen


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CREATE POLICY and RETURNING

2015-08-15 Thread Zhaomo Yang
Stephen,

If no NEW or OLD is used, what happens?  Or would you have
 to always specify OLD/NEW for UPDATE, and then what about for the other
 policies, and the FOR ALL policies?


I should be clearer with references to OLD/NEW. SELECT Predicates cannot
reference any of them.
INSERT predicates cannot refer to OLD and DELETE predicates cannot refer to
NEW. Basically,
for INSERT/UPDATE/DELETE, we specify predicates the same way as we do for
triggers' WHEN condition.

As for FOR ALL, I think we will abandon it if we apply SELECT policy to
other commands, since SELECT predicate
will be the new universally applicable read policy, which makes the FOR ALL
USING clause much less useful. Of course users may need to specify separate
predicates for different commands, but I think it is fine. How often do
users want the same predicate for all the commands?

This could be accomplished with USING (bar  1) and WITH CHECK (foo 
 1), no?
 Your sentence above that USING and WITH CHECK are combined by AND
 isn't correct either- they're independent and are therefore really OR'd.
 If they were AND'd then the new record would have to pass both USING and
 WITH CHECK policies.


No, it is impossible with the current implementation.

CREATE TABLE test {
 id int,
 v1 int,
 v2 int
};

Suppose that the user wants an update policy which is OLD.v1  10 OR NEW.v2
 10.
As you suggested, we use the following policy

CREATE update_p ON test
FOR UPDATE TO test_user
USING v1  10
WITH CHECK v2  10;

(1) Assume there is only one row in the table
id |  v1 | v2 |
1  | 11 | 20 |

Now we execute  UPDATE test SET v2 = 100.
this query is allowed by the policy and the only row should be updated
since v1's old value  10, but will trigger an error because it violates
the WITH CHECK clause.

(2) Again assume there is only one row in the table
id |  v1 | v2 |
1  | 9 | 20 |

Now we execute  UPDATE test SET v2 = 7.
this query is allowed by the policy and the only row should be updated
since v2's new value  10, nothing will be updated because the only row
will be filtered out before update happens.

This is why I said USING and WITH CHECK are combined by AND. In order to
update an row, first the row needs to be visible, which meaning it needs to
pass the USING check, then it needs to pass the WITH CHECK.

Further, I'm not sure that I see how this would work in a case where you
 have the SELECT policy (which clearly could only refer to OLD) applied
 first, as you suggest?


We use SELECT policy to filter the table when we scan it (just like how we
use USING clause now). The predicate of UPDATE will be checked later
(probably similar to how we handle trigger's WHEN clause which can also
reference OLD and NEW).

Thanks,
Zhaomo


Re: [HACKERS] CREATE POLICY and RETURNING

2015-08-10 Thread Zhaomo Yang
Hi,

This thread has a pretty thorough discussion of pros and cons of applying
SELECT policy to other commands. Besides what have been mentioned, I think
there is another potential pro: we can enable references to pseudorelations
OLD and NEW in predicates. Now, for UPDATE, the references to the target
table in USING clause are actually references to OLD and the references in
WITH CHECK clause are references to NEW. Logically now USING and WITH CHECK
are combined by AND, so we cannot have predicates like

  foo(NEW)  1 OR bar(OLD)  1   (combine predicates referencing OLD
and NEW by an operation other than AND)
  NEW.id  OLD.id(reference both in the same expression)

If we apply SELECT policy to other commands, we only need one predicate for
INSERT/UPDATE/DELETE. That predicate can reference to OLD and NEW, just like
predicates for triggers and rules. For UPDATE and DELETE, the predicate of
SELECT will be applied first (when the target table is scanned) to ensure no
information leakage and their own predicate will be applied later. This
doesn't change much for INSERT and DELETE, but it gives users more
flexibility when they set predicates for UPDATE.

Thanks,
Zhaomo



--
View this message in context: 
http://postgresql.nabble.com/CREATE-POLICY-and-RETURNING-tp5823192p5861550.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CREATE POLICY and RETURNING

2015-08-10 Thread Zhaomo Yang
In case you missed the link to the previous discussion at the bottom,
http://www.postgresql.org/message-id/CAHGQGwEqWD=ynqe+zojbpoxywt3xlk52-v_q9s+xofckjd5...@mail.gmail.com


Re: [HACKERS] Implementation of global temporary tables?

2015-07-20 Thread Zhaomo Yang

 Just to be clear, the idea of a global temp table is that the table def
 is available to all users, but the data is private to each session?


The table def is visible to all sessions and persistent, but the data is
private to each session and temporary.

Thanks,
Zhaomo


Re: [HACKERS] Implementation of global temporary tables?

2015-07-15 Thread Zhaomo Yang
  there is other question - what is effect of ALTER TABLE of global temp
table on
  instances of this table in active sessions?

As I said, we need to first agree on the behaviors of the existing
commands. I can think of two options now for ALTER TABLE: 1) only allow
ALTER TABLE when there is no other active sessions (this is how Oracle
deals with it.) 2) handle it as if session copies inherit from the global
copy and ALTER TABLE executes on the global copy.

Thanks,
Zhaomo


Re: [HACKERS] Implementation of global temporary tables?

2015-07-15 Thread Zhaomo Yang
 Sounds fine in general. I'm a bit curious to know what are the locking
implications of  vivifying the table on access.

The locking implications depend on how we interpret the existing commands
in the context of global temp tables and I think we should discuss and
agree on the behaviors of the commands with global temp tables, but I think
in general we can follow these rules:

If the command executes on the global temp table's metadata, for example an
ALTER TABLE command, then we lock the global copy at the same level as we
do a regular table.

If the command executes on the global temp table's data (which is actually
stored in the session copy), for example an DML command, then the global
copy is locked at the AccessShareLock level to prevent concurrent
modifications to the global temp table's definition from other sessions.

Thanks,
Zhaomo

On Wed, Jul 15, 2015 at 4:26 AM, Andrew Dunstan and...@dunslane.net wrote:


 On 07/15/2015 07:58 AM, Simon Riggs wrote:


 For me the design summary is this

 * CREATE GLOBAL TEMP TABLE creates catalog entries like a normal table
 but with different relkind
 * When we see a request to INSERT, DEL, UPD, SEL from the temp table, if
 it does not exist we create it as a TEMP table of the same name, using the
 Global's pg_class entry as a template

 That meets the SQL Standard and doesn't contain any visibility problems
 or need for new internals.

 The purpose of this feature is to automatically create a temp table with
 the same definition whenever needed. The discussion of bloat is just
 wrong. We create exactly the same amount of bloat as if we had typed CREATE
 TEMP TABLE. Optimising temp table entries in the catalog is another,
 separate patch, if we care.



 Sounds fine in general. I'm a bit curious to know what are the locking
 implications of vivifying the table on access.

 cheers

 andrew



Re: [HACKERS] Implementation of global temporary tables?

2015-07-08 Thread Zhaomo Yang
  more global temp tables are little bit comfortable for developers,
I'd like to emphasize this point. This feature does much more than saving a
developer from issuing a CREATE TEMP TABLE statement in every session. Here
are two common use cases and I'm sure there are more.

(1)
Imagine in a web application scenario, a developer wants to cache some
session information in a temp table. What's more, he also wants to specify
some rules which reference the session information. Without this feature,
the rules will be removed at the end of every session since they depend on
a temporary object. Global temp tables will allow the developer to define
the temp table and the rules once.

(2)
The second case is mentioned by Tom Lane back in 2010 in a thread about
global temp tables.
(http://www.postgresql.org/message-id/9319.1272130...@sss.pgh.pa.us)
The context that I've seen it come up in is that people don't want to
clutter their functions with
 create-it-if-it-doesn't-exist logic, which you have to have given the
current behavior of temp tables.

  2.a - using on demand created temp tables - most simple solution, but
  doesn't help with catalogue bloating

I've read the thread and people disapprove this approach because of the
potential catalog bloat. However, I'd like to champion it. Indeed, this
approach may have a bloat issue. But for users who needs global temp
tables, they now have to create a new temp table in every session, which
means they already have the bloat problem and presumably they have some
policy to deal with it. In other words, implementing global temp tables by
this approach gives users the same performance, plus the convenience the
feature brings.

The root problem here is that whether whether having the unoptimized
feature is better than
having no feature at all. Actually, there was a very similar discussion
back in 2009 on global temp tables. Let me borrow Kevin Grittner's and Tom
Lane's arguments here.

Kevin Grittner's argument:

http://www.postgresql.org/message-id/49f82aea.ee98.002...@wicourts.gov
... If you're saying we can implement the standard's global temporary
tables in a way that performs better than current temporary tables, that's
cool.  That would be a nice bonus in addition to the application
programmer convenience and having another tick-mark on the standards
compliance charts.  Do you think that's feasible?  If not, the feature
would be useful to some with the same performance that temporary tables
currently provide.

Tom Lane's arguments:

http://www.postgresql.org/message-id/24110.1241035...@sss.pgh.pa.us
I'm all for eliminating catalog overheads, if we can find a way to do
that.  I don't think that you get to veto implementation of the feature
until we can find a way to optimize it better.  The question is not about
whether having the optimization would be better than not having it --- it's
about whether having the unoptimized feature is better than having no
feature at all (which means people have to implement the same behavior by
hand, and they'll *still* not get the optimization).

There have been several threads here discussing global temp table since
2007. Quite a few ideas aimed to avoid the bloat issue by not storing the
metadata of the session copy in the catalog. However, it seems that none of
them has been implemented, or even has a feasible design. So why don't we
implement it in a unoptimized way first?

  Is there still interest about this feature?
I'm very interested in this feature. I'm thinking about one implementation
which is similar to Pavel's 2009 proposal (
http://www.postgresql.org/message-id/162867790904271344s1ec96d90j6cde295fdcc78...@mail.gmail.com).
Here are the major ideas of my design:

(1)
Creating the cross-session persistent schema as a regular table and
creating session-private temp tables when a session first accesses it.

(2)
For DML queries, The global temp table is overloaded by its session copy
after the relation is opened by an oid or a rangevar. For DDL queries,
which copy is used depends on whether the query needs to access the data or
metadata of the global temp table.

There are more differences between this design and Pavel's 2009 proposal
and I'd like to send a detailed proposal to the mailing list but first I
want to know if our community would accept a global temp table
implementation which provides the same performance as currently temp tables
do.

Thanks,
Zhaomo


Re: [HACKERS] Implementation of global temporary tables?

2015-07-09 Thread Zhaomo Yang
  I am not sure, if it is not useless work.

I don't understand why an implementation taking approach 2.a would be
useless. As I said, its performance will be no worse than current temp
tables and it will provide a lot of convenience to users who need to create
temp tables in every session.

Thanks,
Zhaomo

On Tue, Jul 7, 2015 at 11:53 PM, Pavel Stehule pavel.steh...@gmail.com
wrote:

 Hi


 2015-07-08 9:08 GMT+02:00 Zhaomo Yang zhy...@cs.ucsd.edu:

   more global temp tables are little bit comfortable for developers,
 I'd like to emphasize this point. This feature does much more than saving
 a developer from issuing a CREATE TEMP TABLE statement in every session.
 Here are two common use cases and I'm sure there are more.

 (1)
 Imagine in a web application scenario, a developer wants to cache some
 session information in a temp table. What's more, he also wants to specify
 some rules which reference the session information. Without this feature,
 the rules will be removed at the end of every session since they depend on
 a temporary object. Global temp tables will allow the developer to define
 the temp table and the rules once.

 (2)
 The second case is mentioned by Tom Lane back in 2010 in a thread about
 global temp tables.
 (http://www.postgresql.org/message-id/9319.1272130...@sss.pgh.pa.us)
 The context that I've seen it come up in is that people don't want to
 clutter their functions with
  create-it-if-it-doesn't-exist logic, which you have to have given the
 current behavior of temp tables.

   2.a - using on demand created temp tables - most simple solution, but
   doesn't help with catalogue bloating

 I've read the thread and people disapprove this approach because of the
 potential catalog bloat. However, I'd like to champion it. Indeed, this
 approach may have a bloat issue. But for users who needs global temp
 tables, they now have to create a new temp table in every session, which
 means they already have the bloat problem and presumably they have some
 policy to deal with it. In other words, implementing global temp tables by
 this approach gives users the same performance, plus the convenience the
 feature brings.

 The root problem here is that whether whether having the unoptimized
 feature is better than
 having no feature at all. Actually, there was a very similar discussion
 back in 2009 on global temp tables. Let me borrow Kevin Grittner's and Tom
 Lane's arguments here.

 Kevin Grittner's argument:

 http://www.postgresql.org/message-id/49f82aea.ee98.002...@wicourts.gov
 ... If you're saying we can implement the standard's global temporary
 tables in a way that performs better than current temporary tables, that's
 cool.  That would be a nice bonus in addition to the application
 programmer convenience and having another tick-mark on the standards
 compliance charts.  Do you think that's feasible?  If not, the feature
 would be useful to some with the same performance that temporary tables
 currently provide.

 Tom Lane's arguments:

 http://www.postgresql.org/message-id/24110.1241035...@sss.pgh.pa.us
 I'm all for eliminating catalog overheads, if we can find a way to do
 that.  I don't think that you get to veto implementation of the feature
 until we can find a way to optimize it better.  The question is not about
 whether having the optimization would be better than not having it --- it's
 about whether having the unoptimized feature is better than having no
 feature at all (which means people have to implement the same behavior by
 hand, and they'll *still* not get the optimization).

 There have been several threads here discussing global temp table since
 2007. Quite a few ideas aimed to avoid the bloat issue by not storing the
 metadata of the session copy in the catalog. However, it seems that none of
 them has been implemented, or even has a feasible design. So why don't we
 implement it in a unoptimized way first?


 I am not sure, if it is not useless work.

 Now, I am thinking so best implementation of global temp tables is
 enhancing unlogged tables to have local content. All local data can be
 saved in session memory. Usually it is less than 2KB with statistic, and
 you don't need to store it in catalogue. When anybody is working with any
 table, related data are copied to system cache - and there can be injected
 a implementation of global temp tables.

 regards

 Pavel Stehule



   Is there still interest about this feature?
 I'm very interested in this feature. I'm thinking about one
 implementation which is similar to Pavel's 2009 proposal (
 http://www.postgresql.org/message-id/162867790904271344s1ec96d90j6cde295fdcc78...@mail.gmail.com).
 Here are the major ideas of my design:

 (1)
 Creating the cross-session persistent schema as a regular table and
 creating session-private temp tables when a session first accesses it.

 (2)
 For DML queries, The global temp table is overloaded by its session copy
 after the relation is opened by an oid or a rangevar

Re: [HACKERS] CREATE POLICY and RETURNING

2015-09-16 Thread Zhaomo Yang
Stephen,

> I agree that if we force a single visibility policy for all commands
> then we wouldn't need the USING clauses for UPDATE and DELETE, but we
> would certainly need *some* policy for DELETE to prevent users from
> being able to delete records that they aren't supposed to be allowed to.
> Therefore, we'd just be replacing the USING policy with a 'WITH CHECK'
> policy, no?

If we force a single visibility policy (SELECT policy), then  we will
need a command-specific policy for each of UPDATE/DELETE/INSERT. A
command-specific policy may be  a writing policy (as for INSERT), a
reading policy (as for DELETE), or a hybrid policy (as for UPDATE).

For DELETE we can either combine the visibility policy (SELECT policy)
with the DELETE policy using AND and then scan the table, or just
attach the DELETE policy to the WHERE clause after the visibility
policy has been enforced. I don't see why we need to replace USING
policy with a "WITH CHECK".

BTW, what is the fundamental difference between a USING predicate and
a WITH CHECK predicate? Is it that which phase they are applied (read
or write)? Or is it that how they handle violations (nothing-happens
or error-out)?

> Removing the existing ability to control the visibility on a
> per-command basis is pretty clearly a reduction in the overall
> flexibility of the system without a clear gain to me.

I think there is a clear gain: security.

One interesting issue related to this discussion is that how
violations are handled. Now reading violations fail silently
(nothing-happens result) while writing violations cause errors
(throw-error result).

In the paper named "Extending Query Rewriting Techniques for
Fine-Grained Access Control" [1], Rizvi et al. added row level access
control to DBMSes using an interesting syntax: GRANT-WHERE. They added
a WHERE predicate to the SQL GRANT statement
to achieve row-level access control. Besides the interesting syntax,
they brought up the two possible models of handling violations in the
paper. One model is "nothing-happens" model (they call it Truman's
world model) and another is "error out" model (they call it Non-Truman
model). The authors discussed the pros and cons of both models: the
"nothing-happens" model is more secure since it leaks less information
but a user may get surprised  by the results; the "error-out" model
leaks information but may be more convenient when a user is debugging
his queries. I curious about our community's  take on this issue.


Thanks,
Zhaomo

[1] http://avid.cs.umass.edu/courses/645/s2006/645-paper5.pdf


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CREATE POLICY and RETURNING

2015-09-11 Thread Zhaomo Yang
>
> I really don't like the approach you're suggesting above where an 'OR'
> inside of
> such a clause could mean that users can arbitrarly change any existing row
> without any further check on that row and I have a hard time seeing the
> use-case which justifies the additional complexity and user confusion.


I admit that I gave some bad examples in the previous email, and it is fair
to say
this (Being able to have something like NEW.value > 10 OR OLD.id = 1) is
not a advantage of what I proposed
before I can come up with any real-world examples.

So there would also be a SELECT policy anyway, which is just like the
> existing UPDATE USING policy is today and what you're really asking for
> is the ability to have the WITH CHECK policy reference both the OLD and
> NEW records.

Yes. Then we won't need any USING clauses for UPDATE/DELETE. For
UPDATE/DELETE, we only need
one predicate which can reference both OLD and NEW.

I might be able to get behind supporting that, but I'm not
> terribly excited about it and you've not provided any real use-cases for
> it that I've seen


I think that there are two major advantages:

1)
As many folks have pointed out in this and other threads, this will makes
information leakage less likely.
Now a permissive USING clause for UPDATE/DELETE can give an attacker chance
to read rows he
is not allowed to SELECT. Even without leaky functions, an attacker can
easily figure out the rows by doing a
binary search with tricks like division by zero.

2)
This proposal allows a user to reference both the OLD and NEW records in
the same clause. For example,
NEW.id == OLD.id , or NEW.value <= OLD.value + 10. I think this should be
useful for users since they may often
need to check the new value against the old one.


it still doesn't really change anything regarding
> RETURNING any differently than the earlier suggestions did about having
> the SELECT policy applied to all commands.


No, it doesn't. I proposed it here because there are some related
discussions (applying SELECT policy to other commands).

Thanks,
Zhaomo

On Tue, Aug 25, 2015 at 8:17 AM, Stephen Frost <sfr...@snowman.net> wrote:

> Zhaomo,
>
> * Zhaomo Yang (zmp...@gmail.com) wrote:
> > > If no NEW or OLD is used, what happens?  Or would you have
> > > to always specify OLD/NEW for UPDATE, and then what about for the other
> > > policies, and the FOR ALL policies?
> >
> > I should be clearer with references to OLD/NEW. SELECT Predicates cannot
> > reference any of them.
> > INSERT predicates cannot refer to OLD and DELETE predicates cannot refer
> to
> > NEW. Basically,
> > for INSERT/UPDATE/DELETE, we specify predicates the same way as we do for
> > triggers' WHEN condition.
> >
> > As for FOR ALL, I think we will abandon it if we apply SELECT policy to
> > other commands, since SELECT predicate
> > will be the new universally applicable read policy, which makes the FOR
> ALL
> > USING clause much less useful. Of course users may need to specify
> separate
> > predicates for different commands, but I think it is fine. How often do
> > users want the same predicate for all the commands?
>
> I can certainly see use-cases where you'd want to apply the same policy
> to all new records, regardless of how they're being added, and further,
> the use-case where you want the same policy for records which are
> visible and those which are added.  In fact, I'd expect that to be one
> of the most common use-cases as it maps directly to a set of rows which
> are owned by one user, where that user can see/modify/delete their own
> records but not impact other users.
>
> So, I don't think it would be odd at all for users to want the same
> predicate for all of the commands.
>
> > > This could be accomplished with "USING (bar > 1)" and "WITH CHECK (foo
> >
> > > 1)", no?
> > > Your sentence above that "USING and WITH CHECK are combined by AND"
> > > isn't correct either- they're independent and are therefore really
> OR'd.
> > > If they were AND'd then the new record would have to pass both USING
> and
> > > WITH CHECK policies.
> >
> > No, it is impossible with the current implementation.
> >
> > CREATE TABLE test {
> >  id int,
> >  v1 int,
> >  v2 int
> > };
> >
> > Suppose that the user wants an update policy which is OLD.v1 > 10 OR
> NEW.v2
> > < 10.
> > As you suggested, we use the following policy
> >
> > CREATE update_p ON test
> > FOR UPDATE TO test_user
> > USING v1 > 10
> > WITH CHECK v2 < 10;
> >
> > (1) Assume there is only one row in the table
> > id |  v1 | v2 |
> > 1

Re: [HACKERS] CREATE POLICY and RETURNING

2015-10-05 Thread Zhaomo Yang
Stephen,

I just tried a little bit your patch for applying SELECT policies to
DELETE/UPDATE. It is consistent with the GRANT system so it looks
really good. I'll test it more thoroughly later.

Also, I guess we don't need to worry about the syntax of "restrictive
policies" you mentioned in the upthread since SELECT policies are
essentially restrictive now. Since that work has already been done,
I'm wondering if I can take the task of allowing policies to reference
both the 'old' and 'new' versions of the row. I understand that this
feature won't be considered for 9.5 but I'd like to implement it and
hopefully get it incorporated into 9.6.

Thanks,
Zhaomo


On Wed, Sep 23, 2015 at 11:54 AM, Stephen Frost <sfr...@snowman.net> wrote:
> * Zhaomo Yang (zmp...@gmail.com) wrote:
>> > Just a side-note, but your mail client doesn't seem to get the quoting
>> > quite right sometimes, which can be confusing.  Not sure if there's
>> > anything you can do about it but wanted to let you know in case there
>> > is.
>>
>> Sorry about this. From now on I'll use the plain text mode for msgs I
>> send to the mailing list.
>> Please let me know if this happens also in this email.
>
> Looks like this one has all of the quoting correct- thanks!
>
>> > Regarding this, specifically, we'd need to first decide on what the
>> > syntax/grammar should be.
>>
>> I'll think about it. Also, thanks for the pointers.
>
> Sure, no problem.
>
>> > Right, and we adressed the concerns with RETURNING.  Regarding the
>> > non-RETURNING case, The same concerns about blind updates and deletes
>> > already exist with the GRANT permission system; it's not anything new.
>>
>> I think they are different. In the current GRANT permission system,
>> one can do blind updates but he
>> cannot refer to any existing values in either the expressions or the
>> condition if he doesn't have
>> SELECT privilege on the table (or the columns), thus the tricks like
>> divide-by-zero cannot be used and a malicious
>> user cannot get information out of blind updates.
>
> Ok, I see what you're getting at with that and I believe it'll be a
> pretty straight-forward change, thanks to Dean's recent rework.  I'll
> take a look at making that happens.
>
> Thanks!
>
> Stephen


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CREATE POLICY and RETURNING

2015-09-23 Thread Zhaomo Yang
Stephen,

It'd be great if others who are interested can help define the grammar
> changes necessary
> and perhaps even help with the code aspect of it.

I'd like to help on both. Can you elaborate a little bit more, especially
on the code aspect?

I don't buy that argument.

It is agreed that blind updates and deletes with RETURNING clause are
dangerous. It is quite similar here.
Instead of using
   BEGIN
   UPDATE-or-DELETE-with-RETURNING
   ROLLBACK
as a substitute for SELECT, a malicious user can do a binary search with
some trick like divide-by-zero
to figure out rows he is not allowed to access. Of course, this is not as
serious as RETURNING, but it is still quite convenient for attackers.

Thanks,
Zhaomo


Re: [HACKERS] CREATE POLICY and RETURNING

2015-09-23 Thread Zhaomo Yang
Stephen,

> Just a side-note, but your mail client doesn't seem to get the quoting
> quite right sometimes, which can be confusing.  Not sure if there's
> anything you can do about it but wanted to let you know in case there
> is.

Sorry about this. From now on I'll use the plain text mode for msgs I
send to the mailing list.
Please let me know if this happens also in this email.

> Regarding this, specifically, we'd need to first decide on what the
> syntax/grammar should be.

I'll think about it. Also, thanks for the pointers.

> Right, and we adressed the concerns with RETURNING.  Regarding the
> non-RETURNING case, The same concerns about blind updates and deletes
> already exist with the GRANT permission system; it's not anything new.

I think they are different. In the current GRANT permission system,
one can do blind updates but he
cannot refer to any existing values in either the expressions or the
condition if he doesn't have
SELECT privilege on the table (or the columns), thus the tricks like
divide-by-zero cannot be used and a malicious
user cannot get information out of blind updates.

Thanks,
Zhaomo


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers