Re: [HACKERS] PG10 partitioning - odd behavior/possible bug

2017-11-28 Thread Joe Conway
On 11/28/2017 04:40 PM, Robert Haas wrote:
> On Sun, Sep 3, 2017 at 5:28 PM, Joe Conway <m...@joeconway.com> wrote:
>> I was playing around with partitioning and found an oddity that is best
>> described with the following reasonably minimal test case:
> 
> I can reproduce this without partitioning, just by creating two
> independent tables with the same schema and tweaking a few things from
> your test case to refer to the correct table rather than relying on
> tuple routing:

[snip]

> There's no error here because I didn't bother putting constraints on
> the table, but that tsr = empty bit is still happening.  I think the
> problem is that you're updating the same row twice in the same
> transaction, and now() returns the same value both times because
> that's how now() works, so the second time the range ends up with the
> lower and endpoints that are equal.

Yeah, Tom already pointed that out a while back:

https://www.postgresql.org/message-id/20986.1504478066%40sss.pgh.pa.us

FWIW, I have working version of this now (using clock_timestamp()) here
(see last part of the appendix):

https://www.joeconway.com/presentations/SecurePostgreSQL-PGOpen2017.pdf

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature


Re: has_sequence_privilege() never got the memo

2017-11-26 Thread Joe Conway
On 11/23/2017 07:16 AM, Peter Eisentraut wrote:
> On 11/22/17 22:58, Tom Lane wrote:
>> Joe Conway <m...@joeconway.com> writes:
>>> I just noticed that has_sequence_privilege() never got the memo about
>>> "WITH GRANT OPTION". Any objections to the attached going back to all
>>> supported versions?
>> 
>> That looks odd.  Patch certainly makes this case consistent with the
>> rest of acl.c, but maybe there's some deeper reason?  Peter?
> 
> No I think it was just forgotten.

Pushed.

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature


assert in nested SQL procedure call in current HEAD

2018-05-25 Thread Joe Conway
My colleague Yogesh Sharma discovered an assert in nested SQL procedure
calls after ROLLBACK is used. Minimal test case and backtrace below. I
have not yet tried to figure out exactly what is going on beyond seeing
that it occurs in pg_plan_query() where the comment says "Planner must
have a snapshot in case it calls user-defined functions"...

Joe

--
DROP TABLE IF EXISTS tst_tbl;
CREATE TABLE tst_tbl (a int);
CREATE OR REPLACE PROCEDURE insert_data() AS $$
 INSERT INTO tst_tbl VALUES (42);
$$ LANGUAGE SQL;

CREATE OR REPLACE PROCEDURE transaction_test() AS $$
 BEGIN
  ROLLBACK;
  CALL insert_data();
 END
$$ LANGUAGE PLPGSQL;
CALL transaction_test();

#0  0x7fe343cf4428 in __GI_raise (sig=sig@entry=6) at
../sysdeps/unix/sysv/linux/raise.c:54
#1  0x7fe343cf602a in __GI_abort () at abort.c:89
#2  0x00a71951 in ExceptionalCondition (conditionName=0xc9408b
"!(ActiveSnapshotSet())", errorType=0xc93d23 "FailedAssertion",
fileName=0xc93e22 "postgres.c", lineNumber=801) at assert.c:54
#3  0x008ec20f in pg_plan_query (querytree=0x2df1f00,
cursorOptions=256, boundParams=0x0) at postgres.c:801
#4  0x0070772e in init_execution_state
(queryTree_list=0x2df2c28, fcache=0x2df12d8, lazyEvalOK=true) at
functions.c:507
#5  0x00707e50 in init_sql_fcache (finfo=0x7fffcbb1fd10,
collation=0, lazyEvalOK=true) at functions.c:770
#6  0x007085d9 in fmgr_sql (fcinfo=0x7fffcbb1fd40) at
functions.c:1064
#7  0x00675c12 in ExecuteCallStmt (stmt=0x2d54b18, params=0x0,
atomic=false, dest=0xfc7380 ) at functioncmds.c:2294
#8  0x008f5067 in standard_ProcessUtility (pstmt=0x2d54a00,
queryString=0x2d4f4e8 "CALL insert_data()",
context=PROCESS_UTILITY_QUERY_NONATOMIC,
params=0x0, queryEnv=0x0, dest=0xfc7380 ,
completionTag=0x7fffcbb20390 "") at utility.c:649
#9  0x008f4852 in ProcessUtility (pstmt=0x2d54a00,
queryString=0x2d4f4e8 "CALL insert_data()",
context=PROCESS_UTILITY_QUERY_NONATOMIC, params=0x0,
queryEnv=0x0, dest=0xfc7380 ,
completionTag=0x7fffcbb20390 "") at utility.c:360
#10 0x0074523d in _SPI_execute_plan (plan=0x2d54558,
paramLI=0x0, snapshot=0x0, crosscheck_snapshot=0x0, read_only=false,
fire_triggers=true, tcount=0)
at spi.c:2225
#11 0x00741d9d in SPI_execute_plan_with_paramlist
(plan=0x2d54558, params=0x0, read_only=false, tcount=0) at spi.c:481
#12 0x7fe33491bcfd in exec_stmt_call (estate=0x7fffcbb20870,
stmt=0x2df6170) at pl_exec.c:2103
#13 0x7fe33491b7bd in exec_stmt (estate=0x7fffcbb20870,
stmt=0x2df6170) at pl_exec.c:1920
#14 0x7fe33491b66e in exec_stmts (estate=0x7fffcbb20870,
stmts=0x2df60f0) at pl_exec.c:1875
#15 0x7fe33491b508 in exec_stmt_block (estate=0x7fffcbb20870,
block=0x2df5c60) at pl_exec.c:1816
#16 0x7fe334918e50 in plpgsql_exec_function (func=0x2d5ec30,
fcinfo=0x7fffcbb20ba0, simple_eval_estate=0x0, atomic=false) at
pl_exec.c:586
#17 0x7fe334913119 in plpgsql_call_handler (fcinfo=0x7fffcbb20ba0)
at pl_handler.c:263
#18 0x00675c12 in ExecuteCallStmt (stmt=0x2d2cff8, params=0x0,
atomic=false, dest=0x2d2d420) at functioncmds.c:2294
#19 0x008f5067 in standard_ProcessUtility (pstmt=0x2d2d0c8,
queryString=0x2d2c4e8 "CALL transaction_test();",
context=PROCESS_UTILITY_TOPLEVEL,
params=0x0, queryEnv=0x0, dest=0x2d2d420,
completionTag=0x7fffcbb213a0 "") at utility.c:649
#20 0x008f4852 in ProcessUtility (pstmt=0x2d2d0c8,
queryString=0x2d2c4e8 "CALL transaction_test();",
context=PROCESS_UTILITY_TOPLEVEL, params=0x0,
queryEnv=0x0, dest=0x2d2d420, completionTag=0x7fffcbb213a0 "") at
utility.c:360


-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature


Re: Redesigning the executor (async, JIT, memory efficiency)

2018-05-25 Thread Joe Conway
On 05/24/2018 11:26 PM, Heikki Linnakangas wrote:
> Cool stuff!
> 
> On 25/05/18 06:35, Andres Freund wrote:
>> For example, this converts this converts TPCH's Q01:

+1
Wicked cool!


-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature


Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2018-06-18 Thread Joe Conway
On 06/18/2018 09:49 AM, Robert Haas wrote:
> On Wed, Jun 13, 2018 at 9:20 AM, Joe Conway  wrote:
>>> Also, if I understand correctly, at unconference session there also
>>> were two suggestions about the design other than the suggestion by
>>> Alexander: implementing TDE at column level using POLICY, and
>>> implementing TDE at table-space level. The former was suggested by Joe
>>> but I'm not sure the detail of that suggestion. I'd love to hear the
>>> deal of that suggestion.
>>
>> The idea has not been extensively fleshed out yet, but the thought was
>> that we create column level POLICY, which would transparently apply some
>> kind of transform on input and/or output. The transforms would
>> presumably be expressions, which in turn could use functions (extension
>> or builtin) to do their work. That would allow encryption/decryption,
>> DLP (data loss prevention) schemes (masking, redacting), etc. to be
>> applied based on the policies.
> 
> It seems to me that column-level encryption is a lot less secure than
> block-level encryption.  I am supposing here that the attack vector is
> stealing the disk.  If all you've got is a bunch of 8192-byte blocks,
> it's unlikely you can infer much about the contents.  You know the
> size of the relations and that's probably about it. 

Not necessarily. Our pages probably have enough predictable bytes to aid
cryptanalysis, compared to user data in a column which might not be very
predicable.


> If you've got individual values being encrypted, then there's more
> latitude to figure stuff out.  You can infer something about the
> length of particular values.  Perhaps you can find cases where the
> same encrypted value appears multiple times.

This completely depends on the encryption scheme you are using, and the
column level POLICY leaves that entirely up to you.

But in any case most encryption schemes use a random nonce (salt) to
ensure two identical strings do not encrypt to the same result. And
often the encrypted length is padded, so while you might be able to
infer short versus long, you would not usually be able to infer the
exact plaintext length.


> If there's a btree index, you know the ordering of the values under
> whatever ordering semantics apply to that index.  It's unclear to me
> how useful such information would be in practice or to what extent it
> might allow you to attack the underlying cryptography, but it seems
> like there might be cases where the information leakage is
> significant.  For example, suppose you're trying to determine which
> partially-encrypted record is that of Aaron Aardvark... or this guy: 
> https://en.wikipedia.org/wiki/Hubert_Blaine_Wolfeschlegelsteinhausenbergerdorff,_Sr.
Again, this only applies if your POLICY uses this type of encryption,
i.e. homomorphic encryption. If you use strong encryption you will not
be indexing those columns at all, which is pretty commonly the case.

> Recently, it was suggested to me that a use case for column-level
> encryption might be to prevent casual DBA snooping.  So, you'd want
> the data to appear in pg_dump output encrypted, because the DBA might
> otherwise look at it, but you wouldn't really be concerned about the
> threat of the DBA loading a hostile C module that would steal user
> keys and use them to decrypt all the data, because they don't care
> that much and would be fired if they were caught doing it.

Again completely dependent on the extension you use to do the encryption
for the input policy. The keys don't need to be stored with the data,
and the decryption can be transparent only for certain users or if
certain session variables exist which the DBA does not have access to.

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2018-06-18 Thread Joe Conway
On 06/18/2018 10:26 AM, Robert Haas wrote:
> On Mon, Jun 18, 2018 at 10:12 AM, Joe Conway  wrote:
>> Not necessarily. Our pages probably have enough predictable bytes to aid
>> cryptanalysis, compared to user data in a column which might not be very
>> predicable.
> 
> Really?  I would guess that the amount of entropy in a page is WAY
> higher than in an individual column value.

It isn't about the entropy of the page overall, it is about the
predictability of specific bytes at specific locations on the pages. At
least as far as I understand it.

>> But in any case most encryption schemes use a random nonce (salt) to
>> ensure two identical strings do not encrypt to the same result. And
>> often the encrypted length is padded, so while you might be able to
>> infer short versus long, you would not usually be able to infer the
>> exact plaintext length.
> 
> Sure, that could be done, although it means that equality comparisons
> must be done unencrypted.

Sure. Typically equality comparisons are done on other unencrypted
attributes. Or if you need to do equality on encrypted columns, you can
store non-reversible cryptographic hashes in a separate column.

>> Again completely dependent on the extension you use to do the encryption
>> for the input policy. The keys don't need to be stored with the data,
>> and the decryption can be transparent only for certain users or if
>> certain session variables exist which the DBA does not have access to.
> 
> Not arguing with that.  And to be clear, I'm not trying to attack your
> proposal.  I'm just trying to have a discussion about advantages and
> disadvantages of different approaches.

Understood. Ultimately we might want both page-level encryption and
column level POLICY, as they are each useful for different use-cases.
Personally I believe the former is more generally useful than the
latter, but YMMV.

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2018-06-18 Thread Joe Conway
On 06/18/2018 10:52 AM, Tom Lane wrote:
> Robert Haas  writes:
>> On Mon, Jun 18, 2018 at 10:12 AM, Joe Conway  wrote:
>>> Not necessarily. Our pages probably have enough predictable bytes to aid
>>> cryptanalysis, compared to user data in a column which might not be very
>>> predicable.
> 
>> Really?  I would guess that the amount of entropy in a page is WAY
>> higher than in an individual column value.
> 
> Depending on the specifics of the encryption scheme, having some amount
> of known (or guessable) plaintext may allow breaking the cipher, even
> if much of the plaintext is not known.  This is cryptology 101, really.

Exactly

> At the same time, having to have a bunch of independently-decipherable
> short field values is not real secure either, especially if they're known
> to all be encrypted with the same key.  But what you know or can guess
> about the plaintext in such cases would be target-specific, rather than
> an attack that could be built once and used against any PG database.

Again is dependent on the specific solution for encryption. In some
cases you might do something like generate a single use random key,
encrypt the payload with that, encrypt the single use key with the
"global" key, append the two results and store.

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2018-06-13 Thread Joe Conway
On 06/11/2018 05:22 AM, Masahiko Sawada wrote:
> As per discussion at PGCon unconference, I think that firstly we need
> to discuss what threats we want to defend database data against.

Exactly. While certainly there is demand for encryption for the sake of
"checking a box", different designs will defend against different
threats, and we should be clear on which ones we are trying to protect
against for any particular design.

> Also, if I understand correctly, at unconference session there also
> were two suggestions about the design other than the suggestion by
> Alexander: implementing TDE at column level using POLICY, and
> implementing TDE at table-space level. The former was suggested by Joe
> but I'm not sure the detail of that suggestion. I'd love to hear the
> deal of that suggestion.

The idea has not been extensively fleshed out yet, but the thought was
that we create column level POLICY, which would transparently apply some
kind of transform on input and/or output. The transforms would
presumably be expressions, which in turn could use functions (extension
or builtin) to do their work. That would allow encryption/decryption,
DLP (data loss prevention) schemes (masking, redacting), etc. to be
applied based on the policies.

This, in and of itself, would not address key management. There is
probably a separate need for some kind of built in key management --
perhaps a flexible way to integrate with external systems such as Vault
for example, or maybe something self contained, or perhaps both. Or
maybe key management is really tied into the separately discussed effort
to create SQL VARIABLEs somehow.

In any case certainly a lot of room for discussion.

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



Re: missing toast table for pg_policy

2018-06-15 Thread Joe Conway
On 06/15/2018 02:40 PM, John Naylor wrote:
> On 2/19/18, Joe Conway  wrote:
>> The attached does exactly this. Gives all system tables toast tables
>> except pg_class, pg_attribute, and pg_index, and includes cat version
>> bump and regression test in misc_sanity.
>>
>> Any further discussion, comments, complaints?
> 
> Hi Joe,
> There's been a little bit-rot with duplicate OIDs and the regression
> test. The first attached patch fixes that (applies on top of yours).


Not surprising -- thanks for the update.


> It occurred to be that we could go further and create most toast
> tables automatically by taking advantage of the fact that the toast
> creation function is a no-op if there are no varlena attributes. The
> second patch (applies on top of the first) demonstrates a setup where
> only shared and bootstrap catalogs need to have toast declarations
> specified manually with fixed OIDs. It's possible this way is more
> fragile, though.

Hmmm, I'll have a look.


Thanks!

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



Re: New committers announced at PGCon 2018

2018-06-01 Thread Joe Conway
On 06/01/2018 05:14 PM, Andres Freund wrote:
> On 2018-06-01 17:05:11 -0400, Tom Lane wrote:
>> The core team is pleased to announce the appointment of seven
>> new Postgres committers:
>>
>> Etsuro Fujita
>> Peter Geoghegan
>> Amit Kapila
>> Alexander Korotkov
>> Thomas Munro
>> Michael Paquier
>> Tomas Vondra
>>
>> Congratulations to all!
> 
> +1!

+7!

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



Re: commitfest 2018-07

2018-06-05 Thread Joe Conway
On 06/05/2018 10:43 AM, Andres Freund wrote:
> On 2018-06-05 10:20:55 -0400, Tom Lane wrote:
>> Andres Freund  writes:
>>> I'd rather create a new 2018-07, and just manually move old patches to
>>> it. Otherwise we'll not really focus on the glut of old things, but
>>> everyone just restarts working on their own new thing.
>>
>> I thought the idea was to clear out the underbrush of small, ready-to-go
>> patches.  How old they are doesn't enter into that.
>>
>> There's a separate issue about what to do to prioritize old patches so
>> they don't linger indefinitely.  We had a discussion about that at the
>> dev meeting, but I don't think any specific mechanism was agreed to?
> 
> I think we've not fully agreed on that.  I'd argue we should manually
> filter things into the next CF. And both small patches and older things
> should qualify.

Would it work to rename 2018-09 to 2018-07 and then make a pass through
and move some of the entries to the next commitfest right away? That
seems easier than the alternative unless you think less than half of
what is there should be in 2018-07.

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



Re: Add CONTRIBUTING.md

2018-05-29 Thread Joe Conway
On 05/29/2018 11:38 AM, Magnus Hagander wrote:
> On Tue, May 29, 2018 at 5:36 PM, Andres Freund  > wrote:
> 
> Hi,
> 
> A lot of people contribute in communities via github these days. We
> should add a CONTRIBUTING.md that explains how to do so, given that we
> don't use github. That's shown automatically when doing a pull requests
> etc.
> 
> I think it's also a good idea to have it in the source tree independent
> of github, it's very hard to figure out where to start from a git
> clone / tarball.
> 
> 
> +1. This would be useful for us internally as well, and using the github
> format makes sense because a lot of users are used to it, so that's
> where they're going to go look first.

+1


-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2018-06-20 Thread Joe Conway
On 06/20/2018 05:05 PM, Bruce Momjian wrote:
> On Mon, Jun 18, 2018 at 08:29:32AM -0400, Joe Conway wrote:
>>>> Or
>>>> maybe key management is really tied into the separately discussed effort
>>>> to create SQL VARIABLEs somehow.
>>>
>>> Could you elaborate on how key management is tied into SQL VARIABLEs?
>>
>> Well, the key management probably is not, but the SQL VARIABLE might be
>> where the key is stored for use.
> 
> I disagree.  I would need to understand how an extension actually helps
> here, because it certainly limits flexibility compared to a shell
> command.

That flexibility the shell command gives you is also a huge hole from a
security standpoint.

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2018-06-20 Thread Joe Conway
On 06/20/2018 05:03 PM, Bruce Momjian wrote:
> On Wed, Jun 13, 2018 at 09:20:58AM -0400, Joe Conway wrote:
>> The idea has not been extensively fleshed out yet, but the thought was
>> that we create column level POLICY, which would transparently apply some
>> kind of transform on input and/or output. The transforms would
>> presumably be expressions, which in turn could use functions (extension
>> or builtin) to do their work. That would allow encryption/decryption,
>> DLP (data loss prevention) schemes (masking, redacting), etc. to be
>> applied based on the policies.
> 
> This is currently possible with stock Postgres as you can see from this
> and the following slides:
> 
>   http://momjian.us/main/writings/crypto_hw_use.pdf#page=77

That is definitely not the same thing. A column level POLICY would apply
an input and output transform expression over the column transparently
to the database user. That transform might produce, for example, a
different output depending on the logged in user (certain user sees
entire field whereas other users see redacted or masked form, or certain
users get decrypted result while others don't).

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2018-06-20 Thread Joe Conway
On 06/20/2018 05:09 PM, Bruce Momjian wrote:
> On Mon, Jun 18, 2018 at 09:49:20AM -0400, Robert Haas wrote:
>> know the ordering of the values under whatever ordering semantics
>> apply to that index.  It's unclear to me how useful such information
> 
> I don't think an ordered index is possible, only indexing of encrypted
> hashes, i.e. see this and the next slide:

It is possible with homomorphic encryption -- whether we want to support
that in core is another matter.

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2018-06-20 Thread Joe Conway
On 06/20/2018 05:12 PM, Bruce Momjian wrote:
> On Mon, Jun 18, 2018 at 11:06:20AM -0400, Joe Conway wrote:
>>> At the same time, having to have a bunch of independently-decipherable
>>> short field values is not real secure either, especially if they're known
>>> to all be encrypted with the same key.  But what you know or can guess
>>> about the plaintext in such cases would be target-specific, rather than
>>> an attack that could be built once and used against any PG database.
>>
>> Again is dependent on the specific solution for encryption. In some
>> cases you might do something like generate a single use random key,
>> encrypt the payload with that, encrypt the single use key with the
>> "global" key, append the two results and store.
> 
> Even if they are encrypted with the same key, they use different
> initialization vectors that are stored inside the encrypted payload, so
> you really can't identify much except the length, as Robert stated.

The more you encrypt with a single key, the more fuel you give to the
person trying to solve for the key with cryptanalysis.

By encrypting only essentially random data (the single use keys,
generated with cryptographically strong random number generator) with
the "master key", and then encrypting the actual payloads (which are
presumably more predictable than the strong random single use keys), you
minimize the probability of someone cracking your master key and you
also minimize the damage caused by someone cracking one of the single
use keys.

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



Re: [HACKERS] Re: [HACKERS] generated columns

2017-12-30 Thread Joe Conway
On 12/27/2017 09:31 AM, Peter Eisentraut wrote:
> On 9/12/17 15:35, Jaime Casanova wrote:
>> On 10 September 2017 at 00:08, Jaime Casanova
>>  wrote:
>>>
>>> During my own tests, though, i found some problems:
> 
> Here is an updated patch that should address the problems you have found.

In the commit message it says:

  "The plan to is implement two kinds of generated columns:
   virtual (computed on read) and stored (computed on write).  This
   patch only implements the virtual kind, leaving stubs to implement
   the stored kind later."

and in the patch itself:

+
+ The generation expression can refer to other columns in the table, but
+ not other generated columns.  Any functions and operators used must be
+ immutable.  References to other tables are not allowed.
+

Question -- when the "stored" kind of generated column is implemented,
will the immutable restriction be relaxed? I would like, for example, be
able to have a stored generated column that executes now() whenever the
row is written/rewritten.

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature


Re: [HACKERS] Re: [HACKERS] generated columns

2017-12-31 Thread Joe Conway
On 12/31/2017 09:38 AM, Peter Eisentraut wrote:
> On 12/30/17 16:04, Joe Conway wrote:
>> +
>> + The generation expression can refer to other columns in the table, but
>> + not other generated columns.  Any functions and operators used must be
>> + immutable.  References to other tables are not allowed.
>> +
>> 
>> Question -- when the "stored" kind of generated column is implemented,
>> will the immutable restriction be relaxed? I would like, for example, be
>> able to have a stored generated column that executes now() whenever the
>> row is written/rewritten.



> Maybe some of this could be relaxed at some point, but we would have to
> think it through carefully.  For now, a trigger would still be the best
> implementation for your use case, I think.

Sure, but generated column behavior in general can be implemented via
trigger.

Anyway, I have seen requests for change data capture
(https://en.wikipedia.org/wiki/Change_data_capture) in Postgres which is
apparently available in our competition without requiring the use of
triggers. Perhaps that is yet a different feature, but I was hopeful
that this mechanism could be used to achieve it.

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature


Re: A space-efficient, user-friendly way to store categorical data

2018-02-12 Thread Joe Conway
On 02/11/2018 10:06 PM, Thomas Munro wrote:
> On Mon, Feb 12, 2018 at 12:24 PM, Andrew Dunstan
>  wrote:
>> On Mon, Feb 12, 2018 at 9:10 AM, Tom Lane  wrote:
>>> Andrew Kane  writes:
 A better option could be a new "dynamic enum" type, which would have
 similar storage requirements as an enum, but instead of labels being
 declared ahead of time, they would be added as data is inserted.
>>>
>>> You realize, of course, that it's possible to add labels to an enum type
>>> today.  (Removing them is another story.)
>>>
>>> You haven't explained exactly what you have in mind that is going to be
>>> able to duplicate the advantages of the current enum implementation
>>> without its disadvantages, so it's hard to evaluate this proposal.
>>>
>>
>>
>> This sounds rather like the idea I have been tossing around in my head
>> for a while, and in sporadic discussions with a few people, for a
>> dictionary object. The idea is to have an append-only list of labels
>> which would not obey transactional semantics, and would thus help us
>> avoid the pitfalls of enums - there wouldn't be any rollback of an
>> addition.  The use case would be for a jsonb representation which
>> would replace object keys with the oid value of the corresponding
>> dictionary entry rather like enums now. We could have a per-table
>> dictionary which in most typical json use cases would be very small,
>> and we know from some experimental data that the compression in space
>> used from such a change would often be substantial.
>>
>> This would have to be modifiable dynamically rather than requiring
>> explicit additions to the dictionary, to be of practical use for the
>> jsonb case, I believe.
>>
>> I hadn't thought about this as a sort of super enum that was usable
>> directly by users, but it makes sense.
>>
>> I have no idea how hard or even possible it would be to implement.
> 
> I have had thoughts over the years about something similar, but going
> the other way and hiding it from the end user.  If you could declare a
> column to have a special compressed property (independently of the
> type) then it could either automatically maintain a dictionary, or at
> least build a new dictionary for your when you next run some kind of
> COMPRESS operation.  There would be no user visible difference except
> footprint.  In ancient DB2 they had a column property along those
> lines called "VALUE COMPRESSION" (they also have a row-level version,
> and now they have much more advanced kinds of adaptive compression
> that I haven't kept up with).  In some ways it'd be a bit like toast
> with shared entries, but I haven't seriously looked into how such a
> thing might be implemented.

For what it is worth, there is a similar concept in R called "factors".
When categorical data is stored in a data.frame (R language equivalent
to relations) it is transparently and automatically converted. I believe
this is both for storage compression and to facilitate some of the
analytics. In R you can also explicitly specify to *not* convert strings
to factors as a performance optimization, because that conversion does
have a noticeable impact during ingestion and is not always needed.

I can also envision usefulness of this type of mechanism in other
security related scenarios.

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature


Re: missing toast table for pg_policy

2018-02-17 Thread Joe Conway
On 02/16/2018 05:24 PM, Tom Lane wrote:
> Joe Conway <m...@joeconway.com> writes:
>> On 02/16/2018 05:07 PM, Andres Freund wrote:
>>> If problematic for < master users I think you'll have to restart cluster
>>> with allow_system_table_mods, manually create/drop toasted column. IIRC
>>> that should add a toast table even after dropping.
> 
>> I wasn't sure if we would want to backpatch and put the manual procedure
>> steps into the release notes.
> 
> The example you give seems like pretty bad practice to me.  I don't think
> we should back-patch unless it's possible to trigger the problem with a
> more realistic policy expression.

Fair enough, but the origin of this was a real life-based complaint.

> (Also, one can always work around it by putting the complicated condition
> into a function, which would likely be a better idea anyway from a
> maintenance standpoint.)

Yes, exactly. I'm fine with not backpatching, just wanted to raise the
possibility. I will push later today to HEAD (with a catalog version bump).

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature


Re: SHA-2 functions

2018-02-19 Thread Joe Conway
On 02/19/2018 08:43 AM, Peter Eisentraut wrote:
> I also noticed while working on some SSL code that we have perfectly
> good SHA-2 functionality in the server already, but it has no test
> coverage outside the SCRAM tests.
> 
> So I suggest these patches that expose the new functions sha224(),
> sha256(), sha384(), sha512().  That allows us to make the SSL and SCRAM
> tests more robust, and it will allow them to be used in general purpose
> contexts over md5().

I didn't look closely at the patch itself, but +1 on the concept.

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature


Re: missing toast table for pg_policy

2018-02-16 Thread Joe Conway
On 02/16/2018 05:07 PM, Andres Freund wrote:
> Hi,
> 
> On 2018-02-16 16:56:15 -0500, Joe Conway wrote:
>> Looking at the issue, the problem seems to be missing toast table for
>> pg_policy. Also attached is a one line patch. It isn't clear to me
>> whether this is a candidate for backpatching.
> 
> Don't think it is - it'd not take effect on already initdb'ed clusters.

Yep, knew that, but...

> If problematic for < master users I think you'll have to restart cluster
> with allow_system_table_mods, manually create/drop toasted column. IIRC
> that should add a toast table even after dropping.

I wasn't sure if we would want to backpatch and put the manual procedure
steps into the release notes.

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature


Re: missing toast table for pg_policy

2018-02-18 Thread Joe Conway
On 02/18/2018 11:18 AM, Tom Lane wrote:
> Joe Conway <m...@joeconway.com> writes:
>> Is there really a compelling reason to not just create toast tables for
>> all system catalogs as in the attached?
> 
> What happens when you VACUUM FULL pg_class?  (The associated toast table
> would have to be nonempty for the test to prove much.)

I tried this:
create table foo (id int);
do $$declare i int; begin for i in 1..1000 loop execute 'create user u'
|| i; end loop; end;$$;
do $$declare i int; begin for i in 1..1000 loop execute 'grant all on
foo to u' || i; end loop; end;$$;
vacuum full pg_class;

Worked without issue FWIW.

> I'm fairly suspicious of toasting anything that the toast mechanism itself
> depends on, actually, and that would include at least pg_attribute and
> pg_index as well as pg_class.  Maybe we could get away with it because
> there would never be any actual recursion only potential recursion ...
> but it seems scary.

Well that is the other approach we could pursue -- instead of justifying
which system catalogs need toast tables we could create an exclusion
list of which ones should not have toast tables, with the current
candidates being pg_class, pg_attribute, and pg_index.

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature


Re: missing toast table for pg_policy

2018-02-18 Thread Joe Conway
On 02/17/2018 11:39 AM, Tom Lane wrote:
> Joe Conway <m...@joeconway.com> writes:
>> Yes, exactly. I'm fine with not backpatching, just wanted to raise the
>> possibility. I will push later today to HEAD (with a catalog version bump).
> 
> BTW, I was wondering if it'd be a good idea to try to forestall future
> oversights of this sort by adding a test query in, say, misc_sanity.sql.
> Something like
> 
> select relname, attname, atttypid::regtype
> from pg_class c join pg_attribute a on c.oid = attrelid
> where c.oid < 16384 and reltoastrelid=0 and relkind = 'r' and attstorage != 
> 'p'
> order by 1,2;
> 
> If you try that you'll see the list is quite long:



> I think in most of these cases we've consciously decided not to toast-ify,
> but maybe some of them need a second look.

Is there really a compelling reason to not just create toast tables for
all system catalogs as in the attached? Then we could just check for 0
rows in misc_sanity.sql.

For what its worth:

HEAD

# du -h --max-depth=1 $PGDATA
[...]
22M /usr/local/pgsql-head/data/base
[...]
584K/usr/local/pgsql-head/data/global
[...]
38M /usr/local/pgsql-head/data

time make check
real0m16.295s
user0m3.597s
sys 0m1.465s


with patch

# du -h --max-depth=1 $PGDATA
[...]
23M /usr/local/pgsql-head/data/base
[...]
632K/usr/local/pgsql-head/data/global
[...]
39M /usr/local/pgsql-head/data

time make check
real0m16.462s
user0m3.521s
sys 0m1.531s

select relname, attname, atttypid::regtype
from pg_class c join pg_attribute a on c.oid = attrelid
where c.oid < 16384 and reltoastrelid=0 and relkind = 'r' and attstorage
!= 'p'
order by 1,2;
 relname | attname | atttypid
-+-+--
(0 rows)


-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development
diff --git a/src/backend/catalog/catalog.c b/src/backend/catalog/catalog.c
index 809749add9..813b3b87cc 100644
--- a/src/backend/catalog/catalog.c
+++ b/src/backend/catalog/catalog.c
@@ -258,7 +258,19 @@ IsSharedRelation(Oid relationId)
 		relationId == PgDbRoleSettingToastTable ||
 		relationId == PgDbRoleSettingToastIndex ||
 		relationId == PgShseclabelToastTable ||
-		relationId == PgShseclabelToastIndex)
+		relationId == PgShseclabelToastIndex ||
+		relationId == PgAuthidToastTable ||
+		relationId == PgAuthidToastIndex ||
+		relationId == PgDatabaseToastTable ||
+		relationId == PgDatabaseToastIndex ||
+		relationId == PgPlTemplateToastTable ||
+		relationId == PgPlTemplateToastIndex ||
+		relationId == PgReplicationOriginToastTable ||
+		relationId == PgReplicationOriginToastIndex ||
+		relationId == PgSubscriptionToastTable ||
+		relationId == PgSubscriptionToastIndex ||
+		relationId == PgTablespaceToastTable ||
+		relationId == PgTablespaceToastIndex)
 		return true;
 	return false;
 }
diff --git a/src/include/catalog/toasting.h b/src/include/catalog/toasting.h
index f6387ae143..3ef3990ea3 100644
--- a/src/include/catalog/toasting.h
+++ b/src/include/catalog/toasting.h
@@ -46,25 +46,64 @@ extern void BootstrapToastTable(char *relName,
  */
 
 /* normal catalogs */
+DECLARE_TOAST(pg_aggregate, 4139, 4140);
 DECLARE_TOAST(pg_attrdef, 2830, 2831);
+DECLARE_TOAST(pg_attribute, 4141, 4142);
+DECLARE_TOAST(pg_class, 4143, 4144);
+DECLARE_TOAST(pg_collation, 4145, 4146);
 DECLARE_TOAST(pg_constraint, 2832, 2833);
+DECLARE_TOAST(pg_default_acl, 4147, 4148);
 DECLARE_TOAST(pg_description, 2834, 2835);
+DECLARE_TOAST(pg_event_trigger, 4149, 4150);
+DECLARE_TOAST(pg_extension, 4151, 4152);
+DECLARE_TOAST(pg_foreign_data_wrapper, 4153, 4154);
+DECLARE_TOAST(pg_foreign_server, 4155, 4156);
+DECLARE_TOAST(pg_foreign_table, 4157, 4158);
+DECLARE_TOAST(pg_index, 4159, 4160);
+DECLARE_TOAST(pg_init_privs, 4161, 4162);
+DECLARE_TOAST(pg_language, 4163, 4164);
+DECLARE_TOAST(pg_largeobject, 4165, 4166);
+DECLARE_TOAST(pg_largeobject_metadata, 4167, 4168);
+DECLARE_TOAST(pg_namespace, 4169, 4170);
+DECLARE_TOAST(pg_partitioned_table, 4171, 4172);
+DECLARE_TOAST(pg_policy, 4173, 4174);
 DECLARE_TOAST(pg_proc, 2836, 2837);
 DECLARE_TOAST(pg_rewrite, 2838, 2839);
 DECLARE_TOAST(pg_seclabel, 3598, 3599);
 DECLARE_TOAST(pg_statistic, 2840, 2841);
 DECLARE_TOAST(pg_statistic_ext, 3439, 3440);
 DECLARE_TOAST(pg_trigger, 2336, 2337);
+DECLARE_TOAST(pg_ts_dict, 4175, 4176);
+DECLARE_TOAST(pg_type, 4177, 4178);
+DECLARE_TOAST(pg_user_mapping, 4179, 4180);
 
 /* shared catalogs */
-DECLARE_TOAST(pg_shdescription, 2846, 2847);
-#define PgShdescriptionToastTable 2846
-#define PgShdescriptionToastIndex 2847
+DECLARE_TOAST(pg_authid, 4181, 4182);
+#define PgAuthidToastTable 4181
+#define PgAuthidToastIndex 4182
+DECLARE_TOAST(pg_database, 4183, 4184);	
+#define PgDatabaseToastTable 4183
+#define PgDatabaseToastIndex 4184
 DECLARE_TOAST(pg_db_ro

Re: missing toast table for pg_policy

2018-07-09 Thread Joe Conway
On 07/09/2018 09:16 PM, Michael Paquier wrote:
> On Mon, Jul 09, 2018 at 02:45:26PM +0200, Peter Eisentraut wrote:
>> On 15.06.18 21:15, Joe Conway wrote:
>>> Not surprising -- thanks for the update.
>>>
>>>> It occurred to be that we could go further and create most toast
>>>> tables automatically by taking advantage of the fact that the toast
>>>> creation function is a no-op if there are no varlena attributes. The
>>>> second patch (applies on top of the first) demonstrates a setup where
>>>> only shared and bootstrap catalogs need to have toast declarations
>>>> specified manually with fixed OIDs. It's possible this way is more
>>>> fragile, though.
>>>
>>> Hmmm, I'll have a look.
>>
>> Are you going to provide a new patch soon?  This commit fest item is
>> otherwise not moving forward.
> 
> I am a fan of this patch, so I'd like to help make things move on.  Joe,
> if you cannot provide a patch, do you mind if I begin to hack my way
> around?

If you can wait for the next commitfest (the original one I put the
patch into, i.e. September) I am committed to making it happen. But if
you are anxious to getting this into the current commitfest, go for it.
I am in the middle of a move from California to Florida and don't think
I will realistically have time this month.

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature


Re: postgres_fdw: perform UPDATE/DELETE .. RETURNING on a join directly

2018-03-05 Thread Joe Conway
On 03/05/2018 11:19 AM, Tom Lane wrote:
> Joe, I wonder if you could add "log_autovacuum_min_duration = 0" to
> rhinoceros' extra_config options, temporarily?  Correlating that log
> output with the log_statement output from the test proper would let
> us confirm or deny whether it's autovacuum.


Done just now. Do you want me to force a run?


Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature


Re: public schema default ACL

2018-03-03 Thread Joe Conway
On 03/03/2018 01:56 AM, Noah Misch wrote:
> Commit 5770172 ("Document security implications of search_path and the public
> schema.") is largely a workaround for the fact that the boot_val of
> search_path contains "public" while template0 gets "GRANT CREATE, USAGE ON
> SCHEMA public TO PUBLIC".  It's like having world-writable /usr/bin.  The
> security team opted not to change that in released branches, but we thought to
> revisit it later.  I propose, for v11, switching to "GRANT USAGE ON SCHEMA
> public TO PUBLIC" (omit CREATE).  Concerns?

+1. Doing this, or even revoking everything for schema public from
PUBLIC, is already common enough and good practice.

> If we do that alone, databases reaching v11 via dump/reload or pg_upgrade will
> get the new default ACL if they had not changed the ACL of schema public.  If
> they had GRANTed or REVOKEd on schema public, pg_dump will recreate the
> resulting ACL.  This is the standard pg_dump behavior for ACLs on system
> objects.  I think that's okay for the public schema, too, and I like
> preserving that usual rule.  However, if we wanted to minimize upgrade-time
> surprises, we could make pg_dump include GRANT for schema public
> unconditionally.  That way, the default ACL change would apply to new
> databases only.  Does anyone want to argue for that?

What about a pg_dump option to do that and then a big note in the
release notes telling people why they might want to use it?

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature


Re: postgres_fdw: perform UPDATE/DELETE .. RETURNING on a join directly

2018-03-05 Thread Joe Conway
On 03/05/2018 02:07 PM, Tom Lane wrote:
> So you can revert the rhinoceros config change if you like --- thanks
> for making it so quickly!

Ok, reverted.

> Meanwhile, I'm back to wondering what could possibly have affected
> the planner's estimates, if pg_proc and pg_statistic didn't change.
> I confess bafflement ... but we've now eliminated the autovacuum-
> did-it theory entirely, so it's time to start looking someplace else.
> I wonder if something in the postgres_fdw remote join machinery
> is not as deterministic as it should be.

Do you want me to do anything manual locally on this VM?

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature


Re: Commit fest 2018-09

2018-10-02 Thread Joe Conway
On 10/02/2018 10:13 AM, Tom Lane wrote:
> Michael Paquier  writes:
>> Thanks to all who participated in the patch review, authoring, and
>> everybody else who helped in making the different patches move forward.
> 
> Thanks for being CFM!  I know it's a lot of work ...

+10!

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature


Re: has_column_privilege behavior (was Re: Assert failed in snprintf.c)

2018-10-01 Thread Joe Conway
On 10/01/2018 02:41 PM, Stephen Frost wrote:
> Tom Lane (t...@sss.pgh.pa.us) wrote:
>> But it's not quite clear to me what we want the behavior for bad column
>> name to be.  A case could be made for either of:
>> 
>> * If either the table OID is bad, or the OID is OK but there's no such
>> column, return null.
>> 
>> * Return null for bad OID, but if it's OK, continue to throw error
>> for bad column name.
>> 
>> The second case seems weirdly inconsistent, but it might actually
>> be the most useful definition.  Not detecting a misspelled column
>> name is likely to draw complaints.

Seems you could make the same argument for not detecting a misspelled
table name for this and has_table_privilege...


> My inclination would be to make the function return NULL in any case
> where we can't find what the user is asking for (and to not throw an
> error in general).

+1

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature


Re: doc - add missing documentation for "acldefault"

2018-09-19 Thread Joe Conway
On 09/19/2018 10:54 AM, Tom Lane wrote:
> Joe Conway  writes:
>> * I do believe aclitemeq() has utility outside internal purposes.
> 
> Our normal policy is that we do not document functions that are meant to
> be invoked through operators.  The \df output saying that is sufficient:



> I would strongly object to ignoring that policy in just one place.

Ok, fair enough.

> Actually, it appears that most of these functions have associated
> operators:
> 
> # select oid::regoperator, oprcode from pg_operator where oprright = 
> 'aclitem'::regtype;
>   oid  |   oprcode   
> ---+-
>  +(aclitem[],aclitem)  | aclinsert
>  -(aclitem[],aclitem)  | aclremove
>  @>(aclitem[],aclitem) | aclcontains
>  =(aclitem,aclitem)| aclitemeq
>  ~(aclitem[],aclitem)  | aclcontains
> (5 rows)
> 
> So maybe what we really need is a table of operators not functions.

Good idea -- I will take a look at that.

> However, I don't object to documenting any function that has its
> own pg_description string.

Ok.

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature


Re: doc - add missing documentation for "acldefault"

2018-09-24 Thread Joe Conway
On 09/24/2018 10:01 AM, Tom Lane wrote:
> Joe Conway  writes:
>> Having seen none, committed/pushed. This did not seem worth
>> back-patching, so I only pushed to master.
> 
> I don't see anything on gitmaster?

Hmm, yes, interesting -- I must of messed up my local git repo somehow.
Will try again.

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature


Re: doc - add missing documentation for "acldefault"

2018-09-24 Thread Joe Conway
On 09/21/2018 01:51 PM, Joe Conway wrote:
> On 09/19/2018 11:18 AM, Joe Conway wrote:
>> On 09/19/2018 10:54 AM, Tom Lane wrote:
>>> So maybe what we really need is a table of operators not functions.
>> 
>> Good idea -- I will take a look at that.
>> 
>>> However, I don't object to documenting any function that has its
>>> own pg_description string.
> 
> Ok, so the attached version refactors/splits the group into two tables
> -- operators and functions.



> I also included John Naylor's patch with some minor editorialization.
> 
> Any further comments or complaints?

Having seen none, committed/pushed. This did not seem worth
back-patching, so I only pushed to master.

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature


Re: doc - add missing documentation for "acldefault"

2018-09-24 Thread Joe Conway
On 09/24/2018 10:09 AM, Joe Conway wrote:
> On 09/24/2018 10:01 AM, Tom Lane wrote:
>> Joe Conway  writes:
>>> Having seen none, committed/pushed. This did not seem worth
>>> back-patching, so I only pushed to master.
>> 
>> I don't see anything on gitmaster?
> 
> Hmm, yes, interesting -- I must of messed up my local git repo somehow.
> Will try again.

This time it seems to have worked. Sorry for the noise earlier :-/

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature


Re: doc - add missing documentation for "acldefault"

2018-09-19 Thread Joe Conway
On 08/03/2018 09:04 AM, Fabien COELHO wrote:
> Here is a version of the patch which documents briefly all aclitem-related 
> functions, in a separate table.

I claimed this patch for review and commit. Comments:
---
* There is a comment in src/backend/utils/adt/acl.c noting that
  acldefault is "not documented for general use" which needs adjustment

* It makes no sense to me to document purely internal functions, e.g.
  aclitemin/out. If we are going to do that we should do it universally,
  which is not true currently, and IMHO makes no sense anyway.

* I do believe aclitemeq() has utility outside internal purposes.

* The  section is incomplete

* Interestingly (since that is what started this thread apparently) I
  found myself questioning whether acldefault() is really worth
  documenting since the result will be misleading if the defaults have
  been altered via SQL.
---

Attached patch addresses those items and does a bit of reordering and
editorialization. If there are no objections I will commit it this way
in a day or two.

Thanks,

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 4331beb..cea674e 100644
*** a/doc/src/sgml/func.sgml
--- b/doc/src/sgml/func.sgml
*** SELECT has_function_privilege('joeuser',
*** 16893,16898 
--- 16893,17032 
  be specified by name or by OID.
 
  
+
+  shows functions to
+ manage the aclitem type, the internal representation of access
+ privileges.
+ An aclitem entry describes the permissions of a grantee,
+ whether they are grantable or not, and which grantor granted them.
+ For instance, calvin=r*w/hobbes tells that
+ role calvin has
+ grantable privilege SELECT (r*)
+ and non-grantable privilege UPDATE (w)
+ granted by role hobbes.
+ An empty grantee stands for PUBLIC.
+
+ 
+
+ aclitem Management Functions
+ 
+  
+   Name Return Type Description
+  
+  
+   
+acldefault(type,
+   ownerId)
+aclitem[]
+get the hardcoded default access privileges for an object belonging to ownerId
+   
+   
+aclinsert(aclitem[], aclitem)
+aclitem[]
+add element aclitem to aclitem[] array
+   
+   
+aclremove(aclitem[], aclitem)
+aclitem[]
+remove element aclitem from aclitem[] array
+   
+   
+aclitemeq(aclitem1, aclitem2)
+boolean
+test whether two aclitem elements are equal
+   
+   
+aclcontains(aclitem[], aclitem)
+boolean
+test whether element aclitem is contained within aclitem[] array
+   
+   
+aclexplode(aclitem[])
+setof record
+get aclitem array as tuples
+   
+   
+makeaclitem(grantee, grantor, privilege, grantable)
+aclitem
+build an aclitem from input
+   
+  
+ 
+
+ 
+
+ aclitem
+
+
+ acldefault
+
+
+ aclinsert
+
+
+ aclremove
+
+
+ aclitemeq
+
+
+ aclcontains
+
+
+ aclexplode
+
+
+ makeaclitem
+
+ 
+
+ acldefault returns the hardcoded default access privileges
+ for an object of type belonging to role ownerId.
+ Notice that these are used in the absence of any pg_default_acl
+ () entry. Default access privileges are described in
+  and can be overwritten with
+ . In other words, this function will return
+ results which may be misleading when the defaults have been overridden.
+ Type is a CHAR, use
+ 'c' for COLUMN,
+ 'r' for relation-like objects such as TABLE or VIEW,
+ 's' for SEQUENCE,
+ 'd' for DATABASE,
+ 'f' for FUNCTION or PROCEDURE,
+ 'l' for LANGUAGE,
+ 'L' for LARGE OBJECT,
+ 'n' for SCHEMA,
+ 't' for TABLESPACE,
+ 'F' for FOREIGN DATA WRAPPER,
+ 'S' for FOREIGN SERVER,
+ 'T' for TYPE or DOMAIN.
+
+ 
+
+ aclinsert and aclremove
+ allow to insertion/removal of a privilege described by an
+ aclitem into/from an array of aclitem.
+
+ 
+
+ aclitemeq checks for equality of two
+ aclitem elements.
+
+ 
+
+ aclcontains checks if an aclitem
+ element is present in an array of aclitem.
+
+ 
+
+ aclexplode returns an aclitem array
+ as a set rows. Output columns are grantor oid,
+ grantee oid (0 for PUBLIC),
+ granted privilege as text (SELECT, ...)
+ and whether the prilivege is grantable as boolean.
+ makeaclitem performs the inverse operation.
+
+ 

  shows functions that
 determine whether a certain object is visible in the
diff --git a/src/backend/utils/adt/acl.c b/src/backend/utils/adt/acl.c
index a45e093..d5285e2 100644
*** a/src/backend/utils/adt/acl.c
--- 

Re: doc - add missing documentation for "acldefault"

2018-09-20 Thread Joe Conway
On 09/19/2018 12:30 PM, John Naylor wrote:
> On 9/19/18, Tom Lane  wrote:
>> However, I don't object to documenting any function that has its
>> own pg_description string.
> 
> Speaking of, that description string seems to have been neglected.
> I've attached a remedy for that.

Thanks, will take care of it.

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature


Re: doc - add missing documentation for "acldefault"

2018-09-21 Thread Joe Conway
On 09/19/2018 11:18 AM, Joe Conway wrote:
> On 09/19/2018 10:54 AM, Tom Lane wrote:
>> Joe Conway  writes:
>>> * I do believe aclitemeq() has utility outside internal purposes.
>> 
>> Our normal policy is that we do not document functions that are meant to
>> be invoked through operators.  The \df output saying that is sufficient:
> 
> 
> 
>> I would strongly object to ignoring that policy in just one place.
> 
> Ok, fair enough.
> 
>> Actually, it appears that most of these functions have associated
>> operators:
>> 
>> # select oid::regoperator, oprcode from pg_operator where oprright = 
>> 'aclitem'::regtype;
>>   oid  |   oprcode   
>> ---+-
>>  +(aclitem[],aclitem)  | aclinsert
>>  -(aclitem[],aclitem)  | aclremove
>>  @>(aclitem[],aclitem) | aclcontains
>>  =(aclitem,aclitem)| aclitemeq
>>  ~(aclitem[],aclitem)  | aclcontains
>> (5 rows)
>> 
>> So maybe what we really need is a table of operators not functions.
> 
> Good idea -- I will take a look at that.
> 
>> However, I don't object to documenting any function that has its
>> own pg_description string.

Ok, so the attached version refactors/splits the group into two tables
-- operators and functions.

It drops aclinsert and aclremove entirely due to the fact that they no
longer do anything useful, to wit:
-
Datum
aclinsert(PG_FUNCTION_ARGS)
{
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 errmsg("aclinsert is no longer supported")));

PG_RETURN_NULL();   /* keep compiler quiet */
}

Datum
aclremove(PG_FUNCTION_ARGS)
{
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 errmsg("aclremove is no longer supported")));

PG_RETURN_NULL();   /* keep compiler quiet */
}
-

I also included John Naylor's patch with some minor editorialization.

Any further comments or complaints?

Thanks,

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 4331beb..8ebb1bf 100644
*** a/doc/src/sgml/func.sgml
--- b/doc/src/sgml/func.sgml
*** SELECT * FROM pg_ls_dir('.') WITH ORDINA
*** 15962,15968 
   
  
   
!   System Information Functions
  

  shows several
--- 15962,15968 
   
  
   
!   System Information Functions and Operators
  

  shows several
*** SELECT has_function_privilege('joeuser',
*** 16894,16899 
--- 16894,17034 
 
  

+ shows the operators
+available for the aclitem type, which is the internal
+representation of access privileges. An aclitem entry
+describes the permissions of a grantee, whether they are grantable
+or not, and which grantor granted them. For instance,
+calvin=r*w/hobbes specifies that the role
+calvin has the grantable privilege
+SELECT (r*) and the non-grantable
+privilege UPDATE (w), granted by
+the role hobbes. An empty grantee stands for
+PUBLIC.
+   
+ 
+
+ aclitem
+
+
+ acldefault
+
+
+ aclitemeq
+
+
+ aclcontains
+
+
+ aclexplode
+
+
+ makeaclitem
+
+ 
+ 
+  aclitem Operators
+  
+   
+
+ Operator
+ Description
+ Example
+ Result
+
+   
+   
+ 
+
+  = 
+ equal
+ 'calvin=r*w/hobbes'::aclitem = 'calvin=r*w*/hobbes'::aclitem
+ f
+
+ 
+
+  @ 
+ contains element
+ '{calvin=r*w/hobbes,hobbes=r*w*/postgres}'::aclitem[] @> 'calvin=r*w/hobbes'::aclitem
+ t
+
+ 
+
+  ~ 
+ contains element
+ '{calvin=r*w/hobbes,hobbes=r*w*/postgres}'::aclitem[] ~ 'calvin=r*w/hobbes'::aclitem
+ t
+
+ 
+   
+  
+ 
+ 
+
+  shows some additional
+ functions to manage the aclitem type.
+
+ 
+
+ aclitem Functions
+ 
+  
+   Name Return Type Description
+  
+  
+   
+acldefault(type,
+   ownerId)
+aclitem[]
+get the hardcoded default access privileges for an object belonging to ownerId
+   
+   
+aclexplode(aclitem[])
+setof record
+get aclitem array as tuples
+   
+   
+makeaclitem(grantee, grantor, privilege, grantable)
+aclitem
+build an aclitem from input
+   
+  
+ 
+
+ 
+
+ acldefault returns the hardcoded default access privileges
+ for an object of type belonging to role ownerId.
+ Notice that these are used in the absence of any pg_default_acl
+ ()

Re: pg_config wrongly marked as not parallel safe?

2018-11-30 Thread Joe Conway
On 11/30/18 3:30 AM, Kyotaro HORIGUCHI wrote:
> # And returning to the topic, I vote for pg_config should be "stable".

And on that note, Does this change does warrant backpatching, or should
be applied to master only?

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature


Re: pg_config wrongly marked as not parallel safe?

2018-11-26 Thread Joe Conway
On 11/26/18 7:08 PM, Andres Freund wrote:
> On 2018-11-26 19:04:46 -0500, Joe Conway wrote:
>> Not intentional. Though, sitting here chatting with Stephen about it, I
>> am now wondering if pg_config() should actually be marked immutable:
>>
>> select * from pg_config() where name = 'VERSION';
>>   name   | setting
>> -+-
>>  VERSION | PostgreSQL 10.5
>> (1 row)
>>
>> [...upgrade the postgres binaries...]
>>
>> select * from pg_config() where name = 'VERSION';
>>   name   | setting
>> -+-
>>  VERSION | PostgreSQL 10.6
>> (1 row)
>>
>> So the correct answer is probably to mark pg_config() stable, but it
>> still seems to be parallel safe to me.
> 
> I don't think we should consider immutability to mean anything across
> major versions. What'd be helped by doing that? We'd have to rule out
> any behaviour change to any immutable function for that to make
> sense. Including making an immutable function not immutable anymore.

Umm, this is a minor version not major.

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



Re: pg_config wrongly marked as not parallel safe?

2018-11-26 Thread Joe Conway
On 11/26/18 6:45 PM, Andres Freund wrote:
> Hi,
> 
> Triggered by the thread at [1] I looked for functions marked as
> immutable but not parallel safe.
> 
> postgres[19492][1]=# SELECT oid::regprocedure, provolatile, proparallel FROM 
> pg_proc WHERE provolatile = 'i' AND proparallel != 's';
> ┌─┬─┬─┐
> │ oid │ provolatile │ proparallel │
> ├─┼─┼─┤
> │ pg_config() │ i   │ r   │
> └─┴─┴─┘
> (1 row)
> 
> # pg_config
> { oid => '3400', descr => 'pg_config binary as a function',
>   proname => 'pg_config', prorows => '23', proretset => 't', proparallel => 
> 'r',
>   prorettype => 'record', proargtypes => '', proallargtypes => '{text,text}',
>   proargmodes => '{o,o}', proargnames => '{name,setting}',
>   prosrc => 'pg_config' },
> 
> so that function is marked as immutable but not parallel safe, without
> an explanation for that odd combination.
> 
> Now obviously I don't think it practially matters for pg_config(), but
> it seems unnecessarily confusing as a general matter.
> 
> I think we probably should fix this specific case, and then add a check
> to opr_sanity.sql or such.
> 
> As far as I can tell pg_config() was marked as such since its addition
> in [2]. Joe, I assume this wasn't intentional?

Not intentional. Though, sitting here chatting with Stephen about it, I
am now wondering if pg_config() should actually be marked immutable:

select * from pg_config() where name = 'VERSION';
  name   | setting
-+-
 VERSION | PostgreSQL 10.5
(1 row)

[...upgrade the postgres binaries...]

select * from pg_config() where name = 'VERSION';
  name   | setting
-+-
 VERSION | PostgreSQL 10.6
(1 row)

So the correct answer is probably to mark pg_config() stable, but it
still seems to be parallel safe to me.

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



Re: PostgreSQL pollutes the file system

2019-03-29 Thread Joe Conway
On 3/29/19 11:44 AM, Daniel Gustafsson wrote:
> On Friday, March 29, 2019 4:41 PM, Tom Lane  wrote:
> 
>> Christoph Berg m...@debian.org writes:
>>
>> > What might possibly make sense is to add options to psql to
>> > facilitate common tasks:
>>
>> > psql --createdb foo
>> > psql --createuser bar --superuser
>> > psql --reindex foo
>>
>> That's a thought. Or perhaps better, allow pg_ctl to grow new
>> subcommands for those tasks?
> 
> +1 on using pg_ctl rather than psql, should we go down this path.


Agreed -- another +1 here

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development




Re: [PATCH v20] GSSAPI encryption support

2019-04-02 Thread Joe Conway
On 4/2/19 6:18 PM, Stephen Frost wrote:
> Greetings,
> 
> On Tue, Apr 2, 2019 at 18:10 Peter Eisentraut
>  > wrote:
> 
> On 2019-02-23 17:27, Stephen Frost wrote:
> >> About pg_hba.conf: The "hostgss" keyword seems a bit confusing. 
> It only
> >> applies to encrypted gss-using connections, not all of them.  Maybe
> >> "hostgssenc" or "hostgsswrap"?
> > Not quite sure what you mean here, but 'hostgss' seems to be quite
> well
> > in-line with what we do for SSL...  as in, we have 'hostssl', we don't
> > say 'hostsslenc'.  I feel like I'm just not understanding what you
> mean
> > by "not all of them".
> 
> Reading the latest patch, I think this is still a bit confusing.
> Consider an entry like
> 
>     hostgss all             all             0.0.0.0/0
>                gss
> 
> The "hostgss" part means, the connection is GSS-*encrypted*.  The "gss"
> entry in the last column means use gss for *authentication*.  But didn't
> "hostgss" already imply that?  No.  I understand what's going on, but it
> seems quite confusing.  They both just say "gss"; you have to know a lot
> about the nuances of pg_hba.conf processing to get that.
> 
> If you have line like
> 
>     hostgss all             all             0.0.0.0/0
>                md5
> 
> it is not obvious that this means, if GSS-encrypted, use md5.  It could
> just as well mean, if GSS-authenticated, use md5.
> 
> The analogy with SSL is such that we use "hostssl" for connections using
> SSL encryption and "cert" for the authentication method.  So there we
> use two different words for two different aspects of SSL.
> 
> 
> I don’t view it as confusing, but I’ll change it to hostgssenc as was
> suggested earlier to address that concern.  It’s a bit wordy but if it
> helps reduce confusion then that’s a good thing.

Personally I don't find it as confusing as is either, and I find hostgss
to be a good analog of hostssl. On the other hand hostgssenc is long and
unintuitive. So +1 for leaving as is and -1 one for changing it IMHO.

Joe
-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development




Re: PostgreSQL pollutes the file system

2019-03-29 Thread Joe Conway
On 3/29/19 3:43 PM, Christoph Berg wrote:
> Re: Joe Conway 2019-03-29 <48e5efaf-7ea2-ed70-a803-949bbfec8...@joeconway.com>
>> echo "\password :\"role\"" | psql -v role='my role'
>> Enter new password:
>> Enter it again:
>> 
>> That said, this is kind of off the topic of this thread.
> 
> It is on-topic because the reason we can't just tell people to replace
>   createuser $foo
> with
>   psql -c "create user $foo"
> is because $foo might need escaping.
> 
> IMHO if we find an way to do that which is acceptable for sh scripts,
> the createuser/... commands could go.

I think these commands *were* once (at least some of them) shell scripts
and we went to executable C in order to make them work on Windows, IIRC.

>> I like Tom's last suggestion of:
>> 
>>   pg_util  
>> 
>> Of course that does not lend itself to symlinking for backward
>> compatibility, does it? If there is a way I am not familiar with it.
> 
> We could symlink createuser -> pg_util. It is pretty common for
> commands to act differently based on the name the were invoked as.

Yeah, I forgot about that. Does that also go for Windows?

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development




Re: PostgreSQL pollutes the file system

2019-03-29 Thread Joe Conway
On 3/29/19 3:01 PM, Pavel Stehule wrote:
> But psql has safe escaping via :"xxx" notation. So some like
> 
> psql -c 'create role :"role"' -v role='my role' ...
> 
> But what I know the psql variables are not evaluated for -c query

You can do this:
echo "create role :\"role\"" | psql -v role='my role'
CREATE ROLE

echo "\password :\"role\"" | psql -v role='my role'
Enter new password:
Enter it again:

That said, this is kind of off the topic of this thread.
I like Tom's last suggestion of:

  pg_util  

Of course that does not lend itself to symlinking for backward
compatibility, does it? If there is a way I am not familiar with it.

I guess the alternative would be an alias, but can packages install an
alias? Or something else I am not thinking about?

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development




Re: crosstab/repivot...any interest?

2019-02-26 Thread Joe Conway
On 2/25/19 8:34 PM, Merlin Moncure wrote:
> No worries, sir! Apologies on the late reply.  I've made some headway on
> this item.  Waiting for postgres to implement the SQL standard pivoting
> (regardless if it implements the cases I need) is not an option for my
> personal case. I can't use the SQL approach either as it's very slow and
> imposing some scaling limits that need to work around in the short run.
> 
> My strategy is to borrow [steal] from crosstab_hash and make a new
> version called repivot which takes an arleady pivoted data set and
> repivots it against an identified column.   Most of the code can be
> shared with tablefunc so ideally this could live as an amendment to that
> extension.  That might not happen though, so I'm going to package it as
> a separate extension (removing the majority of tablefunc that I don't
> need) and submit it to this group for consideration.


I can't promise there will be consensus to add to tablefunc, but I am
not opposed and will be happy to try to help you make that happen to the
extent I can find the spare cycles.

> If we punt, it'll end up as a private extension or live the life of an
> Orphan in pgxn.  If there's some interest here, we can consider a new
> contrib extension (which I personally rate very unlikely) or recasting
> as an extra routine to tablefunc.  Any way we slice it, huge thanks to
> Joe Conway for giving us such an awesome function to work with all
> these years (not to mention the strategic plr language).  SRF crosstab()
> is still somewhat baroque, but it still fills a niche that nothing else
> implements.
> 
> The interface I'm looking at is:
> SELECT repivot(
>   query TEXT,
>   static_attributes INT,  /* number of static attributes that are
> unchanged around key; we need this in our usages */
>   attribute_query  TEXT); /* query that draws up the pivoted attribute
> list */
> 
> The driving query is expected to return 0+ static attributes which are
> essentially simply pasted to the output. The next two columns are the
> key column and the pivoting column.   So if you had three attributes,
> the input set would be:
> 
> a1, a2, a3, k1, p, v1...vn
> 
> Where the coordinates v and p would exchange.  I need to get this done
> quickly and so am trying to avoid more abstracted designs (maybe multi
> part keys should be supported through...this is big limitation of
> crosstab albeit with some obnoxious work arounds).

Perhaps not enough coffee yet, but I am not sure I fully grok the design
here. A fully fleshed out example would be useful.

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature


Re: get_controlfile() can leak fds in the backend

2019-02-28 Thread Joe Conway
On 2/27/19 7:54 PM, Michael Paquier wrote:
> On Wed, Feb 27, 2019 at 07:45:11PM -0500, Joe Conway wrote:
>> It seems to me that OpenTransientFile() is more appropriate. Patch done
>> that way attached.
> 
> Works for me, thanks for sending a patch!  While on it, could you
> clean up the comment on top of get_controlfile()?  "char" is mentioned
> instead of "const char" for DataDir which is incorrect.  I would
> remove the complete set of arguments from the description and just
> keep the routine name.

Sure, will do. What are your thoughts on backpatching? This seems
unlikely to be a practical concern in the field, so my inclination is a
master only fix.

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature


Re: get_controlfile() can leak fds in the backend

2019-02-27 Thread Joe Conway
On 2/27/19 2:47 AM, Michael Paquier wrote:
> Hi all,
> (CC-ing Joe as of dc7d70e)
> 
> I was just looking at the offline checksum patch, and noticed some
> sloppy coding in controldata_utils.c.  The control file gets opened in
> get_controlfile(), and if it generates an error then the file
> descriptor remains open.  As basic code rule in the backend we should
> only use BasicOpenFile() when opening files, so I think that the issue
> should be fixed as attached, even if this requires including fd.h for
> the backend compilation which is kind of ugly.
> 
> Another possibility would be to just close the file descriptor before
> any error, saving appropriately errno before issuing any %m portion,
> but that still does not respect the backend policy regarding open().

In fd.c I see:
8<
* AllocateFile, AllocateDir, OpenPipeStream and OpenTransientFile are
* wrappers around fopen(3), opendir(3), popen(3) and open(2),
* respectively. They behave like the corresponding native functions,
* except that the handle is registered with the current subtransaction,
* and will be automatically closed at abort. These are intended mainly
* for short operations like reading a configuration file; there is a
* limit on the number of files that can be opened using these functions
* at any one time.
*
* Finally, BasicOpenFile is just a thin wrapper around open() that can
* release file descriptors in use by the virtual file descriptors if
* necessary. There is no automatic cleanup of file descriptors returned
* by BasicOpenFile, it is solely the caller's responsibility to close
* the file descriptor by calling close(2).
8<

According to that comment BasicOpenFile does not seem to solve the issue
you are pointing out (leaking of file descriptor on ERROR). Perhaps
OpenTransientFile() is more appropriate? I am on the road at the moment
so have not looked very deeply at this yet though.

Joe
-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature


Re: Row Level Security − leakproof-ness and performance implications

2019-02-28 Thread Joe Conway
On 2/28/19 11:50 AM, Robert Haas wrote:
> On Thu, Feb 28, 2019 at 11:44 AM Joe Conway  wrote:
>> No, and Tom stated as much too, but life is all about tradeoffs. Some
>> people will find this an acceptable compromise. For those that don't
>> they don't have to use it. IMHO we tend toward too much nannyism too often.
> 
> Well, I agree with that, too.
> 
> Hmm.  I don't think there's anything preventing you from implementing
> this in "userspace," is there?  A logging hook could suppress all
> error message text, and you could just mark all functions leakproof
> after that, and you'd have this exact behavior in an existing release
> with no core code changes, I think.

I think that would affect the server logs too, no? Worth thinking about
though...

Also manually marking all functions leakproof is far less convenient
than turning off the check as this patch effectively allows. You would
want to keep track of the initial condition and be able to restore it if
needed. Doable but much uglier. Perhaps we could tolerate a hook that
would allow an extension to do this though?

> If you do that, or just stick this patch into your own distro, I would
> be interested to hear some experiences from customers (and those who
> support them) after some time had gone by.  I find it hard to imagine
> delivering customer support in an environment configured this way, but
> sometimes my imagination is limited.

Again, remember that the actual messages are available in the server
logs. The presumption is that the server logs are kept secure, and it is
ok to leak information into them. How the customer does or does not
decide to pass some of that information on to a support group becomes a
problem to deal with on a case by case basis.

Also, as mentioned up-thread, in many cases there is or should be a
non-production instance available to use for reproducing problems to
debug them. Presumably the data on such a system is faked or has already
been cleaned up for a wider audience.

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature


Re: Row Level Security − leakproof-ness and performance implications

2019-02-28 Thread Joe Conway
On 2/28/19 12:28 PM, Robert Haas wrote:
> Mmmph.  If your customers always have a non-production instance where
> problems from production can be easily reproduced, your customers are
> not much like our customers.

Well I certainly did not mean to imply that this is always the case ;-)

But I think it is fair to tell customers that have these tradeoffs in
front of them that it would be even more wise in the case they decided
to use this capability.

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



Re: Row Level Security − leakproof-ness and performance implications

2019-02-28 Thread Joe Conway
On 2/28/19 11:03 AM, Joshua Brindle wrote:
> On Thu, Feb 28, 2019 at 10:49 AM Tom Lane  wrote:
>>
>> Joshua Brindle  writes:
>> > On Thu, Feb 28, 2019 at 9:12 AM Robert Haas  wrote:
>> >> So... you're just going to replace ALL error messages of any kind with
>> >> "ERROR: missing error text" when this option is enabled?  That sounds
>> >> unusable.  I mean if I'm reading it right this would get not only
>> >> messages from SQL-callable functions but also things like "deadlock
>> >> detected" and "could not read block %u in file %s" and "database is
>> >> not accepting commands to avoid wraparound data loss in database with
>> >> OID %u".  You can't even shut it off conveniently, because the way
>>
>> > This makes complete sense to me. The client side of a client/server
>> > protocol doesn't have any way to fix 'could not read block %u in file
>> > %s', the client doesn't need that kind of detailed information about a
>> > server, and in fact that information could be security sensitive.
>>
>> I agree with Robert that this idea is a nonstarter.  Not only is it
>> a complete disaster from a usability standpoint, but *it does not
>> fix the problem*.  The mere fact that an error occurred, or didn't,
>> is already an information leak.  Sure, you can only extract one bit
>> per query that way, but a slow leak is still a leak.  See the Spectre
>> vulnerability for a pretty exact parallel.
> 
> How is leakproof defined WRT Postgres? Generally speaking a 1 bit
> error path would be considered a covert channel on most systems and
> are relatively slow even compared to e.g., timing channels.

Yes, I am pretty sure there are plenty of very security conscious
environments that would be willing to make this tradeoff in order to get
reliable RLS performance.

> Redacting error information, outside of the global leakproof setting,
> seems useful to prevent data leakage to a client on another system,
> such as Robert's example above "could not read block %u in file %s".

True

> Although, and Joe may hate me for saying this, I think only the
> non-constants should be redacted to keep some level of usability for
> regular SQL errors. Maybe system errors like the above should be
> removed from client messages in general.

I started down this path and it looked fragile. I guess if there is
generally enough support to think this might be viable I could open up
that door again, but I don't want to waste time if the approach is
really a non-starter as stated upthread :-/.

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature


Re: Row Level Security − leakproof-ness and performance implications

2019-02-28 Thread Joe Conway
On 2/28/19 9:12 AM, Robert Haas wrote:
> On Wed, Feb 27, 2019 at 6:03 PM Joe Conway  wrote:
>> Patch for discussion attached.
> 
> So... you're just going to replace ALL error messages of any kind with
> "ERROR: missing error text" when this option is enabled?  That sounds
> unusable.  I mean if I'm reading it right this would get not only
> messages from SQL-callable functions but also things like "deadlock
> detected" and "could not read block %u in file %s" and "database is
> not accepting commands to avoid wraparound data loss in database with
> OID %u".  You can't even shut it off conveniently, because the way
> you've designed it it has to be PGC_POSTMASTER to avoid TOCTTOU
> vulnerabilities.  Maybe I'm misreading the patch?

You have it correct.

I completely disagree that is is unusable though. The way I envision
this is that you enable force_leakproof on your development machine
without suppress_client_messages being turned on. Do your debugging there.

On production, both are turned on. You still get full unredacted
messages in your pg log. The client on a prod system does not need these
details. If you *really* need to, you can restart to turn it on for a
short while on prod, but hopefully you have a non prod system where you
reproduce issues for debugging anyway.

I am not married to making this only changeable via restart though --
that's why I posted the patch for discussion. Perhaps a superuserset
would be better so debugging could be done on one session only on the
prod machine.

> I don't think it would be crazy to have a mode where we try to redact
> the particular error messages that might leak information, but I think
> we'd need to make it only those.  A wild idea might be to let
> proleakproof take on three values: yes, no, and maybe.  When 'maybe'
> functions are involved, we tell them whether or not the current query
> involves any security barriers, and if so they self-censor.

Again, I disagree. See above -- you have all you need in the server logs.

Joe
-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature


Re: Row Level Security − leakproof-ness and performance implications

2019-02-28 Thread Joe Conway
On 2/28/19 11:37 AM, Robert Haas wrote:
> On Thu, Feb 28, 2019 at 11:14 AM Joe Conway  wrote:
>> > Although, and Joe may hate me for saying this, I think only the
>> > non-constants should be redacted to keep some level of usability for
>> > regular SQL errors. Maybe system errors like the above should be
>> > removed from client messages in general.
>>
>> I started down this path and it looked fragile. I guess if there is
>> generally enough support to think this might be viable I could open up
>> that door again, but I don't want to waste time if the approach is
>> really a non-starter as stated upthread :-/.
> 
> Hmm.  It seems to me that if there's a function that sometimes throws
> an error and other times does not, and if that behavior is dependent
> on the input, then even redacting the error message down to 'ERROR:
> error' does not remove the leak.  So it seems to me that regardless of
> what one thinks about the proposal from a usability perspective, it's
> probably not correct from a security standpoint.  Information that
> couldn't be leaked until present rules would leak with this change,
> when the new GUCs were turned on.
> 
> Am I wrong?


No, and Tom stated as much too, but life is all about tradeoffs. Some
people will find this an acceptable compromise. For those that don't
they don't have to use it. IMHO we tend toward too much nannyism too often.

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature


Re: get_controlfile() can leak fds in the backend

2019-02-28 Thread Joe Conway
On 2/28/19 7:20 AM, Michael Paquier wrote:
> On Thu, Feb 28, 2019 at 07:11:04AM -0500, Joe Conway wrote:
>> Sure, will do. What are your thoughts on backpatching? This seems
>> unlikely to be a practical concern in the field, so my inclination is a
>> master only fix.
> 
> I agree that this would unlikely become an issue as an error on the
> control file would most likely be a PANIC when updating it, so fixing
> only HEAD sounds fine to me.  Thanks for asking!


Committed and push that way.

By the way, while looking at this, I noted at least a couple of places
where OpenTransientFile() is being passed O_RDWR when the usage is
pretty clearly intended to be read-only. For example at least two
instances in slru.c -- SlruPhysicalReadPage() and
SimpleLruDoesPhysicalPageExist(). Is it worth while searching for and
fixing those instances?

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature


Re: Tighten error control for OpenTransientFile/CloseTransientFile

2019-03-01 Thread Joe Conway
On 2/28/19 9:33 PM, Michael Paquier wrote:
> Hi all,
> 
> Joe's message here has reminded me that we have lacked a lot of error
> handling around CloseTransientFile():
> https://www.postgresql.org/message-id/c49b69ec-e2f7-ff33-4f17-0eaa4f2ce...@joeconway.com
> 
> This has been mentioned by Alvaro a couple of months ago (cannot find
> the thread about that at quick glance), and I just forgot about it at
> that time.  Anyway, attached is a patch to do some cleanup for all
> that:
> - Switch OpenTransientFile to read-only where sufficient.
> - Add more error handling for CloseTransientFile
> A major take of this patch is to make sure that the new error messages
> generated have an elevel consistent with their neighbors.
> 
> Just on time for this last CF.  Thoughts?

Seems like it would be better to modify the arguments to
CloseTransientFile() to include the filename being closed, errorlevel,
and fail_on_error or something similar. Then all the repeated ereport
stanzas could be eliminated.

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature


Re: Row Level Security − leakproof-ness and performance implications

2019-02-20 Thread Joe Conway
On 2/19/19 6:43 PM, Laurenz Albe wrote:
> Pierre Ducroquet wrote:
>> if we had a 'RLS-enabled' context on functions, a way to make a lot of built-
>> in functions leakproof would simply be to prevent them from logging errors 
>> containing values.
>> 
>> For others, like arraycontains, it's much trickier :[...]
> 
> I feel a little out of my depth here, so I won't comment.

I had more-or-less the same idea, and even did a PoC patch (not
preventing the log entries but rather redacting the variable data), but
after discussing offlist with some other hackers I got the impression
that it would be a really hard sell.

It does seem to me though that such a feature would be pretty useful.
Something like use a GUC to turn it on, and while on log messages get
redacted. If you really needed to see the details, you could either
duplicate your issue on another installation with the redaction turned
off, or maybe turn it off on production in a controlled manner just long
enough to capture the full error message.

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature


Re: list append syntax for postgresql.conf

2019-02-20 Thread Joe Conway
On 2/20/19 10:15 AM, Peter Eisentraut wrote:
> Nowadays there are a number of methods for composing multiple
> postgresql.conf files for modularity.  But if you have a bunch of things
> you want to load via shared_preload_libraries, you have to put them all
> in one setting.  How about some kind of syntax for appending something
> to a list, like
> 
> shared_preload_libraries += 'pg_stat_statements'
> 
> Thoughts?


I like the idea, but presume it would apply to any GUC list, not just
shared_preload_libraries?

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature


Re: BUG #15646: Inconsistent behavior for current_setting/set_config

2019-02-20 Thread Joe Conway
On 2/20/19 12:11 PM, Tom Lane wrote:
> Joe Conway  writes:
>> On 2/20/19 11:10 AM, PG Bug reporting form wrote:
>>> But current behavior returns empty string instead of NULL (the initial
>>> value) after transaction is rolled back. When I restart session, NULL is
>>> returned again as it is expected.
> 
>> This has been discussed before and dismissed:
>> https://www.postgresql.org/message-id/flat/56842412.505%40joeconway.com
>> Personally I agree it is a bug, but I am not sure you will get much
>> support for that position.
> 
> The fact that we allow undeclared user-defined GUCs at all is a bug IMO.
> We need to find a way to replace that behavior with something whereby
> the name and type of a parameter are declared up-front before you can
> set it.

(moving to hackers)

Perhaps we could do something like:

1. If the user-defined GUC is defined in postgresql.conf, et al, same
   behavior as now
2. Backward compatibility concerns would be an issue, so create another
   new GUC declare_custom_settings which initially defaults to false.
3. If declare_custom_settings is true, and the user-defined GUC is not
   defined in postgresql.conf, then in order to create it dynamically
   via SET or similar methods, you must do something like:

  CREATE SETTING name TYPE guctype [LIST];
  SET name TO value;


Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature


oddity with ALTER ROLE/USER

2019-02-22 Thread Joe Conway
I noticed that ALTER ROLE/USER succeeds even when called without any
options:

postgres=# alter user foo;
ALTER ROLE
postgres=# alter role foo;
ALTER ROLE
postgres=# alter group foo;
ERROR:  syntax error at or near ";"
LINE 1: alter group foo;

That seems odd, does nothing useful, and is inconsistent with, for
example, ALTER GROUP as shown above.

Proposed patch attached.

Comments/thoughts?

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development
diff --git a/src/backend/commands/user.c b/src/backend/commands/user.c
index 8340948..a71217e 100644
*** a/src/backend/commands/user.c
--- b/src/backend/commands/user.c
*** AlterRole(AlterRoleStmt *stmt)
*** 549,554 
--- 549,559 
  	check_rolespec_name(stmt->role,
  		"Cannot alter reserved roles.");
  
+ 	if (list_length(stmt->options) == 0)
+ 		ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+  errmsg("no options specified")));
+ 
  	/* Extract options from the statement node tree */
  	foreach(option, stmt->options)
  	{


signature.asc
Description: OpenPGP digital signature


Re: oddity with ALTER ROLE/USER

2019-02-22 Thread Joe Conway
On 2/22/19 4:19 PM, Tom Lane wrote:
> Joe Conway  writes:
>> I noticed that ALTER ROLE/USER succeeds even when called without any
>> options:
> 
>> postgres=# alter user foo;
>> ALTER ROLE
>> postgres=# alter role foo;
>> ALTER ROLE
>> postgres=# alter group foo;
>> ERROR:  syntax error at or near ";"
>> LINE 1: alter group foo;
> 
>> That seems odd, does nothing useful, and is inconsistent with, for
>> example, ALTER GROUP as shown above.
> 
>> Proposed patch attached.
> 
> If you want to make it act like alter group, why not make it act
> like alter group?  That is, the way to fix this is to change the
> grammar so that AlterOptRoleList doesn't permit an expansion with
> zero list elements.


I considered that but liked the more specific error message.

> Having said that, I can't get excited about changing this at all.
> Nobody will thank us for it, and someone might complain.


The other route is change the documentation to reflect reality I guess.

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature


Re: Should we increase the default vacuum_cost_limit?

2019-02-25 Thread Joe Conway
On 2/25/19 1:17 AM, Peter Geoghegan wrote:
> On Sun, Feb 24, 2019 at 9:42 PM David Rowley
>  wrote:
>> The current default vacuum_cost_limit of 200 seems to be 15 years old
>> and was added in f425b605f4e.
>>
>> Any supporters for raising the default?
> 
> I also think that the current default limit is far too conservative.

I agree entirely. In my experience you are usually much better off if
vacuum finishes quickly. Personally I think our default scale factors
are horrible too, especially when there are tables with large numbers of
rows.

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



Re: get_controlfile() can leak fds in the backend

2019-02-27 Thread Joe Conway
On 2/27/19 10:26 AM, Joe Conway wrote:
> On 2/27/19 2:47 AM, Michael Paquier wrote:
>> Hi all,
>> (CC-ing Joe as of dc7d70e)

> According to that comment BasicOpenFile does not seem to solve the issue
> you are pointing out (leaking of file descriptor on ERROR). Perhaps
> OpenTransientFile() is more appropriate? I am on the road at the moment
> so have not looked very deeply at this yet though.


It seems to me that OpenTransientFile() is more appropriate. Patch done
that way attached.

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development
diff --git a/src/common/controldata_utils.c b/src/common/controldata_utils.c
index abfe706..c3b1934 100644
*** a/src/common/controldata_utils.c
--- b/src/common/controldata_utils.c
***
*** 27,32 
--- 27,35 
  #include "catalog/pg_control.h"
  #include "common/controldata_utils.h"
  #include "port/pg_crc32c.h"
+ #ifndef FRONTEND
+ #include "storage/fd.h"
+ #endif
  
  /*
   * get_controlfile(char *DataDir, const char *progname, bool *crc_ok_p)
*** get_controlfile(const char *DataDir, con
*** 51,63 
  	ControlFile = palloc(sizeof(ControlFileData));
  	snprintf(ControlFilePath, MAXPGPATH, "%s/global/pg_control", DataDir);
  
- 	if ((fd = open(ControlFilePath, O_RDONLY | PG_BINARY, 0)) == -1)
  #ifndef FRONTEND
  		ereport(ERROR,
  (errcode_for_file_access(),
   errmsg("could not open file \"%s\" for reading: %m",
  		ControlFilePath)));
  #else
  	{
  		fprintf(stderr, _("%s: could not open file \"%s\" for reading: %s\n"),
  progname, ControlFilePath, strerror(errno));
--- 54,67 
  	ControlFile = palloc(sizeof(ControlFileData));
  	snprintf(ControlFilePath, MAXPGPATH, "%s/global/pg_control", DataDir);
  
  #ifndef FRONTEND
+ 	if ((fd = OpenTransientFile(ControlFilePath, O_RDONLY | PG_BINARY)) == -1)
  		ereport(ERROR,
  (errcode_for_file_access(),
   errmsg("could not open file \"%s\" for reading: %m",
  		ControlFilePath)));
  #else
+ 	if ((fd = open(ControlFilePath, O_RDONLY | PG_BINARY, 0)) == -1)
  	{
  		fprintf(stderr, _("%s: could not open file \"%s\" for reading: %s\n"),
  progname, ControlFilePath, strerror(errno));
*** get_controlfile(const char *DataDir, con
*** 95,101 
--- 99,109 
  #endif
  	}
  
+ #ifndef FRONTEND
+ 	CloseTransientFile(fd);
+ #else
  	close(fd);
+ #endif
  
  	/* Check the CRC. */
  	INIT_CRC32C(crc);


signature.asc
Description: OpenPGP digital signature


Re: Row Level Security − leakproof-ness and performance implications

2019-02-27 Thread Joe Conway
On 2/20/19 11:24 AM, Tom Lane wrote:
> Pierre Ducroquet  writes:
>> For simple functions like enum_eq/enum_ne, marking them leakproof is an 
>> obvious fix (patch attached to this email, including also textin/textout).
> 
> This is not nearly as "obvious" as you think.  See prior discussions,
> notably
> https://www.postgresql.org/message-id/flat/31042.1546194242%40sss.pgh.pa.us
> 
> Up to now we've taken a very strict definition of what leakproofness
> means; as Noah stated, if a function can throw errors for some inputs,
> it's not considered leakproof, even if those inputs should never be
> encountered in practice.  Most of the things we've been willing to
> mark leakproof are straight-line code that demonstrably can't throw
> any errors at all.
> 
> The previous thread seemed to have consensus that the hazards in
> text_cmp and friends were narrow enough that nobody had a practical
> problem with marking them leakproof --- but we couldn't define an
> objective policy statement that would allow making such decisions,
> so nothing's been changed as yet.  I think it is important to have
> an articulable policy about this, not just a seat-of-the-pants
> conclusion about the risk level in a particular function.

What if we provided an option to redact all client messages (leaving
logged messages as-is). Separately we could provide a GUC to force all
functions to be resolved as leakproof. Depending on your requirements,
having both options turned on could be perfectly acceptable.

Patch for discussion attached.

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development
diff --git a/src/backend/utils/cache/lsyscache.c b/src/backend/utils/cache/lsyscache.c
index e88c45d..8c32d6c 100644
*** a/src/backend/utils/cache/lsyscache.c
--- b/src/backend/utils/cache/lsyscache.c
***
*** 46,51 
--- 46,54 
  #include "utils/syscache.h"
  #include "utils/typcache.h"
  
+ /* GUC to force functions to be treated as leakproof */
+ extern bool		force_leakproof;
+ 
  /* Hook for plugins to get control in get_attavgwidth() */
  get_attavgwidth_hook_type get_attavgwidth_hook = NULL;
  
*** get_func_leakproof(Oid funcid)
*** 1595,1600 
--- 1598,1610 
  	HeapTuple	tp;
  	bool		result;
  
+ 	/*
+ 	 * If client message suppression was requested,
+ 	 * treat all functions as leakproof
+ 	 */
+ 	if (force_leakproof)
+ 		return true;
+ 
  	tp = SearchSysCache1(PROCOID, ObjectIdGetDatum(funcid));
  	if (!HeapTupleIsValid(tp))
  		elog(ERROR, "cache lookup failed for function %u", funcid);
diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c
index 8b4720e..9941756 100644
*** a/src/backend/utils/error/elog.c
--- b/src/backend/utils/error/elog.c
*** int			Log_destination = LOG_DESTINATION_
*** 107,112 
--- 107,113 
  char	   *Log_destination_string = NULL;
  bool		syslog_sequence_numbers = true;
  bool		syslog_split_messages = true;
+ bool		suppress_client_messages = false;
  
  #ifdef HAVE_SYSLOG
  
*** send_message_to_frontend(ErrorData *edat
*** 3166,3189 
  
  		/* M field is required per protocol, so always send something */
  		pq_sendbyte(, PG_DIAG_MESSAGE_PRIMARY);
- 		if (edata->message)
- 			err_sendstring(, edata->message);
- 		else
- 			err_sendstring(, _("missing error text"));
  
! 		if (edata->detail)
  		{
! 			pq_sendbyte(, PG_DIAG_MESSAGE_DETAIL);
! 			err_sendstring(, edata->detail);
! 		}
  
! 		/* detail_log is intentionally not used here */
  
! 		if (edata->hint)
! 		{
! 			pq_sendbyte(, PG_DIAG_MESSAGE_HINT);
! 			err_sendstring(, edata->hint);
  		}
  
  		if (edata->context)
  		{
--- 3167,3197 
  
  		/* M field is required per protocol, so always send something */
  		pq_sendbyte(, PG_DIAG_MESSAGE_PRIMARY);
  
! 		/* skip sending if requested */
! 		if (!suppress_client_messages)
  		{
! 			if (edata->message)
! err_sendstring(, edata->message);
! 			else
! err_sendstring(, _("missing error text"));
  
! 			if (edata->detail)
! 			{
! pq_sendbyte(, PG_DIAG_MESSAGE_DETAIL);
! err_sendstring(, edata->detail);
! 			}
  
! 			/* detail_log is intentionally not used here */
! 
! 			if (edata->hint)
! 			{
! pq_sendbyte(, PG_DIAG_MESSAGE_HINT);
! err_sendstring(, edata->hint);
! 			}
  		}
+ 		else
+ 			err_sendstring(, _("redacted message text"));
  
  		if (edata->context)
  		{
*** send_message_to_frontend(ErrorData *edat
*** 3274,3283 
  		if (edata->show_funcname && edata->funcname)
  			appendStringInfo(, "%s: ", edata->funcname);
  
! 		if (edata->message)
! 			appendStringInfoString(, edata->message);
  		else
! 			appendStringInfoString(, _("missing error text"));
  
  		if (edata->cursorpos > 0)
  			appendStringInfo(, _(" at character %d"),
--- 3282,3297 
  		if (edata->show_funcname && edata->funcname)
  			appendStringInfo(, "%s: ", edata->funcname);
  
! 		/* skip sending if 

Re: Row Level Security − leakproof-ness and performance implications

2019-03-18 Thread Joe Conway
On 3/18/19 3:52 PM, Peter Eisentraut wrote:
> On 2019-02-28 00:03, Joe Conway wrote:
>> What if we provided an option to redact all client messages (leaving
>> logged messages as-is). Separately we could provide a GUC to force all
>> functions to be resolved as leakproof. Depending on your requirements,
>> having both options turned on could be perfectly acceptable.
> 
> There are two commit fest entries for this thread, one in Pierre's name
> and one in yours.  Is your entry for the error message redacting
> functionality?  I think that approach has been found not to actually
> satisfy the leakproofness criteria.


It is a matter of opinion with regard to what the criteria actually is,
and when it ought to apply. But in any case the clear consensus was
against me, so I guess I'll assume "my patch was rejected by PostgreSQL
all I got was this tee shirt" (...I know I have one that says something
like that somewhere...) ;-)

I have no idea what the other entry is all about as I have not had the
time to look.

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature


Re: pg_config wrongly marked as not parallel safe?

2019-02-17 Thread Joe Conway
On 11/30/18 10:32 AM, Tom Lane wrote:
> Joe Conway  writes:
>> On 11/30/18 3:30 AM, Kyotaro HORIGUCHI wrote:
>>> # And returning to the topic, I vote for pg_config should be "stable".
> 
>> And on that note, Does this change does warrant backpatching, or should
>> be applied to master only?
> 
> I don't think back-patching the catalog change is really a good idea.
> The amount of work involved (e.g. release-noting how to perform the
> update on existing databases) is way out of proportion to the benefit
> for this particular case.


Closing out at least this part of the thread, committed and pushed,
master only.

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature


Re: Some thoughts on NFS

2019-02-19 Thread Joe Conway
On 2/19/19 10:59 AM, Magnus Hagander wrote:
> On Tue, Feb 19, 2019 at 4:46 PM Robert Haas  > wrote:
> 
> On Tue, Feb 19, 2019 at 2:03 AM Thomas Munro  > wrote:
> > How can we achieve that, without writing our
> > own NFS client?
> 
> 
> 
> 
> You'll need it :)
> 
> 
> Instead of writing our own NFS client, how about writing our own
> network storage protocol?  Imagine a stripped-down postmaster process
> running on the NFS server that essentially acts as a block server.
> Through some sort of network chatter, it can exchange blocks with the
> real postmaster running someplace else.  The mini-protocol would
> contain commands like READ_BLOCK, WRITE_BLOCK, EXTEND_RELATION,
> FSYNC_SEGMENT, etc. - basically whatever the relevant operations at
> the smgr layer are.  And the user would see the remote server as a
> tablespace mapped to a special smgr.
> 
> As compared with your proposal, this has both advantages and
> disadvantages.  The main advantage is that we aren't dependent on
> being able to make NFS behave in any particular way; indeed, this type
> of solution could be used not only to work around problems with NFS,
> but also problems with any other network filesystem.  We get to reuse
> all of the work we've done to try to make local operation reliable;
> the remote server can run the same code that would be run locally
> whenever the master tells it to do so.  And you can even imagine
> trying to push more work to the remote side in some future version of
> the protocol.  The main disadvantage is that it doesn't help unless
> you can actually run software on the remote box.  If the only access
> you have to the remote side is that it exposes an NFS interface, then
> this sort of thing is useless.  And that's probably a pretty common
> scenario.
> 
> 
> In my experience, that covers approximately 100% of the usecase.
> 
> The only case I've run into people wanting to use postgres on NFS, the
> NFS server is a big filer from netapp or hitachi or whomever. And you're
> not going to be able to run something like that on top of it.


Exactly my experience too.


> There might be a use-case for the split that you mention, absolutely,
> but it's not going to solve the people-who-want-NFS situation. You'd
> solve more of that by having the middle layer speak "raw device"
> underneath and be able to sit on top of things like iSCSI (yes, really).

Interesting idea but sounds ambitious ;-)

Joe
-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature


Re: Should the docs have a warning about pg_stat_reset()?

2019-04-14 Thread Joe Conway
On 4/13/19 3:42 PM, Tomas Vondra wrote:
> If only we had a way to regularly snapshot the data from within the
> database, and then compute the deltas on that. If only we could insert
> data from one table into another one a then do some analysics on it,
> with like small windows moving over the data or something ...

Why not store deltas separately with their own delta reset command?

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development




Re: New committer: David Rowley

2019-05-30 Thread Joe Conway
On 5/30/19 11:43 AM, Andres Freund wrote:
> Hi,
> 
> On 2019-05-30 11:39:23 -0400, Magnus Hagander wrote:
>> For those of you that have not read the minutes from the developer meeting
>> ahead of pgcon (can be found at
>> https://wiki.postgresql.org/wiki/PgCon_2019_Developer_Meeting), we'd like
>> to announce here as well that David Rowley has joined the ranks of
>> PostgreSQL committers.
>>
>> Congratulations to David, may the buildfarm be gentle to him, and his first
>> revert far away!
> 
> Congrats!

+1

Congratulations David!

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development




Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2019-06-14 Thread Joe Conway
On 6/13/19 11:07 AM, Bruce Momjian wrote:
> On Thu, Jun 13, 2019 at 04:26:47PM +0900, Masahiko Sawada wrote:
>> Yeah, in principle since data key of 2 tier key architecture should
>> not go outside database I think we should not tell data keys to
>> utility commands. So the rearranging WAL format seems to be a better
>> solution but is there any reason why the main data is placed at end of
>> WAL record? I wonder if we can assemble WAL records as following order
>> and encrypt only 3 and 4.
>> 
>> 1. Header data (XLogRecord and other headers)
>> 2. Main data (xl_heap_insert, xl_heap_update etc + related data)
>> 3. Block data (Tuple data, FPI)
>> 4. Sub data (e.g tuple data for logical decoding)
> 
> Yes, that does sound like a reasonable idea.  It is similar to us not
> encrypting the clog --- there is little value.  However, if we only
> encrypt the cluster, we don't need to expose the relfilenode and we can
> just encrypt the entire WAL --- I like that simplicity.  We might find
> that the complexity of encrypting only certain tablespaces makes the
> system slower than just encrypting the entire cluster.


There are reasons other than performance to want more granular than
entire cluster encryption. Limiting the volume of encrypted data with
any one key for example. And not encrypting #1 & 2 above would help
avoid known-plaintext attacks I would think.


>> > > Also, for system catalog encryption, it could be a hard part. System
>> > > catalogs are initially created at initdb time and created by copying
>> > > from template1 when CREATE DATABASE. Therefore we would need to either
>> > > modify initdb so that it's aware of encryption keys and KMS or modify
>> > > database creation so that it copies database file while encrypting
>> > > them.
>> >
>> > I assume initdb will use the same API that you would use to start the
>> > server itself, e.g., type in a password, or contact a key server.
>> 
>> I realized that in XTS encryption mode since we craft the tweak using
>> relfilenode we will need to have the different tweaks for system
>> catalogs in new database would change. So we might need to re-encrypt
>> system catalogs when CREATE DATABASE after all. I suspect that even
>> the cluster-wide encryption has the same problem.
> 
> Yes, this is why I want to just do cluster-wide encryption at this
> stage.
> 
> In addition, while the 8k blocks would use a block cipher, the WAL would
> likely use a stream cipher, and it will be very hard to use multiple
> stream ciphers in a single WAL file.


I don't understand why we would not just use AES for both.

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature


Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2019-06-14 Thread Joe Conway
On 6/14/19 6:09 PM, Bruce Momjian wrote:
> On Fri, Jun 14, 2019 at 02:27:17PM -0400, Joe Conway wrote:
>> On 6/13/19 11:07 AM, Bruce Momjian wrote:
>> > In addition, while the 8k blocks would use a block cipher, the WAL would
>> > likely use a stream cipher, and it will be very hard to use multiple
>> > stream ciphers in a single WAL file.
>> 
>> I don't understand why we would not just use AES for both.
> 
> Uh, AES is an encryption cipher.  You can use it with block mode, CBC,
> or stream mode, CTR, GCM;  see:


AES is a block cipher, not a stream cipher. Yes you can use it in
different modes, including chained modes (and CBC is what I would pick),
but I assumed you were talking about an actual stream cipher algorithm.

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature


Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2019-06-16 Thread Joe Conway
On 6/16/19 9:45 AM, Bruce Momjian wrote:
> On Sun, Jun 16, 2019 at 07:07:20AM -0400, Joe Conway wrote:
>> In any case it doesn't address my first point, which is limiting the
>> volume encrypted with the same key. Another valid reason is you might
>> have data at varying sensitivity levels and prefer different keys be
>> used for each level.
> 
> That seems quite complex.


How? It is no more complex than encrypting at the tablespace level
already gives you - in that case you get this property for free if you
care to use it.

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature


Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2019-06-16 Thread Joe Conway
On 6/16/19 9:46 AM, Bruce Momjian wrote:
> On Sun, Jun 16, 2019 at 09:45:09AM -0400, Bruce Momjian wrote:
>> On Sun, Jun 16, 2019 at 07:07:20AM -0400, Joe Conway wrote:
>> > And although I'm not proposing this for the first implementation, yet
>> > another reason is I might want to additionally control "transparent
>> > access" to data based on who is logged in. That could be done by
>> > layering an additional key on top of the per-tablespace key for example.
>> > 
>> > The bottom line in my mind is encrypting the entire database with a
>> > single key is not much different/better than using filesystem
>> > encryption, so I'm not sure it is worth the effort and complexity to get
>> > that capability. I think having the ability to encrypt at the tablespace
>> > level adds a lot of capability for minimal extra complexity.
>> 
>> I disagree.
> 
> I will add that OpenSSL has been removing features and compatibility
> because the added complexity had hidden exploits that they could not
> have anticipated.

Sorry but I'm not buying it.

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature


Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2019-06-17 Thread Joe Conway
On 6/17/19 8:12 AM, Stephen Frost wrote:
>> But there's about 0% chance we'll get that in v1, of course, so we need
>> s "minimum viable product" to build on anyway.
> 
> There seems like a whole lot of space between something very elaborate
> and only supporting one key.

I think this is exactly the point -- IMHO one key per tablespace is a
nice and very sensible compromise. I can imagine all kinds of more
complex things that would be "nice to have" but that gets us most of the
flexibility needed with minimal additional complexity.

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development




Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2019-06-17 Thread Joe Conway
On 6/17/19 8:29 AM, Masahiko Sawada wrote:
> From perspective of  cryptographic, I think the fine grained TDE would
> be better solution. Therefore if we eventually want the fine grained
> TDE I wonder if it might be better to develop the table/tablespace TDE
> first while keeping it simple as much as possible in v1, and then we
> can provide the functionality to encrypt other data in database
> cluster to satisfy the encrypting-everything requirement. I guess that
> it's easier to incrementally add encryption target objects rather than
> making it fine grained while not changing encryption target objects.
> 
> FWIW I'm writing a draft patch of per tablespace TDE and will submit
> it in this month. We can more discuss the complexity of the proposed
> TDE using it.

+1

Looking forward to it.

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development




Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2019-06-20 Thread Joe Conway
On 6/20/19 8:34 AM, Masahiko Sawada wrote:
> I think even if we provide the per table encryption we can have
> encryption keys per tablepaces. That is, all tables on the same
> tablespace encryption use the same encryption keys but user can
> control encrypted objects per tables.
> 
>> Will we add the table-level TDE in the first version?
> 
> I hope so but It's under discussion now.

+1

>> And I have two questions.
>> 1. Will we add hooks to support replacing the encryption algorithms?
>> 2. Will we add some encryption algorithm or use some in some libraries?
> 
> Currently the WIP patch uses openssl and supports only AES-256 and I
> don't have a plan to have such extensibility for now. But it might be
> a good idea in the future. I think it would be not hard to support
> symmetric encryption altgorithms supported by openssl but would you
> like to support other encryption algorithms?

Supporting other symmetric encryption algorithms would be nice but I
don't think that is required for the first version. It would also be
nice but not initially required to support different encryption
libraries. The implementation should be written with both of these
eventualities in mind though IMHO.

I would like to see strategically placed hooks in the key management so
that an extension could, for example, layer another key in between the
master key and the per-tablespace key. This would allow extensions to
provide additional control over when decryption is allowed.

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development




Re: initdb recommendations

2019-05-24 Thread Joe Conway
On 5/24/19 8:13 AM, Stephen Frost wrote:
> Greetings,
> 
> * Joe Conway (m...@joeconway.com) wrote:
>> On 5/23/19 10:30 PM, Stephen Frost wrote:
>> > * Tom Lane (t...@sss.pgh.pa.us) wrote:
>> >> "Jonathan S. Katz"  writes:
>> >> > For now I have left in the password based method to be scram-sha-256 as
>> >> > I am optimistic about the support across client drivers[1] (and FWIW I
>> >> > have an implementation for crystal-pg ~60% done).
>> >> 
>> >> > However, this probably means we would need to set the default password
>> >> > encryption guc to "scram-sha-256" which we're not ready to do yet, so it
>> >> > may be moot to leave it in.
>> >> 
>> >> > So, thinking out loud about that, we should probably use "md5" and once
>> >> > we decide to make the encryption method "scram-sha-256" by default, then
>> >> > we update the recommendation?
>> >> 
>> >> Meh.  If we're going to break things, let's break them.  Set it to
>> >> scram by default and let people who need to cope with old clients
>> >> change the default.  I'm tired of explaining that MD5 isn't actually
>> >> insecure in our usage ...
>> > 
>> > +many.
>> 
>> many++
>> 
>> Are we doing this for pg12? In any case, I would think we better loudly
>> point out this change somewhere.
> 
> Sure, we should point it out, but I don't know that it needs to be
> screamed from the rooftops considering the packagers have already been
> largely ignoring our defaults here anyway...

Yeah, I thought about that, but anyone not using those packages will be
in for a big surprise. Don't get me wrong, I wholeheartedly endorse the
change, but I predict many related questions on the lists, and anything
we can do to mitigate that should be done.

Joe
-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development




Re: Excessive memory usage in multi-statement queries w/ partitioning

2019-05-24 Thread Joe Conway
On 5/24/19 1:47 AM, Amit Langote wrote:
> On 2019/05/23 4:15, Andreas Seltenreich wrote:
>> …but when doing it on the parent relation, even 100 statements are
>> enough to exceed the limit:
>> 
>> ,
>> | $ psql -c "$(yes update t set c=c where c=6 \; | head -n 100)"
>> | FEHLER:  Speicher aufgebraucht
>> | DETAIL:  Failed on request of size 200 in memory context "MessageContext".
>> `
>> 
>> The memory context dump shows plausible values except for the MessageContext:
>> 
>> TopMemoryContext: 124336 total in 8 blocks; 18456 free (11 chunks); 105880 
>> used
>>   [...]
>>   MessageContext: 264241152 total in 42 blocks; 264 free (0 chunks); 
>> 264240888 used
>>   [...]
> 
> As David Rowley said, planning that query hundreds of times under a single
> MessageContext is not something that will end well on 11.3, because even a
> single instance takes up tons of memory that's only released when
> MessageContext is reset.
> 
>> Maybe some tactically placed pfrees or avoiding putting redundant stuff
>> into MessageContext can relax the situation?
> 
> I too have had similar thoughts on the matter.  If the planner had built
> all its subsidiary data structures in its own private context (or tree of
> contexts) which is reset once a plan for a given query is built and passed
> on, then there wouldn't be an issue of all of that subsidiary memory
> leaking into MessageContext.  However, the problem may really be that
> we're subjecting the planner to use cases that it wasn't perhaps designed
> to perform equally well under -- running it many times while handling the
> same message.  It is worsened by the fact that the query in question is
> something that ought to have been documented as not well supported by the
> planner; David has posted a documentation patch for that [1].  PG 12 has
> alleviated the situation to a large degree, so you won't see the OOM
> occurring for this query, but not for all queries unfortunately.


I admittedly haven't followed this thread too closely, but if having 100
partitions causes out of memory on pg11, that sounds like a massive
regression to me.

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature


Re: initdb recommendations

2019-05-24 Thread Joe Conway
On 5/23/19 10:30 PM, Stephen Frost wrote:
> Greetings,
> 
> * Tom Lane (t...@sss.pgh.pa.us) wrote:
>> "Jonathan S. Katz"  writes:
>> > For now I have left in the password based method to be scram-sha-256 as
>> > I am optimistic about the support across client drivers[1] (and FWIW I
>> > have an implementation for crystal-pg ~60% done).
>> 
>> > However, this probably means we would need to set the default password
>> > encryption guc to "scram-sha-256" which we're not ready to do yet, so it
>> > may be moot to leave it in.
>> 
>> > So, thinking out loud about that, we should probably use "md5" and once
>> > we decide to make the encryption method "scram-sha-256" by default, then
>> > we update the recommendation?
>> 
>> Meh.  If we're going to break things, let's break them.  Set it to
>> scram by default and let people who need to cope with old clients
>> change the default.  I'm tired of explaining that MD5 isn't actually
>> insecure in our usage ...
> 
> +many.

many++

Are we doing this for pg12? In any case, I would think we better loudly
point out this change somewhere.

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development




Re: initdb recommendations

2019-05-24 Thread Joe Conway
On 5/24/19 8:56 AM, Jonathan S. Katz wrote:
> On 5/24/19 8:33 AM, Stephen Frost wrote:
>> * Magnus Hagander (mag...@hagander.net) wrote:
>>> Making the default change away from trust in the source distro will affect
>>> few people.
>> 
>> Agreed.
> 
> +1

Fewer people, but likely disproportionately high representation on pgsql
lists. Anyway, nuff said -- I guess the future will tell one way or the
other.

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development




Re: Excessive memory usage in multi-statement queries w/ partitioning

2019-05-24 Thread Joe Conway
On 5/24/19 9:33 AM, David Rowley wrote:
> On Sat, 25 May 2019 at 00:18, Joe Conway  wrote:
>> I admittedly haven't followed this thread too closely, but if having 100
>> partitions causes out of memory on pg11, that sounds like a massive
>> regression to me.
> 
> For it to have regressed it would have had to once have been better,
> but where was that mentioned?  The only thing I saw was
> non-partitioned tables compared to partitioned tables, but you can't
> really say it's a regression if you're comparing apples to oranges.


I have very successfully used multiple hundreds and even low thousands
of partitions without running out of memory under the older inheritance
based "partitioning", and declarative partitioning is supposed to be
(and we have advertised it to be) better, not worse, isn't it?

At least from my point of view if 100 partitions is unusable due to
memory leaks it is a regression. Perhaps not *technically* a regression
assuming it behaves this way in pg10 also, but I bet lots of users would
see it that way.

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature


Re: Excessive memory usage in multi-statement queries w/ partitioning

2019-05-24 Thread Joe Conway
On 5/24/19 10:28 AM, Tom Lane wrote:
> Joe Conway  writes:
>> On 5/24/19 9:33 AM, David Rowley wrote:
>>> For it to have regressed it would have had to once have been better,
>>> but where was that mentioned?  The only thing I saw was
>>> non-partitioned tables compared to partitioned tables, but you can't
>>> really say it's a regression if you're comparing apples to oranges.
> 
>> I have very successfully used multiple hundreds and even low thousands
>> of partitions without running out of memory under the older inheritance
>> based "partitioning", and declarative partitioning is supposed to be
>> (and we have advertised it to be) better, not worse, isn't it?
> 
> Have you done the exact thing described in the test case?  I think
> that's going to be quite unpleasantly memory-intensive in any version.


Ok, fair point. Will test and report back.

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature


stawidth inconsistency with all NULL columns

2019-05-21 Thread Joe Conway
Consider:

CREATE TABLE testwid
(
  txtnotnull text,
  txtnull text,
  int8notnull int8,
  int8null int8
);
INSERT INTO testwid
SELECT 'a' || g.i,
   NULL,
   g.i,
   NULL
FROM generate_series(1,1) AS g(i);
ANALYZE testwid;
SELECT attname, avg_width FROM pg_stats WHERE tablename = 'testwid';
   attname   | avg_width
-+---
 txtnotnull  | 5
 txtnull | 0
 int8notnull | 8
 int8null| 8
(4 rows)


I see in analyze.c
8<-
/* We can only compute average width if we found some non-null values.*/
if (nonnull_cnt > 0)

  [snip]

else if (null_cnt > 0)
{
/* We found only nulls; assume the column is entirely null */
stats->stats_valid = true;
stats->stanullfrac = 1.0;
if (is_varwidth)
stats->stawidth = 0;/* "unknown" */
else
stats->stawidth = stats->attrtype->typlen;
stats->stadistinct = 0.0;   /* "unknown" */
}
8<-

So apparently intentional, but seems gratuitously inconsistent. Could
this cause any actual inconsistent behaviors? In any case that first
comment does not reflect the code.

Joe
-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature


Re: stawidth inconsistency with all NULL columns

2019-05-21 Thread Joe Conway
On 5/21/19 3:55 PM, Tom Lane wrote:
> Joe Conway  writes:
>> else if (null_cnt > 0)
>> {
>> /* We found only nulls; assume the column is entirely null */
>> stats->stats_valid = true;
>> stats->stanullfrac = 1.0;
>> if (is_varwidth)
>> stats->stawidth = 0;/* "unknown" */
>> else
>> stats->stawidth = stats->attrtype->typlen;
>> stats->stadistinct = 0.0;   /* "unknown" */
>> }
>> 8<-
> 
>> So apparently intentional, but seems gratuitously inconsistent. Could
>> this cause any actual inconsistent behaviors? In any case that first
>> comment does not reflect the code.
> 
> Are you suggesting that we should set stawidth to zero even for a
> fixed-width datatype?  That seems pretty silly.  We know exactly what
> the value should be, and would be if we'd chanced to find even one
> non-null entry.

Well you could argue in similar fashion for variable width values -- if
we find even one of those, it will be at least 4 bytes. So why set those
to zero?

Not a big deal, but it struck me as odd when I was looking at the
current state of affairs.

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature


Re: How to install login_hook in Postgres 10.5

2019-05-14 Thread Joe Conway
On 5/13/19 8:32 PM, Michael Paquier wrote:
> On Mon, May 13, 2019 at 01:06:10PM -0700, legrand legrand wrote:
>> that finished commited
>> "pgsql: Add hooks for session start and session end"
>> https://www.postgresql.org/message-id/flat/575d6fa2-78d0-4456-8600-302fc35b2591%40dunslane.net#0819e315c6e44c49a36c69080cab644d
>> 
>> but was finally rollbacked because it didn't pass installcheck test
>> ...
>> 
>> Maybe you are able to patch your pg installation, 
>> it would be a solution of choice (there is a nice exemple 
>> of extension included)
> 
> You will need to patch Postgres to add this hook, and you could
> basically reuse the patch which has been committed once.  I don't
> think that it would be that much amount of work to get it working
> correctly on the test side to be honest, so we may be able to get
> something into v13 at this stage.  This is mainly a matter of
> resources though, and of folks willing to actually push for it.


I am interested in this, so if Andrew wants to create a buildfarm module
I will either add it to rhinoceros or stand up another buildfarm animal
for it. I am also happy to help push it for v13.

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature


Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2019-07-10 Thread Joe Conway
On 7/10/19 4:24 AM, Antonin Houska wrote:
> Joe Conway  wrote:
> 
>> On 7/8/19 6:04 PM, Stephen Frost wrote:
>> > * Bruce Momjian (br...@momjian.us) wrote:
>> >> Uh, well, renaming the user was a big problem, but that is the only case
>> >> I can think of.  I don't see that as an issue for block or WAL sequence
>> >> numbers.  If we want to use a different nonce, we have to find a way to
>> >> store it or look it up efficiently.  Considering the nonce size, I don't
>> >> see how that is possible.
>> > 
>> > No, this also meant that, as an attacker, I *knew* the salt ahead of
>> > time and therefore could build rainbow tables specifically for that
>> > salt.  I could also use those *same* tables for any system where that
>> > user had an account, even if they used different passwords on different
>> > systems...
>> > 
>> > I appreciate that *some* of this might not be completely relevant for
>> > the way a nonce is used in cryptography, but I'd be very surprised to
>> > have a cryptographer tell me that a deterministic nonce didn't have
>> > similar issues or didn't reduce the value of the nonce significantly.
>> 
>> I have worked side by side on projects with bona fide cryptographers and
>> I can assure you that they recommended random nonces. Granted, that was
>> in the early 2000s, but I don't think "modern cryptography" has changed
>> that any more than "web scale" has made Postgres irrelevant in the
>> intervening years.
> 
> I think that particular threads have to be considered.
> 
>> Related links:
> 
>> https://defuse.ca/cbcmodeiv.htm
>> https://www.cryptofails.com/post/70059609995/crypto-noobs-1-initialization-vectors
> 
> The first one looks more in-depth than the other one, so I focused on it:
> 
> * "Statistical Correlations between IV and Plaintext"
> 
> My understanding is that predictability of the IV (in our implementation of
> full-instance encryption [1] we derive the IV from RelFileNode combined with
> block number) can reveal information about the first encryption block (16
> bytes) of the page, i.e. part of the PageHeaderData structure. I don't think
> this leaks any valuable data. And starting the 2nd block, the IV is not
> predictable because it is cipher text of the previous block.
> 
> * "Chosen-Plaintext Attacks"
> 
> The question here is whether we expect the OS admin to have access to the
> database. In [1] we currently don't (cloud, where DBA has no control over the
> storage layer is the main use case), but if it appears to be the requirement,
> I believe CBC-ESSIV mode [2] can fix the problem.
> 
> Anyway, I'm not sure if this kind of attack can reveal more information than
> something about the first block of the page (the page header), since each of
> the following blocks uses ciphertext of the previous block as the IV.
> 
> * "Altering the IV Before Decryption"
> 
> I don't think this attack needs special attention - page checksums should
> reveal it.
> 
> 
> [1] https://commitfest.postgresql.org/23/2104/
> [2] 
> https://en.wikipedia.org/wiki/Disk_encryption_theory#Encrypted_salt-sector_initialization_vector_.28ESSIV.29
> 

Please see my other reply (and
https://nvlpubs.nist.gov/nistpubs/Legacy/SP/nistspecialpublication800-38a.pdf
appendix C as pointed out by Ryan downthread).

At least in my mind, I trust a published specification from the
nation-state level over random blogs or wikipedia. If we can find some
equivalent published standards that contradict NIST we should discuss
it, but for my money I would prefer to stick with the NIST recommended
method to produce the IVs.

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature


Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2019-07-10 Thread Joe Conway
On 7/10/19 8:34 AM, Stephen Frost wrote:
> Greetings,
> 
> * Joe Conway (m...@joeconway.com) wrote:
>> On 7/9/19 7:28 PM, Stephen Frost wrote:
>> > * Joe Conway (m...@joeconway.com) wrote:
>> >> On 7/9/19 5:42 PM, Tomas Vondra wrote:
>> >> > There are two basic ways to construct nonces - CSPRNG and sequences, and
>> >> > then a combination of both, i.e. one part is generated from a sequence
>> >> > and one randomly.
>> >> > 
>> >> > FWIW not sure using OIDs as nonces directly is a good idea, as those are
>> >> > inherently low entropy data - how often do you see databases with OIDs
>> >> > above 1M or so? Probably not very often, and in most cases those are
>> >> > databases where those OIDs are for OIDs and large objects, so irrelevant
>> >> > for this purpose. I might be wrong but having a 96-bit nonce with maybe
>> >> > just 32bits of entrophy seems suspicious.
>> >> > 
>> >> > That does not mean we can't use the OIDs at all, but maybe hashing them
>> >> > into a single 4B value, and then picking the remaining 8B randomly.
>> >> > Also, we have a "natural" sequence in the database - LSNs, maybe that
>> >> > would be a good source of nonces too?
>> >> 
>> >> I think you missed the quoted part (upthread) from the NIST document:
>> >> 
>> >>   "There are two recommended methods for generating unpredictable IVs.
>> >>The first method is to apply the forward cipher  function, under the
>> >>same key that is used for the encryption of the plaintext, to a
>> >>nonce. The nonce must be a data block that is unique to each
>> >>execution of the encryption operation. For example, the nonce may be
>> >>a counter, as described in Appendix B, or a message number. The
>> >>second method is to generate a random data block using a
>> >>FIPS-approved random number generator."
>> >> 
>> >> That first method says a counter as input produces an acceptably
>> >> unpredictable IV as long as it is unique to each encryption operation.
>> >> If each page is going to be an "encryption operation", so as long as our
>> >> input nonce is unique for a given key, we should be ok. If the input
>> >> nonce is tableoid+pagenum and the key is different per database (at
>> >> least, hopefully different per tablespace too), we should be good to go,
>> >> at least from what I can see.
>> > 
>> > What I think Tomas is getting at here is that we don't write a page only
>> > once.
>> > 
>> > A nonce of tableoid+pagenum will only be unique the first time we write
>> > out that page.  Seems unlikely that we're only going to be writing these
>> > pages once though- what we need is a nonce that's unique for *every
>> > write* of the 8k page, isn't it?  As every write of the page is going to
>> > be encrypting something new.
>> 
>> Hmm, good point. I'm not entirely sure it would be required if the two
>> page versions don't exist at the same time, but I guess backups mean
>> that it would, so yeah.
> 
> Uh, or an attacker got a copy of the page and then just waited a few
> minutes for a new version to be written and then grabbed that...
> 
> Definitely not limited to just concerns about the fact that other
> versions would exist in backups too.

Agreed :-/

Joe
-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature


Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2019-07-10 Thread Joe Conway
On 7/10/19 4:47 AM, Antonin Houska wrote:
> Tomas Vondra  wrote:
>> I don't think that works, because that'd mean we're encrypting the same
>> page with the same nonce over and over, which means reusing the reuse
>> (even if you hash/encrypt it). Or did I miss something?
> 
> I found out that it's wrong to use the same key (or (key, IV) pair) to encrypt
> different plain texts [1], however this is about *stream cipher*. There should
> be some evidence that *block cipher* has similar weakness before we accept
> another restriction on the IV setup.
> 
> [1] https://en.wikipedia.org/wiki/Stream_cipher_attacks#Reused_key_attack

There is plenty of guidance that specifies CBC requires unique,
unpredictable, but not necessarily secret IV. See for example Appendix C
here:

https://nvlpubs.nist.gov/nistpubs/Legacy/SP/nistspecialpublication800-38a.pdf

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature


Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2019-07-10 Thread Joe Conway
On 7/10/19 2:45 AM, Masahiko Sawada wrote:
> On Wed, Jul 10, 2019 at 11:06 AM Stephen Frost  wrote:
>>
>> Greetings,
>>
>> * Ryan Lambert (r...@rustprooflabs.com) wrote:
>> > > What I think Tomas is getting at here is that we don't write a page only
>> > > once.
>> >
>> > > A nonce of tableoid+pagenum will only be unique the first time we write
>> > > out that page.  Seems unlikely that we're only going to be writing these
>> > > pages once though- what we need is a nonce that's unique for *every
>> > > write* of the 8k page, isn't it?  As every write of the page is going to
>> > >  be encrypting something new.
>> >
>> > > With sufficient randomness, we can at least be more likely to have a
>> > > unique nonce for each 8K write.  Including the LSN seems like it'd be a
>> > > possible alternative.
>> >
>> > Agreed.  I know little of the inner details about the LSN but what I read
>> > in [1] sounds encouraging in addition to tableoid + pagenum.
>> >
>> > [1] https://www.postgresql.org/docs/current/datatype-pg-lsn.html
>>
>> Yes, but it's still something that we'd have to store somewhere- the
>> actual LSN of the page is going to be in the 8K block.
> 
> Can we use CBC-ESSIV[1] or XTS[2] instead? IIUC with these modes we
> can use table oid and page number for IV or tweak and we don't need to
> change them each time to encrypt pages.
> 
> [1] 
> https://en.wikipedia.org/wiki/Disk_encryption_theory#Encrypted_salt-sector_initialization_vector_.28ESSIV.29
> [2] 
> https://en.wikipedia.org/wiki/Disk_encryption_theory#XEX-based_tweaked-codebook_mode_with_ciphertext_stealing_(XTS)


From what I can tell [1] is morally equivalent to the NIST method and
does nothing to change the fact that the input nonce needs to be unique
for each encryption operation. I have not had time to review [2] yet...

While it would be very tempting to convince ourselves that a unique
input nonce is not a requirement, I think we are better off being
conservative unless we find some extremely clear guidance that allows us
to draw that conclusion.

Joe
-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature


Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2019-07-10 Thread Joe Conway
On 7/10/19 3:53 PM, Alvaro Herrera wrote:
> On 2019-Jul-10, Bruce Momjian wrote:
> 
>> Good, so I think we all now agree we have to put the nonce
>> (pg_class.oid, LSN, page-number) though the cipher using the secret.

(been traveling -- just trying to get caught up on this thread)

> Actually, why do you need the page number in the nonce?  The LSN already
> distinguishes pages -- you can't have two pages with the same LSN, can
> you?  (I do think you can have multiple writes of the same page with
> different LSNs, if you change hint bits and don't write WAL about it,

Do you mean "multiple writes of the same page without..."?

> but maybe we should force CRC enabled in encrypted tables, which I think
> closes this hole?)

If we can use the LSN (perhaps with CRC) without the page number that
would seem to be a good idea.

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development




Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2019-07-11 Thread Joe Conway
On 7/11/19 6:37 PM, Bruce Momjian wrote:
> On Wed, Jul 10, 2019 at 12:26:24PM -0400, Bruce Momjian wrote:
>> On Wed, Jul 10, 2019 at 08:31:17AM -0400, Joe Conway wrote:
>>> Please see my other reply (and
>>> https://nvlpubs.nist.gov/nistpubs/Legacy/SP/nistspecialpublication800-38a.pdf
>>> appendix C as pointed out by Ryan downthread).
>>>
>>> At least in my mind, I trust a published specification from the
>>> nation-state level over random blogs or wikipedia. If we can find some
>>> equivalent published standards that contradict NIST we should discuss
>>> it, but for my money I would prefer to stick with the NIST recommended
>>> method to produce the IVs.
>>
>> So, we have had a flurry of activity on this thread in the past day, so
>> let me summarize:
> 
> Seems we have an updated approach:

I tried to keep up with this thread, and may have failed, but comments
inline...

> First, we need to store the symmetric encryption key in the data
> directory, like we do for SSL certificates and private keys.  (Crash
> recovery needs access to this key, so we can't easily store it in a
> database table.)  We will pattern it after the GUC
> ssl_passphrase_command.   We will need to decide on a format for the
> symmetric encryption key in the file so we can check that the supplied
> passphrase properly unlocks the key.
> 
> Our first implementation will encrypt the entire cluster.  We can later
> consider encryption per table or tablespace.  It is unclear if
> encrypting different parts of the system with different keys is useful
> or feasible.  (This is separate from key rotation.)

I still object strongly to using a single key for the entire database. I
think we can use a single key for WAL, but we need some way to split the
heap so that multiple keys are used. If not by tablespace, then some
other method.

Regardless of the method to split the heap into different keys, I think
there should be an option for some tables to not be encrypted. If we
decide it must be all or nothing for the first implementation I guess I
could live with it but would be very disappointed.

The keys themselves should be in an file which is encrypted by a master
key. Obtaining the master key should be pattern it after the GUC
ssl_passphrase_command.

> We will use CBC AES128 mode for tables/indexes, and CTR AES128 for WAL.
> 8k pages will use the LSN as a nonce, which will be encrypted to
> generate the initialization vector (IV).  We will not encrypt the first
> 16 bytes of each pages so the LSN can be used in this way.  The WAL will
> use the WAL file segment number as the nonce and the IV will be created
> in the same way.

I vote for AES 256 rather than 128.

Did we determine that we no longer need table oid because LSNs are
sufficiently unique?

> wal_log_hints will be enabled automatically in encryption mode, like we
> do for checksum mode, so we never encrypt different 8k pages with the
> same IV.

check

> There will need to be a pg_control field to indicate that encryption is
> in use.

I didn't see that discussed but it makes sense.

> Right now we don't support the online changing of a cluster's checksum
> mode, so I suggest we create a utility like pg_checksums --enable to
> allow offline key rotation.  Once we get online checksum mode changing
> ability, we can look into use that for encryption key rotation.

Master key rotation should be trivial if we do it the way I discussed
above. Rotating the individual heap and WAL keys would certainly be a
bigger problem.

Thinking out loud (and I believe somewhere in this massive thread
someone else already said this), if we had a way to flag "key version"
at the page level it seems like we could potentially rekey page-by-page
while online, locking only one page at a time. We really only need to
support 2 key versions and could ping-pong between them as they change.
Or maybe this is a crazy idea.

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development




Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2019-07-13 Thread Joe Conway
On 7/11/19 9:05 PM, Bruce Momjian wrote:
> On Thu, Jul 11, 2019 at 08:41:52PM -0400, Joe Conway wrote:
>> On 7/11/19 6:37 PM, Bruce Momjian wrote:
>> > Our first implementation will encrypt the entire cluster.  We can later
>> > consider encryption per table or tablespace.  It is unclear if
>> > encrypting different parts of the system with different keys is useful
>> > or feasible.  (This is separate from key rotation.)
>> 
>> I still object strongly to using a single key for the entire database. I
>> think we can use a single key for WAL, but we need some way to split the
>> heap so that multiple keys are used. If not by tablespace, then some
>> other method.
> 
> What do you base this on?

I have stated this more than once, and you and Stephen discussed it
earlier as well, but will find all the links back into the thread and
references and address in a separate email.

>> Regardless of the method to split the heap into different keys, I think
>> there should be an option for some tables to not be encrypted. If we
>> decide it must be all or nothing for the first implementation I guess I
>> could live with it but would be very disappointed.
> 
> What does it mean you "could live this it"?  Why do you consider having
> some tables unencrypted important?


I think it is pretty obvious isn't it? You have talked about the
performance impact. Why would I want to encrypt, for example a lookup
table, if there is nothing in that table warranting encryption?

I think in many if not most applications the sensitive data is limited
to much less than all of the tables, and I'd rather not take the hit for
those tables.


>> The keys themselves should be in an file which is encrypted by a master
>> key. Obtaining the master key should be pattern it after the GUC
>> ssl_passphrase_command.
>> 
>> > We will use CBC AES128 mode for tables/indexes, and CTR AES128 for WAL.
>> > 8k pages will use the LSN as a nonce, which will be encrypted to
>> > generate the initialization vector (IV).  We will not encrypt the first
>> > 16 bytes of each pages so the LSN can be used in this way.  The WAL will
>> > use the WAL file segment number as the nonce and the IV will be created
>> > in the same way.
>> 
>> I vote for AES 256 rather than 128.
> 
> Why?  This page seems to think 128 is sufficient:


Addressed in the other email


>> Thinking out loud (and I believe somewhere in this massive thread
>> someone else already said this), if we had a way to flag "key version"
>> at the page level it seems like we could potentially rekey page-by-page
>> while online, locking only one page at a time. We really only need to
>> support 2 key versions and could ping-pong between them as they change.
>> Or maybe this is a crazy idea.
> 
> Yes, we did talk about this.  It is certainly possible, but we would
> still need a tool to guarantee all pages are using the new version, so I
> am not sure what per-page buys us except making the later check faster. 
> I don't see this as a version-1 feature, frankly.

If we allow for say, 2 versions of the key to exist at any given time,
and if we could store that key version information on each page, we
could change the key from old to new without locking the entire table at
once, just locking one page at a time. Or at least that was my thinking.

Joe
-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature


Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2019-07-13 Thread Joe Conway
On 7/12/19 2:45 PM, Bruce Momjian wrote:
> On Fri, Jul 12, 2019 at 12:41:19PM -0600, Ryan Lambert wrote:
>> >> I vote for AES 256 rather than 128.
>> >
>> > Why?  This page seems to think 128 is sufficient:
>> >
>> >         https://crypto.stackexchange.com/questions/20/
>> what-are-the-practical-differences-between-256-bit-192-bit-and-128-bit-aes-enc
>> >
>> >         For practical purposes, 128-bit keys are sufficient to ensure
>> security.
>> >         The larger key sizes exist mostly to satisfy some US military
>> >         regulations which call for the existence of several distinct
>> "security
>> >         levels", regardless of whether breaking the lowest level is already
>> far
>> >         beyond existing technology.
>> 
>> After researching AES key sizes a bit more my vote is (surprisingly?) for
>> AES-128.  My reasoning is about security, I did not consider performance
>> impacts in my decision.
> 
> Thank you for this exhaustive research.

First of all, that is a mischaracterization of the issue. That paper
also states:

"we describe several key derivation attacks of practical complexity on
AES-256 when its number of rounds is reduced to approximately that of
AES-128. The best previously published  attacks  on  such  variants
were  far  from  practical,  requiring  4  related  keys  and  2^120
time  to break a 9-round version of AES-256 [9], and 64 related keys and
2^172time to break a 10-round version of AES-256 ([9], see also [2]). In
this paper we describe an attack on 9-round AES-256 which can findits
complete 256-bit key in 239time by using only the simplest type of
related keys (in which the chosenplaintexts are encrypted under two keys
whose XOR difference can be chosen in many different ways).Our best
attack on 10-round AES-256 requires only two keys and 245time, but it
uses a stronger type ofrelated subkey attack. These attacks can be
extended into a quasi-practical 270attack on 11-round AES,and into a
trivial 226attack on 8-round AES."

Notice 2 key things about this:
1. The attacks described are against a reduced number of rounds. AES256
   is 14 rounds, not 9 or 10.
2, They are "related key" attacks, which can be avoided by not using
   related keys, and we certainly should not be doing that.

Also, please read the links that Sehrope sent earlier if you have not
done so. In particular this one:

https://www.ecrypt.eu.org/csa/documents/PQC-whitepaper.pdf

which says:

"Post-quantum cryptography is an area of cryptography in which systems
are studied under the  security  assumption  that  the  attacker  has  a
 quantum  computer. This  attack  model  is interesting because Shor has
shown a quantum algorithm that breaks RSA, ECC, and finite field
discrete logarithms in polynomial time.  This means that in this model
all commonly used public-key systems are no longer secure.Symmetric
cryptography is also affected but significantly less.  For systems that
do not rely on mathematical structures the main effect is that an
algorithm due to Grover halves the security level, i.e., breaking
AES-128 takes 2^64 quantum operations while current attacks take  2^128
steps.   While  this  is  a  big  change,  it  can  be  managed  quite
easily  by  doubling the key sizes, e.g., by deploying AES-256.  The
operations needed in Grover’s algorithm are inherently  sequential
which  has  led  some  to  doubt  that  even  264quantum  operations
are feasible, but since the remedy of changing to larger key sizes is
very inexpensive it is generally recommended to do so."

In addition, that first paper was written in 2010, yet in 2016 NSA
published one of the other documents referenced by Sehrope:

https://apps.nsa.gov/iaarchive/customcf/openAttachment.cfm?FilePath=/iad/library/ia-guidance/ia-solutions-for-classified/algorithm-guidance/assets/public/upload/CNSA-Suite-and-Quantum-Computing-FAQ.pdf=aF6woL7fQp3dJiyWTFKrYn3ZZShmLnzECSjJhf

Which states:

"If you have already implemented Suite B algorithms using the larger
(for TOP SECRET) key sizes, you should continue to use those algorithms
and key sizes through this upcoming transition period. In many products
changing to a larger key size can be done via a configuration
change.Implementations using only the algorithms previously approved for
SECRET and below in Suite B should not be used in NSS. In more precise
terms this means that NSS (National Security Systems) should no longer use
 •ECDH and ECDSA with NIST P-256
 •SHA-256
 •AES-128
 •RSA with 2048-bit keys
 •Diffie-Hellman with 2048-bit keys"


I stand by my position. At a minimum, we need a choice of AES128 and AES256.

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature


Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2019-07-13 Thread Joe Conway
On 7/13/19 9:38 AM, Joe Conway wrote:
> On 7/11/19 9:05 PM, Bruce Momjian wrote:
>> On Thu, Jul 11, 2019 at 08:41:52PM -0400, Joe Conway wrote:
>>> On 7/11/19 6:37 PM, Bruce Momjian wrote:
>>> > Our first implementation will encrypt the entire cluster.  We can later
>>> > consider encryption per table or tablespace.  It is unclear if
>>> > encrypting different parts of the system with different keys is useful
>>> > or feasible.  (This is separate from key rotation.)
>>> 
>>> I still object strongly to using a single key for the entire database. I
>>> think we can use a single key for WAL, but we need some way to split the
>>> heap so that multiple keys are used. If not by tablespace, then some
>>> other method.
>> 
>> What do you base this on?

Ok, so here we go. See links below. I skimmed through the entire thread
and FWIW it was exhausting.

To some extent this degenerated into a general search for relevant
information:

---
[1] and [2] show that at least some file system encryption uses a
different key per file.
---
[2] also shows that file system encryption uses a KDF (key derivation
function) which we may want to use ourselves. The analogy would be
per-table derived key instead of per file derived key. Note that KDF is
a safe way to derive a key and it is not the same as a "related key"
which was mentioned on another email as an attack vector.
---
[2] also says provides additional support for AES 256. It also mentions
CBC versus XTS -- I came across this elsewhere and it bears discussion:

"Currently, the following pairs of encryption modes are supported:

AES-256-XTS for contents and AES-256-CTS-CBC for filenames
AES-128-CBC for contents and AES-128-CTS-CBC for filenames
Adiantum for both contents and filenames

If unsure, you should use the (AES-256-XTS, AES-256-CTS-CBC) pair.

AES-128-CBC was added only for low-powered embedded devices with crypto
accelerators such as CAAM or CESA that do not support XTS."
---
[2] also states this, which again makes me think in terms of table being
the moral equivalent to a file:

"Unlike dm-crypt, fscrypt operates at the filesystem level rather than
at the block device level. This allows it to encrypt different files
with different keys and to have unencrypted files on the same
filesystem. This is useful for multi-user systems where each user’s
data-at-rest needs to be cryptographically isolated from the others.
However, except for filenames, fscrypt does not encrypt filesystem
metadata."
---
[3] suggests 68 GB per key and unique IV in GCM mode.
---
[4] specifies 68 GB per key and unique IV in CTR mode -- this applies
directly to our proposal to use CTR for WAL.
---
[5] has this to say which seems independent of mode:

"When encrypting data with a symmetric block cipher, which uses blocks
of n bits, some security concerns begin to appear when the amount of
data encrypted with a single key comes close to 2n/2 blocks, i.e. n*2n/2
bits. With AES, n = 128 (AES-128, AES-192 and AES-256 all use 128-bit
blocks). This means a limit of more than 250 millions of terabytes,
which is sufficiently large not to be a problem. That's precisely why
AES was defined with 128-bit blocks, instead of the more common (at that
time) 64-bit blocks: so that data size is practically unlimited."

But goes on to say:
"I wouldn't use n*2^(n/2) bits in any sort of recommendation. Once you
reach that number of bits the probability of a collision will grow
quickly and you will be way over 50% probability of a collision by the
time you reach 2*n*2^(n/2) bits. In order to keep the probability of a
collision negligible I recommend encrypting no more than n*2^(n/4) bits
with the same key. In the case of AES that works out to 64GB"

It is hard to say if that recommendation is per key or per key+IV.
---
[6] shows that Azure SQL Database uses AES 256 for TDE. It also seems to
imply a single key is used although at one point it says "transparent
data encryption master key, also known as the transparent data
encryption protector". The term "master key" indicates that they likely
use derived keys under the covers.
---
[7] is generally useful read about how many of the things we have been
discussing are done in SQL Server
---
[8] was referenced by Sehrope. In addition to support for AES 256 for
long term use, table 5.1 is interesting. It lists CBC mode as "legacy"
but not "future".
---
[9] IETF RFC for KDF
---
[10] IETF RFC for Key wrapping -- this is probably how we should wrap
the master key with the Key Encryption Key (KEK) -- i.e. the outer key
provided by the user or command on postmaster start
---

Based on all of that I cannot find a requirement that we use more than
one key per database.

But I did find that files in an encrypted file system are encrypted with
derived k

Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2019-07-14 Thread Joe Conway
On 7/13/19 2:41 PM, Joe Conway wrote:
> [2]
> https://www.postgresql.org/message-id/20190708194733.cztnwhqge4acepzw%40development

BTW I managed to mess up this link. This is what I intended to link
there (from Tomas):

[2] https://www.kernel.org/doc/html/latest/filesystems/fscrypt.html

I am sure I have confused the heck out of everyone reading what I wrote
by that error :-/

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature


Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2019-07-08 Thread Joe Conway
On 7/8/19 6:04 PM, Stephen Frost wrote:
> * Bruce Momjian (br...@momjian.us) wrote:
>> Uh, well, renaming the user was a big problem, but that is the only case
>> I can think of.  I don't see that as an issue for block or WAL sequence
>> numbers.  If we want to use a different nonce, we have to find a way to
>> store it or look it up efficiently.  Considering the nonce size, I don't
>> see how that is possible.
> 
> No, this also meant that, as an attacker, I *knew* the salt ahead of
> time and therefore could build rainbow tables specifically for that
> salt.  I could also use those *same* tables for any system where that
> user had an account, even if they used different passwords on different
> systems...
> 
> I appreciate that *some* of this might not be completely relevant for
> the way a nonce is used in cryptography, but I'd be very surprised to
> have a cryptographer tell me that a deterministic nonce didn't have
> similar issues or didn't reduce the value of the nonce significantly.

I have worked side by side on projects with bona fide cryptographers and
I can assure you that they recommended random nonces. Granted, that was
in the early 2000s, but I don't think "modern cryptography" has changed
that any more than "web scale" has made Postgres irrelevant in the
intervening years.

Related links:

https://defuse.ca/cbcmodeiv.htm
https://www.cryptofails.com/post/70059609995/crypto-noobs-1-initialization-vectors


Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature


Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2019-07-09 Thread Joe Conway
On 7/9/19 8:39 AM, Ryan Lambert wrote:
> Hi Thomas,
> 
>> CBC mode does require
>> random nonces, other modes may be fine with even sequences as long as
>> the values are not reused.   
> 
> I disagree that CBC mode requires random nonces, at least based on what
> NIST has published.  They only require that the IV (not the nonce) must
> be unpredictable per [1]:
> 
> " For the CBC and CFB modes, the IVs must be unpredictable."
> 
> The unpredictable IV can be generated from a non-random nonce including
> a counter:
> 
> "There are two recommended methods for generating unpredictable IVs. The
> first method is to apply the forward cipher function, under the same key
> that is used for the encryption of the plaintext, to a nonce. The nonce
> must be a data block that is unique to each execution of the encryption
> operation. For example, the nonce may be a counter, as described in
> Appendix B, or a message number."
> 
> [1] 
> https://nvlpubs.nist.gov/nistpubs/Legacy/SP/nistspecialpublication800-38a.pdf


The terms nonce and IV are often used more-or-less interchangeably, and
it is important to be clear when we are talking about an IV specifically
- an IV is a specific type of nonce. Nonce means "number used once".
i.e. unique, whereas an IV (for CBC use anyway) should be unique and
random but not necessarily kept secret. The NIST requirements that
Stephen referenced elsewhere on this thread are as I understand it
intended to ensure the random but unique property with high probability.

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature


Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2019-07-09 Thread Joe Conway
On 7/9/19 4:34 AM, Tomas Vondra wrote:
> On Mon, Jul 08, 2019 at 06:45:50PM -0400, Bruce Momjian wrote:
>>On Mon, Jul  8, 2019 at 06:23:13PM -0400, Bruce Momjian wrote:
>>> Yes, 'postgres' can be used to create a nice md5 rainbow table that
>>> works on many servers --- good point.  Are rainbow tables possible with
>>> something like AES?
>>>
>>> > I appreciate that *some* of this might not be completely relevant for
>>> > the way a nonce is used in cryptography, but I'd be very surprised to
>>> > have a cryptographer tell me that a deterministic nonce didn't have
>>> > similar issues or didn't reduce the value of the nonce significantly.
>>>
>>> This post:
>>>
>>> 
>>> https://stackoverflow.com/questions/36760973/why-is-random-iv-fine-for-aes-cbc-but-not-for-aes-gcm
>>>
>>> says:
>>>
>>> GCM is a variation on Counter Mode (CTR).  As you say, with any variant
>>> of Counter Mode, it is essential  that the Nonce is not repeated with
>>> the same key.  Hence CTR mode  Nonces often include either a counter or
>>> a timer element: something that  is guaranteed not to repeat over the
>>> lifetime of the key.
>>>
>>> CTR is what we use for WAL.  8k pages, we would use CBC, which says we
>>> need a random nonce.  I need to dig deeper into ECB mode attack.
>>
>>Looking here:
>>
>>  
>> https://stackoverflow.com/questions/36760973/why-is-random-iv-fine-for-aes-cbc-but-not-for-aes-gcm
>>
>>I think the issues is that we can't use a _counter_ for the nonce since
>>each page-0 of each table would use the same nonce, and each page-1,
>>etc.  I assume we would use the table oid and page number as the nonce.
>>We can't use the database oid since we copy the files from one database
>>to another via file system copy and not through the shared buffer cache
>>where they would be re encrypted.  Using relfilenode seems dangerous.
>>For WAL I think it would be the WAL segment number.  It would be nice
>>to mix that with the "Database system identifier:", but are these the
>>same on primary and replicas?
>>
> 
> Can't we just store the nonce somewhere? What if for encrypted pages we
> only use/encrypt (8kB - X bytes), where X bytes is just enough to store
> the nonce and maybe some other encryption metadata (key ID?).
> 
> This would be similar to the "special" area on a page, except that that
> relies on page header which is encrypted (and thus not accessible before
> decrypting the page).
> 
> So encryption would:
> 
> 1) encrypt the (8kB - X bytes) with nonce suitable for the used
>encryption mode (sequence, random, ...)
> 
> 2) store the nonce / key ID etc. to the reserved space
> 
> and encryption would
> 
> 1) look at the encryption metadata at the end (nonce, key )
> 
> 2) decrypt the page using that info

That is pretty much what I had been envisioning.

> Or maybe we could define a new relation fork for encrypted relations,
> storing all this metadata (not sure if we need that just for the main
> fork or for all forks including vm, fsm ...)?

I like the idea of a fork if it is workable.

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature


Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2019-07-09 Thread Joe Conway
On 7/9/19 4:23 AM, Tomas Vondra wrote:
> On Mon, Jul 08, 2019 at 06:24:40PM -0400, Joe Conway wrote:
>>On 7/8/19 6:04 PM, Stephen Frost wrote:
>>> * Bruce Momjian (br...@momjian.us) wrote:
>>>> Uh, well, renaming the user was a big problem, but that is the only case
>>>> I can think of.  I don't see that as an issue for block or WAL sequence
>>>> numbers.  If we want to use a different nonce, we have to find a way to
>>>> store it or look it up efficiently.  Considering the nonce size, I don't
>>>> see how that is possible.
>>>
>>> No, this also meant that, as an attacker, I *knew* the salt ahead of
>>> time and therefore could build rainbow tables specifically for that
>>> salt.  I could also use those *same* tables for any system where that
>>> user had an account, even if they used different passwords on different
>>> systems...
>>>
>>> I appreciate that *some* of this might not be completely relevant for
>>> the way a nonce is used in cryptography, but I'd be very surprised to
>>> have a cryptographer tell me that a deterministic nonce didn't have
>>> similar issues or didn't reduce the value of the nonce significantly.
>>
>>I have worked side by side on projects with bona fide cryptographers and
>>I can assure you that they recommended random nonces. Granted, that was
>>in the early 2000s, but I don't think "modern cryptography" has changed
>>that any more than "web scale" has made Postgres irrelevant in the
>>intervening years.
>>
>>Related links:
>>
>>https://defuse.ca/cbcmodeiv.htm
>>https://www.cryptofails.com/post/70059609995/crypto-noobs-1-initialization-vectors
>>
> 
> AFAIK it very much depends on the encryption mode. CBC mode does require
> random nonces, other modes may be fine with even sequences as long as
> the values are not reused. In which case we might even use the LSN, for
> example. And I wonder if sha2(LSN) could be considered "random", but
> maybe that's entirely silly idea ...


Yeah, we worked mostly with CBC so that could be the case in terms of
what is required. But I don't think it is ever a bad idea.

But as Stephen pointed out elsewhere on this thread, I think we should
be getting our guidance from places like NIST, which has actual experts
in this stuff.

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature


Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2019-07-09 Thread Joe Conway
On 7/9/19 6:07 AM, Peter Eisentraut wrote:
> On 2019-07-08 18:09, Joe Conway wrote:
>> In my mind, and in practice to a
>> large extent, a postgres tablespace == a unique mount point.
> 
> But a critical difference is that in file systems, a separate mount
> point has its own journal.

While it would be ideal to have separate WAL, and even separate shared
buffer pools, per tablespace, I think that is too much complexity for
the first implementation and we could have a single separate key for all
WAL for now. The main thing I don't think we want is e.g. a 50TB
database with everything encrypted with a single key -- for the reasons
previously stated.

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature


Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2019-07-08 Thread Joe Conway
On 7/8/19 11:56 AM, Peter Eisentraut wrote:
> On 2019-07-08 17:47, Stephen Frost wrote:
>> Of course, we can discuss if what websites do with over-the-wire
>> encryption is sensible to compare to what we want to do in PG for
>> data-at-rest, but then we shouldn't be talking about what websites do,
>> it'd make more sense to look at other data-at-rest encryption systems
>> and consider what they're doing.
> 
> So, how do encrypted file systems do it?  Are there any encrypted file
> systems in general use that allow encrypting only some files or
> encrypting different parts of the file system with different keys, or
> any of those other granular approaches being discussed?

Well it is fairly common, for good reason IMHO, to encrypt some mount
points and not others on a system. In my mind, and in practice to a
large extent, a postgres tablespace == a unique mount point.

There is a description here:

  https://wiki.archlinux.org/index.php/Disk_encryption

A pertinent quote:

After it has been derived, the master key is securely stored in memory
(e.g. in a kernel keyring), for as long as the encrypted block device or
folder is mounted.

It is usually not used for de/encrypting the disk data directly, though.
For example, in the case of stacked filesystem encryption, each file can
be automatically assigned its own encryption key. Whenever the file is
to be read/modified, this file key first needs to be decrypted using the
main key, before it can itself be used to de/encrypt the file contents:

   ╭╮
   ┊ master key ┊
   file on disk:   ╰┈┬┈┈╯
  ┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐│
  ╎╭───╮╎▼  ╭┈┈╮
  ╎│ encrypted file key│(decryption)━━━▶┊ file key ┊
  ╎╰───╯╎   ╰┬┈╯
  ╎┌───┐╎▼
┌┈┈┈┐
  ╎│ encrypted file│◀━(de/encryption)━━━▶┊ readable
file ┊
  ╎│ contents  │╎┊ contents
 ┊
  ╎└───┘╎
└┈┈┈┘
  └ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘

In a similar manner, a separate key (e.g. one per folder) may be used
for the encryption of file names in the case of stacked filesystem
encryption.


Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature


Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2019-07-08 Thread Joe Conway
On 7/8/19 10:19 AM, Bruce Momjian wrote:
> When people are asking for multiple keys (not just for key rotation),
> they are asking to have multiple keys that can be supplied by users only
> when they need to access the data.  Yes, the keys are always in the
> datbase, but the feature request is that they are only unlocked when the
> user needs to access the data.  Obviously, that will not work for
> autovacuum when the encryption is at the block level.

> If the key is always unlocked, there is questionable security value of
> having multiple keys, beyond key rotation.

That is not true. Having multiple keys also allows you to reduce the
amount of data encrypted with a single key, which is desirable because:

1. It makes cryptanalysis more difficult
2. Puts less data at risk if someone gets "lucky" in doing brute force


Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature


  1   2   3   4   5   >