date:20190708

Re: warning to publication created and wal_level is not set to logical

2019-07-08 Thread Lucas Viecelli

Follow the correct file, I added the wrong patch in the previous email


> Attached is the rebased
>


thanks a lot

*Lucas Viecelli*
diff --git a/src/backend/commands/publicationcmds.c b/src/backend/commands/publicationcmds.c
index 1ac1a71bd9..902180cedc 100644
--- a/src/backend/commands/publicationcmds.c
+++ b/src/backend/commands/publicationcmds.c
@@ -232,6 +232,14 @@ CreatePublication(CreatePublicationStmt *stmt)
 
 	InvokeObjectPostCreateHook(PublicationRelationId, puboid, 0);
 
+	if (wal_level != WAL_LEVEL_LOGICAL)
+	{
+		ereport(WARNING,
+			(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+			errmsg("insufficient wal_level to publish logical changes"),
+			errhint("Set wal_level to logical before creating subscriptions")));
+	}
+
 	return myself;
 }
 
@@ -763,4 +771,4 @@ AlterPublicationOwner_oid(Oid subid, Oid newOwnerId)
 	heap_freetuple(tup);
 
 	table_close(rel, RowExclusiveLock);
-}
+}
\ No newline at end of file
diff --git a/src/test/regress/expected/object_address.out b/src/test/regress/expected/object_address.out
index cdbde29502..18daf0d7b6 100644
--- a/src/test/regress/expected/object_address.out
+++ b/src/test/regress/expected/object_address.out
@@ -43,6 +43,8 @@ CREATE TRANSFORM FOR int LANGUAGE SQL (
 	FROM SQL WITH FUNCTION prsd_lextype(internal),
 	TO SQL WITH FUNCTION int4recv(internal));
 CREATE PUBLICATION addr_pub FOR TABLE addr_nsp.gentable;
+WARNING:  insufficient wal_level to publish logical changes
+HINT:  Set wal_level to logical before creating subscriptions
 CREATE SUBSCRIPTION regress_addr_sub CONNECTION '' PUBLICATION bar WITH (connect = false, slot_name = NONE);
 WARNING:  tables were not subscribed, you will have to run ALTER SUBSCRIPTION ... REFRESH PUBLICATION to subscribe the tables
 CREATE STATISTICS addr_nsp.gentable_stat ON a, b FROM addr_nsp.gentable;
diff --git a/src/test/regress/expected/publication.out b/src/test/regress/expected/publication.out
index 0e5e8f2b92..70d7119f21 100644
--- a/src/test/regress/expected/publication.out
+++ b/src/test/regress/expected/publication.out
@@ -5,7 +5,15 @@ CREATE ROLE regress_publication_user LOGIN SUPERUSER;
 CREATE ROLE regress_publication_user2;
 CREATE ROLE regress_publication_user_dummy LOGIN NOSUPERUSER;
 SET SESSION AUTHORIZATION 'regress_publication_user';
+SHOW wal_level;
+ wal_level 
+---
+ replica
+(1 row)
+
 CREATE PUBLICATION testpub_default;
+WARNING:  insufficient wal_level to publish logical changes
+HINT:  Set wal_level to logical before creating subscriptions
 COMMENT ON PUBLICATION testpub_default IS 'test publication';
 SELECT obj_description(p.oid, 'pg_publication') FROM pg_publication p;
  obj_description  
@@ -14,6 +22,8 @@ SELECT obj_description(p.oid, 'pg_publication') FROM pg_publication p;
 (1 row)
 
 CREATE PUBLICATION testpib_ins_trunct WITH (publish = insert);
+WARNING:  insufficient wal_level to publish logical changes
+HINT:  Set wal_level to logical before creating subscriptions
 ALTER PUBLICATION testpub_default SET (publish = update);
 -- error cases
 CREATE PUBLICATION testpub_xxx WITH (foo);
@@ -44,6 +54,8 @@ CREATE TABLE pub_test.testpub_nopk (foo int, bar int);
 CREATE VIEW testpub_view AS SELECT 1;
 CREATE TABLE testpub_parted (a int) PARTITION BY LIST (a);
 CREATE PUBLICATION testpub_foralltables FOR ALL TABLES WITH (publish = 'insert');
+WARNING:  insufficient wal_level to publish logical changes
+HINT:  Set wal_level to logical before creating subscriptions
 ALTER PUBLICATION testpub_foralltables SET (publish = 'insert, update');
 CREATE TABLE testpub_tbl2 (id serial primary key, data text);
 -- fail - can't add to for all tables publication
@@ -87,7 +99,11 @@ DROP PUBLICATION testpub_foralltables;
 CREATE TABLE testpub_tbl3 (a int);
 CREATE TABLE testpub_tbl3a (b text) INHERITS (testpub_tbl3);
 CREATE PUBLICATION testpub3 FOR TABLE testpub_tbl3;
+WARNING:  insufficient wal_level to publish logical changes
+HINT:  Set wal_level to logical before creating subscriptions
 CREATE PUBLICATION testpub4 FOR TABLE ONLY testpub_tbl3;
+WARNING:  insufficient wal_level to publish logical changes
+HINT:  Set wal_level to logical before creating subscriptions
 \dRp+ testpub3
   Publication testpub3
   Owner   | All tables | Inserts | Updates | Deletes | Truncates 
@@ -112,6 +128,8 @@ CREATE PUBLICATION testpub_fortbl FOR TABLE testpub_view;
 ERROR:  "testpub_view" is not a table
 DETAIL:  Only tables can be added to publications.
 CREATE PUBLICATION testpub_fortbl FOR TABLE testpub_tbl1, pub_test.testpub_nopk;
+WARNING:  insufficient wal_level to publish logical changes
+HINT:  Set wal_level to logical before creating subscriptions
 -- fail - already added
 ALTER PUBLICATION testpub_fortbl ADD TABLE testpub_tbl1;
 ERROR:  relation "testpub_tbl1" is already member of publication "testpub_fortbl"
@@ -197,6 +215,8 @@ SET ROLE regress_publication_user;
 GRANT CREATE ON DATABASE regression TO regress_publication_user2;
 SET ROLE

Re: range_agg

2019-07-08 Thread Pavel Stehule

Hi

po 8. 7. 2019 v 18:47 odesílatel Paul A Jungwirth <
p...@illuminatedcomputing.com> napsal:

> On Sat, Jul 6, 2019 at 12:13 PM Jeff Davis  wrote:
> >
> > On Fri, 2019-07-05 at 09:58 -0700, Paul A Jungwirth wrote:
> > > user-defined range types. So how about I start on it and see how it
> > > goes? I expect I can follow the existing code for range types pretty
> > > closely, so maybe it won't be too hard.
> >
> > That would be great to at least take a look. If it starts to look like
> > a bad idea, then we can re-evaluate and see if it's better to just
> > return arrays.
>
> I made some progress over the weekend. I don't have a patch yet but I
> thought I'd ask for opinions on the approach I'm taking:
>
> - A multirange type is an extra thing you get when you define a range
> (just like how you get a tstzrange[]). Therefore
>

I am not against a multirange type, but I miss a explanation why you
introduce new kind of types and don't use just array of ranges.

Introduction of new kind of types is not like introduction new type.

Regards

Pavel


- I don't need separate commands to add/drop multirange types. You get
> one when you define a range type, and if you drop a range type it gets
> dropped automatically.
> - I'm adding a new typtype for multiranges. ('m' in pg_type).
> - I'm just adding a mltrngtypeid column to pg_range. I don't think I
> need a new pg_multirange table.
> - You can have a multirange[].
> - Multirange in/out work just like arrays, e.g. '{"[1,3)", "[5,6)"}'
> - I'll add an anymultirange pseudotype. When it's the return type of a
> function that has an "anyrange" parameter, it will use the same base
> element type. (So basically anymultirange : anyrange :: anyarray ::
> anyelement.)
> - You can cast from a multirange to an array. The individual ranges
> are always sorted in the result array.
> - You can cast from an array to a multirange but it will error if
> there are overlaps (or not?). The array's ranges don't have to be
> sorted but they will be after a "round trip".
> - Interesting functions:
>   - multirange_length
>   - range_agg (range_union_agg if you like)
>   - range_intersection_agg
> - You can subscript a multirange like you do an array (? This could be
> a function instead.)
> - operators:
>   - union (++) and intersection (*):
> - We already have + for range union but it raises an error if
> there is a gap, so ++ is the same but with no errors.
> - r ++ r = mr (commutative, associative)
> - mr ++ r = mr
> - r ++ mr = mr
> - r * r = r (unchanged)
> - mr * r = r
> - r * mr = r
> - mr - r = mr
> - r - mr = mr
> - mr - mr = mr
>   - comparison
> - mr = mr
> - mr @> x
> - mr @> r
> - mr @> mr
> - x <@ mr
> - r <@ mr
> - mr <@ mr
> - mr << mr (strictly left of)
> - mr >> mr (strictly right of)
> - mr &< mr (does not extend to the right of)
> - mr &> mr (does not extend to the left of)
>   - inverse operator?:
> - the inverse of {"[1,2)"} would be {"[null, 1)", "[2, null)"}.
> - not sure we want this or what the symbol should be. I don't like
> -mr as an inverse because then mr - mr != mr ++ -mr.
>
> Anything in there you think should be different?
>
> Thanks,
> Paul
>
>
>

Re: tableam vs. TOAST

2019-07-08 Thread Prabhat Sahu

On Mon, Jul 8, 2019 at 9:06 PM Robert Haas  wrote:

> On Tue, Jun 25, 2019 at 2:19 AM Prabhat Sahu
>  wrote:
> > I have tested the TOAST patches(v3) with different storage options
> like(MAIN, EXTERNAL, EXTENDED, etc.), and
> > combinations of compression and out-of-line storage options.
> > I have used a few dummy tables with various tuple count say 10k, 20k,
> 40k, etc. with different column lengths.
> > Used manual CHECKPOINT option with (checkpoint_timeout = 1d,
> max_wal_size = 10GB) before the test to avoid performance fluctuations,
> > and calculated the results as a median value of a few consecutive test
> executions.
>
> Thanks for testing.
>
> > All the observation looks good to me,
> > except for the "Test1" for SCC UPDATE with tuple count(10K/20K), for SCC
> INSERT with tuple count(40K)  there was a slightly increse in time taken
> > incase of "with patch" result. For a better observation, I also have ran
> the same "Test 1" for higher tuple count(i.e. 80K), and it also looks fine.
>
> Did you run each test just once?  How stable are the results?
>
No, I have executed the test multiple times(7times each) and calculated the
result as the median among those,
and the result looks stable(with v3 patches).

-- 

With Regards,

Prabhat Kumar Sahu
Skype ID: prabhat.sahu1984
EnterpriseDB Software India Pvt. Ltd.

The Postgres Database Company

Re: [PATCH v4] Add \warn to psql

2019-07-08 Thread David G. Johnston

On Mon, Jul 8, 2019 at 8:35 PM Bruce Momjian  wrote:

> On Mon, Jul  8, 2019 at 11:29:00PM -0400, Tom Lane wrote:
> > Bruce Momjian  writes:
> > > On Fri, Jul  5, 2019 at 11:29:03PM +0200, David Fetter wrote:
> > >>> I fixed that, but I'm wondering if we should back-patch that fix
> > >>> or leave the back branches alone.
> >
> > >> +0.5 for back-patching.
> >
> > > Uh, if this was done in a major release I am thinking we have to
> mention
> > > this as an incompatibility, which means we should probably not
> backpatch
> > > it.
> >
> > How is "clearly doesn't match the documentation" not a bug?
>
> Uh, it is a bug, but people might be expecting the existing behavior
> without consulting the documentation, and we don't expect people to be
> testing minor releases.
>
> Anyway, it seems to be have been applied only to head so far.
>

I would leave it at that.  Won't Fix for released versions (neither code
nor documentation) as we describe the intended usage so people do the right
thing (which is highly likely anyway - though something like "\echo
:content_to_echo -n" wouldn't surprise me) but those that learned through
trial and error only experience a behavior change on a major release as
they would expect.  This doesn't seem important enough to warrant breaking
the general rule.  Though I'd give a +1 to v12; at least for me Beta is
generally fair game.

David J.

Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2019-07-08 Thread Ryan Lambert

Hey everyone,

Here is my input regarding nonces and randomness.

> As I understand it, the NIST recommendation is a 96-bit *random* nonce,

I could not find that exact requirement in the NIST documents, though given
the volume of that library it would be possible to miss.  The
recommendation I repeatedly saw for the nonce was unique.  There is also an
important distinction that the nonce is not the Initialization Vector (IV),
it can be used as part of the IV, more on that later.

The most succinct definition for nonce I found was in SP-800-38A [1] page
4:  "A value that is only used once."
SP-800-90A [2] (page 6) expands on the definition: "A time-varying value
that has at most a negligible chance of repeating, e.g., a random value
that is generated anew for each use, a timestamp, a sequence number, or
some combination of these."

The second definition references randomness but does not require it.  [1]
(pg 19) reinforces the importance of uniqueness:  "A procedure should be
established to ensure the uniqueness of the message nonces"


> That's certainly interesting, but such a brute-force with every key
> would allow it, where, if you use a random nonce, then such an attack
> would have to start working only after having access to the data, and
> not be something that could be pre-computed
> to talk about IV's not being secret

An unpredictable IV can be generated using a non-random nonce including a
counter, per [1] (pg 20):

"The first method is to apply the forward cipher function, under the same
key that is used for the encryption of the
plaintext, to a nonce. The nonce must be a data block that is unique to
each execution of the
encryption operation. For example, the nonce may be a counter, as described
in Appendix B, or
a message number. The second method is to generate a random data block
using a FIPS approved
random number generator."

A unique nonce gets passed through the cipher with the key, the uniqueness
of the nonce is the strength with this method and the key + cipher handle
the randomness for the IV.  The second method listed above does require
a random number generator and if chosen those must conform to [2].

> I'm not a fan of the idea of using something which is predictable as a
> nonce.  Using the username as the salt for our md5 password mechanism
> was, all around, a bad idea.  This seems like it's repeating that
> mistake.

Yeah that MD5 stuff wasn't the greatest.  With MD5 and the username as a
salt, the salt is known and you only need to work out the password.  In
reality, you only need to find a collision with that password, the high
collision rate with MD5 (2^64) [3] made things really bad.  That
(collisions) is not a significant problem today with AES to the best of my
knowledge.

Further, knowing the nonce gets you nowhere, it isn't the salt until it is
ran through the forward cipher with the encryption key.  Even with the
nonce the attacker has doesn't know the salt unless they steal the key, and
that's a bigger problem.

The strictest definition of nonce I found was in [2] (pg 19) defining
nonces to use in the process of random generation:

"The nonce shall be either:
a. A value with at least (security_strength/2) bits of entropy, or
b. A value that is expected to repeat no more often than a
(security_strength/2)-bit random
string would be expected to repeat."

Even there it is randomness (a) or uniqueness (b).

[1]
https://nvlpubs.nist.gov/nistpubs/Legacy/SP/nistspecialpublication800-38a.pdf
[2]
https://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.800-90Ar1.pdf
[3]
https://stackoverflow.com/questions/8852668/what-is-the-clash-rate-for-md5

Thanks,

Ryan Lambert
RustProof Labs

Re: warning to publication created and wal_level is not set to logical

2019-07-08 Thread Lucas Viecelli

Hi Thomas.

Attached is the rebased


> The July Commitfest has started.  This patch is in "Needs review"
> status, but it doesn't apply.  If I read the above discussion
> correctly, it seems there is agreement that a warning here is a good
> idea to commit this patch.  Could you please post a rebased patch?
>
>
I followed your suggestion and changed the message and added HINT. I hope
everything is agreed now.


> I wonder if it would be more typical project style to put the clue on
> what to do into an "errhint" message, something like this:
>
> WARNING: insufficient wal_level to publish logical changes
> HINT:  Set wal_level to logical before creating subscriptions.
>

-- 

*Lucas Viecelli*



diff --git a/src/backend/commands/publicationcmds.c b/src/backend/commands/publicationcmds.c
index 4d48be0b92..5d18146c25 100644
--- a/src/backend/commands/publicationcmds.c
+++ b/src/backend/commands/publicationcmds.c
@@ -232,6 +232,14 @@ CreatePublication(CreatePublicationStmt *stmt)
 
 	InvokeObjectPostCreateHook(PublicationRelationId, puboid, 0);
 
+	if (wal_level != WAL_LEVEL_LOGICAL)
+	{
+		ereport(WARNING,
+			(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+			errmsg("insufficient wal_level to publish logical changes"),
+			errhint("Set wal_level to logical before creating subscriptions")));
+	}
+
 	return myself;
 }
 
diff --git a/src/test/regress/expected/object_address.out b/src/test/regress/expected/object_address.out
index 02070fd8af..e4d1a63e3f 100644
--- a/src/test/regress/expected/object_address.out
+++ b/src/test/regress/expected/object_address.out
@@ -43,6 +43,8 @@ CREATE TRANSFORM FOR int LANGUAGE SQL (
 	FROM SQL WITH FUNCTION prsd_lextype(internal),
 	TO SQL WITH FUNCTION int4recv(internal));
 CREATE PUBLICATION addr_pub FOR TABLE addr_nsp.gentable;
+WARNING:  insufficient wal_level to publish logical changes
+HINT:  Set wal_level to logical before creating subscriptions
 CREATE SUBSCRIPTION addr_sub CONNECTION '' PUBLICATION bar WITH (connect = false, slot_name = NONE);
 WARNING:  tables were not subscribed, you will have to run ALTER SUBSCRIPTION ... REFRESH PUBLICATION to subscribe the tables
 CREATE STATISTICS addr_nsp.gentable_stat ON a, b FROM addr_nsp.gentable;
diff --git a/src/test/regress/expected/publication.out b/src/test/regress/expected/publication.out
index afbbdd543d..a2d85a3f7f 100644
--- a/src/test/regress/expected/publication.out
+++ b/src/test/regress/expected/publication.out
@@ -5,7 +5,15 @@ CREATE ROLE regress_publication_user LOGIN SUPERUSER;
 CREATE ROLE regress_publication_user2;
 CREATE ROLE regress_publication_user_dummy LOGIN NOSUPERUSER;
 SET SESSION AUTHORIZATION 'regress_publication_user';
+SHOW wal_level;
+ wal_level 
+---
+ replica
+(1 row)
+
 CREATE PUBLICATION testpub_default;
+WARNING:  insufficient wal_level to publish logical changes
+HINT:  Set wal_level to logical before creating subscriptions
 COMMENT ON PUBLICATION testpub_default IS 'test publication';
 SELECT obj_description(p.oid, 'pg_publication') FROM pg_publication p;
  obj_description  
@@ -14,6 +22,8 @@ SELECT obj_description(p.oid, 'pg_publication') FROM pg_publication p;
 (1 row)
 
 CREATE PUBLICATION testpib_ins_trunct WITH (publish = insert);
+WARNING:  insufficient wal_level to publish logical changes
+HINT:  Set wal_level to logical before creating subscriptions
 ALTER PUBLICATION testpub_default SET (publish = update);
 -- error cases
 CREATE PUBLICATION testpub_xxx WITH (foo);
@@ -44,6 +54,8 @@ CREATE TABLE pub_test.testpub_nopk (foo int, bar int);
 CREATE VIEW testpub_view AS SELECT 1;
 CREATE TABLE testpub_parted (a int) PARTITION BY LIST (a);
 CREATE PUBLICATION testpub_foralltables FOR ALL TABLES WITH (publish = 'insert');
+WARNING:  insufficient wal_level to publish logical changes
+HINT:  Set wal_level to logical before creating subscriptions
 ALTER PUBLICATION testpub_foralltables SET (publish = 'insert, update');
 CREATE TABLE testpub_tbl2 (id serial primary key, data text);
 -- fail - can't add to for all tables publication
@@ -87,7 +99,11 @@ DROP PUBLICATION testpub_foralltables;
 CREATE TABLE testpub_tbl3 (a int);
 CREATE TABLE testpub_tbl3a (b text) INHERITS (testpub_tbl3);
 CREATE PUBLICATION testpub3 FOR TABLE testpub_tbl3;
+WARNING:  insufficient wal_level to publish logical changes
+HINT:  Set wal_level to logical before creating subscriptions
 CREATE PUBLICATION testpub4 FOR TABLE ONLY testpub_tbl3;
+WARNING:  insufficient wal_level to publish logical changes
+HINT:  Set wal_level to logical before creating subscriptions
 \dRp+ testpub3
   Publication testpub3
   Owner   | All tables | Inserts | Updates | Deletes | Truncates 
@@ -112,6 +128,8 @@ CREATE PUBLICATION testpub_fortbl FOR TABLE testpub_view;
 ERROR:  "testpub_view" is not a table
 DETAIL:  Only tables can be added to publications.
 CREATE PUBLICATION testpub_fortbl FOR TABLE testpub_tbl1, pub_test.testpub_nopk;

Re: [PATCH v4] Add \warn to psql

2019-07-08 Thread Bruce Momjian

On Mon, Jul  8, 2019 at 11:29:00PM -0400, Tom Lane wrote:
> Bruce Momjian  writes:
> > On Fri, Jul  5, 2019 at 11:29:03PM +0200, David Fetter wrote:
> >>> I fixed that, but I'm wondering if we should back-patch that fix
> >>> or leave the back branches alone.
> 
> >> +0.5 for back-patching.
> 
> > Uh, if this was done in a major release I am thinking we have to mention
> > this as an incompatibility, which means we should probably not backpatch
> > it.
> 
> How is "clearly doesn't match the documentation" not a bug?

Uh, it is a bug, but people might be expecting the existing behavior
without consulting the documentation, and we don't expect people to be
testing minor releases.

Anyway, it seems to be have been applied only to head so far.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +

Re: [RFC] [PATCH] Flexible "partition pruning" hook

2019-07-08 Thread Mike Palmiotto

On Mon, Jul 8, 2019 at 10:59 AM Mike Palmiotto
 wrote:
>
> On Sun, Jul 7, 2019 at 9:46 PM Thomas Munro  wrote:
>>
>> On Sat, Apr 6, 2019 at 3:06 PM Andres Freund  wrote:
>> > I've moved this to the next CF, and marked it as targeting v13.
>>
>> Hi Mike,
>>
>> Commifest 1 for PostgreSQL 13 is here.  I was just going through the
>> automated build results for the 'fest and noticed that your patch
>> causes a segfault in the regression tests (possibly due to other
>> changes that have been made in master since February).  You can see
>> the complete backtrace on the second link below, but it looks like
>> this is happening every time, so hopefully not hard to track down
>> locally.
>>
>> https://ci.appveyor.com/project/postgresql-cfbot/postgresql/build/1.0.46412
>> https://travis-ci.org/postgresql-cfbot/postgresql/builds/555380617

Attached you will find a patch (rebased on master) that passes all
tests on my local CentOS 7 box. Thanks again for the catch!

-- 
Mike Palmiotto
Software Engineer
Crunchy Data Solutions
https://crunchydata.com
From 76be9aa755c0733852074c84e3e09102d1768427 Mon Sep 17 00:00:00 2001
From: Mike Palmiotto 
Date: Mon, 8 Jul 2019 12:46:21 +
Subject: [PATCH] Flexible partition pruning hook

This hook allows partition pruning to be performed for entire child relations
which are deemed inaccessible by RLS policy prior to those relations being
opened on disk.
---
 src/backend/catalog/pg_inherits.c |  6 ++
 src/backend/optimizer/path/allpaths.c |  7 +++
 src/include/partitioning/partprune.h  | 18 ++
 3 files changed, 31 insertions(+)

diff --git a/src/backend/catalog/pg_inherits.c b/src/backend/catalog/pg_inherits.c
index 00f7957..c1fa8b0 100644
--- a/src/backend/catalog/pg_inherits.c
+++ b/src/backend/catalog/pg_inherits.c
@@ -24,7 +24,9 @@
 #include "access/table.h"
 #include "catalog/indexing.h"
 #include "catalog/pg_inherits.h"
+#include "optimizer/cost.h"
 #include "parser/parse_type.h"
+#include "partitioning/partprune.h"
 #include "storage/lmgr.h"
 #include "utils/builtins.h"
 #include "utils/fmgroids.h"
@@ -92,6 +94,10 @@ find_inheritance_children(Oid parentrelId, LOCKMODE lockmode)
 	while ((inheritsTuple = systable_getnext(scan)) != NULL)
 	{
 		inhrelid = ((Form_pg_inherits) GETSTRUCT(inheritsTuple))->inhrelid;
+
+		if (!InvokePartitionChildAccessHook(inhrelid))
+			continue;
+
 		if (numoids >= maxoids)
 		{
 			maxoids *= 2;
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index b772348..bb81b1b 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -1038,6 +1038,13 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
 			continue;
 		}
 
+		if (!InvokePartitionChildAccessHook(childRTE->relid))
+		{
+			/* Implement custom partition pruning filter*/
+			set_dummy_rel_pathlist(childrel);
+			continue;
+		}
+
 		/*
 		 * Constraint exclusion failed, so copy the parent's join quals and
 		 * targetlist to the child, with appropriate variable substitutions.
diff --git a/src/include/partitioning/partprune.h b/src/include/partitioning/partprune.h
index b3e9268..3138ff9 100644
--- a/src/include/partitioning/partprune.h
+++ b/src/include/partitioning/partprune.h
@@ -15,6 +15,7 @@
 #define PARTPRUNE_H
 
 #include "nodes/execnodes.h"
+#include "optimizer/cost.h"
 #include "partitioning/partdefs.h"
 
 struct PlannerInfo;/* avoid including pathnodes.h here */
@@ -68,6 +69,23 @@ typedef struct PartitionPruneContext
 #define PruneCxtStateIdx(partnatts, step_id, keyno) \
 	((partnatts) * (step_id) + (keyno))
 
+/* Custom partition child access hook. Provides further partition pruning given
+ * child OID.
+ */
+typedef bool (*partitionChildAccess_hook_type) (Oid childOID);
+PGDLLIMPORT partitionChildAccess_hook_type partitionChildAccess_hook;
+
+/* Macro to use partitionChildAccess_hook. Handles NULL-checking. */
+static inline bool InvokePartitionChildAccessHook (Oid childOID)
+{
+	if (partitionChildAccess_hook && enable_partition_pruning && childOID)
+	{
+		return (*partitionChildAccess_hook) (childOID);
+	}
+
+	return true;
+}
+
 extern PartitionPruneInfo *make_partition_pruneinfo(struct PlannerInfo *root,
 	struct RelOptInfo *parentrel,
 	List *subpaths,
-- 
1.8.3.1

Re: [PATCH v4] Add \warn to psql

2019-07-08 Thread Tom Lane

Bruce Momjian  writes:
> On Fri, Jul  5, 2019 at 11:29:03PM +0200, David Fetter wrote:
>>> I fixed that, but I'm wondering if we should back-patch that fix
>>> or leave the back branches alone.

>> +0.5 for back-patching.

> Uh, if this was done in a major release I am thinking we have to mention
> this as an incompatibility, which means we should probably not backpatch
> it.

How is "clearly doesn't match the documentation" not a bug?

regards, tom lane

Re: pgbench - add \aset to store results of a combined query

2019-07-08 Thread Ibrar Ahmed

>  SELECT 1 AS one \;
>  SELECT 2 AS two UNION SELECT 2 \;
>  SELECT 3 AS three \aset
>
> will set both "one" and "three", while "two" is not set because there were 
> two rows. It is a kind of more permissive \gset.

Are you sure two is not set :)? 
SELECT 2 AS two UNION SELECT 2;   -- only returns one row.
but
SELECT 2 AS two UNION SELECT 10;  -- returns the two rows.


Is this the expected behavior with \aset? In my opinion throwing a valid error 
like "client 0 script 0 command 0 query 0: expected one row, got 2" make more 
sense.


 - With \gset 

SELECT 2 AS two UNION SELECT 10 \gset
INSERT INTO test VALUES(:two,0,0);

$ pgbench postgres -f pgbench_aset.sql -T 1 -j 1 -c 1 -s 10
starting vacuum...end.
client 0 script 0 command 0 query 0: expected one row, got 2
transaction type: pgbench_aset.sql
scaling factor: 10
query mode: simple
number of clients: 1
number of threads: 1
duration: 1 s
number of transactions actually processed: 0
Run was aborted; the above results are incomplete.


- With \aset

SELECT 2 AS two UNION SELECT 10 \aset
INSERT INTO test VALUES(:two,0,0);

vagrant@vagrant:~/percona/postgresql$ pgbench postgres -f pgbench_aset.sql -T 1 
-j 1 -c 1 -s 10
starting vacuum...end.
client 0 script 0 aborted in command 1 query 0: ERROR:  syntax error at or near 
":"
LINE 1: INSERT INTO test VALUES(:two,0,0);
^
transaction type: pgbench_aset.sql
scaling factor: 10
query mode: simple
number of clients: 1
number of threads: 1
duration: 1 s
number of transactions actually processed: 0
Run was aborted; the above results are incomplete.

The new status of this patch is: Waiting on Author

Re: PGOPTIONS="-fh" make check gets stuck since Postgres 11

2019-07-08 Thread Michael Paquier

On Tue, Jul 09, 2019 at 03:04:27PM +1200, Thomas Munro wrote:
> It's my mistake.  I'll fix it later today.

Thanks!
--
Michael


signature.asc
Description: PGP signature

Re: Postgres 11: Table Partitioning and Primary Keys

2019-07-08 Thread David G. Johnston

On Mon, Jul 8, 2019 at 7:59 PM Michael Paquier  wrote:

> Attached is an idea of patch for the documentation, using this
> wording:
> + 
> +  
> +   When defining a primary key on a partitioned table, the primary
> +   key column must be included in the partition key.
> +  
> + 
> If somebody has any better idea for that paragraph, please feel free.
>

Reads a bit backward.  How about:

"As uniqueness can only be enforced within an individual partition when
defining a primary key on a partitioned table all columns present in the
partition key must also exist in the primary key."

David J.

Re: Postgres 11: Table Partitioning and Primary Keys

2019-07-08 Thread Tom Lane

Michael Paquier  writes:
> Attached is an idea of patch for the documentation, using this
> wording:
> + 
> +  
> +   When defining a primary key on a partitioned table, the primary
> +   key column must be included in the partition key.
> +  
> + 

Isn't it the other way around, that the partition key column(s) must be
included in the primary key?  Maybe I'm confused, but it seems like
we couldn't enforce PK uniqueness otherwise.

regards, tom lane

Re: PGOPTIONS="-fh" make check gets stuck since Postgres 11

2019-07-08 Thread Thomas Munro

On Tue, Jul 9, 2019 at 2:45 PM Michael Paquier  wrote:
> On Tue, Jul 09, 2019 at 02:30:51PM +1200, Thomas Munro wrote:
> > Yeah.  I had obviously never noticed that test script.  +1 for just
> > enabling hash joins the top of join_hash.sql in 12+, and the
> > equivalent section in 11's join.sql (which is luckily at the end of
> > the file).
>
> Right, I did not pay much attention to REL_11_STABLE.  In this case
> the test begins around line 2030 and reaches the bottom of the file.
> I would actually add a RESET at the bottom of it to avoid any tests to
> be impacted, as usually bug-fix tests are just appended.  Thomas,
> perhaps you would prefer fixing it yourself?  Or should I?

It's my mistake.  I'll fix it later today.

-- 
Thomas Munro
https://enterprisedb.com

Re: doc: improve PG 12 to_timestamp()/to_date() wording

2019-07-08 Thread Bruce Momjian

On Sat, Jul  6, 2019 at 03:24:25PM -0500, Justin Pryzby wrote:
> On Tue, Apr 30, 2019 at 07:14:04PM -0500, Justin Pryzby wrote:
> > On Tue, Apr 30, 2019 at 09:48:14PM +0300, Alexander Korotkov wrote:
> > > I'd like to add couple of comments from my side.
> > 
> > > > -   returns an error because the second template string space is 
> > > > consumed
> > > > -   by the letter J in the input string.
> > > > +   returns an error because the second space in the template 
> > > > string consumes
> > > > +   the letter M from the input string.
> > > 
> > > Why M?  There is no letter "M" is input string.
> > > The issue here is that we already consumed "J" from "JUN" and trying
> > > to match "UN" to "MON".  So, I think we should live
> > > J here.  The rest of this change looks good.
> > 
> > Seems like I confused myself while resolving rebase conflict.
> > 
> > Thanks for checking.
> 
> Find attached updated patch, which seems to still be needed.
> 
> This was subsumed and now extracted from a larger patch, from which Michael at
> one point applied a few hunks.
> I have some minor updates based on review from Andres, but there didn't seem 
> to
> be much interest so I haven't pursued it.
> https://www.postgresql.org/message-id/20190520182001.GA25675%40telsasoft.com

Patch applied back through PG 12.  Thanks.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +

Re: Postgres 11: Table Partitioning and Primary Keys

2019-07-08 Thread Michael Paquier

On Mon, Jul 08, 2019 at 10:37:37PM -0400, Bruce Momjian wrote:
> On Fri, Jul  5, 2019 at 09:20:07PM +, PG Doc comments form wrote:
>> In the documentation for Postgres 11 table partitioning, there is no mention
>> of the requirement that the Primary Key of a partitioned table must contain
>> the partition key.
>> In fact the documentation on primary keys is so light that I am not even
>> 100% sure the above is correct.  If the following table is not possible in
>> Postgres 11, the documentation should find some way to make that clear.  
>> 
>> I believe this should be documented in section "5.10.2.3. Limitations"
> 
> Can someone comment on this?  CC to hackers.

Yep, that's the case:
=# CREATE TABLE parent_tab (id int, id2 int primary key)
 PARTITION BY RANGE (id);
ERROR:  0A000: insufficient columns in PRIMARY KEY constraint
definition
DETAIL:  PRIMARY KEY constraint on table "parent_tab" lacks column
"id" which is part of the partition key.
LOCATION:  DefineIndex, indexcmds.c:894

I agree with the report here that adding one sentence to 5.10.2.3
which is for the limitations of declarative partitioning would be a
good idea.  We don't mention the limitation in CREATE TABLE either
(which would be rather incorrect IMO).

Attached is an idea of patch for the documentation, using this
wording:
+ 
+  
+   When defining a primary key on a partitioned table, the primary
+   key column must be included in the partition key.
+  
+ 
If somebody has any better idea for that paragraph, please feel free.
--
Michael

signature.asc
Description: PGP signature

Re: PGOPTIONS="-fh" make check gets stuck since Postgres 11

2019-07-08 Thread Tom Lane

Michael Paquier  writes:
> On Tue, Jul 09, 2019 at 02:30:51PM +1200, Thomas Munro wrote:
>> Yeah.  I had obviously never noticed that test script.  +1 for just
>> enabling hash joins the top of join_hash.sql in 12+, and the
>> equivalent section in 11's join.sql (which is luckily at the end of
>> the file).

> Right, I did not pay much attention to REL_11_STABLE.  In this case
> the test begins around line 2030 and reaches the bottom of the file.
> I would actually add a RESET at the bottom of it to avoid any tests to
> be impacted, as usually bug-fix tests are just appended.

Agreed that the scope should be limited.  But in 12/HEAD, I think the
relevant tests are all wrapped into one transaction block, so that
using SET LOCAL would be enough.  Not sure if 11 is the same.

regards, tom lane

Re: PGOPTIONS="-fh" make check gets stuck since Postgres 11

2019-07-08 Thread Michael Paquier

On Tue, Jul 09, 2019 at 02:30:51PM +1200, Thomas Munro wrote:
> Yeah.  I had obviously never noticed that test script.  +1 for just
> enabling hash joins the top of join_hash.sql in 12+, and the
> equivalent section in 11's join.sql (which is luckily at the end of
> the file).

Right, I did not pay much attention to REL_11_STABLE.  In this case
the test begins around line 2030 and reaches the bottom of the file.
I would actually add a RESET at the bottom of it to avoid any tests to
be impacted, as usually bug-fix tests are just appended.  Thomas,
perhaps you would prefer fixing it yourself?  Or should I?
--
Michael


signature.asc
Description: PGP signature

Re: Postgres 11: Table Partitioning and Primary Keys

2019-07-08 Thread Bruce Momjian

On Fri, Jul  5, 2019 at 09:20:07PM +, PG Doc comments form wrote:
> The following documentation comment has been logged on the website:
> 
> Page: https://www.postgresql.org/docs/11/ddl-partitioning.html
> Description:
> 
> In the documentation for Postgres 11 table partitioning, there is no mention
> of the requirement that the Primary Key of a partitioned table must contain
> the partition key.
> In fact the documentation on primary keys is so light that I am not even
> 100% sure the above is correct.  If the following table is not possible in
> Postgres 11, the documentation should find some way to make that clear.  
> 
> -- Create partitioned table with partition key not in primary key 
> create table events (
> id bigint not null default nextval('events_id_seq'),
> created_date timestamp not null,
> constraint events_pk primary key (id)
> ) partition by range (created_date);
> -- end create table
> 
> I believe this should be documented in section "5.10.2.3. Limitations"

Can someone comment on this?  CC to hackers.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +

Re: PGOPTIONS="-fh" make check gets stuck since Postgres 11

2019-07-08 Thread Thomas Munro

On Tue, Jul 9, 2019 at 2:22 PM Tom Lane  wrote:
> Thomas Munro  writes:
> > Based on a suggestion from Andres (if I recall correctly), I wrapped
> > each individual test in savepoint/rollback, and then set just the GUCs
> > needed to get the plan shape and execution code path I wanted to
> > exercise, and I guess I found that I only needed to disable merge
> > joins for some of them.  The idea was that the individual tests could
> > be understood independently.
>
> But per this discussion, they can only be "understood independently"
> if you make some assumptions about the prevailing values of the
> planner GUCs.

Yeah.  I had obviously never noticed that test script.  +1 for just
enabling hash joins the top of join_hash.sql in 12+, and the
equivalent section in 11's join.sql (which is luckily at the end of
the file).

-- 
Thomas Munro
https://enterprisedb.com

Re: PGOPTIONS="-fh" make check gets stuck since Postgres 11

2019-07-08 Thread Tom Lane

Thomas Munro  writes:
> On Tue, Jul 9, 2019 at 2:19 AM Tom Lane  wrote:
>> Given the purposes of this test, I think it'd be reasonable to force
>> both enable_hashjoin = on and enable_mergejoin = off at the very top
>> of the join_hash script, or the corresponding place in join.sql in
>> v11.  Thomas, was there a specific reason for forcing enable_mergejoin
>> = off for only some of these tests?

> Based on a suggestion from Andres (if I recall correctly), I wrapped
> each individual test in savepoint/rollback, and then set just the GUCs
> needed to get the plan shape and execution code path I wanted to
> exercise, and I guess I found that I only needed to disable merge
> joins for some of them.  The idea was that the individual tests could
> be understood independently.

But per this discussion, they can only be "understood independently"
if you make some assumptions about the prevailing values of the
planner GUCs.

regards, tom lane

Re: PGOPTIONS="-fh" make check gets stuck since Postgres 11

2019-07-08 Thread Tom Lane

Michael Paquier  writes:
> On Mon, Jul 08, 2019 at 03:21:41PM -0400, Tom Lane wrote:
>> Having said that, join_hash.sql in particular seems to have zero
>> value if it's not testing hash joins, so I think it'd be reasonable
>> for it to override a global enable_hashjoin = off setting.  None of
>> the other regression test scripts seem to take nearly as much of a
>> performance hit from globally forcing poor plans.

> I am a bit confused here.  Don't you mean to have enable_hashjoin =
> *on* at the top of hash_join.sql instead like in the attached?

Right, overriding any enable_hashjoin = off that might've come from
PGOPTIONS or wherever.

regards, tom lane

RE: [PATCH] Speedup truncates of relation forks

2019-07-08 Thread Jamison, Kirk

Hi Thomas,

Thanks for checking.

> On Fri, Jul 5, 2019 at 3:03 PM Jamison, Kirk  wrote:
> > I updated the patch which is similar to V3 of the patch, but
> > addressing my problem in (5) in the previous email regarding
> FreeSpaceMapVacuumRange.
> > It seems to pass the regression test now. Kindly check for validation.
> 
> Hi Kirk,
> 
> FYI there are a couple of compiler errors reported:

Attached is the updated patch (V5) fixing the compiler errors.

Comments and reviews about the patch/tests are very much welcome.

Regards,
Kirk Jamison


v5-0001-Speedup-truncates-of-relation-forks.patch
Description: v5-0001-Speedup-truncates-of-relation-forks.patch

Re: PGOPTIONS="-fh" make check gets stuck since Postgres 11

2019-07-08 Thread Thomas Munro

On Tue, Jul 9, 2019 at 2:19 AM Tom Lane  wrote:
> Given the purposes of this test, I think it'd be reasonable to force
> both enable_hashjoin = on and enable_mergejoin = off at the very top
> of the join_hash script, or the corresponding place in join.sql in
> v11.  Thomas, was there a specific reason for forcing enable_mergejoin
> = off for only some of these tests?

Based on a suggestion from Andres (if I recall correctly), I wrapped
each individual test in savepoint/rollback, and then set just the GUCs
needed to get the plan shape and execution code path I wanted to
exercise, and I guess I found that I only needed to disable merge
joins for some of them.  The idea was that the individual tests could
be understood independently.

-- 
Thomas Munro
https://enterprisedb.com

Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2019-07-08 Thread James Coleman

Note: As I was writing this, I saw a new email come in from Tomas with
a new patch series, and some similar observations. I'll look at that
patch series more, but I think it's likely far more complete, so will
end up going with that. I wanted to send this email anyway to at least
capture the debugging process for reference.

On Mon, Jul 8, 2019 at 12:07 PM James Coleman  wrote:
>
> On Mon, Jul 8, 2019 at 10:58 AM Tomas Vondra
>  wrote:
> >
> > On Mon, Jul 08, 2019 at 10:32:18AM -0400, James Coleman wrote:
> > >On Mon, Jul 8, 2019 at 9:59 AM Tomas Vondra
> > > wrote:
> > >>
> > >> On Mon, Jul 08, 2019 at 09:22:39AM -0400, James Coleman wrote:
> > >> >On Sun, Jul 7, 2019 at 5:02 PM Tomas Vondra
> > >> > wrote:
> > >> >> We're running query like this:
> > >> >>
> > >> >>   SELECT a, sum(b), count(*) FROM pagg_tab_ml GROUP BY a HAVING 
> > >> >> avg(b) < 3 ORDER BY 1, 2, 3
> > >> >>
> > >> >> but we're trying to add the incremental sort *before* the aggregation,
> > >> >> because the optimizer also considers group aggregate with a sorted
> > >> >> input. And (a) is a prefix of (a,sum(b),count(b)) so we think we
> > >> >> actually can do this, but clearly that's nonsense, because we can't
> > >> >> possibly know the aggregates yet. Hence the error.
> > >> >>
> > >> >> If this is the actual issue, we need to ensure we actually can 
> > >> >> evaluate
> > >> >> all the pathkeys. I don't know how to do that yet. I thought that 
> > >> >> maybe
> > >> >> we should modify pathkeys_common_contained_in() to set presorted_keys 
> > >> >> to
> > >> >> 0 in this case.
> > >> >>
> > >> >> But then I started wondering why we don't see this issue even for
> > >> >> regular (non-incremental-sort) paths built in create_ordered_paths().
> > >> >> How come we don't see these failures there? I've modified costing to
> > >> >> make all incremental sort paths very cheap, and still nothing.
> > >> >
> > >> >I assume you mean you modified costing to make regular sort paths very 
> > >> >cheap?
> > >> >
> > >>
> > >> No, I mean costing of incremental sort paths, so that they end up being
> > >> the cheapest ones. If some other path is cheaper, we won't see the error
> > >> because it only happens when building plan from the cheapest path.
> > >
> > >Ah, I misunderstood as you trying to figure out a way to try to cause
> > >the same problem with a regular sort.
> > >
> > >> >> So presumably there's a check elsewhere (either implicit or explicit),
> > >> >> because create_ordered_paths() uses pathkeys_common_contained_in() and
> > >> >> does not have the same issue.
> > >> >
> > >> >Given this comment in create_ordered_paths():
> > >> >
> > >> >  generate_gather_paths() will have already generated a simple Gather
> > >> >  path for the best parallel path, if any, and the loop above will have
> > >> >  considered sorting it.  Similarly, generate_gather_paths() will also
> > >> >  have generated order-preserving Gather Merge plans which can be used
> > >> >  without sorting if they happen to match the sort_pathkeys, and the 
> > >> > loop
> > >> >  above will have handled those as well.  However, there's one more
> > >> >  possibility: it may make sense to sort the cheapest partial path
> > >> >  according to the required output order and then use Gather Merge.
> > >> >
> > >> >my understanding is that generate_gather_paths() only considers paths
> > >> >that already happen to be sorted (not explicit sorts), so I'm
> > >> >wondering if it would make more sense for the incremental sort path
> > >> >creation for this case to live alongside the explicit ordered path
> > >> >creation in create_ordered_paths() rather than in
> > >> >generate_gather_paths().
> > >> >
> > >>
> > >> How would that solve the issue? Also, we're building a gather path, so
> > >> I think generate_gather_paths() is the right place where to do it. And
> > >> we're not changing the semantics of generate_gather_paths() - the result
> > >> path should be sorted "correctly" with respect to sort_pathkeys.
> > >
> > >Does that imply what the explicit sort in create_ordered_paths() is in
> > >the wrong spot?
> > >
> >
> > I think those are essentially the right places where to do this sort of
> > stuff. Maybe there's a better place, but I don't think those places are
> > somehow wrong.
> >
> > >Or, to put it another way, do you think that both kinds of sorts
> > >should be added in the same place? It seems confusing to me that
> > >they'd be split between the two methods (unless I'm completely
> > >misunderstanding how the two work).
> > >
> >
> > The paths built in those two places were very different in one aspect:
> >
> > 1) generate_gather_paths only ever looked at pathkeys for the subpath, it
> > never even looked at ordering expected by paths above it (or the query as
> > a whole). Plain Gather ignores pathkeys entirely, Gather Merge only aims
> > to maintain ordering of the different subpaths.
> >
> > 2) create_oredered_paths is meant to enforce ordering needed by

Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2019-07-08 Thread Tomas Vondra


On Mon, Jul 08, 2019 at 12:07:06PM -0400, James Coleman wrote:

On Mon, Jul 8, 2019 at 10:58 AM Tomas Vondra
 wrote:


On Mon, Jul 08, 2019 at 10:32:18AM -0400, James Coleman wrote:
>On Mon, Jul 8, 2019 at 9:59 AM Tomas Vondra
> wrote:
>>
>> On Mon, Jul 08, 2019 at 09:22:39AM -0400, James Coleman wrote:
>> >On Sun, Jul 7, 2019 at 5:02 PM Tomas Vondra
>> > wrote:
>> >> We're running query like this:
>> >>
>> >>   SELECT a, sum(b), count(*) FROM pagg_tab_ml GROUP BY a HAVING avg(b) < 
3 ORDER BY 1, 2, 3
>> >>
>> >> but we're trying to add the incremental sort *before* the aggregation,
>> >> because the optimizer also considers group aggregate with a sorted
>> >> input. And (a) is a prefix of (a,sum(b),count(b)) so we think we
>> >> actually can do this, but clearly that's nonsense, because we can't
>> >> possibly know the aggregates yet. Hence the error.
>> >>
>> >> If this is the actual issue, we need to ensure we actually can evaluate
>> >> all the pathkeys. I don't know how to do that yet. I thought that maybe
>> >> we should modify pathkeys_common_contained_in() to set presorted_keys to
>> >> 0 in this case.
>> >>
>> >> But then I started wondering why we don't see this issue even for
>> >> regular (non-incremental-sort) paths built in create_ordered_paths().
>> >> How come we don't see these failures there? I've modified costing to
>> >> make all incremental sort paths very cheap, and still nothing.
>> >
>> >I assume you mean you modified costing to make regular sort paths very 
cheap?
>> >
>>
>> No, I mean costing of incremental sort paths, so that they end up being
>> the cheapest ones. If some other path is cheaper, we won't see the error
>> because it only happens when building plan from the cheapest path.
>
>Ah, I misunderstood as you trying to figure out a way to try to cause
>the same problem with a regular sort.
>
>> >> So presumably there's a check elsewhere (either implicit or explicit),
>> >> because create_ordered_paths() uses pathkeys_common_contained_in() and
>> >> does not have the same issue.
>> >
>> >Given this comment in create_ordered_paths():
>> >
>> >  generate_gather_paths() will have already generated a simple Gather
>> >  path for the best parallel path, if any, and the loop above will have
>> >  considered sorting it.  Similarly, generate_gather_paths() will also
>> >  have generated order-preserving Gather Merge plans which can be used
>> >  without sorting if they happen to match the sort_pathkeys, and the loop
>> >  above will have handled those as well.  However, there's one more
>> >  possibility: it may make sense to sort the cheapest partial path
>> >  according to the required output order and then use Gather Merge.
>> >
>> >my understanding is that generate_gather_paths() only considers paths
>> >that already happen to be sorted (not explicit sorts), so I'm
>> >wondering if it would make more sense for the incremental sort path
>> >creation for this case to live alongside the explicit ordered path
>> >creation in create_ordered_paths() rather than in
>> >generate_gather_paths().
>> >
>>
>> How would that solve the issue? Also, we're building a gather path, so
>> I think generate_gather_paths() is the right place where to do it. And
>> we're not changing the semantics of generate_gather_paths() - the result
>> path should be sorted "correctly" with respect to sort_pathkeys.
>
>Does that imply what the explicit sort in create_ordered_paths() is in
>the wrong spot?
>

I think those are essentially the right places where to do this sort of
stuff. Maybe there's a better place, but I don't think those places are
somehow wrong.

>Or, to put it another way, do you think that both kinds of sorts
>should be added in the same place? It seems confusing to me that
>they'd be split between the two methods (unless I'm completely
>misunderstanding how the two work).
>

The paths built in those two places were very different in one aspect:

1) generate_gather_paths only ever looked at pathkeys for the subpath, it
never even looked at ordering expected by paths above it (or the query as
a whole). Plain Gather ignores pathkeys entirely, Gather Merge only aims
to maintain ordering of the different subpaths.

2) create_oredered_paths is meant to enforce ordering needed by upper
parts of the plan - either by using a properly sorted path, or adding an
explicit sort.


We want to extend (1) to also look at ordering expected by the upper parts
of the plan, and consider incremental sort if applicable. (2) already does
that, and it already has the correct pathkeys to enforce.


I guess I'm still not following. If (2) is responsible (currently) for
adding an explicit sort, why wouldn't adding an incremental sort be an
example of that responsibility? The subpath that either a Sort or
IncrementalSort is being added on top of (to then feed into the
GatherMerge) is the same in both cases right?

Unless you're saying that the difference is that since we have a
partial ordering already for incremental

Re: PGOPTIONS="-fh" make check gets stuck since Postgres 11

2019-07-08 Thread Michael Paquier

On Mon, Jul 08, 2019 at 03:21:41PM -0400, Tom Lane wrote:
> Having said that, join_hash.sql in particular seems to have zero
> value if it's not testing hash joins, so I think it'd be reasonable
> for it to override a global enable_hashjoin = off setting.  None of
> the other regression test scripts seem to take nearly as much of a
> performance hit from globally forcing poor plans.

I am a bit confused here.  Don't you mean to have enable_hashjoin =
*on* at the top of hash_join.sql instead like in the attached?
--
Michael
diff --git a/src/test/regress/expected/join_hash.out b/src/test/regress/expected/join_hash.out
index 9eee39bdd3..0890608acf 100644
--- a/src/test/regress/expected/join_hash.out
+++ b/src/test/regress/expected/join_hash.out
@@ -1,6 +1,9 @@
 --
 -- exercises for the hash join code
 --
+-- Enforce the use of hash joins in this test, which could get disabled
+-- at system-level.
+set enable_hashjoin = on;
 begin;
 set local min_parallel_table_scan_size = 0;
 set local parallel_setup_cost = 0;
diff --git a/src/test/regress/sql/join_hash.sql b/src/test/regress/sql/join_hash.sql
index ae352e9b0b..b38081d1d5 100644
--- a/src/test/regress/sql/join_hash.sql
+++ b/src/test/regress/sql/join_hash.sql
@@ -2,6 +2,10 @@
 -- exercises for the hash join code
 --
 
+-- Enforce the use of hash joins in this test, which could get disabled
+-- at system-level.
+set enable_hashjoin = on;
+
 begin;
 
 set local min_parallel_table_scan_size = 0;


signature.asc
Description: PGP signature

Re: [PATCH v4] Add \warn to psql

2019-07-08 Thread Bruce Momjian

On Fri, Jul  5, 2019 at 11:29:03PM +0200, David Fetter wrote:
> > While I was fooling with it I noticed that the existing code for -n
> > is buggy.  The documentation says clearly that only the first
> > argument is a candidate to be -n:
> > 
> > If the first argument is an unquoted -n the 
> > trailing
> > newline is not written.
> > 
> > but the actual implementation allows any argument to be recognized as
> > -n:
> > 
> > regression=# \echo this -n should not be -n like this
> > this should not be like thisregression=# 
> > 
> > I fixed that, but I'm wondering if we should back-patch that fix
> > or leave the back branches alone.
> 
> +0.5 for back-patching.

Uh, if this was done in a major release I am thinking we have to mention
this as an incompatibility, which means we should probably not backpatch
it.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +

Re: Improve search for missing parent downlinks in amcheck

2019-07-08 Thread Peter Geoghegan

On Sun, Jul 7, 2019 at 7:53 PM Thomas Munro  wrote:
> On Wed, May 1, 2019 at 12:58 PM Peter Geoghegan  wrote:
> > I will think about a simple fix, but after the upcoming point release.
> > There is no hurry.
>
> A bureaucratic question:  What should the status be for this CF entry?

I have plans to use RelationGetNumberOfBlocks() to size amcheck's
downlink Bloom filter. I think that that will solve the problems with
unreliable estimates of index size. which seemed to be the basis of
Alexander's complaint.

--
Peter Geoghegan

Re: Add parallelism and glibc dependent only options to reindexdb

2019-07-08 Thread Bruce Momjian

On Mon, Jul  1, 2019 at 09:51:12AM -0400, Alvaro Herrera wrote:
> Now that we have REINDEX CONCURRENTLY, I think reindexdb is going to
> gain more popularity.
> 
> Please don't reuse a file name as generic as "parallel.c" -- it's
> annoying when navigating source.  Maybe conn_parallel.c multiconn.c
> connscripts.c admconnection.c ...?
> 
> If your server crashes or is stopped midway during the reindex, you
> would have to start again from scratch, and it's tedious (if it's
> possible at all) to determine which indexes were missed.  I think it
> would be useful to have a two-phase mode: in the initial phase reindexdb
> computes the list of indexes to be reindexed and saves them into a work
> table somewhere.  In the second phase, it reads indexes from that table
> and processes them, marking them as done in the work table.  If the
> second phase crashes or is stopped, it can be restarted and consults the
> work table.  I would keep the work table, as it provides a bit of an
> audit trail.  It may be important to be able to run even if unable to
> create such a work table (because of the numerous users that
> DROP DATABASE postgres).
> 
> Maybe we'd have two flags in the work table for each index:
> "reindex requested", "reindex done".

I think we have a similar issue with adding checksums, so let's address
with a generic framework and use it for all cases, like vacuumdb too.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +

Re: Increasing default value for effective_io_concurrency?

2019-07-08 Thread Bruce Momjian

On Wed, Jul  3, 2019 at 11:42:49AM -0400, Robert Haas wrote:
> On Wed, Jul 3, 2019 at 11:24 AM Tomas Vondra
>  wrote:
> > Maybe. And it would probably work for the systems I used for benchmarks.
> >
> > It however assumes two things: (a) the storage system actually has
> > spindles and (b) you know how many spindles there are. Which is becoming
> > less and less safe these days - flash storage becomes pretty common, and
> > even when there are spindles they are often hidden behind the veil of
> > virtualization in a SAN, or something.
> 
> Yeah, that's true.
> 
> > I wonder if we might provide something like pg_test_prefetch which would
> > measure performance with different values, similarly to pg_test_fsync.
> 
> That's not a bad idea, but I'm not sure if the results that we got in
> a synthetic test - presumably unloaded - would be a good guide to what
> to use in a production situation.  Maybe it would; I'm just not sure.

I think it would be better than what we have now.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +

Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2019-07-08 Thread Bruce Momjian

On Mon, Jul  8, 2019 at 07:27:12PM -0400, Stephen Frost wrote:
> * Bruce Momjian (br...@momjian.us) wrote:
> > Operationally, how would that work?  We unlock them all on boot but
> > somehow make them inaccessible to some backends after that?
> 
> That could work and doesn't seem like an insurmountable challenge.  The
> way that's been discussed, at least somewhere in the past, is leveraging
> the exec backend framework to have the user-connected backends work in
> an independent space from the processes launched at startup.

Just do it in another cluster --- why bother with all that?

> > > > > > > amount of data you transmit over a given TLS connection increases
> > > > > > > though, the risk increases and it would be better to re-key.  How 
> > > > > > > much
> > > > > > > better?  That depends a great deal on if someone is trying to 
> > > > > > > mount an
> > > > > > > attack or not.
> > > > > > 
> > > > > > Yep, we need to allow rekey.
> > > > > 
> > > > > Supporting a way to rekey is definitely a good idea.
> > > > 
> > > > It is a requirement, I think.  We might have problem tracking exactly
> > > > what key _version_ each table (or 8k block), or WAL file are.  :-( 
> > > > Ideally we would allow only two active keys, and somehow mark each page
> > > > as using the odd or even key at a given time, or something strange. 
> > > > (Yeah, hand waving here.)
> > > 
> > > Well, that wouldn't be the ideal since it would limit us to some small
> > > number of GBs of data written, based on the earlier discussion, right?
> > 
> > No, it is GB per secret-nonce combination.
> 
> Hrmpf.  I'm trying to follow the logic that draws this conclusion.
> 
> As I understand it, the NIST recommendation is a 96-bit *random* nonce,
> and then there's also a recommendation to not encrypt more than 2^32
> messages- much less than the 96-bit random nonce, at least potentially
> because that limits the repeat-nonce risk to a very low probability.
> 
> If the amount-you-can-encrypt is really per secret+nonce combination,
> then how do those recommendations make sense..?  This is where I really
> think we should be reading through and understanding exactly what the
> NIST recommendations are and not just trying to follow through things on
> stackoverflow.

Yes, it needs more research.

> > > I'm not sure that I can see through to a system where we are rewriting
> > > tables that are out on disk every time we hit 60GB of data written.
> > > 
> > > Or maybe I'm misunderstanding what you're suggesting here..?
> > 
> > See above.
> 
> How long would these keys be active for then in the system..?  How much
> data would they potentially be used to encrypt?  Strikes me as likely to
> be an awful lot...

I think we need to look at CTR vs GCM.

> > Uh, well, you would think so, but for some reason AES just doesn't allow
> > that kind of attack, unless you brute force it trying every key.  The
> > nonce is only to prevent someone from detecting that two output
> > encryption pages contain the same contents originally.
> 
> That's certainly interesting, but such a brute-force with every key
> would allow it, where, if you use a random nonce, then such an attack
> would have to start working only after having access to the data, and
> not be something that could be pre-computed.

Uh, the nonce is going to have to be unecrypted so it can be fed into
the crypto method, so it will be visible.

> > > and a recommendation by NIST certainly holds a lot of water, at least
> > > for me.  They also have a recommendation regarding the amount of data to
> > 
> > Agreed.
> 
> This is just it though, at least from my perspective- we are saying "ok,
> well, we know people recommend using a random nonce, but that's hard, so
> we aren't going to do that because we don't think it's important for our
> application", but we aren't cryptographers.  I liken this to whatever
> discussion lead to using the username as the salt for our md5
> authentication method- great intentions, but not complete understanding,
> leading to a less-than-desirable result.
> 
> When it comes to this stuff, I don't think we really get to pick and
> choose what we follow and what we don't.  If the recommendation from an
> authority says we should use random nonces, then we *really* need to
> listen and do that, because that authority is a bunch of cryptographers
> with a lot more experience and who have definitely spent a great deal
> more time thinking about this than we have.
> 
> If there's a recommendation from such an authority that says we *don't*
> need to use a random nonce, great, I'm happy to go review that and agree
> with it, but discussions on stackoverflow or similar don't hold the same
> weight that a recommendation from NIST does.

Yes, we need to get some experts involved.

> > > > Well, in many modes the nonce is just a counter, but as stated above,
> > > > not all modes.  I need to pull out my security books to remember for
> > > > which ones it is safe.

Re: OpenSSL specific value under USE_SSL instead of USE_OPENSSL

2019-07-08 Thread Bruce Momjian

On Thu, Jun 27, 2019 at 04:02:45PM +0200, Daniel Gustafsson wrote:
> The ssl_ciphers GUC for configuring cipher suites sets the default value based
> on USE_SSL but it’s actually an OpenSSL specific value.  As with other such
> changes, it works fine now but will become an issue should we support other 
> TLS
> backends.  Attached patch fixes this.

Thanks, patch applied.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +

Re: Extra quote_all_identifiers in _dumpOptions

2019-07-08 Thread Bruce Momjian

On Thu, Jun 27, 2019 at 12:12:15PM +0300, Arthur Zakirov wrote:
> Hello hackers,
> 
> While working on pg_dump I noticed the extra quote_all_identifiers in
> _dumpOptions struct. I attached the patch.
> 
> It came from refactoring by 0eea8047bf. There is also a discussion:
> https://www.postgresql.org/message-id/CACw0%2B13ZUcXbj9GKJNGZTkym1SXhwRu28nxHoJMoX5Qwmbg_%2Bw%40mail.gmail.com
> 
> Initially the patch proposed to use quote_all_identifiers of _dumpOptions.
> But then everyone came to a decision to use global quote_all_identifiers
> from string_utils.c, because fmtId() uses it.
> 
> -- 
> Arthur Zakirov
> Postgres Professional: http://www.postgrespro.com
> Russian Postgres Company

> diff --git a/src/bin/pg_dump/pg_backup.h b/src/bin/pg_dump/pg_backup.h
> index db30b54a92..8c0cedcd98 100644
> --- a/src/bin/pg_dump/pg_backup.h
> +++ b/src/bin/pg_dump/pg_backup.h
> @@ -153,7 +153,6 @@ typedef struct _dumpOptions
>   int no_synchronized_snapshots;
>   int no_unlogged_table_data;
>   int serializable_deferrable;
> - int quote_all_identifiers;
>   int disable_triggers;
>   int outputNoTablespaces;
>   int use_setsessauth;

Wow, good catch.  I thought C compilers would have reported this issue,
but obviously not.  Patch applied to head.  Thanks.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +

Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2019-07-08 Thread Stephen Frost

Greetings,

* Bruce Momjian (br...@momjian.us) wrote:
> On Mon, Jul  8, 2019 at 06:43:31PM -0400, Stephen Frost wrote:
> > * Bruce Momjian (br...@momjian.us) wrote:
> > > On Mon, Jul  8, 2019 at 06:04:46PM -0400, Stephen Frost wrote:
> > > > * Bruce Momjian (br...@momjian.us) wrote:
> > > > > On Mon, Jul  8, 2019 at 05:41:51PM -0400, Stephen Frost wrote:
> > > > > > * Bruce Momjian (br...@momjian.us) wrote:
> > > > > > > Well, if it was a necessary features, I assume TLS 1.3 would have 
> > > > > > > found
> > > > > > > a way to make it secure, no?  Certainly they are not shipping TLS 
> > > > > > > 1.3
> > > > > > > with a known weakness.
> > > > > > 
> > > > > > As discussed below- this is about moving goalposts, and that's, in 
> > > > > > part
> > > > > > at least, why re-keying isn't a *necessary* feature of TLS.  As the
> > > > > 
> > > > > I agree we have to allow rekeying and allow multiple unlocked keys in
> > > > > the server at the same time.  The open question is whether encrypting
> > > > > different data with different keys and different unlock controls is
> > > > > possible or useful.
> > > > 
> > > > I'm not sure if there's really a question about if it's *possible*?  As
> > > > for if it's useful, I agree there's some debate.
> > > 
> > > Right, it is easily possible to keep all keys unlocked, but the value is
> > > minimal, and the complexity will have a cost, which is my point.
> > 
> > Having them all unlocked but only accessible to certain privileged
> > processes if very different from having them unlocked and available to
> > every backend process.
> 
> Operationally, how would that work?  We unlock them all on boot but
> somehow make them inaccessible to some backends after that?

That could work and doesn't seem like an insurmountable challenge.  The
way that's been discussed, at least somewhere in the past, is leveraging
the exec backend framework to have the user-connected backends work in
an independent space from the processes launched at startup.

> > > > > > amount of data you transmit over a given TLS connection increases
> > > > > > though, the risk increases and it would be better to re-key.  How 
> > > > > > much
> > > > > > better?  That depends a great deal on if someone is trying to mount 
> > > > > > an
> > > > > > attack or not.
> > > > > 
> > > > > Yep, we need to allow rekey.
> > > > 
> > > > Supporting a way to rekey is definitely a good idea.
> > > 
> > > It is a requirement, I think.  We might have problem tracking exactly
> > > what key _version_ each table (or 8k block), or WAL file are.  :-( 
> > > Ideally we would allow only two active keys, and somehow mark each page
> > > as using the odd or even key at a given time, or something strange. 
> > > (Yeah, hand waving here.)
> > 
> > Well, that wouldn't be the ideal since it would limit us to some small
> > number of GBs of data written, based on the earlier discussion, right?
> 
> No, it is GB per secret-nonce combination.

Hrmpf.  I'm trying to follow the logic that draws this conclusion.

As I understand it, the NIST recommendation is a 96-bit *random* nonce,
and then there's also a recommendation to not encrypt more than 2^32
messages- much less than the 96-bit random nonce, at least potentially
because that limits the repeat-nonce risk to a very low probability.

If the amount-you-can-encrypt is really per secret+nonce combination,
then how do those recommendations make sense..?  This is where I really
think we should be reading through and understanding exactly what the
NIST recommendations are and not just trying to follow through things on
stackoverflow.

> > I'm not sure that I can see through to a system where we are rewriting
> > tables that are out on disk every time we hit 60GB of data written.
> > 
> > Or maybe I'm misunderstanding what you're suggesting here..?
> 
> See above.

How long would these keys be active for then in the system..?  How much
data would they potentially be used to encrypt?  Strikes me as likely to
be an awful lot...

> > > > > Uh, well, renaming the user was a big problem, but that is the only 
> > > > > case
> > > > > I can think of.  I don't see that as an issue for block or WAL 
> > > > > sequence
> > > > > numbers.  If we want to use a different nonce, we have to find a way 
> > > > > to
> > > > > store it or look it up efficiently.  Considering the nonce size, I 
> > > > > don't
> > > > > see how that is possible.
> > > > 
> > > > No, this also meant that, as an attacker, I *knew* the salt ahead of
> > > > time and therefore could build rainbow tables specifically for that
> > > > salt.  I could also use those *same* tables for any system where that
> > > > user had an account, even if they used different passwords on different
> > > > systems...
> > > 
> > > Yes, 'postgres' can be used to create a nice md5 rainbow table that
> > > works on many servers --- good point.  Are rainbow tables possible with
> > > something like AES?
> > 
> > I'm not a cryptographer, just

Re: Avoiding deadlock errors in CREATE INDEX CONCURRENTLY

2019-07-08 Thread Thomas Munro

On Tue, Jul 9, 2019 at 10:33 AM Goel, Dhruv  wrote:
> I have attached the revised patch. I ran check-world multiple times on my 
> machine and it seems to succeed now. Do you mind kicking-off the CI build 
> with the latest patch?

Thanks.

It's triggered automatically when you post patches to the thread and
also once a day, though it took ~35 minutes to get around to noticing
your new version due to other activity in other threads, and general
lack of horsepower.  I'm planning to fix that with more horses.

It passed on both OSes.  See here:

http://cfbot.cputube.org/dhruv-goel.html

-- 
Thomas Munro
https://enterprisedb.com

Re: Two pg_rewind patches (auto generate recovery conf and ensure clean shutdown)

2019-07-08 Thread Thomas Munro

On Tue, Jul 2, 2019 at 5:46 PM Paul Guo  wrote:
> Rebased, aligned with recent changes in pg_rewind/pg_basebackup and then 
> retested. Thanks.

Hi Paul,

A minor build problem on Windows:

src/bin/pg_rewind/pg_rewind.c(32): fatal error C1083: Cannot open
include file: 'backup_common.h': No such file or directory
[C:\projects\postgresql\pg_rewind.vcxproj]

https://ci.appveyor.com/project/postgresql-cfbot/postgresql/build/1.0.46422
http://cfbot.cputube.org/paul-guo.html

-- 
Thomas Munro
https://enterprisedb.com

Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2019-07-08 Thread Bruce Momjian

On Mon, Jul  8, 2019 at 06:43:31PM -0400, Stephen Frost wrote:
> Greetings,
> 
> * Bruce Momjian (br...@momjian.us) wrote:
> > On Mon, Jul  8, 2019 at 06:04:46PM -0400, Stephen Frost wrote:
> > > * Bruce Momjian (br...@momjian.us) wrote:
> > > > On Mon, Jul  8, 2019 at 05:41:51PM -0400, Stephen Frost wrote:
> > > > > * Bruce Momjian (br...@momjian.us) wrote:
> > > > > > Well, if it was a necessary features, I assume TLS 1.3 would have 
> > > > > > found
> > > > > > a way to make it secure, no?  Certainly they are not shipping TLS 
> > > > > > 1.3
> > > > > > with a known weakness.
> > > > > 
> > > > > As discussed below- this is about moving goalposts, and that's, in 
> > > > > part
> > > > > at least, why re-keying isn't a *necessary* feature of TLS.  As the
> > > > 
> > > > I agree we have to allow rekeying and allow multiple unlocked keys in
> > > > the server at the same time.  The open question is whether encrypting
> > > > different data with different keys and different unlock controls is
> > > > possible or useful.
> > > 
> > > I'm not sure if there's really a question about if it's *possible*?  As
> > > for if it's useful, I agree there's some debate.
> > 
> > Right, it is easily possible to keep all keys unlocked, but the value is
> > minimal, and the complexity will have a cost, which is my point.
> 
> Having them all unlocked but only accessible to certain privileged
> processes if very different from having them unlocked and available to
> every backend process.

Operationally, how would that work?  We unlock them all on boot but
somehow make them inaccessible to some backends after that?

> > > > > amount of data you transmit over a given TLS connection increases
> > > > > though, the risk increases and it would be better to re-key.  How much
> > > > > better?  That depends a great deal on if someone is trying to mount an
> > > > > attack or not.
> > > > 
> > > > Yep, we need to allow rekey.
> > > 
> > > Supporting a way to rekey is definitely a good idea.
> > 
> > It is a requirement, I think.  We might have problem tracking exactly
> > what key _version_ each table (or 8k block), or WAL file are.  :-( 
> > Ideally we would allow only two active keys, and somehow mark each page
> > as using the odd or even key at a given time, or something strange. 
> > (Yeah, hand waving here.)
> 
> Well, that wouldn't be the ideal since it would limit us to some small
> number of GBs of data written, based on the earlier discussion, right?

No, it is GB per secret-nonce combination.

> I'm not sure that I can see through to a system where we are rewriting
> tables that are out on disk every time we hit 60GB of data written.
> 
> Or maybe I'm misunderstanding what you're suggesting here..?

See above.

> > > > Uh, well, renaming the user was a big problem, but that is the only case
> > > > I can think of.  I don't see that as an issue for block or WAL sequence
> > > > numbers.  If we want to use a different nonce, we have to find a way to
> > > > store it or look it up efficiently.  Considering the nonce size, I don't
> > > > see how that is possible.
> > > 
> > > No, this also meant that, as an attacker, I *knew* the salt ahead of
> > > time and therefore could build rainbow tables specifically for that
> > > salt.  I could also use those *same* tables for any system where that
> > > user had an account, even if they used different passwords on different
> > > systems...
> > 
> > Yes, 'postgres' can be used to create a nice md5 rainbow table that
> > works on many servers --- good point.  Are rainbow tables possible with
> > something like AES?
> 
> I'm not a cryptographer, just to be clear...  but it sure seems like if
> you know what the nonce is, and a strong idea about at least what some
> of the contents are, then you could work to pre-calculate a portian of
> the encrypted data and be able to determine the key based on that.

Uh, well, you would think so, but for some reason AES just doesn't allow
that kind of attack, unless you brute force it trying every key.  The
nonce is only to prevent someone from detecting that two output
encryption pages contain the same contents originally.

> > > I appreciate that *some* of this might not be completely relevant for
> > > the way a nonce is used in cryptography, but I'd be very surprised to
> > > have a cryptographer tell me that a deterministic nonce didn't have
> > > similar issues or didn't reduce the value of the nonce significantly.
> > 
> > This post:
> > 
> > 
> > https://stackoverflow.com/questions/36760973/why-is-random-iv-fine-for-aes-cbc-but-not-for-aes-gcm
> > 
> > says:
> > 
> > GCM is a variation on Counter Mode (CTR).  As you say, with any variant
> > of Counter Mode, it is essential  that the Nonce is not repeated with
> > the same key.  Hence CTR mode  Nonces often include either a counter or
> > a timer element: something that  is guaranteed not to repeat over the
> > lifetime of the key.
> > 
> > CTR is what we use

Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2019-07-08 Thread Bruce Momjian

On Mon, Jul  8, 2019 at 06:23:13PM -0400, Bruce Momjian wrote:
> Yes, 'postgres' can be used to create a nice md5 rainbow table that
> works on many servers --- good point.  Are rainbow tables possible with
> something like AES?
> 
> > I appreciate that *some* of this might not be completely relevant for
> > the way a nonce is used in cryptography, but I'd be very surprised to
> > have a cryptographer tell me that a deterministic nonce didn't have
> > similar issues or didn't reduce the value of the nonce significantly.
> 
> This post:
> 
>   
> https://stackoverflow.com/questions/36760973/why-is-random-iv-fine-for-aes-cbc-but-not-for-aes-gcm
> 
> says:
> 
>   GCM is a variation on Counter Mode (CTR).  As you say, with any variant
>   of Counter Mode, it is essential  that the Nonce is not repeated with
>   the same key.  Hence CTR mode  Nonces often include either a counter or
>   a timer element: something that  is guaranteed not to repeat over the
>   lifetime of the key.
> 
> CTR is what we use for WAL.  8k pages, we would use CBC, which says we
> need a random nonce.  I need to dig deeper into ECB mode attack.

Looking here:

https://stackoverflow.com/questions/36760973/why-is-random-iv-fine-for-aes-cbc-but-not-for-aes-gcm

I think the issues is that we can't use a _counter_ for the nonce since
each page-0 of each table would use the same nonce, and each page-1,
etc.  I assume we would use the table oid and page number as the nonce. 
We can't use the database oid since we copy the files from one database
to another via file system copy and not through the shared buffer cache
where they would be re encrypted.  Using relfilenode seems dangerous. 
For WAL I think it would be the WAL segment number.  It would be nice
to mix that with the "Database system identifier:", but are these the
same on primary and replicas?

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +

Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2019-07-08 Thread Stephen Frost

Greetings,

* Bruce Momjian (br...@momjian.us) wrote:
> On Mon, Jul  8, 2019 at 06:04:46PM -0400, Stephen Frost wrote:
> > * Bruce Momjian (br...@momjian.us) wrote:
> > > On Mon, Jul  8, 2019 at 05:41:51PM -0400, Stephen Frost wrote:
> > > > * Bruce Momjian (br...@momjian.us) wrote:
> > > > > Well, if it was a necessary features, I assume TLS 1.3 would have 
> > > > > found
> > > > > a way to make it secure, no?  Certainly they are not shipping TLS 1.3
> > > > > with a known weakness.
> > > > 
> > > > As discussed below- this is about moving goalposts, and that's, in part
> > > > at least, why re-keying isn't a *necessary* feature of TLS.  As the
> > > 
> > > I agree we have to allow rekeying and allow multiple unlocked keys in
> > > the server at the same time.  The open question is whether encrypting
> > > different data with different keys and different unlock controls is
> > > possible or useful.
> > 
> > I'm not sure if there's really a question about if it's *possible*?  As
> > for if it's useful, I agree there's some debate.
> 
> Right, it is easily possible to keep all keys unlocked, but the value is
> minimal, and the complexity will have a cost, which is my point.

Having them all unlocked but only accessible to certain privileged
processes if very different from having them unlocked and available to
every backend process.

> > > > amount of data you transmit over a given TLS connection increases
> > > > though, the risk increases and it would be better to re-key.  How much
> > > > better?  That depends a great deal on if someone is trying to mount an
> > > > attack or not.
> > > 
> > > Yep, we need to allow rekey.
> > 
> > Supporting a way to rekey is definitely a good idea.
> 
> It is a requirement, I think.  We might have problem tracking exactly
> what key _version_ each table (or 8k block), or WAL file are.  :-( 
> Ideally we would allow only two active keys, and somehow mark each page
> as using the odd or even key at a given time, or something strange. 
> (Yeah, hand waving here.)

Well, that wouldn't be the ideal since it would limit us to some small
number of GBs of data written, based on the earlier discussion, right?

I'm not sure that I can see through to a system where we are rewriting
tables that are out on disk every time we hit 60GB of data written.

Or maybe I'm misunderstanding what you're suggesting here..?

> > > Uh, well, renaming the user was a big problem, but that is the only case
> > > I can think of.  I don't see that as an issue for block or WAL sequence
> > > numbers.  If we want to use a different nonce, we have to find a way to
> > > store it or look it up efficiently.  Considering the nonce size, I don't
> > > see how that is possible.
> > 
> > No, this also meant that, as an attacker, I *knew* the salt ahead of
> > time and therefore could build rainbow tables specifically for that
> > salt.  I could also use those *same* tables for any system where that
> > user had an account, even if they used different passwords on different
> > systems...
> 
> Yes, 'postgres' can be used to create a nice md5 rainbow table that
> works on many servers --- good point.  Are rainbow tables possible with
> something like AES?

I'm not a cryptographer, just to be clear...  but it sure seems like if
you know what the nonce is, and a strong idea about at least what some
of the contents are, then you could work to pre-calculate a portian of
the encrypted data and be able to determine the key based on that.

> > I appreciate that *some* of this might not be completely relevant for
> > the way a nonce is used in cryptography, but I'd be very surprised to
> > have a cryptographer tell me that a deterministic nonce didn't have
> > similar issues or didn't reduce the value of the nonce significantly.
> 
> This post:
> 
>   
> https://stackoverflow.com/questions/36760973/why-is-random-iv-fine-for-aes-cbc-but-not-for-aes-gcm
> 
> says:
> 
>   GCM is a variation on Counter Mode (CTR).  As you say, with any variant
>   of Counter Mode, it is essential  that the Nonce is not repeated with
>   the same key.  Hence CTR mode  Nonces often include either a counter or
>   a timer element: something that  is guaranteed not to repeat over the
>   lifetime of the key.
> 
> CTR is what we use for WAL.  8k pages, we would use CBC, which says we
> need a random nonce.  I need to dig deeper into ECB mode attack.

That page also says:

  Using a random IV / nonce for GCM has been specified as an official
  recommendation by - for instance - NIST. If anybody suggests differently
  then that's up to them.

and a recommendation by NIST certainly holds a lot of water, at least
for me.  They also have a recommendation regarding the amount of data to
encrypt with the same key, and that number is much lower than the 96-bit
randomness of the nonce, with a recommendation to use a
cryptographically sound random, meaning that the chances of a duplicate
are extremely low.

> > > Uh, well, you

Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2019-07-08 Thread Alexander Korotkov

On Thu, Jul 4, 2019 at 4:25 PM James Coleman  wrote:
> Process questions:
> - Do I need to explicitly move the patch somehow to the next CF?

We didn't manage to register it on current (July) commitfest.  So,
please, register it on next (September) commitfest.

> - Since I've basically taken over patch ownership, should I move my
> name from reviewer to author in the CF app? And can there be two
> authors listed there?

Sure, you're co-author of this patch.  Two or more authors could be
listed at CF app, you can find a lot of examples on the list.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2019-07-08 Thread Alexander Korotkov

On Mon, Jun 3, 2019 at 12:18 AM Tomas Vondra
 wrote:
> For a moment I thought we could/should look at the histogram, becase that
> could tell us if there are groups "before" the first MCV one, but I don't
> think we should do that, for two reasons. Firstly, rare values may not get
> to the histogram anyway, which makes this rather unreliable and might
> introduce sudden plan changes, because the cost would vary wildly
> depending on whether we happened to sample the rare row or not. And
> secondly, the rare row may be easily filtered out by a WHERE condition or
> something, at which point we'll have to deal with the large group anyway.

If first MCV is in the middle of first histogram bin, then it's
reasonable to think that it would fit to first group.  But if first
MCV is in the middle of histogram, such assumption would be
ridiculous.  Also, I'd like to note that with our
default_statistics_target == 100, non MCV values are not so rare.  So,
I'm +1 for taking histogram into account.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: FETCH FIRST clause PERCENT option

2019-07-08 Thread Ryan Lambert

Hi Surafel,

The v5 patch [1] applies cleanly and passes make installcheck-world.

My concern with this is its performance.  As a user I would expect using
this feature to enable queries to run faster than they would simply pulling
the full table.  I tested on tables ranging from 10k rows up to 10 million
with the same basic result that using FETCH FIRST N PERCENT is slower than
using the full table.  At best it ran slightly slower than querying the
full table, at worst it increased execution times by 1400% when using a
large high percentage (95%).

Testing was on a DigitalOcean 2 CPU, 2GB RAM droplet.  All queries were
executed multiple times (6+) with EXPLAIN (ANALYZE, COSTS) and /timing
enabled, query plans provided are from the faster end of each query.

Create the test table:

DROP TABLE IF EXISTS r10mwide;
CREATE TABLE r10mwide AS
SELECT generate_series(1, 1000)::INT AS id,
random() AS v1,
random() AS v2,
random() AS v3,
random() AS v4,
random() AS v5,
random() AS v6,
random() AS v7,
random() AS v8,
random() AS v9,
random() AS v10
;
VACUUM ANALYZE;


Start with a baseline query from the full table, the limiting will be done
within the CTE as I would typically do in a production query.  The outer
query does basic aggregates.  Below I provide each full query for easy
copy/paste but the only real change is how the inner results are limited.
This runs in 2.2s off the full table.

EXPLAIN (ANALYZE, COSTS)
WITH t AS (
SELECT id, v1, v2
FROM r10mwide
) SELECT AVG(v1), MIN(v1), AVG(v1 + v2) FROM t
;

   QUERY
PLAN
-
 Finalize Aggregate  (cost=227192.07..227192.08 rows=1 width=24) (actual
time=2230.419..2230.419 rows=1 loops=1)
   ->  Gather  (cost=227191.84..227192.05 rows=2 width=72) (actual
time=2228.321..2231.225 rows=3 loops=1)
 Workers Planned: 2
 Workers Launched: 2
 ->  Partial Aggregate  (cost=226191.84..226191.85 rows=1 width=72)
(actual time=2218.754..2218.754 rows=1 loops=3)
   ->  Parallel Seq Scan on r10mwide  (cost=0.00..184524.92
rows=4166692 width=16) (actual time=0.032..934.652 rows=333 loops=3)
 Planning Time: 0.116 ms
 Execution Time: 2231.935 ms
(8 rows)

Time: 2232.459 ms (00:02.232)


It did use parallel, since the FETCH FIRST queries apparently won't I
turned that off and ran it again.

SET max_parallel_workers_per_gather = 0;

New query plan for full table, no parallel, just over 4 seconds.

  QUERY PLAN

---
 Aggregate  (cost=342859.21..342859.22 rows=1 width=24) (actual
time=4253.861..4253.862 rows=1 loops=1)
   ->  Seq Scan on r10mwide  (cost=0.00..242858.60 rows=1060 width=16)
(actual time=0.062..1728.346 rows=1000 loops=1)
 Planning Time: 0.082 ms
 Execution Time: 4253.918 ms
(4 rows)


The following uses the explicit row count to pull 100k rows (1%) and gives
a massive improvement in speed, this query ranged from 55- 120ms.  This is
the type of speedup I would expect, and it beats the full-table query hands
down, regardless of parallel query.

EXPLAIN (ANALYZE, COSTS)
WITH t AS (
SELECT id, v1, v2
FROM r10mwide
FETCH FIRST 10 ROWS ONLY
) SELECT AVG(v1), MIN(v1), AVG(v1 + v2) FROM t
;

   QUERY PLAN

-
 Aggregate  (cost=4428.58..4428.59 rows=1 width=24) (actual
time=55.963..55.963 rows=1 loops=1)
   ->  Limit  (cost=0.00..2428.57 rows=10 width=20) (actual
time=0.041..35.725 rows=10 loops=1)
 ->  Seq Scan on r10mwide  (cost=0.00..242858.60 rows=1060
width=20) (actual time=0.039..24.796 rows=10 loops=1)
 Planning Time: 0.134 ms
 Execution Time: 56.032 ms
(5 rows)

Time: 57.262 ms



Using FETCH FIRST 1 PERCENT it takes over 7 seconds, 71% slower than
querying the full table.

EXPLAIN (ANALYZE, COSTS)
WITH t AS (
SELECT id, v1, v2
FROM r10mwide
FETCH FIRST 1 PERCENT ROWS ONLY
) SELECT AVG(v1), MIN(v1), AVG(v1 + v2) FROM t
;
 QUERY PLAN

-
 Aggregate  (cost=6857.22..6857.23 rows=1 width=24) (actual
time=7112.205..7112.206 rows=1 loops=1)
   ->  Limit  (cost=2428.60..4857.19 rows=11 width=20) (actual
time=0.036..7047.394 rows=10 loops=1)
 ->  Seq Scan on r10mwide  (cost=0.00..242858.60 rows=1060
width=20) (actual time=0.033..3072.881 rows=1000 loops=1)
 Planning Time:

Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2019-07-08 Thread Joe Conway

On 7/8/19 6:04 PM, Stephen Frost wrote:
> * Bruce Momjian (br...@momjian.us) wrote:
>> Uh, well, renaming the user was a big problem, but that is the only case
>> I can think of.  I don't see that as an issue for block or WAL sequence
>> numbers.  If we want to use a different nonce, we have to find a way to
>> store it or look it up efficiently.  Considering the nonce size, I don't
>> see how that is possible.
> 
> No, this also meant that, as an attacker, I *knew* the salt ahead of
> time and therefore could build rainbow tables specifically for that
> salt.  I could also use those *same* tables for any system where that
> user had an account, even if they used different passwords on different
> systems...
> 
> I appreciate that *some* of this might not be completely relevant for
> the way a nonce is used in cryptography, but I'd be very surprised to
> have a cryptographer tell me that a deterministic nonce didn't have
> similar issues or didn't reduce the value of the nonce significantly.

I have worked side by side on projects with bona fide cryptographers and
I can assure you that they recommended random nonces. Granted, that was
in the early 2000s, but I don't think "modern cryptography" has changed
that any more than "web scale" has made Postgres irrelevant in the
intervening years.

Related links:

https://defuse.ca/cbcmodeiv.htm
https://www.cryptofails.com/post/70059609995/crypto-noobs-1-initialization-vectors


Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature

Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2019-07-08 Thread Bruce Momjian

On Mon, Jul  8, 2019 at 06:04:46PM -0400, Stephen Frost wrote:
> Greetings,
> 
> * Bruce Momjian (br...@momjian.us) wrote:
> > On Mon, Jul  8, 2019 at 05:41:51PM -0400, Stephen Frost wrote:
> > > * Bruce Momjian (br...@momjian.us) wrote:
> > > > Well, if it was a necessary features, I assume TLS 1.3 would have found
> > > > a way to make it secure, no?  Certainly they are not shipping TLS 1.3
> > > > with a known weakness.
> > > 
> > > As discussed below- this is about moving goalposts, and that's, in part
> > > at least, why re-keying isn't a *necessary* feature of TLS.  As the
> > 
> > I agree we have to allow rekeying and allow multiple unlocked keys in
> > the server at the same time.  The open question is whether encrypting
> > different data with different keys and different unlock controls is
> > possible or useful.
> 
> I'm not sure if there's really a question about if it's *possible*?  As
> for if it's useful, I agree there's some debate.

Right, it is easily possible to keep all keys unlocked, but the value is
minimal, and the complexity will have a cost, which is my point.

> > > amount of data you transmit over a given TLS connection increases
> > > though, the risk increases and it would be better to re-key.  How much
> > > better?  That depends a great deal on if someone is trying to mount an
> > > attack or not.
> > 
> > Yep, we need to allow rekey.
> 
> Supporting a way to rekey is definitely a good idea.

It is a requirement, I think.  We might have problem tracking exactly
what key _version_ each table (or 8k block), or WAL file are.  :-( 
Ideally we would allow only two active keys, and somehow mark each page
as using the odd or even key at a given time, or something strange. 
(Yeah, hand waving here.)

> > Uh, well, renaming the user was a big problem, but that is the only case
> > I can think of.  I don't see that as an issue for block or WAL sequence
> > numbers.  If we want to use a different nonce, we have to find a way to
> > store it or look it up efficiently.  Considering the nonce size, I don't
> > see how that is possible.
> 
> No, this also meant that, as an attacker, I *knew* the salt ahead of
> time and therefore could build rainbow tables specifically for that
> salt.  I could also use those *same* tables for any system where that
> user had an account, even if they used different passwords on different
> systems...

Yes, 'postgres' can be used to create a nice md5 rainbow table that
works on many servers --- good point.  Are rainbow tables possible with
something like AES?

> I appreciate that *some* of this might not be completely relevant for
> the way a nonce is used in cryptography, but I'd be very surprised to
> have a cryptographer tell me that a deterministic nonce didn't have
> similar issues or didn't reduce the value of the nonce significantly.

This post:


https://stackoverflow.com/questions/36760973/why-is-random-iv-fine-for-aes-cbc-but-not-for-aes-gcm

says:

GCM is a variation on Counter Mode (CTR).  As you say, with any variant
of Counter Mode, it is essential  that the Nonce is not repeated with
the same key.  Hence CTR mode  Nonces often include either a counter or
a timer element: something that  is guaranteed not to repeat over the
lifetime of the key.

CTR is what we use for WAL.  8k pages, we would use CBC, which says we
need a random nonce.  I need to dig deeper into ECB mode attack.

> > Uh, well, you are much less likely to get duplicate nonce values by
> > using block number or WAL sequence number.  If you look at the
> > implementations, few compute random nonce values.
> 
> Which implementations..?  Where do their nonce values come from?  I can
> see how a nonce might have to be naturally and deterministically random,
> if the source for it is sufficiently varied across the key space, but
> starting at '1' and going up with the same key seems like it's just
> giving a potential attacker more information about what the encrypted
> data contains...

Well, in many modes the nonce is just a counter, but as stated above,
not all modes.  I need to pull out my security books to remember for
which ones it is safe.  (Frankly, it is a lot easier to use a random
nonce for WAL than 8k pages.)

> > And you base the random goal on what?  Nonce is number used only once,
> > and randomness is not a requirement.  You can say you prefer it, but
> > why, because most implementations don't use random nonce.
> 
> The encryption schemes I've worked with in the past have used a random
> nonce, so I'm wondering where the disconnect is between us on that.

OK.

> > > > > When it comes to concerns about autovacuum or other system processes,
> > > > > those don't have any direct user connections or interactions, so 
> > > > > having
> > > > > them be more privileged and having access to more is reasonable.
> > > > 
> > > > Well, I am trying to understand the value of having some keys accessible
> > > > by some parts of the

Detailed questions about pg_xact_commit_timestamp

2019-07-08 Thread Morris de Oryx

I have some specific questions about pg_xact_commit_timestamp, and am
hoping that this is the right place to ask. I read a lot of the commentary
about the original patch, and the contributors seem to be here. If I'm
asking in the wrong place, just let me know.

I'm working on a design for a concurrency-safe incremental aggregate rollup
system,and pg_xact_commit_timestamp sounds perfect. But I've found very
little commentary on it generally, and couldn't figure out how it works in
detail from the source code.

Hopefully, someone knows the answers to a few questions:

* Is it possible for pg_xact_commit_timestamp to produce times out of
order? What I'm after is a way to identify records that have been chagned
since a specific time so that I can get any later changes for processing. I
don't need them in commit order, so overlapping timestamps aren't a
problem.

* How many bytes are added to each row in the final implementation? The
discussions I saw seemed to be ranging from 12-24 bytes. There was
discussion of adding in extra bytes for "just in case." This is pre 9.5, so
a world ago.

* Are the timestamps indexed internally? With a B-tree? I ask for
capacity-planning reasons.

* I've seen on StackOverflow and the design discussions that the timestamps
are not kept indefinitely, but can't find the details on exactly how long
they are stored.

* Any rules of thumb on the performance impact of enabling
pg_xact_commit_timestamp? I don't need the data on all tables but, where I
do, it sounds like it might work perfectly.

Many thanks for any assistance!

Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2019-07-08 Thread Stephen Frost

Greetings,

* Bruce Momjian (br...@momjian.us) wrote:
> On Mon, Jul  8, 2019 at 05:41:51PM -0400, Stephen Frost wrote:
> > * Bruce Momjian (br...@momjian.us) wrote:
> > > Well, if it was a necessary features, I assume TLS 1.3 would have found
> > > a way to make it secure, no?  Certainly they are not shipping TLS 1.3
> > > with a known weakness.
> > 
> > As discussed below- this is about moving goalposts, and that's, in part
> > at least, why re-keying isn't a *necessary* feature of TLS.  As the
> 
> I agree we have to allow rekeying and allow multiple unlocked keys in
> the server at the same time.  The open question is whether encrypting
> different data with different keys and different unlock controls is
> possible or useful.

I'm not sure if there's really a question about if it's *possible*?  As
for if it's useful, I agree there's some debate.

> > amount of data you transmit over a given TLS connection increases
> > though, the risk increases and it would be better to re-key.  How much
> > better?  That depends a great deal on if someone is trying to mount an
> > attack or not.
> 
> Yep, we need to allow rekey.

Supporting a way to rekey is definitely a good idea.

> > > > > Of course, a database is going to process even more data so if the
> > > > > amount of data encrypted is a problem, we might have a problem too in
> > > > > using a single key.  This is not related to whether we use one key for
> > > > > the entire cluster or multiple keys per tablespace --- the problem is
> > > > > the same.  I guess we could create 1024 keys and use the bottom bits 
> > > > > of
> > > > > the block number to decide what key to use.  However, that still only
> > > > > pushes the goalposts farther.
> > > > 
> > > > All of this is about pushing the goalposts farther away, as I see it.
> > > > There's going to be trade-offs here and there isn't going to be any "one
> > > > right answer" when it comes to this space.  That's why I'm inclined to
> > > > argue that we should try to come up with a relatively *good* solution
> > > > that doesn't create a huge amount of work for us, and then build on
> > > > that.  To that end, leveraging metadata that we already have outside of
> > > > the catalogs (databases, tablespaces, potentially other information that
> > > > we store, essentially, in the filesystem metadata already) to decide on
> > > > what key to use, and how many we can support, strikes me as a good
> > > > initial target.
> > > 
> > > Yes, we will need that for a usable nonce that we don't need to store in
> > > the blocks and WAL files.
> > 
> > I'm not a fan of the idea of using something which is predictable as a
> > nonce.  Using the username as the salt for our md5 password mechanism
> > was, all around, a bad idea.  This seems like it's repeating that
> > mistake.
> 
> Uh, well, renaming the user was a big problem, but that is the only case
> I can think of.  I don't see that as an issue for block or WAL sequence
> numbers.  If we want to use a different nonce, we have to find a way to
> store it or look it up efficiently.  Considering the nonce size, I don't
> see how that is possible.

No, this also meant that, as an attacker, I *knew* the salt ahead of
time and therefore could build rainbow tables specifically for that
salt.  I could also use those *same* tables for any system where that
user had an account, even if they used different passwords on different
systems...

I appreciate that *some* of this might not be completely relevant for
the way a nonce is used in cryptography, but I'd be very surprised to
have a cryptographer tell me that a deterministic nonce didn't have
similar issues or didn't reduce the value of the nonce significantly.

> > > > > Anyway, I will to research the reasonable data size that can be 
> > > > > secured
> > > > > with a single key via AES.  I will look at how PGP encrypts large 
> > > > > files
> > > > > too.
> > > > 
> > > > This seems unlikely to lead to a definitive result, but it would be
> > > > interesting to hear if there have been studies around that and what
> > > > their conclusions were.
> > > 
> > > I found this:
> > > 
> > >   
> > > https://crypto.stackexchange.com/questions/44113/what-is-a-safe-maximum-message-size-limit-when-encrypting-files-to-disk-with-aes
> > >   
> > > https://crypto.stackexchange.com/questions/20333/encryption-of-big-files-in-java-with-aes-gcm/20340#20340
> > > 
> > > The numbers listed are:
> > > 
> > >   Maximum Encrypted Plaintext Size:  68GB
> > >   Maximum Processed Additional Authenticated Data: 2 x 10^18
> > 
> > These are specific to AES, from a quick reading of those pages, right?
> 
> Yes, AES with GCM, which has authentication parts we would not use, so
> we would use CBC and CTR, which I think has the same or larger spaces.
> > 
> > > The 68GB value is "the maximum bits that can be processed with a single
> > > key/IV(nonce) pair."  We would 8k of data for each 8k page.  If we
> > > assume a unique nonce per page that

Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2019-07-08 Thread Bruce Momjian

On Mon, Jul  8, 2019 at 05:41:51PM -0400, Stephen Frost wrote:
> * Bruce Momjian (br...@momjian.us) wrote:
> > Well, if it was a necessary features, I assume TLS 1.3 would have found
> > a way to make it secure, no?  Certainly they are not shipping TLS 1.3
> > with a known weakness.
> 
> As discussed below- this is about moving goalposts, and that's, in part
> at least, why re-keying isn't a *necessary* feature of TLS.  As the

I agree we have to allow rekeying and allow multiple unlocked keys in
the server at the same time.  The open question is whether encrypting
different data with different keys and different unlock controls is
possible or useful.

> amount of data you transmit over a given TLS connection increases
> though, the risk increases and it would be better to re-key.  How much
> better?  That depends a great deal on if someone is trying to mount an
> attack or not.

Yep, we need to allow rekey.

> > > > Of course, a database is going to process even more data so if the
> > > > amount of data encrypted is a problem, we might have a problem too in
> > > > using a single key.  This is not related to whether we use one key for
> > > > the entire cluster or multiple keys per tablespace --- the problem is
> > > > the same.  I guess we could create 1024 keys and use the bottom bits of
> > > > the block number to decide what key to use.  However, that still only
> > > > pushes the goalposts farther.
> > > 
> > > All of this is about pushing the goalposts farther away, as I see it.
> > > There's going to be trade-offs here and there isn't going to be any "one
> > > right answer" when it comes to this space.  That's why I'm inclined to
> > > argue that we should try to come up with a relatively *good* solution
> > > that doesn't create a huge amount of work for us, and then build on
> > > that.  To that end, leveraging metadata that we already have outside of
> > > the catalogs (databases, tablespaces, potentially other information that
> > > we store, essentially, in the filesystem metadata already) to decide on
> > > what key to use, and how many we can support, strikes me as a good
> > > initial target.
> > 
> > Yes, we will need that for a usable nonce that we don't need to store in
> > the blocks and WAL files.
> 
> I'm not a fan of the idea of using something which is predictable as a
> nonce.  Using the username as the salt for our md5 password mechanism
> was, all around, a bad idea.  This seems like it's repeating that
> mistake.

Uh, well, renaming the user was a big problem, but that is the only case
I can think of.  I don't see that as an issue for block or WAL sequence
numbers.  If we want to use a different nonce, we have to find a way to
store it or look it up efficiently.  Considering the nonce size, I don't
see how that is possible.

> > > > Anyway, I will to research the reasonable data size that can be secured
> > > > with a single key via AES.  I will look at how PGP encrypts large files
> > > > too.
> > > 
> > > This seems unlikely to lead to a definitive result, but it would be
> > > interesting to hear if there have been studies around that and what
> > > their conclusions were.
> > 
> > I found this:
> > 
> > 
> > https://crypto.stackexchange.com/questions/44113/what-is-a-safe-maximum-message-size-limit-when-encrypting-files-to-disk-with-aes
> > 
> > https://crypto.stackexchange.com/questions/20333/encryption-of-big-files-in-java-with-aes-gcm/20340#20340
> > 
> > The numbers listed are:
> > 
> > Maximum Encrypted Plaintext Size:  68GB
> > Maximum Processed Additional Authenticated Data: 2 x 10^18
> 
> These are specific to AES, from a quick reading of those pages, right?

Yes, AES with GCM, which has authentication parts we would not use, so
we would use CBC and CTR, which I think has the same or larger spaces.
> 
> > The 68GB value is "the maximum bits that can be processed with a single
> > key/IV(nonce) pair."  We would 8k of data for each 8k page.  If we
> > assume a unique nonce per page that is 10^32 bytes.
> 
> A unique nonce per page strikes me as excessive...  but then, I think we
> should have an actually random nonce rather than something calculated
> from the metadata.

Uh, well, you are much less likely to get duplicate nonce values by
using block number or WAL sequence number.  If you look at the
implementations, few compute random nonce values.

> > For the WAL we would probably use a different nonce for each 16MB page,
> > so we would be OK there too, since that is 10 ^ 36 bytes.
> > 
> > gives us 10^36 bytes before the segment number causes the nonce to
> > repeat.
> 
> This presumes that the segment number is involved in the nonce
> selection, which again strikes me as a bad idea.  Maybe it could be
> involved in some way, but we should have a properly random nonce.

And you base the random goal on what?  Nonce is number used only once,
and randomness is not a requirement.  You can say you prefer it, but
why, because most implementations don't

Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2019-07-08 Thread Stephen Frost

Greetings,

* Bruce Momjian (br...@momjian.us) wrote:
> On Mon, Jul  8, 2019 at 02:39:44PM -0400, Stephen Frost wrote:
> > > > Of course, we can discuss if what websites do with over-the-wire
> > > > encryption is sensible to compare to what we want to do in PG for
> > > > data-at-rest, but then we shouldn't be talking about what websites do,
> > > > it'd make more sense to look at other data-at-rest encryption systems
> > > > and consider what they're doing.
> > > 
> > > (I talked to Joe on chat for clarity.)  In modern TLS, the certificate is
> > > used only for authentication, and Diffie–Hellman is used for key
> > > exchange:
> > > 
> > >   https://en.wikipedia.org/wiki/Diffie%E2%80%93Hellman_key_exchange
> > 
> > Right, and the key that's figured out for each connection is at least
> > specific to the server AND client keys/certificates, thus meaning that
> > they're changed at least as frequently as those change (and clients end
> > up creating ones on the fly randomly if they don't have one, iirc).
> > 
> > > So, the question is whether you can pass so much data in TLS that using
> > > the same key for the entire session is a security issue.  TLS originally
> > > had key renegotiation, but that was removed in TLS 1.3:
> > > 
> > >   
> > > https://www.cloudinsidr.com/content/known-attack-vectors-against-tls-implementation-vulnerabilities/
> > >   To mitigate these types of attacks, TLS 1.3 disallows renegotiation.
> > 
> > It was removed due to attacks targeting the renegotiation, not because
> > doing re-keying by itself was a bad idea, or because using multiple keys
> > was seen as a bad idea.
> 
> Well, if it was a necessary features, I assume TLS 1.3 would have found
> a way to make it secure, no?  Certainly they are not shipping TLS 1.3
> with a known weakness.

As discussed below- this is about moving goalposts, and that's, in part
at least, why re-keying isn't a *necessary* feature of TLS.  As the
amount of data you transmit over a given TLS connection increases
though, the risk increases and it would be better to re-key.  How much
better?  That depends a great deal on if someone is trying to mount an
attack or not.

> > > Of course, a database is going to process even more data so if the
> > > amount of data encrypted is a problem, we might have a problem too in
> > > using a single key.  This is not related to whether we use one key for
> > > the entire cluster or multiple keys per tablespace --- the problem is
> > > the same.  I guess we could create 1024 keys and use the bottom bits of
> > > the block number to decide what key to use.  However, that still only
> > > pushes the goalposts farther.
> > 
> > All of this is about pushing the goalposts farther away, as I see it.
> > There's going to be trade-offs here and there isn't going to be any "one
> > right answer" when it comes to this space.  That's why I'm inclined to
> > argue that we should try to come up with a relatively *good* solution
> > that doesn't create a huge amount of work for us, and then build on
> > that.  To that end, leveraging metadata that we already have outside of
> > the catalogs (databases, tablespaces, potentially other information that
> > we store, essentially, in the filesystem metadata already) to decide on
> > what key to use, and how many we can support, strikes me as a good
> > initial target.
> 
> Yes, we will need that for a usable nonce that we don't need to store in
> the blocks and WAL files.

I'm not a fan of the idea of using something which is predictable as a
nonce.  Using the username as the salt for our md5 password mechanism
was, all around, a bad idea.  This seems like it's repeating that
mistake.

> > > Anyway, I will to research the reasonable data size that can be secured
> > > with a single key via AES.  I will look at how PGP encrypts large files
> > > too.
> > 
> > This seems unlikely to lead to a definitive result, but it would be
> > interesting to hear if there have been studies around that and what
> > their conclusions were.
> 
> I found this:
> 
>   
> https://crypto.stackexchange.com/questions/44113/what-is-a-safe-maximum-message-size-limit-when-encrypting-files-to-disk-with-aes
>   
> https://crypto.stackexchange.com/questions/20333/encryption-of-big-files-in-java-with-aes-gcm/20340#20340
> 
> The numbers listed are:
> 
>   Maximum Encrypted Plaintext Size:  68GB
>   Maximum Processed Additional Authenticated Data: 2 x 10^18

These are specific to AES, from a quick reading of those pages, right?

> The 68GB value is "the maximum bits that can be processed with a single
> key/IV(nonce) pair."  We would 8k of data for each 8k page.  If we
> assume a unique nonce per page that is 10^32 bytes.

A unique nonce per page strikes me as excessive...  but then, I think we
should have an actually random nonce rather than something calculated
from the metadata.

> For the WAL we would probably use a different nonce for each 16MB page,
> so we would be OK there too, since

Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2019-07-08 Thread Bruce Momjian

On Mon, Jul  8, 2019 at 09:30:03PM +0200, Tomas Vondra wrote:
> I think Bruce's proposal was to minimize the time the key is "unlocked"
> in memory by only unlocking them when the user connects and supplies
> some sort of secret (passphrase), and remove them from memory when the
> user disconnects. So there's no way for the auxiliary processes to gain
> access to those keys, because only the user knows the secret.

I mentioned that because I thought that was the security value that
people wanted.  While I can see the value, I don't see how it can be
cleanly accomplished.  Keeping the keys unlocked at all times seems to
be possible, but of much smaller value.

Part of my goal in this discussion is to reverse the rush to implement
and pick apart exactly what is possible, and desirable.

> FWIW I have doubts this scheme actually measurably improves privacy in
> practice, because most busy applications will end up having the keys in
> the memory all the time anyway.

Yep.

> It also assumes memory is unsafe, i.e. bad actors can read it, and
> that's probably a valid concern (root access, vulnerabilities etc.). But
> in that case we already have plenty of issues with data in flight
> anyway, and I doubt TDE is an answer to that.

Agreed.

> > Ideally, all of this would leverage a vaulting system or other mechanism
> > which manages access to the keys and allows their usage to be limited.
> > That's been generally accepted as a good way to bridge the gap between
> > having to ask users every time for a key and having keys stored
> > long-term in memory.
> 
> Right. I agree with this.
> 
> > Having *only* the keys for the data which the
> > currently connected user is allowed to access would certainly be a great
> > initial capability, even if system processes (including potentially WAL
> > replay) have to have access to all of the keys.  And yes, shared buffers
> > being unencrypted and accessible by every backend continues to be an
> > issue- it'd be great to improve on that situation too.  I don't think
> > having everything encrypted in shared buffers is likely the solution,
> > rather, segregating it up might make more sense, again, along similar
> > lines to keys and using metadata that's outside of the catalogs, which
> > has been discussed previously, though I don't think anyone's actively
> > working on it.
> > 
> 
> I very much doubt TDE is a solution to this. Essentially, TDE is a good
> data-at-rest solution, but this seems more like protecting data during
> execution. And in that case I think we may need an entirely different
> encryption scheme.

I thought client-level encryption or pgcrypto-style encryption fits that
need better.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +

Re: Excessive memory usage in multi-statement queries w/ partitioning

2019-07-08 Thread Tom Lane

Amit Langote  writes:
> [ parse-plan-memcxt_v2.patch ]

I got around to looking at this finally.  I'm not at all happy with
the fact that it's added a plantree copy step to the only execution
path through exec_simple_query.  That's a very significant overhead,
in many use-cases, to solve something that nobody had complained
about for a couple of decades before now.  I don't see the need for
any added copy step anyway.  The only reason you're doing it AFAICS
is so you can release the per-statement context a bit earlier, which
is a completely unnecessary optimization.  Just wait to release it
till the bottom of the loop.

Also, creating/deleting the sub-context is in itself an added overhead
that accomplishes exactly nothing in the typical case where there's
not multiple statements.  I thought the idea was to do that only if
there was more than one raw parsetree (or, maybe better, do it for
all but the last parsetree).

To show that this isn't an empty concern, I did a quick pgbench
test.  Using a single-client select-only test ("pgbench -S -T 60"
in an -s 10 database), I got these numbers in three trials with HEAD:

tps = 9593.818478 (excluding connections establishing)
tps = 9570.189163 (excluding connections establishing)
tps = 9596.579038 (excluding connections establishing)

and these numbers after applying the patch:

tps = 9411.918165 (excluding connections establishing)
tps = 9389.279079 (excluding connections establishing)
tps = 9409.350175 (excluding connections establishing)

That's about a 2% dropoff.  Now it's possible that that can be
explained away as random variations from a slightly different layout
of critical loops vs cacheline boundaries ... but I don't believe it.

regards, tom lane

Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2019-07-08 Thread Bruce Momjian

On Mon, Jul  8, 2019 at 02:39:44PM -0400, Stephen Frost wrote:
> > > Of course, we can discuss if what websites do with over-the-wire
> > > encryption is sensible to compare to what we want to do in PG for
> > > data-at-rest, but then we shouldn't be talking about what websites do,
> > > it'd make more sense to look at other data-at-rest encryption systems
> > > and consider what they're doing.
> > 
> > (I talked to Joe on chat for clarity.)  In modern TLS, the certificate is
> > used only for authentication, and Diffie–Hellman is used for key
> > exchange:
> > 
> > https://en.wikipedia.org/wiki/Diffie%E2%80%93Hellman_key_exchange
> 
> Right, and the key that's figured out for each connection is at least
> specific to the server AND client keys/certificates, thus meaning that
> they're changed at least as frequently as those change (and clients end
> up creating ones on the fly randomly if they don't have one, iirc).
> 
> > So, the question is whether you can pass so much data in TLS that using
> > the same key for the entire session is a security issue.  TLS originally
> > had key renegotiation, but that was removed in TLS 1.3:
> > 
> > 
> > https://www.cloudinsidr.com/content/known-attack-vectors-against-tls-implementation-vulnerabilities/
> > To mitigate these types of attacks, TLS 1.3 disallows renegotiation.
> 
> It was removed due to attacks targeting the renegotiation, not because
> doing re-keying by itself was a bad idea, or because using multiple keys
> was seen as a bad idea.

Well, if it was a necessary features, I assume TLS 1.3 would have found
a way to make it secure, no?  Certainly they are not shipping TLS 1.3
with a known weakness.

> > Of course, a database is going to process even more data so if the
> > amount of data encrypted is a problem, we might have a problem too in
> > using a single key.  This is not related to whether we use one key for
> > the entire cluster or multiple keys per tablespace --- the problem is
> > the same.  I guess we could create 1024 keys and use the bottom bits of
> > the block number to decide what key to use.  However, that still only
> > pushes the goalposts farther.
> 
> All of this is about pushing the goalposts farther away, as I see it.
> There's going to be trade-offs here and there isn't going to be any "one
> right answer" when it comes to this space.  That's why I'm inclined to
> argue that we should try to come up with a relatively *good* solution
> that doesn't create a huge amount of work for us, and then build on
> that.  To that end, leveraging metadata that we already have outside of
> the catalogs (databases, tablespaces, potentially other information that
> we store, essentially, in the filesystem metadata already) to decide on
> what key to use, and how many we can support, strikes me as a good
> initial target.

Yes, we will need that for a usable nonce that we don't need to store in
the blocks and WAL files.

> > Anyway, I will to research the reasonable data size that can be secured
> > with a single key via AES.  I will look at how PGP encrypts large files
> > too.
> 
> This seems unlikely to lead to a definitive result, but it would be
> interesting to hear if there have been studies around that and what
> their conclusions were.

I found this:


https://crypto.stackexchange.com/questions/44113/what-is-a-safe-maximum-message-size-limit-when-encrypting-files-to-disk-with-aes

https://crypto.stackexchange.com/questions/20333/encryption-of-big-files-in-java-with-aes-gcm/20340#20340

The numbers listed are:

Maximum Encrypted Plaintext Size:  68GB
Maximum Processed Additional Authenticated Data: 2 x 10^18

The 68GB value is "the maximum bits that can be processed with a single
key/IV(nonce) pair."  We would 8k of data for each 8k page.  If we
assume a unique nonce per page that is 10^32 bytes.

For the WAL we would probably use a different nonce for each 16MB page,
so we would be OK there too, since that is 10 ^ 36 bytes.

gives us 10^36 bytes before the segment number causes the nonce to
repeat.

> When it comes to concerns about autovacuum or other system processes,
> those don't have any direct user connections or interactions, so having
> them be more privileged and having access to more is reasonable.

Well, I am trying to understand the value of having some keys accessible
by some parts of the system, and some not.  I am unclear what security
value that has.

> Ideally, all of this would leverage a vaulting system or other mechanism
> which manages access to the keys and allows their usage to be limited.
> That's been generally accepted as a good way to bridge the gap between
> having to ask users every time for a key and having keys stored
> long-term in memory.  Having *only* the keys for the data which the
> currently connected user is allowed to access would certainly be a great
> initial capability, even if system processes (including potentially WAL
> replay) have to have access

Re: Add missing operator <->(box, point)

2019-07-08 Thread Alexander Korotkov

On Mon, Jul 8, 2019 at 11:39 PM Nikita Glukhov  wrote:
> On 08.07.2019 18:22, Alexander Korotkov wrote:
> For me it doesn't look worth having two distinct functions
> gist_box_distance_helper() and gist_bbox_distance().  What about
> having just one and leave responsibility for recheck flag to the
> caller?
>
> gist_bbox_distance() was removed.

OK.

> But maybe it would be better to replace two identical functions
> gist_circle_distance() and gist_poly_distance() with the single
> gist_bbox_distance()?

Sounds reasonable to me.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: Add parallelism and glibc dependent only options to reindexdb

2019-07-08 Thread Julien Rouhaud

On Mon, Jul 8, 2019 at 9:08 PM Julien Rouhaud  wrote:
>
> I'll resubmit the parallel patch using per-table only
> approach

Attached.


0001-Export-vacuumdb-s-parallel-infrastructure.patch
Description: Binary data


0002-Add-parallel-processing-to-reindexdb.patch
Description: Binary data

Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2019-07-08 Thread Bruce Momjian

On Mon, Jul  8, 2019 at 12:09:58PM -0400, Joe Conway wrote:
> On 7/8/19 11:56 AM, Peter Eisentraut wrote:
> > On 2019-07-08 17:47, Stephen Frost wrote:
> >> Of course, we can discuss if what websites do with over-the-wire
> >> encryption is sensible to compare to what we want to do in PG for
> >> data-at-rest, but then we shouldn't be talking about what websites do,
> >> it'd make more sense to look at other data-at-rest encryption systems
> >> and consider what they're doing.
> > 
> > So, how do encrypted file systems do it?  Are there any encrypted file
> > systems in general use that allow encrypting only some files or
> > encrypting different parts of the file system with different keys, or
> > any of those other granular approaches being discussed?
> 
> Well it is fairly common, for good reason IMHO, to encrypt some mount
> points and not others on a system. In my mind, and in practice to a
> large extent, a postgres tablespace == a unique mount point.

Yes, that is a normal partition point for key use because one file
system is independent of others.  You could use different keys for
different directories in the same file system, but internally it all
uses the same storage, and storage theft would potentially happen at the
file system level.

For Postgres, tablespaces are not independent of the database system,
though media theft would still match.  Of course, in the case of a
tablespace media theft, Postgres would be quite confused, though you
could still start the database system:

SELECT * FROM test;
ERROR:  could not open file
"pg_tblspc/16385/PG_13_201907054/16384/16386": No such file or directory

but the data would be gone.  What you can't do with Postgres is to have
the tablespace be inaccessible and then later reappear.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +

Re: Add missing operator <->(box, point)

2019-07-08 Thread Nikita Glukhov


Attached 3rd version of the patches.

On 02.07.2019 21:55, Alexander Korotkov wrote:


On Tue, Jul 2, 2019 at 9:19 PM Nikita Glukhov  wrote:

We could use commuted "const <-> var" operators for kNN searches, but the
current implementation requires the existence of "var <-> const" operators, and
order-by-op clauses are rebuilt using them (see match_clause_to_ordering_op()
at /src/backend/optimizer/path/indxpath.c).

But probably it's still worth to just add commutator for every <->
operator and close this question.  Otherwise, it may arise again once
we want to add some more kNN support to opclasses or something.  On
the other hand, are we already going to limit oid consumption?


All missing distance operators were added to the first patch.



On 08.07.2019 18:22, Alexander Korotkov wrote:


On Mon, Mar 11, 2019 at 2:49 AM Nikita Glukhov  wrote:

2. Add <-> to GiST box_ops.
Extracted gist_box_distance_helper() common for gist_box_distance() and
gist_bbox_distance().

For me it doesn't look worth having two distinct functions
gist_box_distance_helper() and gist_bbox_distance().  What about
having just one and leave responsibility for recheck flag to the
caller?


gist_bbox_distance() was removed.

But maybe it would be better to replace two identical functions
gist_circle_distance() and gist_poly_distance() with the single
gist_bbox_distance()?



3. Add <-> to SP-GiST.
Changed only catalog and tests.  Box case is already checked in
spg_box_quad_leaf_consistent():
  out->recheckDistances = distfnoid == F_DIST_POLYP;

So, it seems to be fix of oversight in 2a6368343ff4.  But assuming
fixing this requires catalog changes, we shouldn't backpatch this.


--
Nikita Glukhov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

>From 14a0e22e1b68b084dd75a7f660955b52b6aaae79 Mon Sep 17 00:00:00 2001
From: Nikita Glukhov 
Date: Mon, 8 Jul 2019 22:55:00 +0300
Subject: [PATCH 1/3] Add missing distance operators

---
 src/backend/utils/adt/geo_ops.c| 136 -
 src/include/catalog/pg_operator.dat|  42 +-
 src/include/catalog/pg_proc.dat|  25 +
 src/test/regress/expected/geometry.out | 986 +
 src/test/regress/sql/geometry.sql  |  15 +-
 5 files changed, 687 insertions(+), 517 deletions(-)

diff --git a/src/backend/utils/adt/geo_ops.c b/src/backend/utils/adt/geo_ops.c
index 373784f..6558ea3 100644
--- a/src/backend/utils/adt/geo_ops.c
+++ b/src/backend/utils/adt/geo_ops.c
@@ -2348,6 +2348,17 @@ dist_pl(PG_FUNCTION_ARGS)
 	PG_RETURN_FLOAT8(line_closept_point(NULL, line, pt));
 }
 
+/*
+ * Distance from a line to a point
+ */
+Datum
+dist_lp(PG_FUNCTION_ARGS)
+{
+	LINE	   *line = PG_GETARG_LINE_P(0);
+	Point	   *pt = PG_GETARG_POINT_P(1);
+
+	PG_RETURN_FLOAT8(line_closept_point(NULL, line, pt));
+}
 
 /*
  * Distance from a point to a lseg
@@ -2362,13 +2373,20 @@ dist_ps(PG_FUNCTION_ARGS)
 }
 
 /*
- * Distance from a point to a path
+ * Distance from a lseg to a point
  */
 Datum
-dist_ppath(PG_FUNCTION_ARGS)
+dist_sp(PG_FUNCTION_ARGS)
+{
+	LSEG	   *lseg = PG_GETARG_LSEG_P(0);
+	Point	   *pt = PG_GETARG_POINT_P(1);
+
+	PG_RETURN_FLOAT8(lseg_closept_point(NULL, lseg, pt));
+}
+
+static float8
+dist_ppath_internal(Point *pt, PATH *path)
 {
-	Point	   *pt = PG_GETARG_POINT_P(0);
-	PATH	   *path = PG_GETARG_PATH_P(1);
 	float8		result = 0.0;	/* keep compiler quiet */
 	bool		have_min = false;
 	float8		tmp;
@@ -2403,7 +2421,31 @@ dist_ppath(PG_FUNCTION_ARGS)
 		}
 	}
 
-	PG_RETURN_FLOAT8(result);
+	return result;
+}
+
+/*
+ * Distance from a point to a path
+ */
+Datum
+dist_ppath(PG_FUNCTION_ARGS)
+{
+	Point	   *pt = PG_GETARG_POINT_P(0);
+	PATH	   *path = PG_GETARG_PATH_P(1);
+
+	PG_RETURN_FLOAT8(dist_ppath_internal(pt, path));
+}
+
+/*
+ * Distance from a path to a point
+ */
+Datum
+dist_pathp(PG_FUNCTION_ARGS)
+{
+	PATH	   *path = PG_GETARG_PATH_P(0);
+	Point	   *pt = PG_GETARG_POINT_P(1);
+
+	PG_RETURN_FLOAT8(dist_ppath_internal(pt, path));
 }
 
 /*
@@ -2419,6 +2461,18 @@ dist_pb(PG_FUNCTION_ARGS)
 }
 
 /*
+ * Distance from a box to a point
+ */
+Datum
+dist_bp(PG_FUNCTION_ARGS)
+{
+	BOX		   *box = PG_GETARG_BOX_P(0);
+	Point	   *pt = PG_GETARG_POINT_P(1);
+
+	PG_RETURN_FLOAT8(box_closept_point(NULL, box, pt));
+}
+
+/*
  * Distance from a lseg to a line
  */
 Datum
@@ -2431,6 +2485,18 @@ dist_sl(PG_FUNCTION_ARGS)
 }
 
 /*
+ * Distance from a line to a lseg
+ */
+Datum
+dist_ls(PG_FUNCTION_ARGS)
+{
+	LINE	   *line = PG_GETARG_LINE_P(0);
+	LSEG	   *lseg = PG_GETARG_LSEG_P(1);
+
+	PG_RETURN_FLOAT8(lseg_closept_line(NULL, lseg, line));
+}
+
+/*
  * Distance from a lseg to a box
  */
 Datum
@@ -2443,6 +2509,18 @@ dist_sb(PG_FUNCTION_ARGS)
 }
 
 /*
+ * Distance from a box to a lseg
+ */
+Datum
+dist_bs(PG_FUNCTION_ARGS)
+{
+	BOX		   *box = PG_GETARG_BOX_P(0);
+	LSEG	   *lseg = PG_GETARG_LSEG_P(1);
+
+	PG_RETURN_FLOAT8(box_closept_lseg(NULL, box, lseg));
+}
+
+/*
  * Distance from a line to a box
  */
 Datum
@@ -2462,13

Re: Ltree syntax improvement

2019-07-08 Thread Alvaro Herrera

On 2019-Jul-08, Dmitry Belyavsky wrote:

> I did not introduce any functions. I've just changed the parser.

I mean the C-level functions -- count_parts_ors() and so on.

> I'm not sure that it makes sense to remove any tests as most of them were
> written to catch really happened bugs during the implementation.

Well, I don't mean to decrease the coverage, only to condense a lot of
little tests in a small powerful test.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Built-in connection pooler

2019-07-08 Thread Konstantin Knizhnik




On 08.07.2019 3:37, Thomas Munro wrote:

On Tue, Jul 2, 2019 at 3:11 AM Konstantin Knizhnik
 wrote:

On 01.07.2019 12:57, Thomas Munro wrote:

Interesting work.  No longer applies -- please rebase.


Rebased version of the patch is attached.
Also all this version of built-ni proxy can be found in conn_proxy
branch of https://github.com/postgrespro/postgresql.builtin_pool.git

Thanks Konstantin.  I haven't looked at the code, but I can't help
noticing that this CF entry and the autoprepare one are both features
that come up again an again on feature request lists I've seen.
That's very cool.  They also both need architectural-level review.
With my Commitfest manager hat on: reviewing other stuff would help
with that; if you're looking for something of similar complexity and
also the same level of
everyone-knows-we-need-to-fix-this-!@#$-we-just-don't-know-exactly-how-yet
factor, I hope you get time to provide some more feedback on Takeshi
Ideriha's work on shared caches, which doesn't seem a million miles
from some of the things you're working on.


Thank you, I will look at Takeshi Ideriha's patch.


Could you please fix these compiler warnings so we can see this
running check-world on CI?

https://ci.appveyor.com/project/postgresql-cfbot/postgresql/build/1.0.46324
https://travis-ci.org/postgresql-cfbot/postgresql/builds/555180678

Sorry, I do not have access to Windows host, so can not check Win32 
build myself.
I have fixed all the reported warnings but can not verify that Win32 
build is now ok.




diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 84341a30e5..9398e561e8 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -719,6 +719,123 @@ include_dir 'conf.d'
   
  
 
+ 
+  max_sessions (integer)
+  
+   max_sessions configuration parameter
+  
+  
+  
+
+  The maximum number of client sessions that can be handled by
+  one connection proxy when session pooling is switched on.
+  This parameter does not add any memory or CPU overhead, so
+  specifying a large max_sessions value
+  does not affect performance.
+  If the max_sessions limit is reached new connection are not accepted.
+
+
+  The default value is 1000. This parameter can only be set at server start.
+   
+  
+ 
+
+ 
+  session_pool_size (integer)
+  
+   session_pool_size configuration parameter
+  
+  
+  
+
+  Enables session pooling and defines the maximum number of
+  backends that can be used by client sessions for each database/user combination.
+  Launched non-tainted backends are never terminated even if there are no active sessions.
+  Backend is considered as tainted if client updates GUCs, creates temporary table or prepared statements.
+  Tainted backend can server only one client.
+
+
+  The default value is 10, so up to 10 backends will server each database,
+
+  
+ 
+
+ 
+  proxy_port (integer)
+  
+   proxy_port configuration parameter
+  
+  
+  
+
+  Sets the TCP port for the connection pooler.
+  Clients connected to main "port" will be assigned dedicated backends,
+  while client connected to proxy port will be connected to backends through proxy which
+  performs transaction level scheduling. 
+   
+
+  The default value is 6543.
+
+  
+ 
+
+ 
+  connection_proxies (integer)
+  
+   connection_proxies configuration parameter
+  
+  
+  
+
+  Sets number of connection proxies.
+  Postmaster spawns separate worker process for each proxy. Postmaster scatters connections between proxies using one of scheduling policies (round-robin, random, load-balancing)."
+		 "Each proxy launches its own subset of backends. So maximal number of non-tainted backends is "
+		  "session_pool_size*connection_proxies*databases*roles.
+   
+
+  The default value is 0, so session pooling is disabled.
+
+  
+ 
+
+ 
+  session_schedule (enum)
+  
+   session_schedule configuration parameter
+  
+  
+  
+
+  Specifies scheduling policy for assigning session to proxies in case of
+  connection pooling. Default policy is round-robin.
+
+
+  With round-robin policy postmaster cyclicly scatter sessions between proxies.
+
+
+  With random policy postmaster randomly choose proxy for new session.
+
+
+  With load-balancing policy postmaster choose proxy with lowest load average.
+  Load average of proxy is estimated by number of clients connection assigned to this proxy with extra weight for SSL connections.
+   
+  
+ 
+
+ 
+

Re: Ltree syntax improvement

2019-07-08 Thread Dmitry Belyavsky

Dear Alvaro,

On Mon, Jul 8, 2019 at 11:16 PM Alvaro Herrera 
wrote:

> On 2019-Jul-08, Dmitry Belyavsky wrote:
>
> > Dear Thomas,
> >
> > Thank you for your proofreading!
> >
> > Please find the updated patch attached. It also contains the missing
> > escaping.
>
> I think all these functions you're adding should have a short sentence
> explaining what it does.
>
> I'm not really convinced that we need this much testing.  It seems a bit
> excessive.  Many of these very focused test SQL lines could be removed
> with no loss of coverage, and many of the others could be grouped into
> one.
>

I did not introduce any functions. I've just changed the parser.
I'm not sure that it makes sense to remove any tests as most of them were
written to catch really happened bugs during the implementation.

-- 
SY, Dmitry Belyavsky

Re: Ltree syntax improvement

2019-07-08 Thread Alvaro Herrera

On 2019-Jul-08, Dmitry Belyavsky wrote:

> Dear Thomas,
> 
> Thank you for your proofreading!
> 
> Please find the updated patch attached. It also contains the missing
> escaping.

I think all these functions you're adding should have a short sentence
explaining what it does.

I'm not really convinced that we need this much testing.  It seems a bit
excessive.  Many of these very focused test SQL lines could be removed
with no loss of coverage, and many of the others could be grouped into
one.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2019-07-08 Thread Tomas Vondra


On Mon, Jul 08, 2019 at 12:09:58PM -0400, Joe Conway wrote:

On 7/8/19 11:56 AM, Peter Eisentraut wrote:

On 2019-07-08 17:47, Stephen Frost wrote:

Of course, we can discuss if what websites do with over-the-wire
encryption is sensible to compare to what we want to do in PG for
data-at-rest, but then we shouldn't be talking about what websites do,
it'd make more sense to look at other data-at-rest encryption systems
and consider what they're doing.


So, how do encrypted file systems do it?  Are there any encrypted file
systems in general use that allow encrypting only some files or
encrypting different parts of the file system with different keys, or
any of those other granular approaches being discussed?


Well it is fairly common, for good reason IMHO, to encrypt some mount
points and not others on a system. In my mind, and in practice to a
large extent, a postgres tablespace == a unique mount point.

There is a description here:

 https://wiki.archlinux.org/index.php/Disk_encryption



That link is a bit overwhelming, as it explains how various encrypted
filesystems do things. There's now official support for this in the
Linux kernel (encryption at the filesystem level, not block device) in
the form of fscrypt, see

 https://www.kernel.org/doc/html/latest/filesystems/fscrypt.html

It's a bit different because that's not a stacked encryption, it's
integrated directly into filesystems (like ext4, at the moment) and it
leverages other kernel facilities (like keyring).

The link also discusses the threat model, which is interesting
particularly interesting for this discussion, IMO.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2019-07-08 Thread Tomas Vondra


On Mon, Jul 08, 2019 at 12:16:04PM -0400, Bruce Momjian wrote:


...

Anyway, I will to research the reasonable data size that can be secured
with a single key via AES.  I will look at how PGP encrypts large files
too.



IMO there are various recommendations about this, for example from NIST.
But it varies on the exact encryption mode (say, GCM, XTS, ...) and the
recommendations are not "per key" but "per key + nonce" etc.

IANAC but my understanding is if we use e.g. "OID + blocknum" as nonce,
then we should be pretty safe.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Assertion for logically decoding multi inserts into the catalog

2019-07-08 Thread Daniel Gustafsson

My patch for using heap_multi_insert in the catalog [1] failed the logical
decoding part of test/recovery [2].

The assertion it failed on seems to not take multi inserts into the catalog
into consideration, while the main logic does.  This assertion hasn't tripped
since there are no multi inserts into the catalog, but if we introduce them it
will so I’m raising it in a separate thread as it is sort of unrelated from the
patch in question.

The attached patch fixes my test failure and makes sense to me, but this code
is far from my neck of the tree, so I’m really not sure this is the best way to
express the assertion.

cheers ./daniel

[1] https://commitfest.postgresql.org/23/2125/
[2] 
https://postgr.es/m/CA+hUKGLg1vFiXnkxjp_bea5+VP8D=vhrwsdvj7rbikr_u4x...@mail.gmail.com



logdec_assert.patch
Description: Binary data

Re: Declared but no defined functions

2019-07-08 Thread Tom Lane

Ashwin Agrawal  writes:
> Do we wish to make this a tool and have it in src/tools, either as part of
> find_static tool after renaming that one to more generic name or
> independent script.

Well, the scripts described so far are little more than jury-rigged
hacks, with lots of room for false positives *and* false negatives.
I wouldn't want to institutionalize any of them as the right way to
check for such problems.  If somebody made the effort to create a
tool that was actually trustworthy, perhaps that'd be a different
story.

(Personally I was wondering whether pgindent could be hacked up to
emit things it thought were declarations of function names.  I'm
not sure that I'd trust that 100% either, but at least it would have
a better shot than the grep hacks we've discussed so far.  Note in
particular that pgindent would see things inside #ifdef blocks,
whether or not your local build ever sees those declarations.)

regards, tom lane

Re: [HACKERS] Cached plans and statement generalization

2019-07-08 Thread Konstantin Knizhnik





On 08.07.2019 2:23, Thomas Munro wrote:

On Tue, Jul 2, 2019 at 3:29 AM Konstantin Knizhnik
 wrote:

Attached please find rebased version of the patch.
Also this version can be found in autoprepare branch of this repository
https://github.com/postgrespro/postgresql.builtin_pool.git
on github.

Thanks.  I haven't looked at the code but this seems like interesting
work and I hope it will get some review.  I guess this is bound to use
a lot of memory.  I guess we'd eventually want to figure out how to
share the autoprepared plan cache between sessions, which is obviously
a whole can of worms.

A couple of trivial comments with my CF manager hat on:

1.  Can you please fix the documentation?  It doesn't build.
Obviously reviewing the goals, design and implementation are more
important than the documentation at this point, but if that is fixed
then the CF bot will be able to run check-world every day and we might
learn something about the code.
2.  Accidental editor junk included:  src/include/catalog/pg_proc.dat.~1~



Sorry, are you tests autoprepare-16.patch I have sent in the last e-mail?
I can not reproduce the problem with building documentation:

knizhnik@xps:~/postgresql/doc$ make
make -C ../src/backend generated-headers
make[1]: Entering directory '/home/knizhnik/postgresql/src/backend'
make -C catalog distprep generated-header-symlinks
make[2]: Entering directory '/home/knizhnik/postgresql/src/backend/catalog'
make[2]: Nothing to be done for 'distprep'.
make[2]: Nothing to be done for 'generated-header-symlinks'.
make[2]: Leaving directory '/home/knizhnik/postgresql/src/backend/catalog'
make -C utils distprep generated-header-symlinks
make[2]: Entering directory '/home/knizhnik/postgresql/src/backend/utils'
make[2]: Nothing to be done for 'distprep'.
make[2]: Nothing to be done for 'generated-header-symlinks'.
make[2]: Leaving directory '/home/knizhnik/postgresql/src/backend/utils'
make[1]: Leaving directory '/home/knizhnik/postgresql/src/backend'
make -C src all
make[1]: Entering directory '/home/knizhnik/postgresql/doc/src'
make -C sgml all
make[2]: Entering directory '/home/knizhnik/postgresql/doc/src/sgml'
/usr/bin/xmllint --path . --noout --valid postgres.sgml
/usr/bin/xsltproc --path . --stringparam pg.version '12devel' 
stylesheet.xsl postgres.sgml

cp ./stylesheet.css html/
touch html-stamp
/usr/bin/xmllint --path . --noout --valid postgres.sgml
/usr/bin/xsltproc --path . --stringparam pg.version '12devel' 
stylesheet-man.xsl postgres.sgml

touch man-stamp
make[2]: Leaving directory '/home/knizhnik/postgresql/doc/src/sgml'
make[1]: Leaving directory '/home/knizhnik/postgresql/doc/src'


Also autoporepare-16.patch doesn't include any junk

src/include/catalog/pg_proc.dat.~1~

Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2019-07-08 Thread Tomas Vondra

On Mon, Jul 08, 2019 at 02:39:44PM -0400, Stephen Frost wrote:

Greetings,

* Bruce Momjian (br...@momjian.us) wrote:

On Mon, Jul  8, 2019 at 11:47:33AM -0400, Stephen Frost wrote:
> * Bruce Momjian (br...@momjian.us) wrote:
> > On Mon, Jul  8, 2019 at 11:18:01AM -0400, Joe Conway wrote:
> > > On 7/8/19 10:19 AM, Bruce Momjian wrote:
> > > > When people are asking for multiple keys (not just for key rotation),
> > > > they are asking to have multiple keys that can be supplied by users only
> > > > when they need to access the data.  Yes, the keys are always in the
> > > > datbase, but the feature request is that they are only unlocked when the
> > > > user needs to access the data.  Obviously, that will not work for
> > > > autovacuum when the encryption is at the block level.
> > >
> > > > If the key is always unlocked, there is questionable security value of
> > > > having multiple keys, beyond key rotation.
> > >
> > > That is not true. Having multiple keys also allows you to reduce the
> > > amount of data encrypted with a single key, which is desirable because:
> > >
> > > 1. It makes cryptanalysis more difficult
> > > 2. Puts less data at risk if someone gets "lucky" in doing brute force
> >
> > What systems use multiple keys like that?  I know of no website that
> > does that.  Your arguments seem hypothetical.  What is your goal here?
>
> Not sure what the reference to 'website' is here, but one doesn't get
> certificates for TLS/SSL usage that aren't time-bounded, and when it
> comes to the actual on-the-wire encryption that's used, that's a
> symmetric key that's generated on-the-fly for every connection.
>
> Wouldn't the fact that they generate a different key for every
> connection be a pretty clear indication that it's a good idea to use
> multiple keys and not use the same key over and over..?
>
> Of course, we can discuss if what websites do with over-the-wire
> encryption is sensible to compare to what we want to do in PG for
> data-at-rest, but then we shouldn't be talking about what websites do,
> it'd make more sense to look at other data-at-rest encryption systems
> and consider what they're doing.

(I talked to Joe on chat for clarity.)  In modern TLS, the certificate is
used only for authentication, and Diffie–Hellman is used for key
exchange:

https://en.wikipedia.org/wiki/Diffie%E2%80%93Hellman_key_exchange

Right, and the key that's figured out for each connection is at least
specific to the server AND client keys/certificates, thus meaning that
they're changed at least as frequently as those change (and clients end
up creating ones on the fly randomly if they don't have one, iirc).

So, the question is whether you can pass so much data in TLS that using
the same key for the entire session is a security issue.  TLS originally
had key renegotiation, but that was removed in TLS 1.3:

https://www.cloudinsidr.com/content/known-attack-vectors-against-tls-implementation-vulnerabilities/
To mitigate these types of attacks, TLS 1.3 disallows renegotiation.

It was removed due to attacks targeting the renegotiation, not because
doing re-keying by itself was a bad idea, or because using multiple keys
was seen as a bad idea.

Of course, a database is going to process even more data so if the
amount of data encrypted is a problem, we might have a problem too in
using a single key.  This is not related to whether we use one key for
the entire cluster or multiple keys per tablespace --- the problem is
the same.  I guess we could create 1024 keys and use the bottom bits of
the block number to decide what key to use.  However, that still only
pushes the goalposts farther.

All of this is about pushing the goalposts farther away, as I see it.
There's going to be trade-offs here and there isn't going to be any "one
right answer" when it comes to this space.  That's why I'm inclined to
argue that we should try to come up with a relatively *good* solution
that doesn't create a huge amount of work for us, and then build on
that.  To that end, leveraging metadata that we already have outside of
the catalogs (databases, tablespaces, potentially other information that
we store, essentially, in the filesystem metadata already) to decide on
what key to use, and how many we can support, strikes me as a good
initial target.

Anyway, I will to research the reasonable data size that can be secured
with a single key via AES.  I will look at how PGP encrypts large files
too.

This seems unlikely to lead to a definitive result, but it would be
interesting to hear if there have been studies around that and what
their conclusions were.

When it comes to concerns about autovacuum or other system processes,
those don't have any direct user connections or interactions, so having
them be more privileged and having access to more is reasonable.

I think Bruce's proposal was to minimize the time the key is "unlocked"
in memory by only unlocking them when the user connects and supplies

Re: PGOPTIONS="-fh" make check gets stuck since Postgres 11

2019-07-08 Thread Tom Lane

Michael Paquier  writes:
> Well, another thing I'd like to think about is if there is any point
> to keep regressplans.sh and the relevant options in postgres at this
> stage.  From the top of the file one can read that:

The point of regressplans.sh is to see if anything goes seriously
wrong when forcing non-default plan choices --- seriously wrong being
defined as crashes or semantically wrong answers.  It's not expected
that the regression tests will automatically pass when you do that,
because of their dependencies on output row ordering, not to mention
all the EXPLAINs.  I'm not for removing it --- the fact that its
results require manual evaluation doesn't make it useless.

Having said that, join_hash.sql in particular seems to have zero
value if it's not testing hash joins, so I think it'd be reasonable
for it to override a global enable_hashjoin = off setting.  None of
the other regression test scripts seem to take nearly as much of a
performance hit from globally forcing poor plans.

regards, tom lane

Re: tableam: abstracting relation sizing code

2019-07-08 Thread Robert Haas

On Wed, Jun 12, 2019 at 9:14 AM Robert Haas  wrote:
> On Tue, Jun 11, 2019 at 7:17 PM Andres Freund  wrote:
> > Just to understand: What version are you targeting? It seems pretty
> > clearly v13 material to me?
>
> My current plan is to commit this to v13 as soon as the tree opens.

Committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Ltree syntax improvement

2019-07-08 Thread Dmitry Belyavsky

Dear Thomas,

Thank you for your proofreading!

Please find the updated patch attached. It also contains the missing
escaping.

On Mon, Jul 8, 2019 at 10:39 AM Thomas Munro  wrote:

> On Wed, Apr 17, 2019 at 5:29 AM Dmitry Belyavsky 
> wrote:
> > I've applied your patch.
> > From my point of view, there is no major difference between case and
> chain if here.
> > Neither case nor ifs allow extracting the common code to separate
> function - just because there seem to be no identical pieces of code.
>
> Hi Dmitry,
>
> The documentation doesn't build[1], due to invalid XML.  Since I'm
> here, here is some proof-reading of the English in the documentation:
>
>
> -   A label is a sequence of alphanumeric characters
> -   and underscores (for example, in C locale the characters
> -   A-Za-z0-9_ are allowed).  Labels must be less
> than 256 bytes
> -   long.
> +   A label is a sequence of characters. Labels
> must be
> +   less than 256 symbols long. Label may contain any character
> supported by Postgres
>
> "fewer than 256 characters in length", and
> "PostgreSQL"
>
> +   except \0. If label contains spaces, dots,
> lquery modifiers,
>
> "spaces, dots or lquery modifiers,"
>
> +   they may be escaped. Escaping can be done
> either by preceeding
> +   backslash (\\) symbol, or by wrapping the label
> in whole in double
> +   quotes ("). Initial and final unescaped
> whitespace is stripped.
>
> "Escaping can be done with either a preceding backslash [...] or by
> wrapping the whole label in double quotes [...]."
>
>
>
> +During converting text into internal representations, wrapping
> double quotes
>
> "During conversion to internal representation, "
>
> +and escaping backslashes are removed. During converting internal
> +representations into text, if the label does not contain any special
>
> "During conversion from internal representation to text, "
>
> +symbols, it is printed as is. Otherwise, it is wrapped in quotes and,
> if
> +there are internal quotes, they are escaped with backslash. The
> list of special
>
> "escaped with backslashes."
>
> +  
> +Examples: 42, "\\42",
> +\\4\\2,  42  and  "42"
> + will have the similar internal representation and, being
>
> "will all have the same internal representation and,"
>
> +converted from internal representation, will become
> 42.
> +Literal abc def will turn into "abc
> +def".
>
>
> [1] https://travis-ci.org/postgresql-cfbot/postgresql/builds/71856
>
> --
> Thomas Munro
> https://enterprisedb.com
>


-- 
SY, Dmitry Belyavsky
diff --git a/contrib/ltree/expected/ltree.out b/contrib/ltree/expected/ltree.out
index 8226930905..5f45726229 100644
--- a/contrib/ltree/expected/ltree.out
+++ b/contrib/ltree/expected/ltree.out
@@ -1,4 +1,5 @@
 CREATE EXTENSION ltree;
+SET standard_conforming_strings=on;
 -- Check whether any of our opclasses fail amvalidate
 SELECT amname, opcname
 FROM pg_opclass opc LEFT JOIN pg_am am ON am.oid = opcmethod
@@ -7679,3 +7680,1587 @@ SELECT count(*) FROM _ltreetest WHERE t ? '{23.*.1,23.*.2}' ;
 15
 (1 row)
 
+-- Extended syntax, escaping, quoting etc
+-- success
+SELECT E'\\.'::ltree;
+ ltree 
+---
+ "."
+(1 row)
+
+SELECT E'\\ '::ltree;
+ ltree 
+---
+ " "
+(1 row)
+
+SELECT E''::ltree;
+ ltree 
+---
+ "\"
+(1 row)
+
+SELECT E'\\a'::ltree;
+ ltree 
+---
+ a
+(1 row)
+
+SELECT E'\\n'::ltree;
+ ltree 
+---
+ n
+(1 row)
+
+SELECT E'x'::ltree;
+ ltree 
+---
+ "x\"
+(1 row)
+
+SELECT E'x\\ '::ltree;
+ ltree 
+---
+ "x "
+(1 row)
+
+SELECT E'x\\.'::ltree;
+ ltree 
+---
+ "x."
+(1 row)
+
+SELECT E'x\\a'::ltree;
+ ltree 
+---
+ xa
+(1 row)
+
+SELECT E'x\\n'::ltree;
+ ltree 
+---
+ xn
+(1 row)
+
+SELECT 'a b.с d'::ltree;
+ltree
+-
+ "a b"."с d"
+(1 row)
+
+SELECT ' e . f '::ltree;
+ ltree 
+---
+ e.f
+(1 row)
+
+SELECT ' '::ltree;
+ ltree 
+---
+ 
+(1 row)
+
+SELECT E'\\ g  . h\\ '::ltree;
+   ltree   
+---
+ " g"."h "
+(1 row)
+
+SELECT E'\\ g'::ltree;
+ ltree 
+---
+ " g"
+(1 row)
+
+SELECT E' h\\ '::ltree;
+ ltree 
+---
+ "h "
+(1 row)
+
+SELECT '" g  "." h "'::ltree;
+ltree 
+--
+ " g  "." h "
+(1 row)
+
+SELECT '" g  " '::ltree;
+ ltree  
+
+ " g  "
+(1 row)
+
+SELECT '" g  "   ." h "  '::ltree;
+ltree 
+--
+ " g  "." h "
+(1 row)
+
+SELECT nlevel(E'Bottom\\.Test'::ltree);
+ nlevel 
+
+  1
+(1 row)
+
+SELECT subpath(E'Bottom\\.'::ltree, 0, 1);
+  subpath  
+---
+ "Bottom."
+(1 row)
+
+SELECT subpath(E'a\\.b', 0, 1);
+ subpath 
+-
+ "a.b"
+(1 row)
+
+SELECT subpath(E'a\\..b', 1, 1);
+ subpath 
+-
+ b
+(1 row)
+
+SELECT subpath(E'a\\..\\b', 1, 1);
+ subpath 
+-
+ b
+(1 row)
+
+SELECT subpath(E'a b.с d'::ltree, 1, 1);
+ subpath 
+-
+ "с d"
+(1 row)
+
+SELECT(
+'01234567890123456789012345678901234567890123456789' ||
+'01234567890123456789012345678901234567890123456789' ||

Re: Add parallelism and glibc dependent only options to reindexdb

2019-07-08 Thread Julien Rouhaud

On Mon, Jul 8, 2019 at 9:57 AM Michael Paquier  wrote:
>
> On Fri, Jul 05, 2019 at 07:25:41PM +0200, Julien Rouhaud wrote:
> > On Fri, Jul 5, 2019 at 6:16 PM Peter Eisentraut 
> >  wrote:
> >> Isn't that also the case for your proposal?  We are not going to release
> >> a new reindexdb before a new REINDEX.
> >
> > Sure, but my point was that once the new reindexdb is released (or if
> > you're so desperate, using a nightly build or compiling your own), it
> > can be used against any previous major version.  There is probably a
> > large fraction of users who don't perform a postgres upgrade when they
> > upgrade their OS, so that's IMHO also something to consider.
>
> I think that we need to think long-term here and be confident in the
> fact we will still see breakages with collations and glibc, using a
> solution that we think is the right API.  Peter's idea to make the
> backend-aware command of the filtering is cool.  On top of that, there
> is no need to add any conflict logic in reindexdb and we can live with
> restricting --jobs support for non-index objects.

Don't get me wrong, I do agree that implementing filtering in the
backend is a better design.  What's bothering me is that I also agree
that there will be more glibc breakage, and if that happens within a
few years, a lot of people will still be using pg12- version, and they
still won't have an efficient way to rebuild their indexes.  Now, it'd
be easy to publish an external tools that does a simple
parallel-and-glic-filtering reindex tool that will serve that purpose
for the few years it'll be needed, so everyone can be happy.

For now, I'll resubmit the parallel patch using per-table only
approach, and will submit the filtering in the backend using a new
REINDEX option in a different thread.

Re: Declared but no defined functions

2019-07-08 Thread Ashwin Agrawal

On Sat, Jul 6, 2019 at 4:32 PM Masahiko Sawada 
wrote:

> Indeed. I've tried to search again with the following script and got
> more such functions.
>
> for func in `git ls-files | egrep "\w+\.h$" | xargs cat | egrep -v
> "(^typedef)|(DECLARE)|(BKI)" | egrep "^(extern )*[\_0-9A-Za-z]+
> [\_\*0-9a-zA-Z]+ ?\(.+\);$" | sed -e "s/\(^extern \)*[\_0-9A-Za-z]\+
> \([\_0-9A-Za-z\*]\+\) \{0,1\}(.*);$/\2(/g" | sed -e "s/\*//g"`
> do
> if [ "`git grep "$func" -- "*.c" | wc -l`" -lt 1 ];then
> echo $func
> fi
> done
>
>
Do we wish to make this a tool and have it in src/tools, either as part of
find_static tool after renaming that one to more generic name or
independent script.

Re: progress report for ANALYZE

2019-07-08 Thread Julien Rouhaud

On Mon, Jul 8, 2019 at 8:44 PM Robert Haas  wrote:
>
> On Mon, Jul 8, 2019 at 2:18 PM Alvaro Herrera  
> wrote:
> > Yeah, I got the impression that that was determined to be the desirable
> > behavior, so I made it do that, but I'm not really happy about it
> > either.  We're not too late to change the CREATE INDEX behavior, but
> > let's discuss what is it that we want.
>
> I don't think I intended to make any such determination -- which
> commit do you think established this as the canonical behavior?
>
> I propose that once a field is set, we should leave it set until the end.

+1

Note that this patch is already behaving like that if the table only
contains dead rows.

Re: progress report for ANALYZE

2019-07-08 Thread Robert Haas

On Mon, Jul 8, 2019 at 2:18 PM Alvaro Herrera  wrote:
> Yeah, I got the impression that that was determined to be the desirable
> behavior, so I made it do that, but I'm not really happy about it
> either.  We're not too late to change the CREATE INDEX behavior, but
> let's discuss what is it that we want.

I don't think I intended to make any such determination -- which
commit do you think established this as the canonical behavior?

I propose that once a field is set, we should leave it set until the end.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2019-07-08 Thread Stephen Frost

Greetings,

* Bruce Momjian (br...@momjian.us) wrote:
> On Mon, Jul  8, 2019 at 11:47:33AM -0400, Stephen Frost wrote:
> > * Bruce Momjian (br...@momjian.us) wrote:
> > > On Mon, Jul  8, 2019 at 11:18:01AM -0400, Joe Conway wrote:
> > > > On 7/8/19 10:19 AM, Bruce Momjian wrote:
> > > > > When people are asking for multiple keys (not just for key rotation),
> > > > > they are asking to have multiple keys that can be supplied by users 
> > > > > only
> > > > > when they need to access the data.  Yes, the keys are always in the
> > > > > datbase, but the feature request is that they are only unlocked when 
> > > > > the
> > > > > user needs to access the data.  Obviously, that will not work for
> > > > > autovacuum when the encryption is at the block level.
> > > > 
> > > > > If the key is always unlocked, there is questionable security value of
> > > > > having multiple keys, beyond key rotation.
> > > > 
> > > > That is not true. Having multiple keys also allows you to reduce the
> > > > amount of data encrypted with a single key, which is desirable because:
> > > > 
> > > > 1. It makes cryptanalysis more difficult
> > > > 2. Puts less data at risk if someone gets "lucky" in doing brute force
> > > 
> > > What systems use multiple keys like that?  I know of no website that
> > > does that.  Your arguments seem hypothetical.  What is your goal here?
> > 
> > Not sure what the reference to 'website' is here, but one doesn't get
> > certificates for TLS/SSL usage that aren't time-bounded, and when it
> > comes to the actual on-the-wire encryption that's used, that's a
> > symmetric key that's generated on-the-fly for every connection.
> > 
> > Wouldn't the fact that they generate a different key for every
> > connection be a pretty clear indication that it's a good idea to use
> > multiple keys and not use the same key over and over..?
> > 
> > Of course, we can discuss if what websites do with over-the-wire
> > encryption is sensible to compare to what we want to do in PG for
> > data-at-rest, but then we shouldn't be talking about what websites do,
> > it'd make more sense to look at other data-at-rest encryption systems
> > and consider what they're doing.
> 
> (I talked to Joe on chat for clarity.)  In modern TLS, the certificate is
> used only for authentication, and Diffie–Hellman is used for key
> exchange:
> 
>   https://en.wikipedia.org/wiki/Diffie%E2%80%93Hellman_key_exchange

Right, and the key that's figured out for each connection is at least
specific to the server AND client keys/certificates, thus meaning that
they're changed at least as frequently as those change (and clients end
up creating ones on the fly randomly if they don't have one, iirc).

> So, the question is whether you can pass so much data in TLS that using
> the same key for the entire session is a security issue.  TLS originally
> had key renegotiation, but that was removed in TLS 1.3:
> 
>   
> https://www.cloudinsidr.com/content/known-attack-vectors-against-tls-implementation-vulnerabilities/
>   To mitigate these types of attacks, TLS 1.3 disallows renegotiation.

It was removed due to attacks targeting the renegotiation, not because
doing re-keying by itself was a bad idea, or because using multiple keys
was seen as a bad idea.

> Of course, a database is going to process even more data so if the
> amount of data encrypted is a problem, we might have a problem too in
> using a single key.  This is not related to whether we use one key for
> the entire cluster or multiple keys per tablespace --- the problem is
> the same.  I guess we could create 1024 keys and use the bottom bits of
> the block number to decide what key to use.  However, that still only
> pushes the goalposts farther.

All of this is about pushing the goalposts farther away, as I see it.
There's going to be trade-offs here and there isn't going to be any "one
right answer" when it comes to this space.  That's why I'm inclined to
argue that we should try to come up with a relatively *good* solution
that doesn't create a huge amount of work for us, and then build on
that.  To that end, leveraging metadata that we already have outside of
the catalogs (databases, tablespaces, potentially other information that
we store, essentially, in the filesystem metadata already) to decide on
what key to use, and how many we can support, strikes me as a good
initial target.

> Anyway, I will to research the reasonable data size that can be secured
> with a single key via AES.  I will look at how PGP encrypts large files
> too.

This seems unlikely to lead to a definitive result, but it would be
interesting to hear if there have been studies around that and what
their conclusions were.

When it comes to concerns about autovacuum or other system processes,
those don't have any direct user connections or interactions, so having
them be more privileged and having access to more is reasonable.

Ideally, all of this would leverage a vaulting system or other mechanism

Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2019-07-08 Thread Tomas Vondra

On Mon, Jul 08, 2019 at 11:25:10AM -0400, Bruce Momjian wrote:

On Mon, Jul  8, 2019 at 11:18:01AM -0400, Joe Conway wrote:

On 7/8/19 10:19 AM, Bruce Momjian wrote:
> When people are asking for multiple keys (not just for key rotation),
> they are asking to have multiple keys that can be supplied by users only
> when they need to access the data.  Yes, the keys are always in the
> datbase, but the feature request is that they are only unlocked when the
> user needs to access the data.  Obviously, that will not work for
> autovacuum when the encryption is at the block level.

> If the key is always unlocked, there is questionable security value of
> having multiple keys, beyond key rotation.

That is not true. Having multiple keys also allows you to reduce the
amount of data encrypted with a single key, which is desirable because:

1. It makes cryptanalysis more difficult
2. Puts less data at risk if someone gets "lucky" in doing brute force

What systems use multiple keys like that?  I know of no website that
does that.  Your arguments seem hypothetical.  What is your goal here?

I might ask the same question about databases - which databases use an
encryption scheme where the database does not have access to the keys?

Actually, I've already asked this question before ...

The databases I'm familiar with do store all the keys in a vault that's
unlocked during startup, and then users may get keys from it (including
maintenance processes). We could still control access to those keys in
various ways (ACL or whatever), of course.

BTW how do you know this is what users want? Maybe they do, but then
again - maybe they just see it as magic and don't realize the extra
complexity (not just at the database level). In my experience users
generally want more abstract things, like "Ensure data privacy in case
media theft," or "protection against evil DBA".

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Multivariate MCV list vs. statistics target

2019-07-08 Thread Tomas Vondra


On Sun, Jul 07, 2019 at 12:02:38AM +0200, Tomas Vondra wrote:

Hi,

apparently v1 of the ALTER STATISTICS patch was a bit confused about
length of the VacAttrStats array passed to statext_compute_stattarget,
causing segfaults. Attached v2 patch fixes that, and it also makes sure
we print warnings about ignored statistics only once.



v3 of the patch, adding pg_dump support - it works just like when you
tweak statistics target for a column, for example. When the value stored
in the catalog is not -1, pg_dump emits a separate ALTER STATISTICS
command setting it (for the already created statistics object).

I've considered making it part of CREATE STATISTICS itself, but it seems
a bit cumbersome and we don't do it for columns either. I'm not against
adding it in the future, but at this point I don't see a need.

At this point I'm not aware of any missing or broken pieces, so I'd
welcome feedback.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/doc/src/sgml/ref/alter_statistics.sgml 
b/doc/src/sgml/ref/alter_statistics.sgml
index 58c7ed020d..987f9dbc6f 100644
--- a/doc/src/sgml/ref/alter_statistics.sgml
+++ b/doc/src/sgml/ref/alter_statistics.sgml
@@ -26,6 +26,7 @@ PostgreSQL documentation
 ALTER STATISTICS name OWNER TO { 
new_owner | CURRENT_USER | 
SESSION_USER }
 ALTER STATISTICS name RENAME TO 
new_name
 ALTER STATISTICS name SET SCHEMA 
new_schema
+ALTER STATISTICS name SET 
STATISTICS new_target
 
  
 
@@ -93,6 +94,22 @@ ALTER STATISTICS name SET SCHEMA 
  
 
+ 
+  new_target
+  
+   
+The statistic-gathering target for this statistic object for subsequent
+ operations.
+The target can be set in the range 0 to 1; alternatively, set it
+to -1 to revert to using the system default statistics
+target ().
+For more information on the use of statistics by the
+PostgreSQL query planner, refer to
+.
+   
+  
+ 
+
 

   
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index d7004e5313..caa4fca99d 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -490,6 +490,19 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
}
}
 
+   /*
+* Look at extended statistic objects too, because those may define 
their
+* own statistic target. So we need to sample enough rows and then build
+* the statistics with enough detail.
+*/
+   {
+   int nrows = ComputeExtStatisticsTarget(onerel,
+   
   attr_cnt, vacattrstats);
+
+   if (targrows < nrows)
+   targrows = nrows;
+   }
+
/*
 * Acquire the sample rows
 */
diff --git a/src/backend/commands/statscmds.c b/src/backend/commands/statscmds.c
index cf406f6f96..356b6096ac 100644
--- a/src/backend/commands/statscmds.c
+++ b/src/backend/commands/statscmds.c
@@ -14,6 +14,7 @@
  */
 #include "postgres.h"
 
+#include "access/heapam.h"
 #include "access/relation.h"
 #include "access/relscan.h"
 #include "access/table.h"
@@ -21,6 +22,7 @@
 #include "catalog/dependency.h"
 #include "catalog/indexing.h"
 #include "catalog/namespace.h"
+#include "catalog/objectaccess.h"
 #include "catalog/pg_namespace.h"
 #include "catalog/pg_statistic_ext.h"
 #include "catalog/pg_statistic_ext_data.h"
@@ -29,6 +31,7 @@
 #include "miscadmin.h"
 #include "statistics/statistics.h"
 #include "utils/builtins.h"
+#include "utils/fmgroids.h"
 #include "utils/inval.h"
 #include "utils/memutils.h"
 #include "utils/rel.h"
@@ -336,6 +339,7 @@ CreateStatistics(CreateStatsStmt *stmt)
values[Anum_pg_statistic_ext_stxrelid - 1] = ObjectIdGetDatum(relid);
values[Anum_pg_statistic_ext_stxname - 1] = NameGetDatum();
values[Anum_pg_statistic_ext_stxnamespace - 1] = 
ObjectIdGetDatum(namespaceId);
+   values[Anum_pg_statistic_ext_stxstattarget - 1] = Int32GetDatum(-1);
values[Anum_pg_statistic_ext_stxowner - 1] = ObjectIdGetDatum(stxowner);
values[Anum_pg_statistic_ext_stxkeys - 1] = PointerGetDatum(stxkeys);
values[Anum_pg_statistic_ext_stxkind - 1] = PointerGetDatum(stxkind);
@@ -414,6 +418,85 @@ CreateStatistics(CreateStatsStmt *stmt)
return myself;
 }
 
+
+/*
+ * ALTER STATISTICS
+ */
+ObjectAddress
+AlterStatistics(AlterStatsStmt *stmt)
+{
+   Relationrel;
+   Oid stxoid;
+   HeapTuple   oldtup;
+   HeapTuple   newtup;
+   Datum   repl_val[Natts_pg_statistic_ext];
+   boolrepl_null[Natts_pg_statistic_ext];
+   boolrepl_repl[Natts_pg_statistic_ext];
+   ObjectAddress   address;
+   int newtarget = stmt->stxstattarget;
+
+   /* Limit

Re: progress report for ANALYZE

2019-07-08 Thread Alvaro Herrera

On 2019-Jul-08, Robert Haas wrote:

> On Mon, Jul 8, 2019 at 5:29 AM Tatsuro Yamada
>  wrote:
> > 17520|13599|postgres|16387|f|16387|scanning table|4425|4425
> > 17520|13599|postgres|16387|f|16387|analyzing sample|0|0
> > 17520|13599|postgres|16387|f|16387||0|0  <-- Is it Okay??
> 
> Why do we zero out the block numbers when we switch phases?  The
> CREATE INDEX progress reporting patch does that kind of thing too, and
> it seems like poor behavior to me.

Yeah, I got the impression that that was determined to be the desirable
behavior, so I made it do that, but I'm not really happy about it
either.  We're not too late to change the CREATE INDEX behavior, but
let's discuss what is it that we want.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: progress report for ANALYZE

2019-07-08 Thread Robert Haas

On Mon, Jul 8, 2019 at 5:29 AM Tatsuro Yamada
 wrote:
> 17520|13599|postgres|16387|f|16387|scanning table|4425|4425
> 17520|13599|postgres|16387|f|16387|analyzing sample|0|0
> 17520|13599|postgres|16387|f|16387||0|0  <-- Is it Okay??

Why do we zero out the block numbers when we switch phases?  The
CREATE INDEX progress reporting patch does that kind of thing too, and
it seems like poor behavior to me.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [PATCH] kNN for btree

2019-07-08 Thread Alexander Korotkov

On Mon, Jul 1, 2019 at 8:47 PM Nikita Glukhov  wrote:
> On 01.07.2019 13:41, Thomas Munro wrote:
> > On Tue, Mar 26, 2019 at 4:30 AM Nikita Glukhov  
> > wrote:
>  Fixed two bugs in patches 3 and 10 (see below).
>  Patch 3 was extracted from the main patch 5 (patch 4 in v9).
> >>> This patch no longer applies so marking Waiting on Author.
> >> Attached 11th version of the patches rebased onto current master.
> > Hi Nikita,
> >
> > Since a new Commitfest is here and this doesn't apply, could we please
> > have a fresh rebase?
>
> Attached 12th version of the patches rebased onto the current master.

I have more thoughts about planner/executor infrastructure.  It
appears that supporting "ORDER BY col1, col2 <-> val" is too complex
for initial version of patch.  Handling of "ORDER BY col" and "ORDER
BY col <-> val" cases uses different code paths in optimizer.

So, I'd like to limit first version of this patch to support just most
simple "ORDER BY col <-> val" case.  We would probably be able to
enhance it even in v13 release cycle, but let's start with just simple
case.  I also think we can evade replacing amcanorderbyop flag with
method, but introduce just new boolean flag indicating knn supports
only first column.

--
Alexander Korotkov

Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: PGOPTIONS="-fh" make check gets stuck since Postgres 11

2019-07-08 Thread Thomas Munro

On Mon, Jul 8, 2019 at 5:53 PM Michael Paquier  wrote:
> I have begun playing with regressplans.sh which enforces various
> combinations of "-f s|i|n|m|h" when running the regression tests, and
> I have noticed that -fh can cause the server to become stuck in the
> test join_hash.sql with this query (not sure which portion of the SET
> LOCAL parameters are involved) :
> select count(*) from simple r join extremely_skewed s using (id);
>
> This does not happen with REL_10_STABLE where the test executes
> immediately, so we has visibly an issue caused by v11 here.

If you don't allow hash joins it makes this plan:

 Aggregate
   ->  Nested Loop
 Join Filter: (r.id = s.id)
 ->  Seq Scan on simple r
 ->  Materialize
   ->  Seq Scan on extremely_skewed s

"simple" has 20k rows and "extremely_skewed" has 20k rows but the
planner thinks it only has 2.  So this going to take O(n^2) time and n
is 20k.  Not sure what to do about that.  Maybe "join_hash" should be
skipped for the -h (= no hash joins please) case?

-- 
Thomas Munro
https://enterprisedb.com

Re: PGOPTIONS="-fh" make check gets stuck since Postgres 11

2019-07-08 Thread Michael Paquier

On Mon, Jul 08, 2019 at 06:49:44PM +1200, Thomas Munro wrote:
> "simple" has 20k rows and "extremely_skewed" has 20k rows but the
> planner thinks it only has 2.  So this going to take O(n^2) time and n
> is 20k.  Not sure what to do about that.  Maybe "join_hash" should be
> skipped for the -h (= no hash joins please) case?

Ah, thanks.  Yes that's going to take a while :)

Well, another thing I'd like to think about is if there is any point
to keep regressplans.sh and the relevant options in postgres at this
stage.  From the top of the file one can read that:
# This script runs the Postgres regression tests with all useful combinations
# of the backend options that disable various query plan types.  If the
# results are not all the same, it may indicate a bug in a particular
# plan type, or perhaps just a regression test whose results aren't fully
# determinate (eg, due to lack of an ORDER BY keyword).

However if you run any option with make check, then in all runs there
are tests failing.  We can improve the situation for some of them with
ORDER BY queries by looking at the query outputs, but some EXPLAIN
queries are sensitive to that, and the history around regressplans.sh
does not play in favor of it (some people really use it?).  If you
look at the latest commits for it, it has not been really touched in
19 years.

So I would be rather in favor in nuking it.
--
Michael

signature.asc
Description: PGP signature

Re: PGOPTIONS="-fh" make check gets stuck since Postgres 11

2019-07-08 Thread Tom Lane

Michael Paquier  writes:
> I have begun playing with regressplans.sh which enforces various
> combinations of "-f s|i|n|m|h" when running the regression tests, and
> I have noticed that -fh can cause the server to become stuck in the
> test join_hash.sql with this query (not sure which portion of the SET
> LOCAL parameters are involved) :
> select count(*) from simple r join extremely_skewed s using (id);

> This does not happen with REL_10_STABLE where the test executes
> immediately, so we has visibly an issue caused by v11 here.

Yeah, these test cases were added by fa330f9ad in v11.

What it looks like to me is that some of these test cases force
"enable_mergejoin = off", so if you also have enable_hashjoin off then
you are going to get a nestloop plan, and it's hardly surprising that
that takes an unreasonable amount of time on the rather large test
tables used in these tests.

Given the purposes of this test, I think it'd be reasonable to force
both enable_hashjoin = on and enable_mergejoin = off at the very top
of the join_hash script, or the corresponding place in join.sql in
v11.  Thomas, was there a specific reason for forcing enable_mergejoin
= off for only some of these tests?

regards, tom lane

Re: Broken defenses against dropping a partitioning column

2019-07-08 Thread Robert Haas

On Mon, Jul 8, 2019 at 11:08 AM Tom Lane  wrote:
> FWIW, I was imagining the action as being (1) detach all the child
> partitions, (2) make parent into a non-partitioned table, (3)
> drop the target column in each of these now-independent tables.
> No data movement.  Other than the need to acquire locks on all
> the tables, it shouldn't be particularly slow.

I see.  I think that would be reasonable, but like you say, it's not
clear that it's really what users would prefer.  You can think of a
partitioned table as a first-class object and the partitions as
subordinate implementation details; or you can think of the partitions
as the first-class objects and the partitioned table as the
second-rate glue that holds them together. It seems like users prefer
the former view.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Comment typo in tableam.h

2019-07-08 Thread Ashwin Agrawal

On Sat, Jul 6, 2019 at 12:05 AM Amit Kapila  wrote:

> On Tue, Jul 2, 2019 at 1:00 AM Ashwin Agrawal  wrote:
> > Please find attached v2 of patch 1 without objectionable comment change.
> v1 of patch 2 attaching here just for convenience, no modifications made to
> it.
> >
>
> 0001*
>   * See table_index_fetch_tuple's comment about what the difference between
> - * these functions is. This function is the correct to use outside of
> - * index entry->table tuple lookups.
> + * these functions is. This function is correct to use outside of index
> + * entry->table tuple lookups.
>
> How about if we write the last line of comment as "It is correct to
> use this function outside of index entry->table tuple lookups."?  I am
> not an expert on this matter, but I find the way I am suggesting
> easier to read.
>

I am fine with the way you have suggested.

Re: range_agg

2019-07-08 Thread Paul A Jungwirth

On Sat, Jul 6, 2019 at 12:13 PM Jeff Davis  wrote:
>
> On Fri, 2019-07-05 at 09:58 -0700, Paul A Jungwirth wrote:
> > user-defined range types. So how about I start on it and see how it
> > goes? I expect I can follow the existing code for range types pretty
> > closely, so maybe it won't be too hard.
>
> That would be great to at least take a look. If it starts to look like
> a bad idea, then we can re-evaluate and see if it's better to just
> return arrays.

I made some progress over the weekend. I don't have a patch yet but I
thought I'd ask for opinions on the approach I'm taking:

- A multirange type is an extra thing you get when you define a range
(just like how you get a tstzrange[]). Therefore
- I don't need separate commands to add/drop multirange types. You get
one when you define a range type, and if you drop a range type it gets
dropped automatically.
- I'm adding a new typtype for multiranges. ('m' in pg_type).
- I'm just adding a mltrngtypeid column to pg_range. I don't think I
need a new pg_multirange table.
- You can have a multirange[].
- Multirange in/out work just like arrays, e.g. '{"[1,3)", "[5,6)"}'
- I'll add an anymultirange pseudotype. When it's the return type of a
function that has an "anyrange" parameter, it will use the same base
element type. (So basically anymultirange : anyrange :: anyarray ::
anyelement.)
- You can cast from a multirange to an array. The individual ranges
are always sorted in the result array.
- You can cast from an array to a multirange but it will error if
there are overlaps (or not?). The array's ranges don't have to be
sorted but they will be after a "round trip".
- Interesting functions:
  - multirange_length
  - range_agg (range_union_agg if you like)
  - range_intersection_agg
- You can subscript a multirange like you do an array (? This could be
a function instead.)
- operators:
  - union (++) and intersection (*):
- We already have + for range union but it raises an error if
there is a gap, so ++ is the same but with no errors.
- r ++ r = mr (commutative, associative)
- mr ++ r = mr
- r ++ mr = mr
- r * r = r (unchanged)
- mr * r = r
- r * mr = r
- mr - r = mr
- r - mr = mr
- mr - mr = mr
  - comparison
- mr = mr
- mr @> x
- mr @> r
- mr @> mr
- x <@ mr
- r <@ mr
- mr <@ mr
- mr << mr (strictly left of)
- mr >> mr (strictly right of)
- mr &< mr (does not extend to the right of)
- mr &> mr (does not extend to the left of)
  - inverse operator?:
- the inverse of {"[1,2)"} would be {"[null, 1)", "[2, null)"}.
- not sure we want this or what the symbol should be. I don't like
-mr as an inverse because then mr - mr != mr ++ -mr.

Anything in there you think should be different?

Thanks,
Paul

Re: Switching PL/Python to Python 3 by default in PostgreSQL 12

2019-07-08 Thread Tom Lane

I wrote:
> But I could support having a way for individual installations to change
> what the synonym means locally.  Perhaps we could think about how to do
> that in conjunction with the project of getting rid of pg_pltemplate
> that's been kicked around before [1][2][3].

... actually, if we had that (i.e., languages fully defined by extensions
with no help from pg_pltemplate), wouldn't this be nearly trivial?
I'm imagining two extensions, one that defines plpythonu to call the
python2 code and one that defines it to call the python3 code, and you
install whichever you want.  They're separate from the extensions that
define plpython2u and plpython3u, so mix and match as you wish.

regards, tom lane

Re: errbacktrace

2019-07-08 Thread Alvaro Herrera

On 2019-Jul-08, Dmitry Dolgov wrote:

> > On Sat, Jun 29, 2019 at 7:41 AM Jaime Casanova 
> >  wrote:
> >
> > This is certainly a very useful thing. Sadly, it doesn't seem to compile 
> > when
> > trying to use libunwind.
> 
> Yeah, the same for me. To make it works I've restricted libunwind to local
> unwinding only:
> 
> #ifdef USE_LIBUNWIND
> #define UNW_LOCAL_ONLY
> #include 
> #endif

Ah, yes.  unwind's manpage says:

  Normally, libunwind supports both local and remote unwinding (the latter will
  be explained in the next section). However, if you tell libunwind that your
  program only needs local unwinding, then a special implementation can be
  selected which may run much faster than the generic implementation which
  supports both kinds of unwinding. To select this optimized version, simply
  define the macro UNW_LOCAL_ONLY before including the headerfile .

so I agree with unconditionally defining that symbol.

Nitpicking dept: I think in these tests:

+   if (!edata->backtrace &&
+   edata->funcname &&
+   backtrace_function[0] &&
+   strcmp(backtrace_function, edata->funcname) == 0)
+   set_backtrace(edata, 2);

we should test for backtrace_function[0] before edata->funcname, since
it seems more likely to be unset.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Switching PL/Python to Python 3 by default in PostgreSQL 12

2019-07-08 Thread Tom Lane

Peter Eisentraut  writes:
> I'm not trying to dismiss the importance of managing the Python
> transition.  But this issue has been known for many years, and the
> current setup is more or less in line with the wider world.  For
> example, the Debian release that came out over the weekend still ships
> with /usr/bin/python being Python 2.  So it is neither timely nor urgent
> to try to make some significant change about this in PostgreSQL 12 right
> now.  I would welcome patches for this for PostgreSQL 13.

I don't think it's been mentioned in this thread yet, but we *did*
recently install a configure-time preference for python3 over python2:

Author: Peter Eisentraut 
Branch: master Release: REL_12_BR [7291733ac] 2019-01-13 10:23:48 +0100
Branch: REL_11_STABLE Release: REL_11_2 [3d498c65a] 2019-01-13 10:24:21 +0100
Branch: REL_10_STABLE Release: REL_10_7 [cd1873160] 2019-01-13 10:25:23 +0100

configure: Update python search order

Some systems don't ship with "python" by default anymore, only
"python3" or "python2" or some combination, so include those in the
configure search.

Discussion: 
https://www.postgresql.org/message-id/flat/1457.1543184081%40sss.pgh.pa.us#c9cc1199338fd6a257589c6dcea6cf8d

configure's search order is now $PYTHON, python, python3, python2.
I think it will be a very long time, if ever, before there would be
a reason to consider changing that.  Both of the first two options
represent following a clear user preference.

So the only thing that's really at stake is when/whether we can make
"plpythonu" a synonym for "plpython3u" rather than "plpython2u".
As I said already, I think that's got to be a long way off, since the
whole problem here is that python3 isn't a drop-in replacement for
python2.  We're much more likely to break existing functions than do
anything useful by forcibly switching the synonym.

But I could support having a way for individual installations to change
what the synonym means locally.  Perhaps we could think about how to do
that in conjunction with the project of getting rid of pg_pltemplate
that's been kicked around before [1][2][3].

regards, tom lane

[1] 
https://www.postgresql.org/message-id/flat/763f2fe4-743f-d530-8831-20811edd3d6a%402ndquadrant.com

[2] https://www.postgresql.org/message-id/flat/7495.1524861244%40sss.pgh.pa.us

[3] 
https://www.postgresql.org/message-id/flat/5351890.TdMePpdHBD%40nb.usersys.redhat.com

Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2019-07-08 Thread Bruce Momjian

On Mon, Jul  8, 2019 at 11:47:33AM -0400, Stephen Frost wrote:
> Greetings,
> 
> * Bruce Momjian (br...@momjian.us) wrote:
> > On Mon, Jul  8, 2019 at 11:18:01AM -0400, Joe Conway wrote:
> > > On 7/8/19 10:19 AM, Bruce Momjian wrote:
> > > > When people are asking for multiple keys (not just for key rotation),
> > > > they are asking to have multiple keys that can be supplied by users only
> > > > when they need to access the data.  Yes, the keys are always in the
> > > > datbase, but the feature request is that they are only unlocked when the
> > > > user needs to access the data.  Obviously, that will not work for
> > > > autovacuum when the encryption is at the block level.
> > > 
> > > > If the key is always unlocked, there is questionable security value of
> > > > having multiple keys, beyond key rotation.
> > > 
> > > That is not true. Having multiple keys also allows you to reduce the
> > > amount of data encrypted with a single key, which is desirable because:
> > > 
> > > 1. It makes cryptanalysis more difficult
> > > 2. Puts less data at risk if someone gets "lucky" in doing brute force
> > 
> > What systems use multiple keys like that?  I know of no website that
> > does that.  Your arguments seem hypothetical.  What is your goal here?
> 
> Not sure what the reference to 'website' is here, but one doesn't get
> certificates for TLS/SSL usage that aren't time-bounded, and when it
> comes to the actual on-the-wire encryption that's used, that's a
> symmetric key that's generated on-the-fly for every connection.
> 
> Wouldn't the fact that they generate a different key for every
> connection be a pretty clear indication that it's a good idea to use
> multiple keys and not use the same key over and over..?
> 
> Of course, we can discuss if what websites do with over-the-wire
> encryption is sensible to compare to what we want to do in PG for
> data-at-rest, but then we shouldn't be talking about what websites do,
> it'd make more sense to look at other data-at-rest encryption systems
> and consider what they're doing.

(I talked to Joe on chat for clarity.)  In modern TLS, the certificate is
used only for authentication, and Diffie–Hellman is used for key
exchange:

https://en.wikipedia.org/wiki/Diffie%E2%80%93Hellman_key_exchange

So, the question is whether you can pass so much data in TLS that using
the same key for the entire session is a security issue.  TLS originally
had key renegotiation, but that was removed in TLS 1.3:


https://www.cloudinsidr.com/content/known-attack-vectors-against-tls-implementation-vulnerabilities/
To mitigate these types of attacks, TLS 1.3 disallows renegotiation.

Of course, a database is going to process even more data so if the
amount of data encrypted is a problem, we might have a problem too in
using a single key.  This is not related to whether we use one key for
the entire cluster or multiple keys per tablespace --- the problem is
the same.  I guess we could create 1024 keys and use the bottom bits of
the block number to decide what key to use.  However, that still only
pushes the goalposts farther.

Anyway, I will to research the reasonable data size that can be secured
with a single key via AES.  I will look at how PGP encrypts large files
too.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +

Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2019-07-08 Thread Joe Conway

On 7/8/19 11:56 AM, Peter Eisentraut wrote:
> On 2019-07-08 17:47, Stephen Frost wrote:
>> Of course, we can discuss if what websites do with over-the-wire
>> encryption is sensible to compare to what we want to do in PG for
>> data-at-rest, but then we shouldn't be talking about what websites do,
>> it'd make more sense to look at other data-at-rest encryption systems
>> and consider what they're doing.
> 
> So, how do encrypted file systems do it?  Are there any encrypted file
> systems in general use that allow encrypting only some files or
> encrypting different parts of the file system with different keys, or
> any of those other granular approaches being discussed?

Well it is fairly common, for good reason IMHO, to encrypt some mount
points and not others on a system. In my mind, and in practice to a
large extent, a postgres tablespace == a unique mount point.

There is a description here:

  https://wiki.archlinux.org/index.php/Disk_encryption

A pertinent quote:

After it has been derived, the master key is securely stored in memory
(e.g. in a kernel keyring), for as long as the encrypted block device or
folder is mounted.

It is usually not used for de/encrypting the disk data directly, though.
For example, in the case of stacked filesystem encryption, each file can
be automatically assigned its own encryption key. Whenever the file is
to be read/modified, this file key first needs to be decrypted using the
main key, before it can itself be used to de/encrypt the file contents:

   ╭╮
   ┊ master key ┊
   file on disk:   ╰┈┬┈┈╯
  ┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐│
  ╎╭───╮╎▼  ╭┈┈╮
  ╎│ encrypted file key│(decryption)━━━▶┊ file key ┊
  ╎╰───╯╎   ╰┬┈╯
  ╎┌───┐╎▼
┌┈┈┈┐
  ╎│ encrypted file│◀━(de/encryption)━━━▶┊ readable
file ┊
  ╎│ contents  │╎┊ contents
 ┊
  ╎└───┘╎
└┈┈┈┘
  └ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘

In a similar manner, a separate key (e.g. one per folder) may be used
for the encryption of file names in the case of stacked filesystem
encryption.

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development

signature.asc
Description: OpenPGP digital signature

Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2019-07-08 Thread James Coleman

On Mon, Jul 8, 2019 at 10:58 AM Tomas Vondra
 wrote:
>
> On Mon, Jul 08, 2019 at 10:32:18AM -0400, James Coleman wrote:
> >On Mon, Jul 8, 2019 at 9:59 AM Tomas Vondra
> > wrote:
> >>
> >> On Mon, Jul 08, 2019 at 09:22:39AM -0400, James Coleman wrote:
> >> >On Sun, Jul 7, 2019 at 5:02 PM Tomas Vondra
> >> > wrote:
> >> >> We're running query like this:
> >> >>
> >> >>   SELECT a, sum(b), count(*) FROM pagg_tab_ml GROUP BY a HAVING avg(b) 
> >> >> < 3 ORDER BY 1, 2, 3
> >> >>
> >> >> but we're trying to add the incremental sort *before* the aggregation,
> >> >> because the optimizer also considers group aggregate with a sorted
> >> >> input. And (a) is a prefix of (a,sum(b),count(b)) so we think we
> >> >> actually can do this, but clearly that's nonsense, because we can't
> >> >> possibly know the aggregates yet. Hence the error.
> >> >>
> >> >> If this is the actual issue, we need to ensure we actually can evaluate
> >> >> all the pathkeys. I don't know how to do that yet. I thought that maybe
> >> >> we should modify pathkeys_common_contained_in() to set presorted_keys to
> >> >> 0 in this case.
> >> >>
> >> >> But then I started wondering why we don't see this issue even for
> >> >> regular (non-incremental-sort) paths built in create_ordered_paths().
> >> >> How come we don't see these failures there? I've modified costing to
> >> >> make all incremental sort paths very cheap, and still nothing.
> >> >
> >> >I assume you mean you modified costing to make regular sort paths very 
> >> >cheap?
> >> >
> >>
> >> No, I mean costing of incremental sort paths, so that they end up being
> >> the cheapest ones. If some other path is cheaper, we won't see the error
> >> because it only happens when building plan from the cheapest path.
> >
> >Ah, I misunderstood as you trying to figure out a way to try to cause
> >the same problem with a regular sort.
> >
> >> >> So presumably there's a check elsewhere (either implicit or explicit),
> >> >> because create_ordered_paths() uses pathkeys_common_contained_in() and
> >> >> does not have the same issue.
> >> >
> >> >Given this comment in create_ordered_paths():
> >> >
> >> >  generate_gather_paths() will have already generated a simple Gather
> >> >  path for the best parallel path, if any, and the loop above will have
> >> >  considered sorting it.  Similarly, generate_gather_paths() will also
> >> >  have generated order-preserving Gather Merge plans which can be used
> >> >  without sorting if they happen to match the sort_pathkeys, and the loop
> >> >  above will have handled those as well.  However, there's one more
> >> >  possibility: it may make sense to sort the cheapest partial path
> >> >  according to the required output order and then use Gather Merge.
> >> >
> >> >my understanding is that generate_gather_paths() only considers paths
> >> >that already happen to be sorted (not explicit sorts), so I'm
> >> >wondering if it would make more sense for the incremental sort path
> >> >creation for this case to live alongside the explicit ordered path
> >> >creation in create_ordered_paths() rather than in
> >> >generate_gather_paths().
> >> >
> >>
> >> How would that solve the issue? Also, we're building a gather path, so
> >> I think generate_gather_paths() is the right place where to do it. And
> >> we're not changing the semantics of generate_gather_paths() - the result
> >> path should be sorted "correctly" with respect to sort_pathkeys.
> >
> >Does that imply what the explicit sort in create_ordered_paths() is in
> >the wrong spot?
> >
>
> I think those are essentially the right places where to do this sort of
> stuff. Maybe there's a better place, but I don't think those places are
> somehow wrong.
>
> >Or, to put it another way, do you think that both kinds of sorts
> >should be added in the same place? It seems confusing to me that
> >they'd be split between the two methods (unless I'm completely
> >misunderstanding how the two work).
> >
>
> The paths built in those two places were very different in one aspect:
>
> 1) generate_gather_paths only ever looked at pathkeys for the subpath, it
> never even looked at ordering expected by paths above it (or the query as
> a whole). Plain Gather ignores pathkeys entirely, Gather Merge only aims
> to maintain ordering of the different subpaths.
>
> 2) create_oredered_paths is meant to enforce ordering needed by upper
> parts of the plan - either by using a properly sorted path, or adding an
> explicit sort.
>
>
> We want to extend (1) to also look at ordering expected by the upper parts
> of the plan, and consider incremental sort if applicable. (2) already does
> that, and it already has the correct pathkeys to enforce.

I guess I'm still not following. If (2) is responsible (currently) for
adding an explicit sort, why wouldn't adding an incremental sort be an
example of that responsibility? The subpath that either a Sort or
IncrementalSort is being added on top of (to then feed into the
GatherMerge) is the

Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2019-07-08 Thread Peter Eisentraut

On 2019-07-08 17:47, Stephen Frost wrote:
> Of course, we can discuss if what websites do with over-the-wire
> encryption is sensible to compare to what we want to do in PG for
> data-at-rest, but then we shouldn't be talking about what websites do,
> it'd make more sense to look at other data-at-rest encryption systems
> and consider what they're doing.

So, how do encrypted file systems do it?  Are there any encrypted file
systems in general use that allow encrypting only some files or
encrypting different parts of the file system with different keys, or
any of those other granular approaches being discussed?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2019-07-08 Thread Stephen Frost

Greetings,

* Bruce Momjian (br...@momjian.us) wrote:
> On Mon, Jul  8, 2019 at 11:18:01AM -0400, Joe Conway wrote:
> > On 7/8/19 10:19 AM, Bruce Momjian wrote:
> > > When people are asking for multiple keys (not just for key rotation),
> > > they are asking to have multiple keys that can be supplied by users only
> > > when they need to access the data.  Yes, the keys are always in the
> > > datbase, but the feature request is that they are only unlocked when the
> > > user needs to access the data.  Obviously, that will not work for
> > > autovacuum when the encryption is at the block level.
> > 
> > > If the key is always unlocked, there is questionable security value of
> > > having multiple keys, beyond key rotation.
> > 
> > That is not true. Having multiple keys also allows you to reduce the
> > amount of data encrypted with a single key, which is desirable because:
> > 
> > 1. It makes cryptanalysis more difficult
> > 2. Puts less data at risk if someone gets "lucky" in doing brute force
> 
> What systems use multiple keys like that?  I know of no website that
> does that.  Your arguments seem hypothetical.  What is your goal here?

Not sure what the reference to 'website' is here, but one doesn't get
certificates for TLS/SSL usage that aren't time-bounded, and when it
comes to the actual on-the-wire encryption that's used, that's a
symmetric key that's generated on-the-fly for every connection.

Wouldn't the fact that they generate a different key for every
connection be a pretty clear indication that it's a good idea to use
multiple keys and not use the same key over and over..?

Of course, we can discuss if what websites do with over-the-wire
encryption is sensible to compare to what we want to do in PG for
data-at-rest, but then we shouldn't be talking about what websites do,
it'd make more sense to look at other data-at-rest encryption systems
and consider what they're doing.

Thanks,

Stephen

signature.asc
Description: PGP signature

Re: tableam vs. TOAST

2019-07-08 Thread Robert Haas

On Tue, Jun 25, 2019 at 2:19 AM Prabhat Sahu
 wrote:
> I have tested the TOAST patches(v3) with different storage options like(MAIN, 
> EXTERNAL, EXTENDED, etc.), and
> combinations of compression and out-of-line storage options.
> I have used a few dummy tables with various tuple count say 10k, 20k, 40k, 
> etc. with different column lengths.
> Used manual CHECKPOINT option with (checkpoint_timeout = 1d, max_wal_size = 
> 10GB) before the test to avoid performance fluctuations,
> and calculated the results as a median value of a few consecutive test 
> executions.

Thanks for testing.

> All the observation looks good to me,
> except for the "Test1" for SCC UPDATE with tuple count(10K/20K), for SCC 
> INSERT with tuple count(40K)  there was a slightly increse in time taken
> incase of "with patch" result. For a better observation, I also have ran the 
> same "Test 1" for higher tuple count(i.e. 80K), and it also looks fine.

Did you run each test just once?  How stable are the results?

> While testing few concurrent transactions I have below query:
> -- Concurrent transactions acquire a lock for TOAST option(ALTER TABLE .. SET 
> STORAGE .. MAIN/EXTERNAL/EXTENDED/ etc)
>
> -- Session 1:
> CREATE TABLE a (a_id text PRIMARY KEY);
> CREATE TABLE b (b_id text);
> INSERT INTO a VALUES ('a'), ('b');
> INSERT INTO b VALUES ('a'), ('b'), ('b');
>
> BEGIN;
> ALTER TABLE b ADD CONSTRAINT bfk FOREIGN KEY (b_id) REFERENCES a (a_id); 
> -- Not Acquiring any lock

For me, this acquires AccessShareLock and ShareRowExclusiveLock on the
target table.

rhaas=# select locktype, database, relation, pid, mode, granted from
pg_locks where relation = 'b'::regclass;
 locktype | database | relation |  pid  | mode  | granted
--+--+--+---+---+-
 relation |16384 |16872 | 93197 | AccessShareLock   | t
 relation |16384 |16872 | 93197 | ShareRowExclusiveLock | t
(2 rows)

I don't see what that has to do with the topic at hand, though.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: errbacktrace

2019-07-08 Thread Dmitry Dolgov

> On Sat, Jun 29, 2019 at 7:41 AM Jaime Casanova 
>  wrote:
>
> This is certainly a very useful thing. Sadly, it doesn't seem to compile when
> trying to use libunwind.

Yeah, the same for me. To make it works I've restricted libunwind to local
unwinding only:

#ifdef USE_LIBUNWIND
#define UNW_LOCAL_ONLY
#include 
#endif

And result looks pretty nice:

2019-07-08 17:24:08.406 CEST [31828] ERROR:  invalid input syntax for
type integer: "foobar" at character 12
2019-07-08 17:24:08.406 CEST [31828] BACKTRACE:  #0
pg_strtoint32+0x1d1 [0x55fa389bcbbe]
#1  int4in+0xd [0x55fa38976d7b]
#2  InputFunctionCall+0x6f [0x55fa38a488e9]
#3  OidInputFunctionCall+0x44 [0x55fa38a48b0d]
#4  stringTypeDatum+0x33 [0x55fa386e222e]
#5  coerce_type+0x26d [0x55fa386ca14d]
#6  coerce_to_target_type+0x79 [0x55fa386c9494]
#7  transformTypeCast+0xaa [0x55fa386d0042]
#8  transformExprRecurse+0x22f [0x55fa386cf650]
#9  transformExpr+0x1a [0x55fa386cf30a]
#10 transformTargetEntry+0x79 [0x55fa386e1131]
#11 transformTargetList+0x86 [0x55fa386e11ce]
#12 transformSelectStmt+0xa1 [0x55fa386a29c9]
#13 transformStmt+0x9d [0x55fa386a345a]
#14 transformOptionalSelectInto+0x94 [0x55fa386a3f49]
#15 transformTopLevelStmt+0x15 [0x55fa386a3f88]
#16 parse_analyze+0x4e [0x55fa386a3fef]
#17 pg_analyze_and_rewrite+0x3e [0x55fa3890cfa5]
#18 exec_simple_query+0x35b [0x55fa3890d5b5]
#19 PostgresMain+0x91f [0x55fa3890f7a8]
#20 BackendRun+0x1ac [0x55fa3887ed17]
#21 BackendStartup+0x15c [0x55fa38881ea1]
#22 ServerLoop+0x1e6 [0x55fa388821bb]
#23 PostmasterMain+0x1101 [0x55fa388835a1]
#24 main+0x21a [0x55fa387db1a9]
#25 __libc_start_main+0xe7 [0x7f3d1a607fa7]
#26 _start+0x2a [0x55fa3858e4ea]

Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2019-07-08 Thread Bruce Momjian

On Mon, Jul  8, 2019 at 11:18:01AM -0400, Joe Conway wrote:
> On 7/8/19 10:19 AM, Bruce Momjian wrote:
> > When people are asking for multiple keys (not just for key rotation),
> > they are asking to have multiple keys that can be supplied by users only
> > when they need to access the data.  Yes, the keys are always in the
> > datbase, but the feature request is that they are only unlocked when the
> > user needs to access the data.  Obviously, that will not work for
> > autovacuum when the encryption is at the block level.
> 
> > If the key is always unlocked, there is questionable security value of
> > having multiple keys, beyond key rotation.
> 
> That is not true. Having multiple keys also allows you to reduce the
> amount of data encrypted with a single key, which is desirable because:
> 
> 1. It makes cryptanalysis more difficult
> 2. Puts less data at risk if someone gets "lucky" in doing brute force

What systems use multiple keys like that?  I know of no website that
does that.  Your arguments seem hypothetical.  What is your goal here?

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +

Re: Add missing operator <->(box, point)

2019-07-08 Thread Alexander Korotkov

On Mon, Mar 11, 2019 at 2:49 AM Nikita Glukhov  wrote:
> 2. Add <-> to GiST box_ops.
>Extracted gist_box_distance_helper() common for gist_box_distance() and
>gist_bbox_distance().

For me it doesn't look worth having two distinct functions
gist_box_distance_helper() and gist_bbox_distance().  What about
having just one and leave responsibility for recheck flag to the
caller?

> 3. Add <-> to SP-GiST.
>Changed only catalog and tests.  Box case is already checked in
>spg_box_quad_leaf_consistent():
>  out->recheckDistances = distfnoid == F_DIST_POLYP;

So, it seems to be fix of oversight in 2a6368343ff4.  But assuming
fixing this requires catalog changes, we shouldn't backpatch this.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: Broken defenses against dropping a partitioning column

2019-07-08 Thread Tom Lane

Alvaro Herrera  writes:
> That said, I'm not sure I see the use case for an ALTER TABLE .. DROP
> COLUMN command that turns a partitioned table (with existing partitions
> containing data) into one non-partitioned table with all data minus the
> partitioning column(s).

Yeah, it'd be a lot of work for a dubious goal.

> This seems vaguely related to the issue of dropping foreign keys; see
> https://postgr.es/m/20190329152239.GA29258@alvherre.pgsql wherein I
> settled with a non-ideal solution to the problem of being unable to
> depend on something that did not cause the entire table to be dropped
> in certain cases.

That's an interesting analogy.  Re-reading that thread, what I said
in <29497.1554217...@sss.pgh.pa.us> seems pretty apropos to the
current problem:

>> FWIW, I think that the dependency mechanism is designed around the idea
>> that whether it's okay to drop a *database object* depends only on what
>> other *database objects* rely on it, and that you can always make a DROP
>> valid by dropping all the dependent objects.  That's not an unreasonable
>> design assumption, considering that the SQL standard embeds the same
>> assumption in its RESTRICT/CASCADE syntax.

I think that is probably a fatal objection to my idea of putting an error
check into RemoveAttributeById().  As an example, consider the possibility
that somebody makes a temporary type and then makes a permanent table with
a partitioning column of that type.  What shall we do at session exit?
Failing to remove the temp type is not an acceptable choice, because that
leaves us with a permanently broken temp schema (compare bug #15631 [1]).

Also, I don't believe we can make that work without order-of-operations
problems in cases comparable to the original bug in this thread [2].
One or the other order of the object OIDs is going to lead to the column
being visited for deletion before the whole table is, and again rejecting
the column deletion is not going to be an acceptable behavior.

So I think we're probably stuck with the approach of adding new internal
dependencies.  If we go that route, then our options for the released
branches are (1) do nothing, or (2) back-patch the code that adds such
dependencies, but without a catversion bump.  That would mean that only
tables created after the next minor releases would have protection against
this problem.  That's not ideal but maybe it's okay, considering that we
haven't seen actual field reports of trouble of this kind.

regards, tom lane

[1] 
https://www.postgresql.org/message-id/flat/15631-188663b383e1e697%40postgresql.org

[2] 
https://www.postgresql.org/message-id/flat/CA%2Bu7OA4JKCPFrdrAbOs7XBiCyD61XJxeNav4LefkSmBLQ-Vobg%40mail.gmail.com

Re: Broken defenses against dropping a partitioning column

2019-07-08 Thread Tom Lane

Robert Haas  writes:
> On Mon, Jul 8, 2019 at 11:02 AM Robert Haas  wrote:
>> On Mon, Jul 8, 2019 at 10:32 AM Alvaro Herrera  
>> wrote:
>>> That said, I'm not sure I see the use case for an ALTER TABLE .. DROP
>>> COLUMN command that turns a partitioned table (with existing partitions
>>> containing data) into one non-partitioned table with all data minus the
>>> partitioning column(s).

>> I think it would be useful to have "ALTER TABLE blah NOT PARTITIONED" but I

> ...hit send too soon, and also, I don't think anyone will be very
> happy if they get that behavior as a side effect of a DROP statement,
> mostly because it could take an extremely long time to execute.

FWIW, I was imagining the action as being (1) detach all the child
partitions, (2) make parent into a non-partitioned table, (3)
drop the target column in each of these now-independent tables.
No data movement.  Other than the need to acquire locks on all
the tables, it shouldn't be particularly slow.

But I'm still not volunteering to write it, because I'm not sure
anyone would want such a behavior.

regards, tom lane

Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2019-07-08 Thread Joe Conway

On 7/8/19 10:19 AM, Bruce Momjian wrote:
> When people are asking for multiple keys (not just for key rotation),
> they are asking to have multiple keys that can be supplied by users only
> when they need to access the data.  Yes, the keys are always in the
> datbase, but the feature request is that they are only unlocked when the
> user needs to access the data.  Obviously, that will not work for
> autovacuum when the encryption is at the block level.

> If the key is always unlocked, there is questionable security value of
> having multiple keys, beyond key rotation.

That is not true. Having multiple keys also allows you to reduce the
amount of data encrypted with a single key, which is desirable because:

1. It makes cryptanalysis more difficult
2. Puts less data at risk if someone gets "lucky" in doing brute force


Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature

1 2 >

1 - 100 of 141 matches

Mail list logo