Re: Key management with tests

2020-12-31 Thread Bruce Momjian
On Thu, Dec 31, 2020 at 11:50:47PM -0500, Bruce Momjian wrote:
> I have completed the key management patch with tests created by Stephen
> Frost.  Original patch by Masahiko Sawada.  It requires the hex
> reorganization patch first.  The key patch is now 2.1MB because of the
> tests, so attaching it here seems unwise:
> 
>   https://github.com/postgres/postgres/compare/master...bmomjian:hex.diff
>   https://github.com/postgres/postgres/compare/master...bmomjian:key.diff
> 
> I will add it to the commitfest.  I think we need to figure out how much
> of the tests we want to add.

I am getting regression test errors using OpenSSL 1.1.1d  10 Sep 2019
with zero-length input data (no -p), while Stephen is able for those
tests to pass.   This needs more research, plus I think higher-level
tests.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee





Safety/validity of resetting permissions by updating system tables

2020-12-31 Thread Isaac Morland
I have long maintained permissions in my databases by having a script which
assigns all the permissions. I have tended to start with invocations
something like this:

REVOKE ALL ON ALL TABLES IN SCHEMA ... FROM ...;

... where the appropriate schemas and roles are listed. This is intended to
ensure that the permissions actually in effect exactly match those in the
permissions script file. However, the commands I'm using don't guarantee a
true reset to default permissions. What I really want is to guarantee that
after running the permissions script I will have exactly the same
permissions whether I am starting from a freshly initialized database (from
script files containing table definitions etc.) or from an existing
database (potentially with obsolete grants or other history).

Is it safe and valid to reset to default permissions by doing
UPDATE pg_namespace/pg_class/pg_type/pg_proc
SET nspacl/relacl/typacl/proacl = NULL WHERE ... to accomplish this? Do I
need to take locks or inform some component that I have updated permissions?

And what do people think, conceptually, of the notion of adding a command
to do this without resorting to updating system tables directly?

Note: I don't use ALTER DEFAULT PRIVILEGES; my pg_default_acl is empty. So
for my immediate question default privileges can be ignored; but in the
context of adding a command for privilege resetting we would have to think
about how to handle default privileges.


Key management with tests

2020-12-31 Thread Bruce Momjian
I have completed the key management patch with tests created by Stephen
Frost.  Original patch by Masahiko Sawada.  It requires the hex
reorganization patch first.  The key patch is now 2.1MB because of the
tests, so attaching it here seems unwise:

https://github.com/postgres/postgres/compare/master...bmomjian:hex.diff
https://github.com/postgres/postgres/compare/master...bmomjian:key.diff

I will add it to the commitfest.  I think we need to figure out how much
of the tests we want to add.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee





RE: Disable WAL logging to speed up data loading

2020-12-31 Thread tsunakawa.ta...@fujitsu.com
From: Simon Riggs 
> Agreed, it is a footgun. -1 to commit the patch as-is.
> 
> The patch to avoid WAL is simple but it is dangerous for both the user
> and the PostgreSQL project.
> 
> In my experience, people will use this option and when it crashes and
> they lose their data, they will claim PostgreSQL broke and that they
> were not stupid enough to use this option. Data loss has always been
> the most serious error for PostgreSQL and our reputation for
> protecting data has been hard won; it can easily be lost in a moment
> of madness. Please consider how the headlines will read, bearing in
> mind past incidents and negative press. Yes, we did think of this
> feature already and rejected it.

Could you share the negative press that blames Postgres due to the user's 
misuse of some feature like fsync?


> If we ever did allow such an option, it must contain these things (IMHO):
> * the option should be called "unsafe" or "allows_data_loss", not just
> "none" (anything less than "minimal" must be insufficient or
> unsafe...)

One idea I proposed is "wal_level = unrecoverable" because it clearly states 
the bad consequence and Oracle also uses UNRECOVERABLE as a data loading option 
(and I found it intuitive.)  But others here commented that "none" would be OK. 
 I don't have a strong opinion on naming, as I think what's important is warn 
the user in the documentation.


> * the option must be set in the control file and be part of the same
> patch, so users cannot easily edit things to hide their unsafe usage

wal_level setting is stored in the control file as before.  If the server 
crashes while wal_level is set to none, the server refuses to start saying 
that's because wal_level is set to none, which is like MySQL.  So, the user 
cannot hide their misuse.


> * we must not support the option of going down to "unsafe" and then
> back up again. It must be a one-way transition from "unsafe" to a
> higher level, so if people want to use this for temp reporting servers
> or initial loading, great, but they can't use it as a quick speed-up
> for databases containing needs-to-be-safe data. Possibly the state
> change might be "unsafe" -> "needs_backup" -> "minimal"... or some
> other way to signal to backup.

I'm afraid I don't get a clear image.  Could you elaborate on that?  But 
anyway, I think that could be an overreaction and a prominent caution would 
suffice (like the one in this patch.)


Regards
Takayuki Tsunakawa




RE: Disable WAL logging to speed up data loading

2020-12-31 Thread tsunakawa.ta...@fujitsu.com
From: Michael Paquier 
> Something that has not been mentioned on this thread is that if you could also
> put pg_wal/ on a RAM disk.  That's similarly unsafe, of course, but it does 
> not
> require any extra upstream patching, and that should be really fast for the 
> case
> of this thread.  If you care about consistency, there are solutions like PMEM
> that we could look more into.

As noted in this thread, the customer is worried about and wants to avoid 
handling the possible high volume storage of WAL during the data loading.  
That's understandable.  In that regard, DRAM and PMEM are worse because their 
capacity is more limited than SSD.


Regards
Takayuki Tsunakawa






RE: Global snapshots

2020-12-31 Thread tsunakawa.ta...@fujitsu.com
Hello,


Fujii-san and I discussed how to move the scale-out development forward.  We 
are both worried that Clock-SI is (highly?) likely to infringe the said 
Microsoft's patent.  So we agreed we are going to investigate the Clock-SI and 
the patent, and if we have to conclude that we cannot embrace Clock-SI, we will 
explore other possibilities.

IMO, it seems that Clock-SI overlaps with the patent and we can't use it.  
First, looking back how to interpret the patent document, patent "claims" are 
what we should pay our greatest attention.  According to the following citation 
from the IP guide by Software Freedom Law Center (SFLC) [1], software infringes 
a patent if it implements everything of any claim, not all claims.


--
4.2 Patent Infringement
To prove that you5 infringe a patent, the patent holder must show that you 
make, use, offer to sell, or sell the invention as it is defined in at least 
one claim of the patent.

For software to infringe a patent, the software essentially must implement 
everything recited in one of the patent�fs claims. It is crucial to recognize 
that infringement is based directly on the claims of the patent, and not on 
what is stated or described in other parts of the patent document. 
--


And, Clock-SI implements at least claims 11 and 20 cited below.  It doesn't 
matter whether Clock-SI uses a physical clock or logical one.


--
11. A method comprising:
receiving information relating to a distributed database transaction operating 
on data in data stores associated with respective participating nodes 
associated with the distributed database transaction;
requesting commit time votes from the respective participating nodes, the 
commit time votes reflecting local clock values of the respective participating 
nodes;
receiving the commit time votes from the respective participating nodes in 
response to the requesting;
computing a global commit timestamp for the distributed database transaction 
based at least in part on the commit time votes, the global commit timestamp 
reflecting a maximum value of the commit time votes received from the 
respective participating nodes; and
synchronizing commitment of the distributed database transaction at the 
respective participating nodes to the global commit timestamp,
wherein at least the computing is performed by a computing device.

20. A method for managing a distributed database transaction, the method 
comprising:
receiving information relating to the distributed database transaction from a 
transaction coordinator associated with the distributed database transaction;
determining a commit time vote for the distributed database transaction based 
at least in part on a local clock;
communicating the commit time vote for the distributed database transaction to 
the transaction coordinator;
receiving a global commit timestamp from the transaction coordinator;
synchronizing commitment of the distributed database transaction to the global 
commit timestamp;
receiving a remote request from a requesting database node corresponding to the 
distributed database transaction;
creating a local transaction corresponding to the distributed database 
transaction;
compiling a list of database nodes involved in generating a result of the local 
transaction and access types utilized by respective database nodes in the list 
of database nodes; and
returning the list of database nodes and the access types to the requesting 
database node in response to the remote request,
wherein at least the compiling is performed by a computing device.
--


My question is that the above claims appear to cover somewhat broad range.  I 
wonder if other patents or unpatented technologies overlap with this kind of 
description.

Thoughts?


[1]
A Legal Issues Primer for Open Source and Free Software Projects
https://www.softwarefreedom.org/resources/2008/foss-primer.pdf

[2]
US8356007B2 - Distributed transaction management for database systems with 
multiversioning - Google Patents
https://patents.google.com/patent/US8356007


Regards
Takayuki Tsunakawa



Re: Tid scan improvements

2020-12-31 Thread David Fetter
On Sun, Dec 01, 2019 at 11:34:16AM +0900, Michael Paquier wrote:
> On Thu, Sep 05, 2019 at 01:06:56PM +1200, Edmund Horner wrote:
> > So, I think we need to either get some help from someone familiar with
> > heapam.c, or maybe shelve the patch.  It has been work in progress for
> > over a year now.
> 
> Okay, still nothing has happened after two months.  Once this is
> solved a new patch submission could be done.  For now I have marked
> the entry as returned with feedback.  

I dusted off the last version of the patch without implementing the
suggestions in this thread between here and there.

I think this is a capability worth having, as I was surprised when it
turned out that it didn't exist when I was looking to make an
improvement to pg_dump. My idea, which I'll get back to when this is
in, was to use special knowledge of heap AM tables to make it possible
to parallelize dumps of large tables by working separately on each
underlying file, of which there could be quite a few for a large one.

Will try to understand the suggestions upthread better and implement
same.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
>From 96046239014de8a7dec62e2f60b5210deb1bd32a Mon Sep 17 00:00:00 2001
From: David Fetter 
Date: Thu, 31 Dec 2020 16:42:07 -0800
Subject: [PATCH v10] first cut
To: hackers
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="2.29.2"

This is a multi-part message in MIME format.
--2.29.2
Content-Type: text/plain; charset=UTF-8; format=fixed
Content-Transfer-Encoding: 8bit


 create mode 100644 src/include/executor/nodeTidrangescan.h
 create mode 100644 src/backend/executor/nodeTidrangescan.c
 create mode 100644 src/test/regress/expected/tidrangescan.out
 create mode 100644 src/test/regress/sql/tidrangescan.sql


--2.29.2
Content-Type: text/x-patch; name="v10-0001-first-cut.patch"
Content-Transfer-Encoding: 8bit
Content-Disposition: attachment; filename="v10-0001-first-cut.patch"

diff --git src/include/access/tableam.h src/include/access/tableam.h
index 387eb34a61..5776f8ba6e 100644
--- src/include/access/tableam.h
+++ src/include/access/tableam.h
@@ -218,6 +218,15 @@ typedef struct TableAmRoutine
 bool set_params, bool allow_strat,
 bool allow_sync, bool allow_pagemode);
 
+	/*
+	 * Set the range of a scan.
+	 *
+	 * Optional callback: A table AM can implement this to enable TID range
+	 * scans.
+	 */
+	void		(*scan_setlimits) (TableScanDesc scan,
+   BlockNumber startBlk, BlockNumber numBlks);
+
 	/*
 	 * Return next tuple from `scan`, store in slot.
 	 */
@@ -875,6 +884,16 @@ table_rescan(TableScanDesc scan,
 	scan->rs_rd->rd_tableam->scan_rescan(scan, key, false, false, false, false);
 }
 
+/*
+ * Set the range of a scan.
+ */
+static inline void
+table_scan_setlimits(TableScanDesc scan,
+	 BlockNumber startBlk, BlockNumber numBlks)
+{
+	scan->rs_rd->rd_tableam->scan_setlimits(scan, startBlk, numBlks);
+}
+
 /*
  * Restart a relation scan after changing params.
  *
diff --git src/include/catalog/pg_operator.dat src/include/catalog/pg_operator.dat
index 9c6bf6c9d1..bb7193b9e7 100644
--- src/include/catalog/pg_operator.dat
+++ src/include/catalog/pg_operator.dat
@@ -237,15 +237,15 @@
   oprname => '<', oprleft => 'tid', oprright => 'tid', oprresult => 'bool',
   oprcom => '>(tid,tid)', oprnegate => '>=(tid,tid)', oprcode => 'tidlt',
   oprrest => 'scalarltsel', oprjoin => 'scalarltjoinsel' },
-{ oid => '2800', descr => 'greater than',
+{ oid => '2800', oid_symbol => 'TIDGreaterOperator', descr => 'greater than',
   oprname => '>', oprleft => 'tid', oprright => 'tid', oprresult => 'bool',
   oprcom => '<(tid,tid)', oprnegate => '<=(tid,tid)', oprcode => 'tidgt',
   oprrest => 'scalargtsel', oprjoin => 'scalargtjoinsel' },
-{ oid => '2801', descr => 'less than or equal',
+{ oid => '2801', oid_symbol => 'TIDLessEqOperator', descr => 'less than or equal',
   oprname => '<=', oprleft => 'tid', oprright => 'tid', oprresult => 'bool',
   oprcom => '>=(tid,tid)', oprnegate => '>(tid,tid)', oprcode => 'tidle',
   oprrest => 'scalarlesel', oprjoin => 'scalarlejoinsel' },
-{ oid => '2802', descr => 'greater than or equal',
+{ oid => '2802', oid_symbol => 'TIDGreaterEqOperator', descr => 'greater than or equal',
   oprname => '>=', oprleft => 'tid', oprright => 'tid', oprresult => 'bool',
   oprcom => '<=(tid,tid)', oprnegate => '<(tid,tid)', oprcode => 'tidge',
   oprrest => 'scalargesel', oprjoin => 'scalargejoinsel' },
diff --git src/include/executor/nodeTidrangescan.h src/include/executor/nodeTidrangescan.h
new file mode 100644
index 00..f0bbcc6a04
--- /dev/null
+++ src/include/executor/nodeTidrangescan.h
@@ -0,0 +1,24 @@
+/*-
+ *
+ * nodeTidrangescan.h
+ *
+ *
+ *
+ * Portions Copyright (c) 1996-2019, PostgreSQL Global Development 

[PATCH] Simple progress reporting for COPY command

2020-12-31 Thread Josef Šimánek
Hello,

finally I had some time to revisit patch and all comments from
https://www.postgresql.org/message-id/CAFp7QwqMGEi4OyyaLEK9DR0%2BE%2BoK3UtA4bEjDVCa4bNkwUY2PQ%40mail.gmail.com
and I have prepared simple version of COPY command progress reporting.

To keep the patch small as possible, I have introduced only a minimum
set of columns. It could be extended later if needed.

Columns are inspired by CREATE INDEX progress report system view.

pid - integer - PID of backend
datid - oid - OID of related database
datname - name - name of related database (this seems redundant, since
oid should be enough, but it is the same in CREATE INDEX)
relid - oid - oid of table related to COPY command, when not known
(for example when copying to file, it is 0)
bytes_processed - bigint - amount of bytes processed
bytes_total - bigint - file size in bytes if COPY FROM file (0 if not
COPY FROM file)
lines_processed - bigint - amount of tuples processed

example output of progress for common use case (import from CSV file):

first console:
yr=# COPY test FROM '/home/retro/test.csv' (FORMAT CSV);

second console:
yr=# SELECT * FROM pg_stat_progress_copy;
  pid   | datid | datname | relid | bytes_processed | bytes_total |
lines_processed
+---+-+---+-+-+-
 803148 | 16384 | yr  | 16394 |   998965248 |  177796 |
56730126
(1 row)

It is simple to get progress in percents for example by:

yr=# SELECT (bytes_processed/bytes_total::decimal)*100 FROM
pg_stat_progress_copy WHERE pid = 803148;
?column?
-
 50.04287948706048525800

^ ~50% of file processed already

I did some dead simple benchmarking as well. The difference is not
huge. Each command works with 100 millions of tuples. Times are in
seconds.

   test with progress   master (32d6287)   difference
 - --- -- 
  COPY table TO46.102 47.499   -1.397
  COPY query TO52.168 49.8222.346
  COPY table TO PROGRAM52.345 51.8820.463
  COPY query TO PROGRAM54.141 52.7631.378
  COPY table FROM  88.970 85.1613.809
  COPY table FROM PROGRAM  94.393 90.3464.047

Properly formatted table (since I'm not sure everyone here would be
able to see the table formatted well) and the benchmark source is
present at https://github.com/simi/postgres/pull/6. I have also
included an example output in there.

I'll add this to the current commitfest as well.
From 847b227dac1ce2e9554a32ff95b8d618f8725843 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Josef=20=C5=A0im=C3=A1nek?= 
Date: Fri, 1 Jan 2021 01:14:47 +0100
Subject: [PATCH] Add pg_stat_progress_copy view with COPY progress report.

---
 doc/src/sgml/monitoring.sgml | 101 +++
 src/backend/catalog/system_views.sql |  11 +++
 src/backend/commands/copyfrom.c  |  16 +++-
 src/backend/commands/copyfromparse.c |   4 +
 src/backend/commands/copyto.c|  21 -
 src/backend/utils/adt/pgstatfuncs.c  |   2 +
 src/include/commands/copyfrom_internal.h |   1 +
 src/include/commands/progress.h  |   5 ++
 src/include/pgstat.h |   3 +-
 src/test/regress/expected/rules.out  |   9 ++
 10 files changed, 168 insertions(+), 5 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 3d6c90130677..51d261defd94 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -399,6 +399,12 @@ postgres   27093  0.0  0.0  30096  2752 ?Ss   11:34   0:00 postgres: ser
   
  
 
+ 
+  pg_stat_progress_copypg_stat_progress_copy
+  One row for each backend running COPY, showing current progress.
+   See .
+  
+ 
 

   
@@ -5247,6 +5253,7 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
which support progress reporting are ANALYZE,
CLUSTER,
CREATE INDEX, VACUUM,
+   COPY,
and  (i.e., replication
command that  issues to take
a base backup).
@@ -6396,6 +6403,100 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
   
 
  
+
+ 
+  COPY Progress Reporting
+
+  
+   Whenever COPY is running, the
+   pg_stat_copy_progress view will contain one row
+   for each backend that is currently running COPY command.
+   The table bellow describes the information that will be reported and provide
+   information how to interpret it.
+  
+
+  
+   pg_stat_progress_copy View
+   
+
+ 
+  
+   Column Type
+  
+  
+   Description
+  
+ 
+
+
+
+ 
+  
+   pid integer
+  
+  
+   Process ID of backend.
+  
+ 
+
+ 
+  
+   datid text
+  
+  
+   OID of the database to which this backend is 

Re: Table AM modifications to accept column projection lists

2020-12-31 Thread Zhihong Yu
Hi, Soumyadeep:
Happy New Year.

+typedef struct neededColumnContext
+{
+   Bitmapset **mask;
+   int n;

+ * n specifies the number of allowed entries in mask: we use
+ * it for bounds-checking in the walker above.

I think the code would be easier to read if the above comment is moved or
copied for field n of neededColumnContext

Cheers

On Thu, Dec 31, 2020 at 1:03 PM Soumyadeep Chakraborty <
soumyadeep2...@gmail.com> wrote:

> Hey Masahiko,
>
> I added it to the Jan CF (https://commitfest.postgresql.org/31/2922/).
>
> PFA a rebased version against latest head.
>
> Regards,
> Soumyadeep
>


Re: Deleting older versions in unique indexes to avoid page splits

2020-12-31 Thread Peter Geoghegan
On Thu, Dec 31, 2020 at 11:01 AM Zhihong Yu  wrote:
> Happy New Year.

Happy New Year.

> For v12-0001-Pass-down-logically-unchanged-index-hint.patch
>
> +   if (hasexpression)
> +   return false;
> +
> +   return true;
>
> The above can be written as return !hasexpression;
> The negation is due to the return value from 
> index_unchanged_by_update_var_walker() is inverse of index unchanged.
> If you align the meaning of return value from 
> index_unchanged_by_update_var_walker() the same way as 
> index_unchanged_by_update(), negation is not needed.

I don't think that that represents an improvement. The negation seems
clearer to me because we're negating the *absence* of something that
we search for more or less linearly (a modified column from the
index). This happens when determining whether to do an extra thing
(provide the "logically unchanged" hint to this particular
index/aminsert() call). To me, the negation reflects that high level
structure.

> For struct xl_btree_delete:
>
> +   /* DELETED TARGET OFFSET NUMBERS FOLLOW */
> +   /* UPDATED TARGET OFFSET NUMBERS FOLLOW */
> +   /* UPDATED TUPLES METADATA (xl_btree_update) ARRAY FOLLOWS */
>
> I guess the comment is for illustration purposes. Maybe you can write the 
> comment in lower case.

The comment is written like this (in higher case) to be consistent
with an existing convention. If this was a green field situation I
suppose I might not write it that way, but that's not how I assess
these things. I always try to give the existing convention the benefit
of the doubt. In this case I don't think that it matters very much,
and so I'm inclined to stick with the existing style.

> +#define BOTTOMUP_FAVORABLE_STRIDE  3
>
> Adding a comment on why the number 3 is chosen would be helpful for people to 
> understand the code.

Noted - I will add something about that to the next revision.

-- 
Peter Geoghegan




Re: crash recovery vs partially written WAL

2020-12-31 Thread Bruce Momjian
On Thu, Dec 31, 2020 at 02:27:44PM -0500, Stephen Frost wrote:
> Greetings,
> 
> * Bruce Momjian (br...@momjian.us) wrote:
> > On Wed, Dec 30, 2020 at 12:52:46PM -0800, Andres Freund wrote:
> > > A question from a colleague made me wonder if there are scenarios where
> > > two subsequent crashes could lead to wrong WAL to be applied.
> > > 
> > > Imagine the following scenario
> > > [ xlog page 1 ][ xlog page 2 ][ xlog page 3 ][ xlog page 4 ]
> > > ^flush^write ^insert
> > > 
> > > if the machine crashes in this moment, we could end up with a situation
> > > where page 1, 3, 4 made it out out to disk, but page 2 wasn't.
> > 
> > I don't see any flaw in your logic.  Seems we have to zero out all
> > future WAL files, not just to the end of the current one, or at least
> > clear xlp_pageaddr on each future page.
> 
> I've wondered before if we should be doing a timeline switch at the end
> of crash recovery...

For a while we had trouble tracking timeline switches, but I think we
might be fine on that now.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee





Re: Table AM modifications to accept column projection lists

2020-12-31 Thread Soumyadeep Chakraborty
Hey Masahiko,

I added it to the Jan CF (https://commitfest.postgresql.org/31/2922/).

PFA a rebased version against latest head.

Regards,
Soumyadeep
From d033a0e3bceaf6e3f861e08363d4f170bc2a9fea Mon Sep 17 00:00:00 2001
From: Jacob Champion 
Date: Mon, 9 Nov 2020 16:36:10 -0800
Subject: [PATCH v2 1/1] tableam: accept column projection list

Co-authored-by: Soumyadeep Chakraborty 
Co-authored-by: Melanie Plageman 
Co-authored-by: Ashwin Agrawal 
Co-authored-by: Jacob Champion 
---
 src/backend/access/heap/heapam_handler.c |   5 +-
 src/backend/access/nbtree/nbtsort.c  |   3 +-
 src/backend/access/table/tableam.c   |   5 +-
 src/backend/commands/analyze.c   |   5 +-
 src/backend/commands/trigger.c   |  19 +++-
 src/backend/executor/execMain.c  |   3 +-
 src/backend/executor/execPartition.c |   2 +
 src/backend/executor/execReplication.c   |   6 +-
 src/backend/executor/execScan.c  | 108 +
 src/backend/executor/nodeIndexscan.c |  10 ++
 src/backend/executor/nodeLockRows.c  |   7 +-
 src/backend/executor/nodeModifyTable.c   |  38 ++--
 src/backend/executor/nodeSeqscan.c   |  43 -
 src/backend/executor/nodeTidscan.c   |  11 ++-
 src/backend/nodes/copyfuncs.c|   2 +
 src/backend/nodes/equalfuncs.c   |   2 +
 src/backend/nodes/outfuncs.c |   2 +
 src/backend/nodes/readfuncs.c|   2 +
 src/backend/optimizer/path/allpaths.c|  85 +++-
 src/backend/optimizer/plan/planner.c |  19 
 src/backend/optimizer/prep/preptlist.c   |  13 ++-
 src/backend/optimizer/util/inherit.c |  37 ++-
 src/backend/parser/analyze.c |  40 ++--
 src/backend/parser/parse_relation.c  |  18 
 src/backend/partitioning/partbounds.c|  14 ++-
 src/backend/rewrite/rewriteHandler.c |   8 ++
 src/include/access/tableam.h | 118 +--
 src/include/executor/executor.h  |   6 ++
 src/include/nodes/execnodes.h|   1 +
 src/include/nodes/parsenodes.h   |  14 +++
 src/include/utils/rel.h  |  14 +++
 31 files changed, 612 insertions(+), 48 deletions(-)

diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 3eea215b85..c18641700b 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -180,7 +180,8 @@ static bool
 heapam_fetch_row_version(Relation relation,
 		 ItemPointer tid,
 		 Snapshot snapshot,
-		 TupleTableSlot *slot)
+		 TupleTableSlot *slot,
+		 Bitmapset *project_cols)
 {
 	BufferHeapTupleTableSlot *bslot = (BufferHeapTupleTableSlot *) slot;
 	Buffer		buffer;
@@ -348,7 +349,7 @@ static TM_Result
 heapam_tuple_lock(Relation relation, ItemPointer tid, Snapshot snapshot,
   TupleTableSlot *slot, CommandId cid, LockTupleMode mode,
   LockWaitPolicy wait_policy, uint8 flags,
-  TM_FailureData *tmfd)
+  TM_FailureData *tmfd, Bitmapset *project_cols)
 {
 	BufferHeapTupleTableSlot *bslot = (BufferHeapTupleTableSlot *) slot;
 	TM_Result	result;
diff --git a/src/backend/access/nbtree/nbtsort.c b/src/backend/access/nbtree/nbtsort.c
index 8730de25ed..25ee10806b 100644
--- a/src/backend/access/nbtree/nbtsort.c
+++ b/src/backend/access/nbtree/nbtsort.c
@@ -1967,7 +1967,8 @@ _bt_parallel_scan_and_sort(BTSpool *btspool, BTSpool *btspool2,
 	indexInfo = BuildIndexInfo(btspool->index);
 	indexInfo->ii_Concurrent = btshared->isconcurrent;
 	scan = table_beginscan_parallel(btspool->heap,
-	ParallelTableScanFromBTShared(btshared));
+	ParallelTableScanFromBTShared(btshared),
+	NULL);
 	reltuples = table_index_build_scan(btspool->heap, btspool->index, indexInfo,
 	   true, progress, _bt_build_callback,
 	   (void *) , scan);
diff --git a/src/backend/access/table/tableam.c b/src/backend/access/table/tableam.c
index 6438c45716..187e861b5d 100644
--- a/src/backend/access/table/tableam.c
+++ b/src/backend/access/table/tableam.c
@@ -172,7 +172,7 @@ table_parallelscan_initialize(Relation rel, ParallelTableScanDesc pscan,
 }
 
 TableScanDesc
-table_beginscan_parallel(Relation relation, ParallelTableScanDesc parallel_scan)
+table_beginscan_parallel(Relation relation, ParallelTableScanDesc parallel_scan, Bitmapset *proj)
 {
 	Snapshot	snapshot;
 	uint32		flags = SO_TYPE_SEQSCAN |
@@ -194,6 +194,9 @@ table_beginscan_parallel(Relation relation, ParallelTableScanDesc parallel_scan)
 		snapshot = SnapshotAny;
 	}
 
+	if (proj)
+		return relation->rd_tableam->scan_begin_with_column_projection(relation, snapshot, 0, NULL,
+			parallel_scan, flags, proj);
 	return relation->rd_tableam->scan_begin(relation, snapshot, 0, NULL,
 			parallel_scan, flags);
 }
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index 8af12b5c6b..791451c765 100644
--- a/src/backend/commands/analyze.c
+++ 

Re: new heapcheck contrib module

2020-12-31 Thread Thomas Munro
On Tue, Oct 27, 2020 at 5:12 AM Mark Dilger
 wrote:
> The v20 patches 0002, 0003, and 0005 still apply cleanly, but 0004 required a 
> rebase.  (0001 was already committed last week.)
>
> Here is a rebased set of 4 patches, numbered 0002..0005 to be consistent with 
> the previous naming.  There are no substantial changes.

Hi Mark,

The command line stuff fails to build on Windows[1].  I think it's
just missing #include "getopt_long.h" (see
contrib/vacuumlo/vacuumlo.c).

[1] https://ci.appveyor.com/project/postgresql-cfbot/postgresql/build/1.0.123328




Re: crash recovery vs partially written WAL

2020-12-31 Thread Stephen Frost
Greetings,

* Bruce Momjian (br...@momjian.us) wrote:
> On Wed, Dec 30, 2020 at 12:52:46PM -0800, Andres Freund wrote:
> > A question from a colleague made me wonder if there are scenarios where
> > two subsequent crashes could lead to wrong WAL to be applied.
> > 
> > Imagine the following scenario
> > [ xlog page 1 ][ xlog page 2 ][ xlog page 3 ][ xlog page 4 ]
> > ^flush^write ^insert
> > 
> > if the machine crashes in this moment, we could end up with a situation
> > where page 1, 3, 4 made it out out to disk, but page 2 wasn't.
> 
> I don't see any flaw in your logic.  Seems we have to zero out all
> future WAL files, not just to the end of the current one, or at least
> clear xlp_pageaddr on each future page.

I've wondered before if we should be doing a timeline switch at the end
of crash recovery...

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: [HACKERS] [PATCH] Generic type subscripting

2020-12-31 Thread Pavel Stehule
čt 31. 12. 2020 v 15:27 odesílatel Dmitry Dolgov <9erthali...@gmail.com>
napsal:

> > On Wed, Dec 30, 2020 at 09:01:37PM +0100, Dmitry Dolgov wrote:
> > > make check fails
> >
> > Yeah, apparently I forgot to enable asserts back after the last
> > benchmarking discussion, and missed some of those. Will fix.
> >
> > > 2. The index position was ignored.
> > >
> > > postgres=# update foo set a['a'][10] = '20';
> > > UPDATE 1
> > > postgres=# select * from foo;
> > > ┌─┐
> > > │  a  │
> > > ╞═╡
> > > │ {"a": [20]} │
> > > └─┘
> > > (1 row)
> >
> > I just realized I haven't included "filling the gaps" part, that's why
> > it works as before. Can add this too.
> >
> > > 1. quietly ignored update
> > >
> > > postgres=# update foo set a['a'][10] = '20';
> > > UPDATE 1
> > > postgres=# select * from foo;
> > > ┌┐
> > > │ a  │
> > > ╞╡
> > > │ {} │
> > > └┘
> > > (1 row)
> >
> > This belongs to the original jsonb_set implementation. Although if we
> > started to change it anyway with "filling the gaps", maybe it's fine to
> > add one more flag to tune its behaviour in this case as well. I can
> > check how complicated that could be.
>
> Here is what I had in mind. Assert issue in main patch is fixed (nothing
> serious, it was just the rawscalar check for an empty jsonb created
> during assignment), and the second patch contains all the bits with
> "filling the gaps" including your suggestion about creating the whole
> path if it's not present. The latter (creating the chain of empty
> objects) I haven't tested that much, but if there are any issues or
> concerns I guess it will not prevent the main patch from going forward


the tests passed and filling gaps works well

but creating empty objects doesn't work

create table foo(a jsonb);

insert into foo values('{}');

postgres=# update foo set a['k'][1] = '20';
UPDATE 1
postgres=# select * from foo;
┌───┐
│ a │
╞═══╡
│ {"k": [null, 20]} │
└───┘
(1 row)

it is ok

postgres=# update foo set a['k3'][10] = '20';
UPDATE 1
postgres=# select * from foo;
┌───┐
│ a │
╞═══╡
│ {"k": [null, 20]} │
└───┘
(1 row)

the second update was not successful



.
>


Re: Deleting older versions in unique indexes to avoid page splits

2020-12-31 Thread Victor Yegorov
чт, 31 дек. 2020 г. в 20:01, Zhihong Yu :

> For v12-0001-Pass-down-logically-unchanged-index-hint.patch
>
> +   if (hasexpression)
> +   return false;
> +
> +   return true;
>
> The above can be written as return !hasexpression;
>

To be honest, I prefer the way Peter has it in his patch.
Yes, it's possible to shorten this part. But readability is hurt — for
current code I just read it, for the suggested change I need to think about
it.

-- 
Victor Yegorov


Re: crash recovery vs partially written WAL

2020-12-31 Thread Bruce Momjian
On Wed, Dec 30, 2020 at 12:52:46PM -0800, Andres Freund wrote:
> Hi,
> 
> A question from a colleague made me wonder if there are scenarios where
> two subsequent crashes could lead to wrong WAL to be applied.
> 
> Imagine the following scenario
> [ xlog page 1 ][ xlog page 2 ][ xlog page 3 ][ xlog page 4 ]
> ^flush^write ^insert
> 
> if the machine crashes in this moment, we could end up with a situation
> where page 1, 3, 4 made it out out to disk, but page 2 wasn't.

I don't see any flaw in your logic.  Seems we have to zero out all
future WAL files, not just to the end of the current one, or at least
clear xlp_pageaddr on each future page.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee





Re: pgbench: option delaying queries till connections establishment?

2020-12-31 Thread Thomas Munro
On Sun, Nov 15, 2020 at 4:53 AM Fabien COELHO  wrote:
> In the attached version, I just comment out the call and add an
> explanation about why it is commented out. If pg overall version
> requirements are changed on windows, then it could be reinstated.

It looks like macOS doesn't have pthread barriers (via cfbot 2021, now
with more operating systems):

pgbench.c:326:8: error: unknown type name 'pthread_barrier_t'
static pthread_barrier_t barrier;
^
pgbench.c:6128:2: error: implicit declaration of function
'pthread_barrier_init' is invalid in C99
[-Werror,-Wimplicit-function-declaration]
pthread_barrier_init(, NULL, nthreads);
^




Re: Deleting older versions in unique indexes to avoid page splits

2020-12-31 Thread Zhihong Yu
Hi, Peter:
Happy New Year.

For v12-0001-Pass-down-logically-unchanged-index-hint.patch

+   if (hasexpression)
+   return false;
+
+   return true;

The above can be written as return !hasexpression;
The negation is due to the return value from
index_unchanged_by_update_var_walker() is inverse of index unchanged.
If you align the meaning of return value
from index_unchanged_by_update_var_walker() the same way
as index_unchanged_by_update(), negation is not needed.

For v12-0002-Add-bottom-up-index-deletion.patch :

For struct xl_btree_delete:

+   /* DELETED TARGET OFFSET NUMBERS FOLLOW */
+   /* UPDATED TARGET OFFSET NUMBERS FOLLOW */
+   /* UPDATED TUPLES METADATA (xl_btree_update) ARRAY FOLLOWS */

I guess the comment is for illustration purposes. Maybe you can write the
comment in lower case.

+#define BOTTOMUP_FAVORABLE_STRIDE  3

Adding a comment on why the number 3 is chosen would be helpful for people
to understand the code.

Cheers

On Wed, Dec 30, 2020 at 6:55 PM Peter Geoghegan  wrote:

> On Wed, Dec 9, 2020 at 5:12 PM Peter Geoghegan  wrote:
> > Attached is v11, which cleans everything up around the tableam
> > interface. There are only two patches in v11, since the tableam
> > refactoring made it impossible to split the second patch into a heapam
> > patch and nbtree patch (there is no reduction in functionality
> > compared to v10).
>
> Attached is v12, which fixed bitrot against the master branch. This
> version has significant comment and documentation revisions. It is
> functionally equivalent to v11, though.
>
> I intend to commit the patch in the next couple of weeks. While it
> certainly would be nice to get a more thorough review, I don't feel
> that it is strictly necessary. The patch provides very significant
> benefits with certain workloads that have traditionally been
> considered an Achilles' heel for Postgres. Even zheap doesn't provide
> a solution to these problems. The only thing that I can think of that
> might reasonably be considered in competition with this design is
> WARM, which hasn't been under active development since 2017 (I assume
> that it has been abandoned by those involved).
>
> I should also point out that the design doesn't change the on-disk
> format in any way, and so doesn't commit us to supporting something
> that might become onerous in the event of somebody else finding a
> better way to address at least some of the same problems.
>
> --
> Peter Geoghegan
>


Re: Proposed patch for key management

2020-12-31 Thread Joshua Drake
>
>
> > >I will say that if the community feels external-only should be the only
> > >option, I will stop working on this feature because I feel the result
> > >would be too fragile to be reliable,
> >
> > I'm do not see why it would be the case. I'm just arguing to have key
> > management in a separate, possibly suid something-else, process, which
> given
> > the security concerns which dictates the feature looks like a must have,
> or
> > at least must be possible. From a line count point of view, it should be
> a
> > small addition to the current code.
>
> All of this hand-waving really isn't helping.
>
> If it's a small addition to the current code then it'd be fantastic if
> you'd propose a specific patch which adds what you're suggesting.  I
> don't think either Bruce or I would have any issue with others helping
> out on this effort, but let's be clear- we need something that *is* part
> of core PG, even if we have an ability to have other parts exist outside
> of PG.
>

+1

JD


Re: Proposed patch for key management

2020-12-31 Thread Joshua Drake
On Wed, Dec 30, 2020 at 3:47 PM Bruce Momjian  wrote:

>
> I will say that if the community feels external-only should be the only
> option, I will stop working on this feature because I feel the result
> would be too fragile to be reliable, and I would not want to be
> associated with it.
>
>
I can say that the people that I work with would prefer an "officially"
supported mechanism from Postgresql.org. The use of a generic API would be
useful for the ecosystem who wants to build on core functionality but
Postgresql should have competent built-in support as well.

JD

>
>


Re: Buildfarm's cross-version-upgrade tests

2020-12-31 Thread Andrew Dunstan


On 12/30/20 4:28 PM, Andrew Dunstan wrote:
>
> I'll try to fix the test for the latest breakage shortly.



See



I'm going to get a new release out before next Thursday come hell or
high water.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Proposed patch for key management

2020-12-31 Thread Bruce Momjian
On Thu, Dec 31, 2020 at 12:05:53PM -0500, Stephen Frost wrote:
> > Let's unpack this logic, since I know others, like Alastair Turner (CC
> > added), had similar concerns.  Frankly, I feel this feature has limited
> > security usefulness, so if we don't feel it has sufficient value, let's
> > mark it as not wanted and move on.
> 
> Considering the amount of requests for this, I don't feel it's at all
> reasonable to suggest that it's 'not wanted'.

Well, people ask for (or used to ask for) query hints, so request count
and usefulness aren't always correlated.  I brought up this point
because people sometimes want to add complexity to try and guard against
some exploit that is impossible to guard against, so I thought we should
be clear what we can't protect, no matter how we configure it.

> > When using AES256 with GCM (for verification) is #1 more secure than #2?
> > I don't think so.  If the Postgres account is compromised, they can grab
> > the data encryption keys as the come in from the external script (#1),
> > or when they are decrypted by the KEK (#2), or while they are in shared
> > memory while the server is running.  If the postgres account is not
> > compromised, I don't think it is any easier to get the data key wrapped
> > by a KEK than it is to try decrypting the actual heap/index/WAL blocks.
> > (Once you find the key for one, they would all decrypt.)
> 
> I can see some usefulness for supporting #1, but that has got next to
> nothing to do with what this patch is about and is all about the *next*
> step, which is to actually look at supporting encryption of the rest of
> the cluster, beyond just the keys.  We need to get past this first step
> though, and it seems to be nearly impossible to do so, which has
> frustrated multiple attempts to actually accomplish anything here.
> 
> I agree that none of these approaches address an online compromise of
> the postgres account.  Thankfully, that isn't actually what this is
> intended to address and to talk about that case is just a distraction
> that isn't solvable and wastes time.

I assume people don't want the data keys stored in the file system, even
in encrypted form, and they believe this makes the system more secure.  I
don't think it does, and I think it makes external key rotation very
difficult, among other negatives.  Also, keep in mind any failure of the
key management system could make the cluster unreadable, so making it as
simple/error-free as possible is a big requirement for me.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee





Re: Proposed patch for key management

2020-12-31 Thread Stephen Frost
Greetings,

* Bruce Momjian (br...@momjian.us) wrote:
> On Thu, Dec 31, 2020 at 11:11:11AM +0100, Fabien COELHO wrote:
> > > I am not sure what else I can add to this discussion.  Having something
> > > that is completely external might be a nice option, but I don't think it
> > > is the common use-case, and will make the most common use cases more
> > > complex.
> > 
> > I'm unsure about what is the "common use case". ISTM that having the
> > postgres process holding the master key looks like a "no" for me in any use
> > case with actual security concern: as the database must be remotely
> > accessible, a reasonable security assumption is that a pg account could be
> > compromised, and the "master key" (if any, that is just one particular
> > cryptographic design) should not be accessible in that case. The first
> > barrier would be pg admin account.
> 
> Let's unpack this logic, since I know others, like Alastair Turner (CC
> added), had similar concerns.  Frankly, I feel this feature has limited
> security usefulness, so if we don't feel it has sufficient value, let's
> mark it as not wanted and move on.

Considering the amount of requests for this, I don't feel it's at all
reasonable to suggest that it's 'not wanted'.

> I think there are two basic key configurations:
> 
> 1.  pass data encryption keys in from an external source
> 2.  store the data encryption keys wrapped by a key encryption key (KEK)
> passed in from an external source

I agree those are two basic configurations (though they aren't the only
possible ones).

> The current code implements #2 because it simplifies administration,
> checking, external key (KEK) rotation, and provides good error reporting
> when something goes wrong.  For example, with #1, there is no way to
> rotate the externally-stored key except reencrypting the entire cluster.

Right, and there also wouldn't be a very simple way to actually get PG
going with a single key or a passphrase.

> When using AES256 with GCM (for verification) is #1 more secure than #2?
> I don't think so.  If the Postgres account is compromised, they can grab
> the data encryption keys as the come in from the external script (#1),
> or when they are decrypted by the KEK (#2), or while they are in shared
> memory while the server is running.  If the postgres account is not
> compromised, I don't think it is any easier to get the data key wrapped
> by a KEK than it is to try decrypting the actual heap/index/WAL blocks.
> (Once you find the key for one, they would all decrypt.)

I can see some usefulness for supporting #1, but that has got next to
nothing to do with what this patch is about and is all about the *next*
step, which is to actually look at supporting encryption of the rest of
the cluster, beyond just the keys.  We need to get past this first step
though, and it seems to be nearly impossible to do so, which has
frustrated multiple attempts to actually accomplish anything here.

I agree that none of these approaches address an online compromise of
the postgres account.  Thankfully, that isn't actually what this is
intended to address and to talk about that case is just a distraction
that isn't solvable and wastes time.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: Proposed patch for key management

2020-12-31 Thread Bruce Momjian
On Thu, Dec 31, 2020 at 11:11:11AM +0100, Fabien COELHO wrote:
> > I am not sure what else I can add to this discussion.  Having something
> > that is completely external might be a nice option, but I don't think it
> > is the common use-case, and will make the most common use cases more
> > complex.
> 
> I'm unsure about what is the "common use case". ISTM that having the
> postgres process holding the master key looks like a "no" for me in any use
> case with actual security concern: as the database must be remotely
> accessible, a reasonable security assumption is that a pg account could be
> compromised, and the "master key" (if any, that is just one particular
> cryptographic design) should not be accessible in that case. The first
> barrier would be pg admin account.

Let's unpack this logic, since I know others, like Alastair Turner (CC
added), had similar concerns.  Frankly, I feel this feature has limited
security usefulness, so if we don't feel it has sufficient value, let's
mark it as not wanted and move on.

I think there are two basic key configurations:

1.  pass data encryption keys in from an external source
2.  store the data encryption keys wrapped by a key encryption key (KEK)
passed in from an external source

The current code implements #2 because it simplifies administration,
checking, external key (KEK) rotation, and provides good error reporting
when something goes wrong.  For example, with #1, there is no way to
rotate the externally-stored key except reencrypting the entire cluster.

When using AES256 with GCM (for verification) is #1 more secure than #2?
I don't think so.  If the Postgres account is compromised, they can grab
the data encryption keys as the come in from the external script (#1),
or when they are decrypted by the KEK (#2), or while they are in shared
memory while the server is running.  If the postgres account is not
compromised, I don't think it is any easier to get the data key wrapped
by a KEK than it is to try decrypting the actual heap/index/WAL blocks.
(Once you find the key for one, they would all decrypt.)

> With external, the options are that the
> key is hold by another id, so the host must be compromised as well, and
> could even be managed remotely on another host (I agree that this later
> option would be fragile because of the implied network connection, but it
> would be a tradeoff depending on the security concerns, but it pretty much
> correspond to the kerberos model).

We don't store the data encryption keys raw on the server, but rather
wrap them with a KEK.  If the external keystore is compromised using #1,
you can't rotate those keys without reencrypting the entire cluster.

> > Adding a pre-defined system will not prevent future work on a
> > completely external option.
> 
> Yes and no.
> 
> Having two side-by-side cluster-encryption scheme in core, the first that
> could be implemented on top of the second without much effort, would look
> pretty weird. Moreover, required changes in core to allow an API are the
> very same as the one in this patch. The other concern I have with doing the
> cleaning work afterwards is that the API would be somehow in the middle of
> the existing functions if it has not been thought of before.

The question is whether there is a common use-case, and if so, do we
implement that first, with all the safeguards, and then allow others to
customize?  I don't want to optimize for the rare case if that makes the
common case more error-prone.

> > I know archive_command is completely external, but that is much simpler
> > than what would need to be done for key management, key rotation, and
> > key verification.
> 
> I'm not sure of the design of the key rotation stuff as I recall it from the
> patches I looked at, but the API design looks like the more pressing issue.
> First, the mere existence of an "master key" is a cryptographic choice which
> is debatable, because it creates issues if one must be able to change it,
> hence the whole "key rotation" stuff. ISTM that what core needs to know is

Well, you would want to change the external-stored key independent of
the data encryption keys, which are harder to change.

> > I will say that if the community feels external-only should be the only
> > option, I will stop working on this feature because I feel the result
> > would be too fragile to be reliable,
> 
> I'm do not see why it would be the case. I'm just arguing to have key
> management in a separate, possibly suid something-else, process, which given
> the security concerns which dictates the feature looks like a must have, or
> at least must be possible. From a line count point of view, it should be a
> small addition to the current code.

See above.  Please explain the security value of this complexity.

> > and I would not want to be associated with it.
> 
> You do as you want. I'm no one, you can ignore me and proceed to commit
> whatever you want. I'm only advising to the best of my ability, what I 

Re: [Doc Patch] Clarify that CREATEROLE roles can GRANT default roles

2020-12-31 Thread Michael Banck
Hi,

Am Montag, den 28.12.2020, 20:41 +0900 schrieb Masahiko Sawada:
> On Sat, Nov 28, 2020 at 7:50 AM Michael Banck  
> wrote:
> > https://www.postgresql.org/docs/current/default-roles.html mentions the
> > "Administrator" several times, but does not specify it further. I
> > understand that it is mentioned elsewhere [1], but I think it would be
> > beneficial to remind the reader on that page at least once that
> > "Administrators" includes "roles with the CREATEROLE privilege" in the
> > context of GRANTing and REVOKEing default privileges, e.g. with the
> > attached patch.
> > 
> 
> You sent in your patch, default-roles-createrole.patch to
> pgsql-hackers on Nov 28, but you did not post it to the next
> CommitFest[1].  If this was intentional, then you need to take no
> action.  However, if you want your patch to be reviewed as part of the
> upcoming CommitFest, then you need to add it yourself before
> 2021-01-01 AoE[2]. Thanks for your contributions.

Thanks for reminding me, I've done so now[1].


Michael

[1] https://commitfest.postgresql.org/31/2921/

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.ba...@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz





Re: Proposed patch for key managment

2020-12-31 Thread Stephen Frost
Greetings,

* Fabien COELHO (coe...@cri.ensmp.fr) wrote:
> >>The implementations should not have to be in any particular language: Shell,
> >>Perl, Python, C should be possible.
> >
> >I disagree that it makes any sense to pass actual encryption out to a
> >shell script.
> 
> Yes, sure. I'm talking about key management. For actual crypto functions,
> ISTM that they should be "replaceable", i.e. if some institution does not
> believe in AES then they could switch to something else.

The proposed implementation makes it pretty straight-forward to add in
other implementations and other crypto algorithms if people wish to do
so.

> >>After giving it more thought during the day, I think that only one
> >>command and a basic protocol is needed. Maybe something as simple as
> >>
> >>  /path/to/command --options arguments…
> >
> >This is what's proposed- a command is run to acquire the KEK (as a hex
> >encoded set of bytes, making it possible to use a shell script, which is
> >what the patch does too).
> 
> Yep, but that is not what I'm trying to advocate. The "KEK" (if any), would
> stay in the process, not be given back to the database or command using it.
> Then the database/tool would interact with the command to get the actual
> per-file keys when needed.

"When needed" is every single write that we do of any file in the entire
backend.  Reaching out to something external every time we go to use the
key really isn't sensible- except, perhaps, in the case where we are
doing full crypto off-loading and we don't actually have to deal with
the KEK or the DEK at all, but that's all on the cluster encryption
side, really, and isn't the focus of this patch and what it's bringing-
all of which is needed anyway.

> >[...] The API to fetch the KEK doesn't care at all about where it's stored
> >or how it's derived or anything like that.
> 
> >There's a relatively small change which could be made to have PG request
> >all of the keys that it'll need on startup, if we want to go there as has
> >been suggested elsewhere, but even if we do that, PG needs to be able to
> >do that itself too, otherwise it's not a complete capability and there
> >seems little point in doing something that's just a pass-thru to something
> >else and isn't able to really be used.
> 
> I do not understand. Postgres should provide a working implementation,
> whether the functions are directly inside pg or though an external process,
> they need to be there anyway. I'm trying to fix a clean, possibly external
> API so that it is possible to have something different from the choices made
> in the patch.

Right, we need a working implementation that's part of core, that's what
this effort is trying to bring.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: Proposed patch for key management

2020-12-31 Thread Stephen Frost
Greetings,

* Fabien COELHO (coe...@cri.ensmp.fr) wrote:
> >>The API should NOT make assumptions about the cryptographic design, what
> >>depends about what, where things are stored… ISTM that Pg should only care
> >>about naming keys, holding them when created/retrieved (but not create
> >>them), basically interacting with the key manager, passing the stuff to
> >>functions for encryption/decryption seen as black boxes.
> >>
> >>I may have suggested something along these lines at the beginning of the key
> >>management thread, probably. Not going this way implicitely implies making
> >>some assumptions which may or may not suit other use cases, so makes them
> >>specific not generic. I do not think pg should do that.
> >
> >I am not sure what else I can add to this discussion.  Having something
> >that is completely external might be a nice option, but I don't think it
> >is the common use-case, and will make the most common use cases more
> >complex.
> 
> I'm unsure about what is the "common use case". ISTM that having the
> postgres process holding the master key looks like a "no" for me in any use
> case with actual security concern: as the database must be remotely
> accessible, a reasonable security assumption is that a pg account could be
> compromised, and the "master key" (if any, that is just one particular
> cryptographic design) should not be accessible in that case. The first
> barrier would be pg admin account. With external, the options are that the
> key is hold by another id, so the host must be compromised as well, and
> could even be managed remotely on another host (I agree that this later
> option would be fragile because of the implied network connection, but it
> would be a tradeoff depending on the security concerns, but it pretty much
> correspond to the kerberos model).

No, this isn't anything like the Kerberos model and there's no case
where the PG account won't have access to the DEK (data encryption key)
during normal operation (except with the possibility of off-loading to a
hardware crypto device which, while interesting, is definitely something
that's of very limited utility and which could be added on as a
capability later anyway.  This is also well understood by those who are
interested in this capability and it's perfectly acceptable that PG will
have, in memory, the DEK.

The keys which are stored on disk with the proposed patch are encrypted
using a KEK which is external to PG- that's all part of the existing
patch and design.

> >Adding a pre-defined system will not prevent future work on a
> >completely external option.
> 
> Yes and no.
> 
> Having two side-by-side cluster-encryption scheme in core, the first that
> could be implemented on top of the second without much effort, would look
> pretty weird. Moreover, required changes in core to allow an API are the
> very same as the one in this patch. The other concern I have with doing the
> cleaning work afterwards is that the API would be somehow in the middle of
> the existing functions if it has not been thought of before.

This has been considered and the functions which are proposed are
intentionally structured to allow for multiple implementations already,
so it's unclear to me what the argument here is.  Further, we haven't
even gotten to actual cluster encryption yet- this is all just the key
management side of things, which is absolutely a necessary and important
part but argueing that it doesn't address the cluster encryption
approach in core simply doesn't make any sense as that's not a part of
this patch.

Let's have that discussion when we actually get to the point of talking
about cluster encryption.

> >I know archive_command is completely external, but that is much simpler
> >than what would need to be done for key management, key rotation, and key
> >verification.
> 
> I'm not sure of the design of the key rotation stuff as I recall it from the
> patches I looked at, but the API design looks like the more pressing issue.
> First, the mere existence of an "master key" is a cryptographic choice which
> is debatable, because it creates issues if one must be able to change it,
> hence the whole "key rotation" stuff. ISTM that what core needs to know is
> that one particular key is changed by a new one and the underlying file is
> rewritten. It should not need to know anything about master keys and key
> derivation at all. It should need to know that the user wants to change file
> keys, though.

No, it's not debatable that a KEK is needed, I disagree entirely.
Perhaps we can have support for an external key store in the future for
the DEKs, but we really need to have our own key management and the
ability to reasonably rotate keys (which is what the KEK provides).
Ideally, we'll get to a point where we can even have multiple DEKs and
deal with rotating them too, but that, if anything, point even stronger
to the need to have a key management system and KEK as we'll be dealing
with that many more keys.

> 

Re: [HACKERS] [PATCH] Generic type subscripting

2020-12-31 Thread Dmitry Dolgov
> On Wed, Dec 30, 2020 at 09:01:37PM +0100, Dmitry Dolgov wrote:
> > make check fails
>
> Yeah, apparently I forgot to enable asserts back after the last
> benchmarking discussion, and missed some of those. Will fix.
>
> > 2. The index position was ignored.
> >
> > postgres=# update foo set a['a'][10] = '20';
> > UPDATE 1
> > postgres=# select * from foo;
> > ┌─┐
> > │  a  │
> > ╞═╡
> > │ {"a": [20]} │
> > └─┘
> > (1 row)
>
> I just realized I haven't included "filling the gaps" part, that's why
> it works as before. Can add this too.
>
> > 1. quietly ignored update
> >
> > postgres=# update foo set a['a'][10] = '20';
> > UPDATE 1
> > postgres=# select * from foo;
> > ┌┐
> > │ a  │
> > ╞╡
> > │ {} │
> > └┘
> > (1 row)
>
> This belongs to the original jsonb_set implementation. Although if we
> started to change it anyway with "filling the gaps", maybe it's fine to
> add one more flag to tune its behaviour in this case as well. I can
> check how complicated that could be.

Here is what I had in mind. Assert issue in main patch is fixed (nothing
serious, it was just the rawscalar check for an empty jsonb created
during assignment), and the second patch contains all the bits with
"filling the gaps" including your suggestion about creating the whole
path if it's not present. The latter (creating the chain of empty
objects) I haven't tested that much, but if there are any issues or
concerns I guess it will not prevent the main patch from going forward.
>From c9143a620497dac5615c4de1d9349684e9af95b5 Mon Sep 17 00:00:00 2001
From: Dmitrii Dolgov <9erthali...@gmail.com>
Date: Fri, 18 Dec 2020 17:19:51 +0100
Subject: [PATCH v40 1/2] Subscripting for jsonb

Subscripting implementation for jsonb. It does not support slices, does
not have a limit for number of subscripts and for assignment expects a
replace value to be of jsonb type. There is also one functional
difference in assignment via subscripting from jsonb_set, when an
original jsonb container is NULL, subscripting replaces it with an empty
jsonb and proceed with assignment.

For the sake of code reuse, some parts of jsonb functionality were
rearranged to allow use the same functions for jsonb_set and assign
subscripting operation.

The original idea belongs to Oleg Bartunov.

Reviewed-by: Tom Lane, Arthur Zakirov, Pavel Stehule
---
 doc/src/sgml/json.sgml  |  48 
 src/backend/utils/adt/Makefile  |   1 +
 src/backend/utils/adt/jsonb_util.c  |  76 -
 src/backend/utils/adt/jsonbsubs.c   | 413 
 src/backend/utils/adt/jsonfuncs.c   | 180 ++--
 src/include/catalog/pg_proc.dat |   4 +
 src/include/catalog/pg_type.dat |   3 +-
 src/include/utils/jsonb.h   |   6 +-
 src/test/regress/expected/jsonb.out | 272 +-
 src/test/regress/sql/jsonb.sql  |  84 +-
 10 files changed, 982 insertions(+), 105 deletions(-)
 create mode 100644 src/backend/utils/adt/jsonbsubs.c

diff --git a/doc/src/sgml/json.sgml b/doc/src/sgml/json.sgml
index 5b9a5557a4..100d1a60f4 100644
--- a/doc/src/sgml/json.sgml
+++ b/doc/src/sgml/json.sgml
@@ -602,6 +602,54 @@ SELECT jdoc-'guid', jdoc-'name' FROM api WHERE jdoc @ '{"tags": ["qu
   
  
 
+ 
+  jsonb Subscripting
+  
+   jsonb data type supports array-style subscripting expressions
+   to extract or update particular elements. It's possible to use multiple
+   subscripting expressions to extract nested values. In this case, a chain of
+   subscripting expressions follows the same rules as the
+   path argument in jsonb_set function,
+   e.g. in case of arrays it is a 0-based operation or that negative integers
+   that appear in path count from the end of JSON arrays.
+   The result of subscripting expressions is always jsonb data type. An
+   example of subscripting syntax:
+
+-- Extract value by key
+SELECT ('{"a": 1}'::jsonb)['a'];
+
+-- Extract nested value by key path
+SELECT ('{"a": {"b": {"c": 1}}}'::jsonb)['a']['b']['c'];
+
+-- Extract element by index
+SELECT ('[1, "2", null]'::jsonb)[1];
+
+-- Update value by key, note the single quotes - the assigned value
+-- needs to be of jsonb type as well
+UPDATE table_name SET jsonb_field['key'] = '1';
+
+-- Select records using where clause with subscripting. Since the result of
+-- subscripting is jsonb and we basically want to compare two jsonb objects, we
+-- need to put the value in double quotes to be able to convert it to jsonb.
+SELECT * FROM table_name WHERE jsonb_field['key'] = '"value"';
+
+
+  Subscripting for jsonb does not support slice expressions,
+  even if it contains an array.
+
+  In case if source jsonb is NULL, assignment
+  via subscripting will proceed as if it was an empty JSON object:
+
+-- If jsonb_field here is NULL, the result is {"a": 1}
+UPDATE table_name SET jsonb_field['a'] = '1';
+
+-- If jsonb_field here is NULL, the result is [1]
+UPDATE table_name SET jsonb_field[0] = '1';
+
+
+  
+ 
+
  
   Transforms
 
diff 

Re: Zedstore - compressed in-core columnar storage

2020-12-31 Thread Simon Riggs
On Wed, 18 Nov 2020 at 00:31, Jacob Champion  wrote:
>
> So that's in need of resolution. I'd expect gin and gist to be pretty
> flaky until we fix that.

Jacob and Soumyadeep,

Thanks for submitting this. I think a fix is still outstanding? and
the patch fails to apply on HEAD in two places.
Please can you submit the next version?

Do you mind if we add this for review to the Jan CF?

It is a lot of code and I think there is significant difficulty for
the community to accept that as-is, even though it looks to be a very
high quality submission. So I would like to suggest a strategy for
commit: we accept Zedstore as "Beta" or "Experimental" in PG14,
perhaps with a WARNING/Caution similar to the one that used to be
given by Postgres in earlier versions when you created a Hash index.
We keep Zedstore in "Beta" mode until a later release, PG15 or later
when we can declare Zedstore fully safe. That approach allows us to
get this into the repo asap, and then be fixed and improved
incrementally from here.

e.g.

"NOTICE:  Caution: Zedstore is an experimental feature in PostgreSQL14
intended for robustness and performance testing only. Your data and/or
query accuracy may be at risk if you rely on this."

-- 
Simon Riggshttp://www.EnterpriseDB.com/




Re: Proposed patch for key managment

2020-12-31 Thread Stephen Frost
Greetings,

* Bruce Momjian (br...@momjian.us) wrote:
> On Wed, Dec 30, 2020 at 06:49:34PM -0500, Stephen Frost wrote:
> > The API to fetch the KEK doesn't care at all about where it's stored or
> > how it's derived or anything like that.  There's a relatively small
> > change which could be made to have PG request all of the keys that it'll
> > need on startup, if we want to go there as has been suggested elsewhere,
> > but even if we do that, PG needs to be able to do that itself too,
> > otherwise it's not a complete capability and there seems little point in
> > doing something that's just a pass-thru to something else and isn't able
> > to really be used.
> 
> Right now, the script returns a cluster key (KEK), and during initdb the
> server generates data encryption keys and wraps them with the KEK. 
> During server start, the server validates the KEK and decrypts the data
> keys.  pg_alterckey allows changing the KEK.

Right- all of those are really necessary things for PG to have a useful
implementation.

> I think Fabien is saying this all should _only_ be done using external
> tools --- that's what I don't agree with.

I also don't agree that everything should be external.  We effectively
have that today and it certainly doesn't get us any closer to having a
solution in PG, a point which we are frequently pointed to as a reason
why certain, entirely reasonable, use-cases can't be satisfied with PG.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: Deleting older versions in unique indexes to avoid page splits

2020-12-31 Thread Victor Yegorov
чт, 31 дек. 2020 г. в 03:55, Peter Geoghegan :

> Attached is v12, which fixed bitrot against the master branch. This
> version has significant comment and documentation revisions. It is
> functionally equivalent to v11, though.
>
> I intend to commit the patch in the next couple of weeks. While it
> certainly would be nice to get a more thorough review, I don't feel
> that it is strictly necessary. The patch provides very significant
> benefits with certain workloads that have traditionally been
> considered an Achilles' heel for Postgres. Even zheap doesn't provide
> a solution to these problems. The only thing that I can think of that
> might reasonably be considered in competition with this design is
> WARM, which hasn't been under active development since 2017 (I assume
> that it has been abandoned by those involved).
>

I am planning to look into this patch in the next few days.

-- 
Victor Yegorov


[PATCH] New default role allowing to change per-role/database settings

2020-12-31 Thread Michael Banck
Hi,

in today's world, some DBAs have no superuser rights, but we can
delegate them additional powers like CREATEDB or the pg_monitor default
role etc. Usually, the DBA can also view the database logs, either via
shell access or some web interface.

One thing that I personally find lacking is that it is not possible to
change role-specific log settings (like log_statement = 'mod' for a
security sensitive role) without being SUPERUSER as their GUC context is
"superuser". This makes setup auditing much harder if there is no
SUPERUSER access, also pgaudit then only allows to configure object-
based auditing. Amazon RDS e.g. has patched Postgres to allow the
cluster owner/pseudo-superuser `rds_superuser' to change those log
settings that define what/when we log something, while keeping the
"where to log" entries locked down.

The attached patch introduces a new guc context "administrator" (better
names/bikeshedding for this welcome) that is meant as a middle ground
between "superuser" and "user". It also adds a new default role
"pg_change_role_settings" (better names also welcome) that can be
granted to DBAs so that they can change the "administrator"-context GUCs
on a per-role (or per-database) basis. Whether the latter should be
included is maybe debatable, but I added both on the basis that they are
the same "source".

The initial set of "administrator" GUCs are all current GUCs with
"superuser" context from these categories:

* Reporting and Logging / What to Log
* Reporting and Logging / When to
Log
* Statistics / Query and Index Statistics Collector
* Statistics /
Monitoring

Of course, further categories (or particular GUCs) could be added now or
in the future, e.g. RDS also patches the following GUCs in their v12
offering:

* temp_file_limit
* session_replication_role

RDS does *not* patch log_transaction_sample_rate from "Reporting and
Logging / When to Log", but that might be more of an oversight than a
security consideration, or does anybody see a big problem with that
(compared to the others in that set)?

I initially pondered not introducing a new context but just filtering on
category, but as categories seem to be only descriptive in guc.c and not
used for any policy decisions so far, I have abandoned this pretty
quickly.


Thoughts?

Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.ba...@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz
From 71c1134ca8ea9267be6b605fa140d7d97f54ac6d Mon Sep 17 00:00:00 2001
From: Michael Banck 
Date: Tue, 15 Dec 2020 20:53:59 +0100
Subject: [PATCH] Add new PGC_ADMINSET guc context and pg_change_role_settings
 default role.

This commit introduces `administrator' as a middle ground between `superuser'
and `user' for guc contexts.

Those settings initially include all previously `superuser'-context settings
from the 'Reporting and Logging' ('What/When to Log' but not 'Where to Log')
and both 'Statistics' categories. Further settings could be added in follow-up
commits.

Also, a new default role pg_change_role_settings is introduced.  This allows
administrators (that are not superusers) that have been granted this role to
change the above PGC_ADMINSET settings on a per-role/database basis like "ALTER
ROLE [...] SET log_statement".

The rationale is to allow certain settings that affect logging to be changed
in setups where the DBA has access to the database log, but is not a full
superuser.

Catversion is bumped.
---
 src/backend/commands/dbcommands.c |  7 +++-
 src/backend/commands/user.c   |  7 ++--
 src/backend/utils/misc/guc.c  | 56 +++
 src/include/catalog/catversion.h  |  2 +-
 src/include/catalog/pg_authid.dat |  5 +++
 src/include/utils/guc.h   |  1 +
 6 files changed, 53 insertions(+), 25 deletions(-)

diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c
index f27c3fe8c1..03787eb5cf 100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -1657,7 +1657,12 @@ AlterDatabaseSet(AlterDatabaseSetStmt *stmt)
 	 */
 	shdepLockAndCheckObject(DatabaseRelationId, datid);
 
-	if (!pg_database_ownercheck(datid, GetUserId()))
+	/*
+	 * Only allow the database owner or a member of the
+	 * pg_change_role_settings default role to change database settings.
+	 */
+	if (!pg_database_ownercheck(datid, GetUserId()) &&
+		!is_member_of_role(GetUserId(), DEFAULT_ROLE_CHANGE_ROLE_SETTINGS))
 		aclcheck_error(ACLCHECK_NOT_OWNER, OBJECT_DATABASE,
 	   stmt->dbname);
 
diff --git a/src/backend/commands/user.c b/src/backend/commands/user.c
index 0e6800bf3e..884f15d1c5 100644
--- a/src/backend/commands/user.c
+++ b/src/backend/commands/user.c
@@ -936,7 +936,8 

Re: Dependency isn't created between extension and schema

2020-12-31 Thread Masahiko Sawada
On Tue, Dec 22, 2020 at 1:03 AM Tom Lane  wrote:
>
> James Coleman  writes:
> > On Mon, Dec 21, 2020 at 2:59 AM Michael Paquier  wrote:
> >> On Mon, Dec 21, 2020 at 04:02:29PM +0900, Masahiko Sawada wrote:
> >>> Is it a bug? Since the created schema obviously depends on the
> >>> extension when we created the schema specified in the schema option, I
> >>> think we might want to create the dependency so that DROP EXTENSION
> >>> drops the schema as well.
>
> >> FWIW, I recall that the "soft" behavior that exists now is wanted, as
> >> it is more flexible for DROP EXTENSION: what you are suggesting here
> >> has the disadvantage to make DROP EXTENSION fail if any non-extension
> >> object has been created on this schema, so this could be disruptive
> >> when it comes to some upgrade scenarios.
>
> I think it absolutely is intentional.  For example, if several extensions
> all list "schema1" in their control files, and you install them all, you
> would not want dropping the first-created one to force dropping the rest.
> I do not really see any problem here that's worth creating such hazards
> to fix.

Thank you for the comments!

I understand that it is intentional behavior and the downside of my
idea. But what is the difference between the schema created by
specifying the schema option in the control file and by CREATE SCHEMA
in the install script? Extensions might create the same schema
"schema1" in their install script. In this case, dropping the first
one force dropping the rest. Looking at some extensions in the world.
some extensions use the schema option whereas some use the install
script. I think it’s reasonable there are two ways to create the
extension’s schema with different dependencies but I think it’s better
to be documented. It looked like a non-intuitive behavior when I saw
it for the first time.

Regards,

-- 
Masahiko Sawada
EnterpriseDB:  https://www.enterprisedb.com/




Re: Proposed patch for key managment

2020-12-31 Thread Fabien COELHO



Hello,


The API to fetch the KEK doesn't care at all about where it's stored or
how it's derived or anything like that.  There's a relatively small
change which could be made to have PG request all of the keys that it'll
need on startup, if we want to go there as has been suggested elsewhere,
but even if we do that, PG needs to be able to do that itself too,
otherwise it's not a complete capability and there seems little point in
doing something that's just a pass-thru to something else and isn't able
to really be used.


Right now, the script returns a cluster key (KEK), and during initdb the
server generates data encryption keys and wraps them with the KEK.
During server start, the server validates the KEK and decrypts the data
keys.  pg_alterckey allows changing the KEK.

I think Fabien is saying this all should _only_ be done using external
tools --- that's what I don't agree with.


Yep.

I could compromise on "could be done using an external tool", but that 
requires designing the API and thinking about where and how things are 
done before everything is hardwired. Designing afterwards is too late. 
ISTM that the current patch does not separate API design and cryptographic 
design, so both are deeply entwined, and would be difficult to 
disentangle.


--
Fabien.




Re: Proposed patch for key managment

2020-12-31 Thread Fabien COELHO


Hello Stephen,


The implementations should not have to be in any particular language: Shell,
Perl, Python, C should be possible.


I disagree that it makes any sense to pass actual encryption out to a
shell script.


Yes, sure. I'm talking about key management. For actual crypto functions, 
ISTM that they should be "replaceable", i.e. if some institution does not 
believe in AES then they could switch to something else.



After giving it more thought during the day, I think that only one
command and a basic protocol is needed. Maybe something as simple as

  /path/to/command --options arguments…


This is what's proposed- a command is run to acquire the KEK (as a hex
encoded set of bytes, making it possible to use a shell script, which is
what the patch does too).


Yep, but that is not what I'm trying to advocate. The "KEK" (if any), 
would stay in the process, not be given back to the database or command 
using it. Then the database/tool would interact with the command to get 
the actual per-file keys when needed.


[...] The API to fetch the KEK doesn't care at all about where it's 
stored or how it's derived or anything like that.


There's a relatively small change which could be made to have PG request 
all of the keys that it'll need on startup, if we want to go there as 
has been suggested elsewhere, but even if we do that, PG needs to be 
able to do that itself too, otherwise it's not a complete capability and 
there seems little point in doing something that's just a pass-thru to 
something else and isn't able to really be used.


I do not understand. Postgres should provide a working implementation, 
whether the functions are directly inside pg or though an external 
process, they need to be there anyway. I'm trying to fix a clean, possibly 
external API so that it is possible to have something different from the 
choices made in the patch.


--
Fabien.

Re: Asynchronous Append on postgres_fdw nodes.

2020-12-31 Thread Etsuro Fujita
On Sat, Dec 19, 2020 at 5:55 PM Etsuro Fujita  wrote:
> On Mon, Dec 14, 2020 at 4:01 PM Kyotaro Horiguchi
>  wrote:
> > At Sat, 12 Dec 2020 18:25:57 +0900, Etsuro Fujita  
> > wrote in
> > > On Fri, Nov 20, 2020 at 3:51 PM Kyotaro Horiguchi
> > >  wrote:
> > > > The reason for
> > > > the early fetching is letting fdw send the next request as early as
> > > > possible. (However, I didn't measure the effect of the
> > > > nodeAppend-level prefetching.)
> > >
> > > I agree that that would lead to an improved efficiency in some cases,
> > > but I still think that that would be useless in some other cases like
> > > SELECT * FROM sharded_table LIMIT 1.  Also, I think the situation
> > > would get worse if we support Append on top of joins or aggregates
> > > over ForeignScans, which would be more expensive to perform than these
> > > ForeignScans.
> >
> > I'm not sure which gain we weigh on, but if doing "LIMIT 1" on Append
> > for many times is more common than fetching all or "LIMIT  > multiples of fetch_size>", that discussion would be convincing... Is
> > it really the case?
>
> I don't have a clear answer for that...  Performance in the case you
> mentioned would be improved by async execution without prefetching by
> Append, so it seemed reasonable to me to remove that prefetching to
> avoid unnecessary overheads in the case I mentioned.  BUT: I started
> to think my proposal, which needs an additional FDW callback routine
> (ie, ForeignAsyncBegin()), might be a bad idea, because it would
> increase the burden on FDW authors.

I dropped my proposal; I modified the patch so that ExecAppend()
requests tuples from all subplans needing a request *at once*, as
originally proposed by Robert and then you.  Please find attached a
new version of the patch.

Other changes:

* I renamed ExecAppendAsyncResponse() to what was originally proposed
by Robert, and modified the patch so that that function is called from
the requestee side, not the requestor side as in the previous version.

* I renamed the variable async_aware as explained by you.

* I tweaked comments a bit to address your comments.

* I made code simpler, and added a bit more assertions.

Best regards,
Etsuro Fujita


async-wip-2020-12-31.patch
Description: Binary data


Re: Proposed patch for key management

2020-12-31 Thread Fabien COELHO


Hello Bruce,


The API should NOT make assumptions about the cryptographic design, what
depends about what, where things are stored… ISTM that Pg should only care
about naming keys, holding them when created/retrieved (but not create
them), basically interacting with the key manager, passing the stuff to
functions for encryption/decryption seen as black boxes.

I may have suggested something along these lines at the beginning of the key
management thread, probably. Not going this way implicitely implies making
some assumptions which may or may not suit other use cases, so makes them
specific not generic. I do not think pg should do that.


I am not sure what else I can add to this discussion.  Having something 
that is completely external might be a nice option, but I don't think it 
is the common use-case, and will make the most common use cases more 
complex.


I'm unsure about what is the "common use case". ISTM that having the 
postgres process holding the master key looks like a "no" for me in any 
use case with actual security concern: as the database must be remotely 
accessible, a reasonable security assumption is that a pg account could be 
compromised, and the "master key" (if any, that is just one particular 
cryptographic design) should not be accessible in that case. The first 
barrier would be pg admin account. With external, the options are that the 
key is hold by another id, so the host must be compromised as well, and 
could even be managed remotely on another host (I agree that this later 
option would be fragile because of the implied network connection, but it 
would be a tradeoff depending on the security concerns, but it pretty much 
correspond to the kerberos model).



Adding a pre-defined system will not prevent future work on a
completely external option.


Yes and no.

Having two side-by-side cluster-encryption scheme in core, the first that 
could be implemented on top of the second without much effort, would look 
pretty weird. Moreover, required changes in core to allow an API are the 
very same as the one in this patch. The other concern I have with doing 
the cleaning work afterwards is that the API would be somehow in the 
middle of the existing functions if it has not been thought of before.


I know archive_command is completely external, but that is much simpler 
than what would need to be done for key management, key rotation, and 
key verification.


I'm not sure of the design of the key rotation stuff as I recall it from 
the patches I looked at, but the API design looks like the more pressing 
issue. First, the mere existence of an "master key" is a cryptographic 
choice which is debatable, because it creates issues if one must be able 
to change it, hence the whole "key rotation" stuff. ISTM that what core 
needs to know is that one particular key is changed by a new one and the 
underlying file is rewritten. It should not need to know anything about 
master keys and key derivation at all. It should need to know that the 
user wants to change file keys, though.



I will say that if the community feels external-only should be the only
option, I will stop working on this feature because I feel the result
would be too fragile to be reliable,


I'm do not see why it would be the case. I'm just arguing to have key 
management in a separate, possibly suid something-else, process, which 
given the security concerns which dictates the feature looks like a must 
have, or at least must be possible. From a line count point of view, it 
should be a small addition to the current code.



and I would not want to be associated with it.


You do as you want. I'm no one, you can ignore me and proceed to commit 
whatever you want. I'm only advising to the best of my ability, what I 
think should be the level of API pursued for such a feature in pg, at the 
design level.


I feel that the current proposal is built around one particular use case 
with many implicit and unnecessary assumptions without giving much 
thoughts to the generic API which should really be pg concern, and I do 
not think that it can really be corrected once the ship has sailed, hence 
my attempt at having some thoughts given to the matter before that.


IMHO, the *minimum* should be to have the API clearly designed, and the 
implementation compatible with it at some level, including not having 
assumptions about underlying cryptographic functions and key derivations 
(I mean at the API level, the code itself can do it obviously).


If you want a special "key_mgt_command = internal" because you feel that 
processes are fragile and unreliable and do not bring security, so be it, 
but I think that the API should be designed beforehand nevertheless.


Note that if people think that I'm wrong, then I'm wrong by definition.

--
Fabien.