Re: [PATCH] Fix memory corruption in pg_shdepend.c

2021-10-25 Thread Michael Paquier
On Mon, Oct 25, 2021 at 11:59:52AM -0400, Tom Lane wrote:
> I agree that we're not testing that area well enough.  Proposed
> patch seems basically OK, but I think the test needs to be stricter
> about what the expected output looks like --- for instance, it
> wouldn't complain if tab_foobar were described as something other
> than a table.

Indeed.  There was also a problem in the regex itself, where '|' was
not escaped so the regex was not strict enough.  While on it, I have
added a policy in the set copied to the new database.  Testing the
case where the set of slots is full would require 2300~ entries, that
would take some time..
--
Michael
diff --git a/src/bin/scripts/t/020_createdb.pl b/src/bin/scripts/t/020_createdb.pl
index cb8e26c773..6bcc59de08 100644
--- a/src/bin/scripts/t/020_createdb.pl
+++ b/src/bin/scripts/t/020_createdb.pl
@@ -6,7 +6,7 @@ use warnings;
 
 use PostgreSQL::Test::Cluster;
 use PostgreSQL::Test::Utils;
-use Test::More tests => 22;
+use Test::More tests => 25;
 
 program_help_ok('createdb');
 program_version_ok('createdb');
@@ -28,6 +28,30 @@ $node->issues_sql_like(
 $node->command_fails([ 'createdb', 'foobar1' ],
 	'fails if database already exists');
 
+# Check use of templates with shared dependencies copied from the template.
+my ($ret, $stdout, $stderr) = $node->psql(
+	'foobar2',
+	'CREATE ROLE role_foobar;
+CREATE TABLE tab_foobar (id int);
+ALTER TABLE tab_foobar owner to role_foobar;
+CREATE POLICY pol_foobar ON tab_foobar FOR ALL TO role_foobar;');
+$node->issues_sql_like(
+	[ 'createdb', '-l', 'C', '-T', 'foobar2', 'foobar3' ],
+	qr/statement: CREATE DATABASE foobar3 TEMPLATE foobar2/,
+	'create database with template');
+($ret, $stdout, $stderr) = $node->psql(
+	'foobar3',
+	"SELECT pg_describe_object(classid, objid, objsubid) AS obj,
+   pg_describe_object(refclassid, refobjid, 0) AS refobj
+   FROM pg_shdepend s JOIN pg_database d ON (d.oid = s.dbid)
+   WHERE d.datname = 'foobar3' ORDER BY obj;", on_error_die => 1);
+chomp($stdout);
+like(
+	$stdout,
+	qr/^policy pol_foobar on table tab_foobar\|role role_foobar
+table tab_foobar\|role role_foobar$/,
+	'shared dependencies copied over to target database');
+
 # Check quote handling with incorrect option values.
 $node->command_checks_all(
 	[ 'createdb', '--encoding', "foo'; SELECT '1", 'foobar2' ],


signature.asc
Description: PGP signature


Re: CREATEROLE and role ownership hierarchies

2021-10-25 Thread Shinya Kato

On 2021-10-21 03:40, Mark Dilger wrote:

These patches have been split off the now deprecated monolithic
"Delegating superuser tasks to new security roles" thread at [1].

The purpose of these patches is to fix the CREATEROLE escalation
attack vector misfeature.  (Not everyone will see CREATEROLE that way,
but the perceived value of the patch set likely depends on how much
you see CREATEROLE in that light.)


Hi! Thank you for the patch.
I too think that CREATEROLE escalation attack is problem.

I have three comments.
1. Is there a function to check the owner of a role, it would be nice to 
be able to check with \du or pg_roles view.
2. Is it correct that REPLICATION/BYPASSRLS can be granted even if you 
are not a super user, but have CREATEROLE and REPLICATION/BYPASSRLS?
3. I think it would be better to have an "DROP ROLE [ IF EXISTS ] name 
[, ...] [CASCADE | RESTRICT]" like "DROP TABLE [ IF EXISTS ] name [, 
...] [ CASCADE | RESTRICT ]". What do you think?


--
Regards,

--
Shinya Kato
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: row filtering for logical replication

2021-10-25 Thread Peter Smith
On Mon, Sep 27, 2021 at 2:02 PM Amit Kapila  wrote:
>
> On Sat, Sep 25, 2021 at 3:36 PM Tomas Vondra
>  wrote:
> >
> > On 9/25/21 6:23 AM, Amit Kapila wrote:
> > > On Sat, Sep 25, 2021 at 3:30 AM Tomas Vondra
> > >  wrote:
> > >>
> > >> On 9/24/21 7:20 AM, Amit Kapila wrote:
> > >>>
> > >>> I think the right way to support functions is by the explicit marking
> > >>> of functions and in one of the emails above Jeff Davis also agreed
> > >>> with the same. I think we should probably introduce a new marking for
> > >>> this. I feel this is important because without this it won't be safe
> > >>> to access even some of the built-in functions that can access/update
> > >>> database (non-immutable functions) due to logical decoding environment
> > >>> restrictions.
> > >>>
> > >>
> > >> I agree that seems reasonable. Is there any reason why not to just use
> > >> IMMUTABLE for this purpose? Seems like a good match to me.
> > >>
> > >
> > > It will just solve one part of the puzzle (related to database access)
> > > but it is better to avoid the risk of broken replication by explicit
> > > marking especially for UDFs or other user-defined objects. You seem to
> > > be okay documenting such risk but I am not sure we have an agreement
> > > on that especially because that was one of the key points of
> > > discussions in this thread and various people told that we need to do
> > > something about it. I personally feel we should do something if we
> > > want to allow user-defined functions or operators because as reported
> > > in the thread this problem has been reported multiple times. I think
> > > we can go ahead with IMMUTABLE built-ins for the first version and
> > > then allow UDFs later or let's try to find a way for explicit marking.
> > >
> >
> > Well, I know multiple people mentioned that issue. And I certainly agree
> > just documenting the risk would not be an ideal solution. Requiring the
> > functions to be labeled helps, but we've seen people marking volatile
> > functions as immutable in order to allow indexing, so we'll have to
> > document the risks anyway.
> >
> > All I'm saying is that allowing built-in functions/operators but not
> > user-defined variants seems like an annoying break of extensibility.
> > People are used that user-defined stuff can be used just like built-in
> > functions and operators.
> >
>
> I agree with you that allowing UDFs in some way would be good for this
> feature. I think once we get the base feature committed then we can
> discuss whether and how to allow UDFs. Do we want to have an
> additional label for it or can we come up with something which allows
> the user to continue replication even if she has dropped the object
> used in the function? It seems like we can limit the scope of base
> patch functionality to allow the use of immutable built-in functions
> in row filter expressions.
>

OK, immutable system functions are now allowed in v34 [1]

--
[1] 
https://www.postgresql.org/message-id/CAHut%2BPvWk4w%2BNEAqB32YkQa75tSkXi50cq6suV9f3fASn5C9NA%40mail.gmail.com

Kind Regards,
Peter Smith
Fujitsu Australia




Re: row filtering for logical replication

2021-10-25 Thread Peter Smith
On Thu, Sep 23, 2021 at 10:33 PM Tomas Vondra
 wrote:
>
> 7) exprstate_list
>
> I'd just call the field / variable "exprstates", without indicating the
> data type. I don't think we do that anywhere.

Fixed in v34. [1]

>
>
> 8) RfCol
>
> Do we actually need this struct? Why not to track just name or attnum,
> and lookup the other value in syscache when needed?
>

Fixed in v34. [1]

>
> 9)  rowfilter_expr_checker
>
> * Walk the parse-tree to decide if the row-filter is valid or not.
>
> I don't see any clear explanation what does "valid" mean.
>

Updated comment in v34. [1]

--
[1] 
https://www.postgresql.org/message-id/CAHut%2BPvWk4w%2BNEAqB32YkQa75tSkXi50cq6suV9f3fASn5C9NA%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia.




Re: pgsql: Document XLOG_INCLUDE_XID a little better

2021-10-25 Thread Amit Kapila
On Mon, Oct 25, 2021 at 4:21 PM Amit Kapila  wrote:
>
> On Thu, Oct 21, 2021 at 11:20 AM Dilip Kumar  wrote:
> >
> > On Thu, Oct 21, 2021 at 9:11 AM Amit Kapila  wrote:
> > >
> >
> > v5-0001, incorporates all the comment fixes suggested by Alvaro.  and
> > 0001 is an additional patch which moves
> > MarkCurrentTransactionIdLoggedIfAny(), out of the critical section.
> >
>
> Thanks, both your patches look good to me except that we need to
> remove the sentence related to the revert of ade24dab97 from the
> commit message. I think we should backpatch the first patch to 14
> where it was introduced and commit the second patch (related to moving
> code out of critical section) only to HEAD but we can even backpatch
> the second one till 9.6 for the sake of consistency. What do you guys
> think?
>

The other option could be to just commit both these patches in HEAD as
there is no correctness issue here.

-- 
With Regards,
Amit Kapila.




Re: XTS cipher mode for cluster file encryption

2021-10-25 Thread Sasasu

On 2021/10/26 04:32, Yura Sokolov wrote:

And among others Adiantum looks best: it is fast even without hardware
acceleration,


No, AES is fast on modern high-end hardware.

on X86 AMD 3700X
type  1024 bytes  8192 bytes   16384 bytes
aes-128-ctr   8963982.50k 11124613.88k 11509149.42k
aes-128-gcm   3978860.44k 4669417.10k  4732070.64k
aes-128-xts   7776628.39k 9073664.63k  9264617.74k
chacha20-poly1305 2043729.73k 2131296.36k  2141002.10k

on ARM RK3399, A53 middle-end with AES-NI
type  1024 bytes   8192 bytes   16384 bytes
aes-128-ctr   1663857.66k  1860930.22k  1872991.57k
aes-128-xts   685086.38k   712906.07k   716073.64k
aes-128-gcm   985578.84k   1054818.30k  1056768.00k
chacha20-poly1305 309012.82k   318889.98k   319711.91k

I think the baseline is the speed when using read(2) syscall on 
/dev/zero (which is 3.6GiB/s, on ARM is 980MiB/s)
chacha is fast on the low-end arm, but I haven't seen any HTTPS sites 
using chacha, including Cloudflare and Google.


On 2021/10/26 04:32, Yura Sokolov wrote:
>> That sounds like a great thing to think about adding ... after we get
>> something in that's based on XTS.
> Why? I see no points to do it after. Why not XTS after Adiantum?
>
> Ok, I see one: XTS is standartized.
:>
PostgreSQL even not discuss single-table key rotation or remote KMS.
I think it's too hard to use an encryption algorithm which openssl 
doesn't implement.


OpenPGP_0x4E72AF09097DAE2E.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature


Re: pgsql: Remove unused wait events.

2021-10-25 Thread Amit Kapila
On Tue, Oct 26, 2021 at 6:50 AM Michael Paquier  wrote:
>
> On Mon, Oct 25, 2021 at 01:18:26PM -0400, Tom Lane wrote:
> > Robert Haas  writes:
> >> ... But while I agree it's good to remove unused stuff in the
> >> master, it doesn't seem like we really need to back-patch it.
> >
> > Yeah, exactly.  I don't see any benefit that's commensurate with
> > even a small risk of breaking extensions --- and apparently, in
> > this case that's not a risk but a certainty.
>
> +1.
>

I agree with the points raised here and will revert this for v14.

-- 
With Regards,
Amit Kapila.




Re: prevent immature WAL streaming

2021-10-25 Thread Kyotaro Horiguchi
At Mon, 25 Oct 2021 10:34:27 +0530, Amul Sul  wrote in 
> On Mon, Oct 25, 2021 at 7:02 AM Kyotaro Horiguchi
>  wrote:
> >
> > At Fri, 22 Oct 2021 18:43:52 +0530, Amul Sul  wrote in
> > > Any thoughts about the patch posted previously?
> >
> > Honestly, xlogreader looks fine with the current shape. The reason is
> > that it seems cleaner as an interface boundary since the caller of
> > xlogreader doesn't need to know about the details of xlogreader. The
> > current code nicely hides the end+1 confusion.
> >
> > Even if we want to get rid of global variables in xlog.c, I don't
> > understand why we remove only abortedRecPtr. That change makes things
> > more complex as a whole by letting xlog.c be more conscious of
> > xlogreader's internals.  I'm not sure I like that aspect of the patch.
> >
> 
> Because we have other ways to get abortedRecPtr without having a
> global variable, but we don't have such a way for missingContrecPtr,
> AFAICU.

That depends on the reason why you want to get rid of the glboal
variables. Since we restart WAL reading before reading the two
variables so we can not rely on the xlogreader's corresponding
members. So we need another set of variables to preserve the values
beyond the restart.

> I agree using global variables makes things a bit easier, but those
> are inefficient when you want to share those with other processes --
> that would add extra burden to shared memory.

We could simply add a new member in XLogCtlData. Or we can create
another struct for ReadRecord's (not XLogReader's) state then allocate
shared memory to it.  I don't think it is the right solution to infer
it from another variable using knowledge of xlogreader's internals.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Predefined role pg_maintenance for VACUUM, ANALYZE, CHECKPOINT.

2021-10-25 Thread Jeff Davis
On Tue, 2021-10-26 at 00:07 +, Bossart, Nathan wrote:
> It feels a bit excessive to introduce a new predefined role just for
> this.  Perhaps this could be accomplished with a new function that
> could be granted.

It would be nice if the syntax could be used, since it's pretty
widespread. I guess it does feel excessive to have its own predefined
role, but at the same time it's hard to group a command like CHECKPOINT
into a category. Maybe if we named it something like pg_performance or
something we could make a larger group?

Regards,
Jeff Davis






Re: Spelling change in LLVM 14 API

2021-10-25 Thread Thomas Munro
On Tue, Oct 26, 2021 at 1:52 PM Andres Freund  wrote:
> On 2021-10-26 13:39:53 +1300, Thomas Munro wrote:
> > According to my crystal ball, seawasp will shortly break again[1][2]
> > and the attached patch will fix it.
>
> That feels lot more convincing though. Based on past experience it's not hard
> to believe that the compile-time impact of the include the use of std::string
> forces into a lot of places is pretty significant...
>
> Could we make old case a wrapper around the new case that just passes on the
> error as a const char *? That should be more maintainable long-term, because
> there's only one copy of the body?

Here's one like that.  The previous message "Track LLVM 14 API
changes" didn't seem too scalable so I added date and commit ID.
From 575ea753e29424cb1b8ee4fc399284e5b6930a4a Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Tue, 26 Oct 2021 14:03:59 +1300
Subject: [PATCH] Track LLVM 14 API changes, 2021-10-25.

Tested with LLVM 13 and LLVM 14 at commit dbab339ea44e.

Reviewed-by: Andres Freund 
Discussion: https://postgr.es/m/CA%2BhUKG%2B3Ac3He9_SpJcxeiiVknbcES1tbZEkH9sRBdJFGj8K5Q%40mail.gmail.com
---
 src/backend/jit/llvm/llvmjit_error.cpp | 34 ++
 1 file changed, 29 insertions(+), 5 deletions(-)

diff --git a/src/backend/jit/llvm/llvmjit_error.cpp b/src/backend/jit/llvm/llvmjit_error.cpp
index daefb3e1fd..bfa19468a6 100644
--- a/src/backend/jit/llvm/llvmjit_error.cpp
+++ b/src/backend/jit/llvm/llvmjit_error.cpp
@@ -29,9 +29,15 @@ static std::new_handler old_new_handler = NULL;
 
 static void fatal_system_new_handler(void);
 #if LLVM_VERSION_MAJOR > 4
+static void fatal_llvm_new_handler(void *user_data, const char *reason, bool gen_crash_diag);
+#if LLVM_VERSION_MAJOR < 14
 static void fatal_llvm_new_handler(void *user_data, const std::string& reason, bool gen_crash_diag);
 #endif
+#endif
+static void fatal_llvm_error_handler(void *user_data, const char *reason, bool gen_crash_diag);
+#if LLVM_VERSION_MAJOR < 14
 static void fatal_llvm_error_handler(void *user_data, const std::string& reason, bool gen_crash_diag);
+#endif
 
 
 /*
@@ -129,23 +135,41 @@ fatal_system_new_handler(void)
 #if LLVM_VERSION_MAJOR > 4
 static void
 fatal_llvm_new_handler(void *user_data,
-	   const std::string& reason,
+	   const char *reason,
 	   bool gen_crash_diag)
 {
 	ereport(FATAL,
 			(errcode(ERRCODE_OUT_OF_MEMORY),
 			 errmsg("out of memory"),
-			 errdetail("While in LLVM: %s", reason.c_str(;
+			 errdetail("While in LLVM: %s", reason)));
+}
+#if LLVM_VERSION_MAJOR < 14
+static void
+fatal_llvm_new_handler(void *user_data,
+	   const std::string& reason,
+	   bool gen_crash_diag)
+{
+	fatal_llvm_new_handler(user_data, reason.c_str(), gen_crash_diag);
 }
 #endif
+#endif
 
 static void
 fatal_llvm_error_handler(void *user_data,
-		 const std::string& reason,
+		 const char *reason,
 		 bool gen_crash_diag)
 {
 	ereport(FATAL,
 			(errcode(ERRCODE_OUT_OF_MEMORY),
-			 errmsg("fatal llvm error: %s",
-	reason.c_str(;
+			 errmsg("fatal llvm error: %s", reason)));
 }
+
+#if LLVM_VERSION_MAJOR < 14
+static void
+fatal_llvm_error_handler(void *user_data,
+		 const std::string& reason,
+		 bool gen_crash_diag)
+{
+	fatal_llvm_error_handler(user_data, reason.c_str(), gen_crash_diag);
+}
+#endif
-- 
2.30.2



Re: pgsql: Remove unused wait events.

2021-10-25 Thread Michael Paquier
On Mon, Oct 25, 2021 at 01:18:26PM -0400, Tom Lane wrote:
> Robert Haas  writes:
>> ... But while I agree it's good to remove unused stuff in the
>> master, it doesn't seem like we really need to back-patch it.
> 
> Yeah, exactly.  I don't see any benefit that's commensurate with
> even a small risk of breaking extensions --- and apparently, in
> this case that's not a risk but a certainty.

+1.
--
Michael


signature.asc
Description: PGP signature


Re: pg_receivewal starting position

2021-10-25 Thread Michael Paquier
On Mon, Oct 25, 2021 at 05:46:57PM +0530, Bharath Rupireddy wrote:
> StreamLog() isn't reached for create and drop slot cases, see [1]. I
> suggest to remove replication_slot != NULL and have Assert(slot_name)
> in GetSlotInformation():
> /*
>  * Try to get the starting point from the slot.  This is supported in
>  * PostgreSQL 15 and up.
>  */
> if (PQserverVersion(conn) >= 15)
> {
> if (!GetSlotInformation(conn, replication_slot, ,
> ))
> {
> /* Error is logged by GetSlotInformation() */
> return;
> }
> }

Please note that it is possible to use pg_receivewal without a slot,
which is the default case, so we cannot do what you are suggesting
here.  An assertion on slot_name in GetSlotInformation() is not that
helpful either in my opinion, as we would just crash a couple of lines
down the road.

I have changed the patch per Ronan's suggestion to have the version
check out of GetSlotInformation(), addressed what you have reported,
and the result looked good.  So I have applied this part.

What remains on this thread is the addition of new tests to make sure
that pg_receivewal is able to follow a timeline switch.  Now that we
can restart from a slot that should be a bit easier to implemented as
a test by creating a slot on a standby.  Ronan, are you planning to
send a new patch for this part?
--
Michael


signature.asc
Description: PGP signature


Re: Spelling change in LLVM 14 API

2021-10-25 Thread Andres Freund
Hi,

On 2021-10-26 13:39:53 +1300, Thomas Munro wrote:
> On Mon, Sep 27, 2021 at 10:54 AM Thomas Munro  wrote:
> > And pushed.  Probably won't be the last change and seawasp only tests
> > master, so no back-patch for now.
> 
> According to my crystal ball, seawasp will shortly break again[1][2]
> and the attached patch will fix it.

That feels lot more convincing though. Based on past experience it's not hard
to believe that the compile-time impact of the include the use of std::string
forces into a lot of places is pretty significant...

Could we make old case a wrapper around the new case that just passes on the
error as a const char *? That should be more maintainable long-term, because
there's only one copy of the body?

Greetings,

Andres Freund




Re: Spelling change in LLVM 14 API

2021-10-25 Thread Thomas Munro
On Mon, Sep 27, 2021 at 10:54 AM Thomas Munro  wrote:
> And pushed.  Probably won't be the last change and seawasp only tests
> master, so no back-patch for now.

According to my crystal ball, seawasp will shortly break again[1][2]
and the attached patch will fix it.

[1] 
https://github.com/llvm/llvm-project/commit/f6fa95b77f33c3690e4201e505cb8dce1433abd9
[2] 
https://github.com/llvm/llvm-project/commit/e463b69736da8b0a950ecd937cf990401bdfcdeb
diff --git a/src/backend/jit/llvm/llvmjit_error.cpp b/src/backend/jit/llvm/llvmjit_error.cpp
index daefb3e1fd..ee9ff5e571 100644
--- a/src/backend/jit/llvm/llvmjit_error.cpp
+++ b/src/backend/jit/llvm/llvmjit_error.cpp
@@ -28,10 +28,15 @@ static int fatal_new_handler_depth = 0;
 static std::new_handler old_new_handler = NULL;
 
 static void fatal_system_new_handler(void);
+#if LLVM_VERSION_MAJOR >= 14
+static void fatal_llvm_new_handler(void *user_data, const char *reason, bool gen_crash_diag);
+static void fatal_llvm_error_handler(void *user_data, const char *reason, bool gen_crash_diag);
+#else
 #if LLVM_VERSION_MAJOR > 4
 static void fatal_llvm_new_handler(void *user_data, const std::string& reason, bool gen_crash_diag);
 #endif
 static void fatal_llvm_error_handler(void *user_data, const std::string& reason, bool gen_crash_diag);
+#endif
 
 
 /*
@@ -126,7 +131,18 @@ fatal_system_new_handler(void)
 			 errdetail("while in LLVM")));
 }
 
-#if LLVM_VERSION_MAJOR > 4
+#if LLVM_VERSION_MAJOR >= 14
+static void
+fatal_llvm_new_handler(void *user_data,
+	   const char *reason,
+	   bool gen_crash_diag)
+{
+	ereport(FATAL,
+			(errcode(ERRCODE_OUT_OF_MEMORY),
+			 errmsg("out of memory"),
+			 errdetail("While in LLVM: %s", reason)));
+}
+#elif LLVM_VERSION_MAJOR > 4
 static void
 fatal_llvm_new_handler(void *user_data,
 	   const std::string& reason,
@@ -139,6 +155,18 @@ fatal_llvm_new_handler(void *user_data,
 }
 #endif
 
+#if LLVM_VERSION_MAJOR >= 14
+static void
+fatal_llvm_error_handler(void *user_data,
+		 const char *reason,
+		 bool gen_crash_diag)
+{
+	ereport(FATAL,
+			(errcode(ERRCODE_OUT_OF_MEMORY),
+			 errmsg("fatal llvm error: %s",
+	reason)));
+}
+#else
 static void
 fatal_llvm_error_handler(void *user_data,
 		 const std::string& reason,
@@ -149,3 +177,4 @@ fatal_llvm_error_handler(void *user_data,
 			 errmsg("fatal llvm error: %s",
 	reason.c_str(;
 }
+#endif


Re: Predefined role pg_maintenance for VACUUM, ANALYZE, CHECKPOINT.

2021-10-25 Thread Bossart, Nathan
On 10/25/21, 4:40 PM, "Jeff Davis"  wrote:
> On Mon, 2021-10-25 at 17:55 -0300, Alvaro Herrera wrote:
>> Maybe you just need pg_checkpointer.
>
> Fair enough. Attached simpler patch that only covers checkpoint, and
> calls the role pg_checkpointer.

It feels a bit excessive to introduce a new predefined role just for
this.  Perhaps this could be accomplished with a new function that
could be granted.

Nathan



Re: Allow pg_signal_backend members to use pg_log_backend_memory_stats().

2021-10-25 Thread Bossart, Nathan
On 10/25/21, 4:29 PM, "Jeff Davis"  wrote:
> On Mon, 2021-10-25 at 14:30 -0700, Andres Freund wrote:
>> I don't get the reasoning behind the "except ..." logic. What does
>> this
>> actually protect against? A reasonable use case for this feature is
>> is to
>> monitor memory usage of all backends, and this restriction practially
>> requires
>> to still use a security definer function.
>
> Nathan brought it up -- more as a question than a request, so perhaps
> it's not necessary. I don't have a strong opinion about it, but I
> included it to be conservative (easier to relax a privilege than to
> tighten one).

I asked about it since we were going to grant execution to
pg_signal_backend, which (per the docs) shouldn't be able to signal a
superuser-owned backend.  I don't mind removing this now that the
pg_signal_backend part is removed.

Nathan



Re: Predefined role pg_maintenance for VACUUM, ANALYZE, CHECKPOINT.

2021-10-25 Thread Jeff Davis
On Mon, 2021-10-25 at 17:55 -0300, Alvaro Herrera wrote:
> Maybe you just need pg_checkpointer.

Fair enough. Attached simpler patch that only covers checkpoint, and
calls the role pg_checkpointer.

Regards,
Jeff Davis

diff --git a/doc/src/sgml/ref/checkpoint.sgml b/doc/src/sgml/ref/checkpoint.sgml
index 2afee6d7b59..aebe8f8cd77 100644
--- a/doc/src/sgml/ref/checkpoint.sgml
+++ b/doc/src/sgml/ref/checkpoint.sgml
@@ -52,7 +52,8 @@ CHECKPOINT
   
 
   
-   Only superusers can call CHECKPOINT.
+   Only superusers or members of the pg_checkpointer role
+   can call CHECKPOINT.
   
  
 
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index bf085aa93b2..0ff832a62c2 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -24,6 +24,7 @@
 #include "catalog/catalog.h"
 #include "catalog/index.h"
 #include "catalog/namespace.h"
+#include "catalog/pg_authid.h"
 #include "catalog/pg_inherits.h"
 #include "catalog/toasting.h"
 #include "commands/alter.h"
@@ -939,10 +940,10 @@ standard_ProcessUtility(PlannedStmt *pstmt,
 			break;
 
 		case T_CheckPointStmt:
-			if (!superuser())
+			if (!has_privs_of_role(GetUserId(), ROLE_PG_CHECKPOINTER))
 ereport(ERROR,
 		(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
-		 errmsg("must be superuser to do CHECKPOINT")));
+		 errmsg("must be member of pg_checkpointer to do CHECKPOINT")));
 
 			RequestCheckpoint(CHECKPOINT_IMMEDIATE | CHECKPOINT_WAIT |
 			  (RecoveryInProgress() ? 0 : CHECKPOINT_FORCE));
diff --git a/src/include/catalog/catversion.h b/src/include/catalog/catversion.h
index 3253b8751b1..1bc331a9a39 100644
--- a/src/include/catalog/catversion.h
+++ b/src/include/catalog/catversion.h
@@ -53,6 +53,6 @@
  */
 
 /*			mmddN */
-#define CATALOG_VERSION_NO	202109101
+#define CATALOG_VERSION_NO	202110231
 
 #endif
diff --git a/src/include/catalog/pg_authid.dat b/src/include/catalog/pg_authid.dat
index 3da68016b61..9c65174f3c6 100644
--- a/src/include/catalog/pg_authid.dat
+++ b/src/include/catalog/pg_authid.dat
@@ -79,5 +79,10 @@
   rolcreaterole => 'f', rolcreatedb => 'f', rolcanlogin => 'f',
   rolreplication => 'f', rolbypassrls => 'f', rolconnlimit => '-1',
   rolpassword => '_null_', rolvaliduntil => '_null_' },
+{ oid => '4544', oid_symbol => 'ROLE_PG_CHECKPOINTER',
+  rolname => 'pg_checkpointer', rolsuper => 'f', rolinherit => 't',
+  rolcreaterole => 'f', rolcreatedb => 'f', rolcanlogin => 'f',
+  rolreplication => 'f', rolbypassrls => 'f', rolconnlimit => '-1',
+  rolpassword => '_null_', rolvaliduntil => '_null_' },
 
 ]


Re: Allow pg_signal_backend members to use pg_log_backend_memory_stats().

2021-10-25 Thread Jeff Davis
On Mon, 2021-10-25 at 14:30 -0700, Andres Freund wrote:
> I don't get the reasoning behind the "except ..." logic. What does
> this
> actually protect against? A reasonable use case for this feature is
> is to
> monitor memory usage of all backends, and this restriction practially
> requires
> to still use a security definer function.

Nathan brought it up -- more as a question than a request, so perhaps
it's not necessary. I don't have a strong opinion about it, but I
included it to be conservative (easier to relax a privilege than to
tighten one).

I can cut out the in-function check entirely if there's no objection.

Regards,
Jeff Davis

[1] https://postgr.es/m/33f34f0c-bb16-48de-b125-95d340a54...@amazon.com





Re: pg_dump versus ancient server versions

2021-10-25 Thread Justin Pryzby
On Mon, Oct 25, 2021 at 11:38:51AM -0400, Tom Lane wrote:
> Andrew Dunstan  writes:
> > On 10/25/21 10:23, Tom Lane wrote:
> >> (Hmmm ... but disk space could
> >> become a problem, particularly on older machines with not so much
> >> disk.  Do we really need to maintain a separate checkout for each
> >> branch?  It seems like a fresh checkout from the repo would be
> >> little more expensive than the current copy-a-checkout process.)
> 
> > If you set it up with these settings then the disk space used is minimal:
> >      git_use_workdirs => 1,
> >      rm_worktrees => 1,
> 
> Maybe we should make those the defaults?  AFAICS the current
> default setup uses circa 200MB per back branch, even between runs.
> I'm not sure what that is buying us.

Maybe git's shared/"alternates" would be helpful to minimize the size of
.git/objects?

I'm not sure - it looks like the BF client does its own stuff with symlinks.
Is that for compatibility with old git ?
https://github.com/PGBuildFarm/client-code/blob/main/PGBuild/SCM.pm

If you "clone" a local location, it uses hard links by default.
If you use --shared or --reference, then it uses references to the configured
"alternates", if any.

In both cases, .git/objects requires no additional space (but the "checked out"
copy still takes up however much space).

$ mkdir tmp
$ git clone --quiet ./postgresql tmp/pg
$ du -sh tmp/pg
492Mtmp/pg

$ rm -fr tmp/pg
$ git clone --quiet --shared ./postgresql tmp/pg
$ du -sh tmp/pg
124Mtmp/pg

-- 
Justin




Re: pg_dump versus ancient server versions

2021-10-25 Thread Tom Lane
Anyway, to get back to the original point ...

No one has spoken against moving up the cutoff for pg_dump support,
so I did a very quick pass to see how much code could be removed.
The answer is right about 1000 lines, counting both pg_dump and
pg_upgrade, so it seems like it's worth doing independently of the
unnest() issue.

The attached is just draft-quality, because I don't really want
to pursue the point until after committing the pg_dump changes
being discussed in the other thread.  If I push this first it'll
break a lot of those patches.  (Admittedly, pushing those first
will break this one, but this one is a lot easier to re-do.)

BTW, while looking at pg_upgrade I chanced to notice
check_for_isn_and_int8_passing_mismatch(), which seems like it's
not well thought out at all.  It's right that contrib/isn will
not upgrade nicely if the target cluster has a different
float8_pass_by_value setting from the source.  What's wrong is
the assumption that no other extension has the same issue.
We invented and publicized the "LIKE type" option for CREATE TYPE
precisely so that people could build types that act just like isn,
so it seems pretty foolish to imagine that no one has done so.

I think we should nuke check_for_isn_and_int8_passing_mismatch()
and just refuse to upgrade if float8_pass_by_value differs, full stop.
I can see little practical need to allow that case.

regards, tom lane

diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml
index 7682226b99..6e95148d11 100644
--- a/doc/src/sgml/ref/pg_dump.sgml
+++ b/doc/src/sgml/ref/pg_dump.sgml
@@ -1381,7 +1381,7 @@ CREATE DATABASE foo WITH TEMPLATE template0;
PostgreSQL server versions newer than
pg_dump's version.  pg_dump can also
dump from PostgreSQL servers older than its own version.
-   (Currently, servers back to version 8.0 are supported.)
+   (Currently, servers back to version 9.0 are supported.)
However, pg_dump cannot dump from
PostgreSQL servers newer than its own major version;
it will refuse to even try, rather than risk making an invalid dump.
diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
index 20efdd771f..b15fbb44f1 100644
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@@ -68,7 +68,7 @@ PostgreSQL documentation
  
 
   
-   pg_upgrade supports upgrades from 8.4.X and later to the current
+   pg_upgrade supports upgrades from 9.0.X and later to the current
major release of PostgreSQL, including snapshot and beta releases.
   
  
diff --git a/src/bin/pg_dump/dumputils.c b/src/bin/pg_dump/dumputils.c
index ea67e52a3f..e4b04fd491 100644
--- a/src/bin/pg_dump/dumputils.c
+++ b/src/bin/pg_dump/dumputils.c
@@ -183,10 +183,6 @@ buildACLCommands(const char *name, const char *subname, const char *nspname,
 appendPQExpBuffer(firstsql, "%s FROM ", name);
 if (grantee->len == 0)
 	appendPQExpBufferStr(firstsql, "PUBLIC;\n");
-else if (strncmp(grantee->data, "group ",
- strlen("group ")) == 0)
-	appendPQExpBuffer(firstsql, "GROUP %s;\n",
-	  fmtId(grantee->data + strlen("group ")));
 else
 	appendPQExpBuffer(firstsql, "%s;\n",
 	  fmtId(grantee->data));
@@ -194,20 +190,6 @@ buildACLCommands(const char *name, const char *subname, const char *nspname,
 		}
 	}
 
-	/*
-	 * We still need some hacking though to cover the case where new default
-	 * public privileges are added in new versions: the REVOKE ALL will revoke
-	 * them, leading to behavior different from what the old version had,
-	 * which is generally not what's wanted.  So add back default privs if the
-	 * source database is too old to have had that particular priv.
-	 */
-	if (remoteVersion < 80200 && strcmp(type, "DATABASE") == 0)
-	{
-		/* database CONNECT priv didn't exist before 8.2 */
-		appendPQExpBuffer(firstsql, "%sGRANT CONNECT ON %s %s TO PUBLIC;\n",
-		  prefix, type, name);
-	}
-
 	/* Scan individual ACL items */
 	for (i = 0; i < naclitems; i++)
 	{
@@ -300,10 +282,6 @@ buildACLCommands(const char *name, const char *subname, const char *nspname,
 	appendPQExpBuffer(secondsql, "%s TO ", name);
 	if (grantee->len == 0)
 		appendPQExpBufferStr(secondsql, "PUBLIC;\n");
-	else if (strncmp(grantee->data, "group ",
-	 strlen("group ")) == 0)
-		appendPQExpBuffer(secondsql, "GROUP %s;\n",
-		  fmtId(grantee->data + strlen("group ")));
 	else
 		appendPQExpBuffer(secondsql, "%s;\n", fmtId(grantee->data));
 }
@@ -316,10 +294,6 @@ buildACLCommands(const char *name, const char *subname, const char *nspname,
 	appendPQExpBuffer(secondsql, "%s TO ", name);
 	if (grantee->len == 0)
 		appendPQExpBufferStr(secondsql, "PUBLIC");
-	else if (strncmp(grantee->data, "group ",
-	 strlen("group ")) == 0)
-		appendPQExpBuffer(secondsql, "GROUP %s",
-		  fmtId(grantee->data + strlen("group ")));
 	else
 		

Re: Refactoring: join MakeSingleTupleTableSlot() and MakeTupleTableSlot()

2021-10-25 Thread Alvaro Herrera
On 2021-Oct-22, Aleksander Alekseev wrote:

> Hi hackers,
> 
> During the discussion [1] it was discovered that we have two
> procedures in execTuples.c that do the same thing:
> 
> * MakeSingleTupleTableSlot()
> * MakeTupleTableSlot()
> 
> In fact, MakeSingleTupleTableSlot() is simply a wrapper for
> MakeTupleTableSlot().

Did you see the arguments at [1]?

[1] https://www.postgresql.org/message-id/1632520.1613195514%40sss.pgh.pa.us

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/




Re: Assorted improvements in pg_dump

2021-10-25 Thread Andres Freund
Hi,

On 2021-10-25 16:02:34 -0400, Tom Lane wrote:
> So I'd like a better idea, but I'm not sure that that one is better.

I guess we could move the prepared-statement handling into a query execution
helper. That could then use a hashtable or something similar to check if a
certain prepared statement already exists. That'd then centrally be extensible
to deal with multiple connects etc.

Greetings,

Andres Freund




Re: Allow pg_signal_backend members to use pg_log_backend_memory_stats().

2021-10-25 Thread Andres Freund
Hi,

On 2021-10-25 13:42:07 -0700, Jeff Davis wrote:
> Good idea. Attached a patch to remove the superuser check on
> pg_log_backend_memory_contexts(), except in the case when trying to log
> memory contexts of a superuser backend.

I don't get the reasoning behind the "except ..." logic. What does this
actually protect against? A reasonable use case for this feature is is to
monitor memory usage of all backends, and this restriction practially requires
to still use a security definer function.

Greetings,

Andres Freund




Re: Allow pg_signal_backend members to use pg_log_backend_memory_stats().

2021-10-25 Thread Bossart, Nathan
On 10/25/21, 1:43 PM, "Jeff Davis"  wrote:
> On Mon, 2021-10-25 at 16:10 +0900, Michael Paquier wrote:
>> Hmm.  Why don't you split the patch into two parts that can be
>> discussed separately then?  There would be one to remove all the
>> superuser() checks you can think of, and a potential second to grant
>> those function's execution to some system role.
>
> Good idea. Attached a patch to remove the superuser check on
> pg_log_backend_memory_contexts(), except in the case when trying to log
> memory contexts of a superuser backend.

LGTM.

Nathan



Re: Predefined role pg_maintenance for VACUUM, ANALYZE, CHECKPOINT.

2021-10-25 Thread Mark Dilger



> On Oct 24, 2021, at 7:49 AM, Bharath Rupireddy 
>  wrote:
> 
> At this point, the idea of having a new role for maintenance work
> looks good. With this patch and Mark Dilger's patch introducing a
> bunch of new predefined roles, one concern is that we might reach to a
> state where we will have patches being proposed for new predefined
> roles for every database activity and the superuser eventually will
> have nothing to do in the database, it just becomes dummy?
> 
> I'm not sure if Mark Dilger's patch on new predefined roles has a
> suitable/same role that we can use here.

If you refer to the ALTER SYSTEM SET patches, which I agree introduce a number 
of new predefined roles, it may interest you that Andrew has requested that I 
rework that patch set.  In particular, he would like me to implement a new 
system of grants whereby the authority to ALTER SYSTEM SET can be granted per 
GUC rather than having predefined roles which hardcoded privileges.

I have not withdrawn the ALTER SYSTEM SET patches yet, as I don't know if 
Andrew's proposal can be made to work, but I wouldn't recommend tying this 
pg_maintenance idea to that set.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: Predefined role pg_maintenance for VACUUM, ANALYZE, CHECKPOINT.

2021-10-25 Thread Alvaro Herrera
On 2021-Oct-25, Jeff Davis wrote:

> But CHECKPOINT right now has an explicit superuser check, and it would
> be nice to be able to avoid that.
> 
> It's pretty normal to issue a CHECKPOINT right after a data load and
> before running a performance test, right? Shouldn't there be some way
> to do that without superuser?

Maybe you just need pg_checkpointer.

-- 
Álvaro Herrera  Valdivia, Chile  —  https://www.EnterpriseDB.com/




Re: Predefined role pg_maintenance for VACUUM, ANALYZE, CHECKPOINT.

2021-10-25 Thread Jeff Davis
On Mon, 2021-10-25 at 13:54 -0400, Stephen Frost wrote:
> Let's not forget that there are already existing non-superusers who
> can
> run things like REFRESH MATERIALIZED VIEW- the owner.

Right, that's one reason why I don't see a particular use case there.

But CHECKPOINT right now has an explicit superuser check, and it would
be nice to be able to avoid that.

It's pretty normal to issue a CHECKPOINT right after a data load and
before running a performance test, right? Shouldn't there be some way
to do that without superuser?

Regards,
Jeff Davis






Re: Assorted improvements in pg_dump

2021-10-25 Thread Alvaro Herrera
On 2021-Oct-25, Tom Lane wrote:

> Yeah, I wasn't too happy with the static bools either.  However, each
> function would need its own field in the struct, which seems like a
> maintenance annoyance, plus a big hazard for future copy-and-paste
> changes (ie, copy and paste the wrong flag name -> trouble).  Also
> the Archive struct is shared between dump and restore cases, so
> adding a dozen fields that are irrelevant for restore didn't feel
> right.  So I'd like a better idea, but I'm not sure that that one
> is better.

What about a separate struct passed from pg_dump's main() to the
functions that execute queries, containing a bunch of bools?  This'd
still have the problem that mindless copy and paste would cause a bug,
but I wonder if that isn't overstated: if you use the wrong flag,
pg_dump would fail as soon as you try to invoke your query when it
hasn't been prepared yet.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"I'm impressed how quickly you are fixing this obscure issue. I came from 
MS SQL and it would be hard for me to put into words how much of a better job
you all are doing on [PostgreSQL]."
 Steve Midgley, http://archives.postgresql.org/pgsql-sql/2008-08/msg0.php




Re: Allow pg_signal_backend members to use pg_log_backend_memory_stats().

2021-10-25 Thread Jeff Davis
On Mon, 2021-10-25 at 16:10 +0900, Michael Paquier wrote:
> Hmm.  Why don't you split the patch into two parts that can be
> discussed separately then?  There would be one to remove all the
> superuser() checks you can think of, and a potential second to grant 
> those function's execution to some system role.

Good idea. Attached a patch to remove the superuser check on
pg_log_backend_memory_contexts(), except in the case when trying to log
memory contexts of a superuser backend.

Using pg_signal_backend does not seem to be universally acceptable, so
I'll just drop the idea of granting to that predefined role.

Regards,
Jeff Davis

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 5677032cb28..b7003fd37a3 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -25332,7 +25332,8 @@ SELECT collation for ('foo' COLLATE "de_DE");
 (See  for more information),
 but will not be sent to the client regardless of
 .
-Only superusers can request to log the memory contexts.
+Only superusers can request to log the memory contexts of superuser
+backends.

   
 
diff --git a/src/backend/catalog/system_functions.sql b/src/backend/catalog/system_functions.sql
index a416e94d371..54c93b16c4c 100644
--- a/src/backend/catalog/system_functions.sql
+++ b/src/backend/catalog/system_functions.sql
@@ -699,6 +699,8 @@ REVOKE EXECUTE ON FUNCTION pg_ls_dir(text) FROM public;
 
 REVOKE EXECUTE ON FUNCTION pg_ls_dir(text,boolean,boolean) FROM public;
 
+REVOKE EXECUTE ON FUNCTION pg_log_backend_memory_contexts(integer) FROM PUBLIC;
+
 --
 -- We also set up some things as accessible to standard roles.
 --
diff --git a/src/backend/utils/adt/mcxtfuncs.c b/src/backend/utils/adt/mcxtfuncs.c
index 0d52613bc32..38ca0ddee73 100644
--- a/src/backend/utils/adt/mcxtfuncs.c
+++ b/src/backend/utils/adt/mcxtfuncs.c
@@ -162,10 +162,11 @@ pg_get_backend_memory_contexts(PG_FUNCTION_ARGS)
  * pg_log_backend_memory_contexts
  *		Signal a backend process to log its memory contexts.
  *
- * Only superusers are allowed to signal to log the memory contexts
- * because allowing any users to issue this request at an unbounded
- * rate would cause lots of log messages and which can lead to
- * denial of service.
+ * By default, only superusers are allowed to signal to log the memory
+ * contexts because allowing any users to issue this request at an unbounded
+ * rate would cause lots of log messages and which can lead to denial of
+ * service. Additional roles can be permitted with GRANT, but they will still
+ * be prevented from logging the memory contexts of a superuser backend.
  *
  * On receipt of this signal, a backend sets the flag in the signal
  * handler, which causes the next CHECK_FOR_INTERRUPTS() to log the
@@ -177,12 +178,6 @@ pg_log_backend_memory_contexts(PG_FUNCTION_ARGS)
 	int			pid = PG_GETARG_INT32(0);
 	PGPROC	   *proc;
 
-	/* Only allow superusers to log memory contexts. */
-	if (!superuser())
-		ereport(ERROR,
-(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
- errmsg("must be a superuser to log memory contexts")));
-
 	proc = BackendPidGetProc(pid);
 
 	/*
@@ -205,6 +200,12 @@ pg_log_backend_memory_contexts(PG_FUNCTION_ARGS)
 		PG_RETURN_BOOL(false);
 	}
 
+	/* Only allow superusers to signal superuser-owned backends. */
+	if (superuser_arg(proc->roleId) && !superuser())
+		ereport(ERROR,
+(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+ errmsg("must be a superuser to log memory contexts of superuser backend")));
+
 	if (SendProcSignal(pid, PROCSIG_LOG_MEMORY_CONTEXT, proc->backendId) < 0)
 	{
 		/* Again, just a warning to allow loops */
diff --git a/src/include/catalog/catversion.h b/src/include/catalog/catversion.h
index 3253b8751b1..039f6338604 100644
--- a/src/include/catalog/catversion.h
+++ b/src/include/catalog/catversion.h
@@ -53,6 +53,6 @@
  */
 
 /*			mmddN */
-#define CATALOG_VERSION_NO	202109101
+#define CATALOG_VERSION_NO	202110230
 
 #endif
diff --git a/src/test/regress/expected/misc_functions.out b/src/test/regress/expected/misc_functions.out
index e845042d38d..07c3d3311b1 100644
--- a/src/test/regress/expected/misc_functions.out
+++ b/src/test/regress/expected/misc_functions.out
@@ -138,14 +138,40 @@ HINT:  No function matches the given name and argument types. You might need to
 --
 -- Memory contexts are logged and they are not returned to the function.
 -- Furthermore, their contents can vary depending on the timing. However,
--- we can at least verify that the code doesn't fail.
+-- we can at least verify that the code doesn't fail, and that the
+-- permissions are set properly.
 --
-SELECT * FROM pg_log_backend_memory_contexts(pg_backend_pid());
+SELECT pg_log_backend_memory_contexts(pg_backend_pid());
  pg_log_backend_memory_contexts 
 
  t
 (1 row)
 
+CREATE ROLE regress_log_memory;
+SELECT has_function_privilege('regress_log_memory',
+  

Re: parallelizing the archiver

2021-10-25 Thread Bossart, Nathan
On 10/25/21, 1:29 PM, "Robert Haas"  wrote:
> On Mon, Oct 25, 2021 at 3:45 PM Bossart, Nathan  wrote:
>> Alright, here is an attempt at that.  With this revision, archive
>> libraries are preloaded (and _PG_init() is called), and the archiver
>> is responsible for calling _PG_archive_module_init() to get the
>> callbacks.  I've also removed the GUC check hooks as previously
>> discussed.
>
> I would need to spend more time on this to have a detailed opinion on
> all of it, but I agree that part looks better this way.

Great.  Unless I see additional feedback on the basic design shortly,
I'll give the documentation updates a try.

Nathan



Re: [Patch] ALTER SYSTEM READ ONLY

2021-10-25 Thread Bossart, Nathan
On 10/25/21, 1:33 PM, "Robert Haas"  wrote:
> On Mon, Oct 25, 2021 at 3:14 PM Bossart, Nathan  wrote:
>> My compiler is complaining about oldXLogAllowed possibly being used
>> uninitialized in CreateCheckPoint().  AFAICT it can just be initially
>> set to zero to silence this warning because it will, in fact, be
>> initialized properly when it is used.
>
> Hmm, I guess I could have foreseen that, had I been a little bit
> smarter than I am. I have committed a change to initialize it to 0 as
> you propose.

Thanks!

Nathan



Re: XTS cipher mode for cluster file encryption

2021-10-25 Thread Yura Sokolov
В Пн, 25/10/2021 в 12:12 -0400, Stephen Frost пишет:
> Greetings,
> 
> * Yura Sokolov (y.soko...@postgrespro.ru) wrote:
> > В Чт, 21/10/2021 в 13:28 -0400, Stephen Frost пишет:
> > > I really don't think this is necessary.  Similar to PageSetChecksumCopy
> > > and PageSetChecksumInplace, I'm sure we would have functions which are
> > > called in the appropriate spots to do encryption (such as 'encrypt_page'
> > > and 'encrypt_block' in the Cybertec patch) and folks could review those
> > > in relative isolation to the rest.  Dealing with blocks in PG is already
> > > pretty well handled, the infrastructure that needs to be added is around
> > > handling temporary files and is being actively worked on ... if we could
> > > move past this debate around if we should be adding support for XTS or
> > > if only GCM-SIV would be accepted.
> > > 
> > > .
> > > 
> > > No, the CTR approach isn't great because, as has been discussed quite a
> > > bit already, using the LSN as the IV means that different data might be
> > > re-encrypted with the same LSN and that's not an acceptable thing to
> > > have happen with CTR.
> > > 
> > > .
> > > 
> > > We've discussed at length how using CTR for heap isn't a good idea even
> > > if we're using the LSN for the IV, while if we use XTS then we don't
> > > have the issues that CTR has with IV re-use and using the LSN (plus
> > > block number and perhaps other things).  Nothing in what has been
> > > discussed here has really changed anything there that I can see and so
> > > it's unclear to me why we continue to go round and round with it.
> > > 
> > 
> > Instead of debatting XTS vs GCM-SIV I'd suggest Google's Adiantum [1][2]
> > [3][4].
> 
> That sounds like a great thing to think about adding ... after we get
> something in that's based on XTS.

Why? I see no points to do it after. Why not XTS after Adiantum?

Ok, I see one: XTS is standartized.

> > It is explicitely created to solve large block encryption issue - disk
> > encryption. It is used to encrypt 4kb at whole, but in fact has no
> > (practical) limit on block size: it is near-zero modified to encrypt 1kb
> > or 8kb or 32kb.
> > 
> > It has benefits of both XTS and GCM-SIV:
> > - like GCM-SIV every bit of cipher text depends on every bit of plain text
> > - therefore like GCM-SIV it is resistant to IV reuse: it is safe to reuse
> >   LSN+reloid+blocknumber tuple as IV even for hint-bit changes since every
> >   block's bit will change.
> 
> The advantage of GCM-SIV is that it provides integrity as well as
> confidentiality.

Integrity could be based on simple non-cryptographic checksum, and it could
be checked after decryption. It would be imposible to intentionally change
encrypted page in a way it will pass checksum after decription. 

Currently we have 16bit checksum, and it is very small. But having larger
checksum is orthogonal (ie doesn't bound) to having encryption.

In fact, Adiantum is easily made close to SIV construction:
- just leave last 8/16 bytes zero. If after decription they are zero,
  then integrity check passed.
That is because SIV and Adiantum are very similar in its structure:
- SIV:
-- hash
-- then stream cipher
- Adiantum:
-- hash (except last 16bytes)
-- then encrypt last 16bytes with hash,
-- then stream cipher
-- then hash.
If last N (N>16) bytes is nonce + zero bytes, then "hash, then encrypt last
16bytes with hash" become equivalent to just "hash", and Adiantum became
logical equivalent to SIV.

> > - like XTS it doesn't need to change plain text format and doesn't need in
> >   additional Nonce/Auth Code.
> 
> Sure, in which case it's something that could potentially be added later
> as another option in the future.  I don't think we'll always have just
> one encryption method and it's good to generally think about what it
> might look like to have others but I don't think it makes sense to try
> and get everything in all at once.

And among others Adiantum looks best: it is fast even without hardware
acceleration, it provides whole block encryption (ie every bit depends
on every bit) and it doesn't bound to plain-text format.

> Thanks,
> 
> Stephen

regards,

Yura

PS. Construction beside SIV and Adiantum could be used with XTS as well.
I.e. instead of AES-GCM-SIV it could be AES-XTS-SIV.
And same way AES-XTS could be used instead of Chacha12 in Adiantum.





Re: [Patch] ALTER SYSTEM READ ONLY

2021-10-25 Thread Robert Haas
On Mon, Oct 25, 2021 at 3:14 PM Bossart, Nathan  wrote:
> My compiler is complaining about oldXLogAllowed possibly being used
> uninitialized in CreateCheckPoint().  AFAICT it can just be initially
> set to zero to silence this warning because it will, in fact, be
> initialized properly when it is used.

Hmm, I guess I could have foreseen that, had I been a little bit
smarter than I am. I have committed a change to initialize it to 0 as
you propose.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: parallelizing the archiver

2021-10-25 Thread Robert Haas
On Mon, Oct 25, 2021 at 3:45 PM Bossart, Nathan  wrote:
> Alright, here is an attempt at that.  With this revision, archive
> libraries are preloaded (and _PG_init() is called), and the archiver
> is responsible for calling _PG_archive_module_init() to get the
> callbacks.  I've also removed the GUC check hooks as previously
> discussed.

I would need to spend more time on this to have a detailed opinion on
all of it, but I agree that part looks better this way.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: pg_dump versus ancient server versions

2021-10-25 Thread Andrew Dunstan


On 10/25/21 13:06, Andres Freund wrote:
> Hi,
>
> On 2021-10-25 10:23:40 -0400, Tom Lane wrote:
>> Also, I concur with Andrew's point that we'd really have to have
>> buildfarm support.  However, this might not be as bad as it seems.
>> In principle we might just need to add resurrected branches back to
>> the branches_to_build list.  Given my view of what the back-patching
>> policy ought to be, a new build in an old branch might only be
>> required a couple of times a year, which would not be an undue
>> investment of buildfarm resources.
> FWIW, if helpful I could easily specify a few additional branches to some of
> my buildfarm animals. Perhaps serinus/flaviventris (snapshot gcc wo/w
> optimizations) so we'd see problems coming early? I could also add
> recent-clang one.
>
> I think doing this to a few designated animals is a better idea than wasting
> cycles and space on a lot of animals.


Right now the server will only accept results for something in
branches_of_interest.txt. So we would need to modify that.


I tend to agree that we don't need a whole lot of cross platform testing
here.


>
>
>> It seems like a fresh checkout from the repo would be little more expensive
>> than the current copy-a-checkout process.)
> I haven't looked in detail, but from what I've seen in the logs the
> is-there-anything-new check is already not cheap, and does a checkout / update
> of the git directory.
>
>

If you have removed the work tree (with the "rm_worktrees => 1" setting)
then it restores it by doing a checkout. It then does a "git fetch", and
then as you say looks to see if there is anything new. If you know of a
better way to manage it then please let me know. On crake (which is
actually checking out four different repos) the checkout step typically
takes one or two seconds.


Copying the work tree can take a few seconds - to avoid that on
Unix/msys use vpath builds.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: XTS cipher mode for cluster file encryption

2021-10-25 Thread Bruce Momjian
On Mon, Oct 25, 2021 at 11:58:14AM -0400, Stephen Frost wrote:
> As for the specific encryption method to use, using CTR would be simpler
> as it doesn't require access to be block-based, though we would need to
> make sure to not re-use the IV across any of the temporary files being
> created (potentially concurrently).  Probably not that hard to do but
> just something to make sure we do.  Of course, if we arrange for
> block-based access then we could use XTS or perhaps GCM/GCM-SIV if we
> wanted to.

Agreed on all points.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.





Re: Assorted improvements in pg_dump

2021-10-25 Thread Tom Lane
Andres Freund  writes:
> On 2021-10-24 17:10:55 -0400, Tom Lane wrote:
>> +static bool query_prepared = false;

> I wonder if it'd be better to store this in Archive or such. The approach with
> static variables might run into problems with parallel pg_dump at some
> point. These objects aren't dumped in parallel yet, but still...

Yeah, I wasn't too happy with the static bools either.  However, each
function would need its own field in the struct, which seems like a
maintenance annoyance, plus a big hazard for future copy-and-paste
changes (ie, copy and paste the wrong flag name -> trouble).  Also
the Archive struct is shared between dump and restore cases, so
adding a dozen fields that are irrelevant for restore didn't feel
right.  So I'd like a better idea, but I'm not sure that that one
is better.

regards, tom lane




Re: Delegating superuser tasks to new security roles (Was: Granting control of SUSET gucs to non-superusers)

2021-10-25 Thread Mark Dilger



> On Oct 25, 2021, at 11:30 AM, Stephen Frost  wrote:

> Consider instead:
> 
> CREATE ROLE X;
> CREATE ROLE Y;
> CREATE ROLE Z;
> 
> GRANT Y to X;
> GRANT Z to X;
> 
> SET ROLE X;
> CREATE EVENT TRIGGER FOR Y do_stuff();
> 
> Now, X has explicitly said that they want the event trigger to fire for
> role Y and if the event trigger fires or not is no longer based on
> membership in the role creating the trigger but instead depends on being
> the role which the event trigger was explicitly defined to fire on.

I don't think your proposal quite works, because the set of users you'd like to 
audit with an event trigger based auditing system may be both large and dynamic:

CREATE ROLE batman;
CREATE ROLE robin;

SET ROLE batman;
CREATE ROLE gotham_residents NOLOGIN;
CREATE ROLE riddler IN ROLE gotham_residents LOGIN;
-- create millions of other Gotham residents
CREATE EVENT TRIGGER FOR gotham_residents audit_criminal_activity();

Batman is not superuser, but he's pretty powerful, and he wants to audit all 
the criminal activity in Gotham.  How should he expect this example to work?

Having the "FOR gotham_residents" clause mean anybody with membership in role 
gotham_residents is problematic, because it means that being granted into a 
role both increases and decreases your freedoms, rather than merely giving you 
more freedoms.  If Batman covets privileges that Robin has, but wants not to be 
subjected to any event triggers that fire for Robin, he both wants into and out 
of role Robin.

Having "FOR gotham_residents" mean that only actions performed by role 
"gotham_residents" should fire the trigger is useless, since Gotham residents 
don't log in as that, but as themselves.  Batman won't catch anybody this way.

Having to list each new resident to the trigger is tedious and error-prone.  
Batman may not be able to pass a compliance audit.

Having Batman *own* all residents in Gotham city would work, if we can agree on 
a role ownership system.  It has the downside that only a role's (direct or 
indirect) owner can do the auditing, though.  That's more flexible than what we 
have today, where only superuser can do it, but maybe some people would want to 
argue for a different solution with even more flexibility?  A grantable 
privilege perhaps?  But whatever it is, the reasoning about who gets audited 
and who does not must be clear enough that Batman can pass a compliance audit.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: Assorted improvements in pg_dump

2021-10-25 Thread Andres Freund
Hi,

On 2021-10-24 17:10:55 -0400, Tom Lane wrote:
> 0004 is not committable as-is, because it assumes that the source
> server has single-array unnest(), which is not true before 8.4.
> We could fix that by using "oid = ANY(array-constant)" conditions
> instead, but I'm unsure about the performance properties of that
> for large OID arrays on those old server versions.

It doesn't seem bad at all. 8.3 assert:

CREATE TABLE foo(oid oid primary key);
INSERT INTO foo SELECT generate_series(1, 100);
postgres[1164129][1]=# explain ANALYZE SELECT count(*) FROM foo WHERE oid = 
ANY(ARRAY(SELECT generate_series(110, 1, -1)));
┌─┐
│ QUERY PLAN
  │
├─┤
│ Aggregate  (cost=81.54..81.55 rows=1 width=0) (actual time=2433.656..2433.656 
rows=1 loops=1)   │
│   InitPlan
  │
│ ->  Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.004..149.425 
rows=110 loops=1)  │
│   ->  Bitmap Heap Scan on foo  (cost=42.70..81.50 rows=10 width=0) (actual 
time=2275.137..2369.478 rows=100 loops=1)│
│ Recheck Cond: (oid = ANY (($0)::oid[]))   
  │
│ ->  Bitmap Index Scan on foo_pkey  (cost=0.00..42.69 rows=10 width=0) 
(actual time=2274.077..2274.077 rows=100 loops=1) │
│   Index Cond: (oid = ANY (($0)::oid[]))   
  │
│ Total runtime: 2436.201 ms
  │
└─┘
(8 rows)

Time: 2437.568 ms (00:02.438)


> Lastly, 0006 implements the other idea we'd discussed in the other
> thread: for queries that are issued repetitively but not within a
> single pg_dump function invocation, use PREPARE/EXECUTE to cut down
> the overhead.  This gets only diminishing returns after 0004, but
> it still brings "pg_dump -s regression" down from 0.39s to 0.33s,
> so maybe it's worth doing.

I think it's worth doing. There's things that the batch approach won't help
with and even if it doesn't help a lot with the regression test database, I'd
expect it to help plenty with other cases.

A test database I had around with lots of functions got drastically faster to
dump (7.4s to 2.5s), even though the number of queries didn't change
significantly. According to pg_stat_statements plan_time for the dumpFunc
query went from 2352ms to 0.4ms - interestingly execution time nearly halves
as well.


> I stopped after caching the plans for
> functions/aggregates/operators/types, though.  The remaining sorts
> of objects aren't likely to appear in typical databases enough times
> to be worth worrying over.  (This patch will be a net loss if there
> are more than zero but less than perhaps 10 instances of an object type,
> so there's definitely reasons beyond laziness for not doing more.)

Seems reasonable.


> @@ -7340,25 +7340,37 @@ getDomainConstraints(Archive *fout, TypeInfo *tyinfo)
>   i_consrc;
>   int ntups;
>  
> - query = createPQExpBuffer();
> + static bool query_prepared = false;
>  
> - if (fout->remoteVersion >= 90100)
> - appendPQExpBuffer(query, "SELECT tableoid, oid, conname, "
> -   
> "pg_catalog.pg_get_constraintdef(oid) AS consrc, "
> -   "convalidated "
> -   "FROM 
> pg_catalog.pg_constraint "
> -   "WHERE contypid = 
> '%u'::pg_catalog.oid "
> -   "ORDER BY conname",
> -   tyinfo->dobj.catId.oid);
> + if (!query_prepared)
> + {

I wonder if it'd be better to store this in Archive or such. The approach with
static variables might run into problems with parallel pg_dump at some
point. These objects aren't dumped in parallel yet, but still...

Greetings,

Andres Freund




Re: Delegating superuser tasks to new security roles (Was: Granting control of SUSET gucs to non-superusers)

2021-10-25 Thread Robert Haas
On Mon, Oct 25, 2021 at 2:30 PM Stephen Frost  wrote:
> Does membership in role Y cause the event trigger to fire for that role?
> I'd argue that the answer is probably 'yes', but at least it's no longer
> tied back to membership in X (the owner of the trigger).  That Noah
> explicitly mentioned 'login' roles vs. 'non-login' roles makes me think
> this is more in line with what the argument was about- the owner of the
> trigger would almost certainly be a 'login' role.  All that said, this
> is definitely a complex area and there's certainly a lot of different
> ways we could go.

I mean I get all this. I am not convinced that it's a big problem,
because it seems a bit hypothetical, but if it is a problem, then
introducing some explicit mechanism to control which triggers fire for
which users is a solution. I'm a bit concerned that it's just going to
make it complicated to configure your event triggers to no real
benefit. Suppose that, as a master tenant, have 10 event triggers and
100 users and all the users are supposed to run all the event
triggers. When I add user #101, if I have to say, yes, I want that
user to fire the same 10 event triggers, running a separate SQL
command for each of one, that's kind of annoying. If I can just create
the new user and I automatically gain membership in that user and it
therefore fires all my event triggers, I get the behavior I wanted
anyway without having to do any special steps.

But also, Noah writes: "Also, it's currently a best practice for only
non-LOGIN roles to have members.  The proposed approach invites folks
to abandon that best practice."

The kind of mechanism you're proposing here doesn't seem to make that
any less likely.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: [Patch] ALTER SYSTEM READ ONLY

2021-10-25 Thread Bossart, Nathan
On 10/25/21, 7:50 AM, "Robert Haas"  wrote:
> I've pushed 0001 and 0002 but I reversed the order of them and made a
> few other edits.

My compiler is complaining about oldXLogAllowed possibly being used
uninitialized in CreateCheckPoint().  AFAICT it can just be initially
set to zero to silence this warning because it will, in fact, be
initialized properly when it is used.

Nathan



Re: Unnecessary delay in streaming replication due to replay lag

2021-10-25 Thread Soumyadeep Chakraborty
Rebased and added a CF entry for Nov CF:
https://commitfest.postgresql.org/35/3376/.


Re: lastOverflowedXid does not handle transaction ID wraparound

2021-10-25 Thread Nikolay Samokhvalov
On Thu, Oct 21, 2021 at 07:21 Stan Hu  wrote:

> On Wed, Oct 20, 2021 at 9:01 PM Kyotaro Horiguchi
>  wrote:
> >
> > lastOverflowedXid is the smallest subxid that possibly exists but
> > possiblly not known to the standby. So if all top-level transactions
> > older than lastOverflowedXid end, that means that all the
> > subtransactions in doubt are known to have been ended.
>
> Thanks for the patch! I verified that it appears to reset
> lastOverflowedXid properly.
>

Is it right time to register the patch in the current commit fest, right?
(How to do that?)

On a separate note, I think it would be really good to improve
observability for SLRUs -- particularly for Subtrans SLRU and this
overflow-related aspects.  pg_stat_slru added in PG13 is really helpful,
but not enough to troubleshoot, analyze and tune issues like this, and the
patches related to SLRU. Basic ideas:
- expose to the user how many pages are currently used (especially useful
if SLRU sizes will be configurable, see
https://commitfest.postgresql.org/34/2627/)
- Andrew Borodin also expressed the idea to extend pageinspect to allow
seeing the content of SLRUs
- a more specific thing: allow seeing lastOverflowedXid somehow (via SQL or
in logs) - we see how important it for standbys health, but we cannot see
it now.

Any ideas in the direction of observability?

Nik

>


Re: Experimenting with hash tables inside pg_dump

2021-10-25 Thread Andres Freund
Hi,

On 2021-10-25 13:58:06 -0400, Tom Lane wrote:
> Andres Freund  writes:
> >> Seems like we need a less quick-and-dirty approach to dealing with
> >> unnecessary simplehash support functions.
> 
> > I don't think the problem is unnecessary ones?
> 
> I was thinking about the stuff like SH_ITERATE, which you might or
> might not have use for in any particular file.  In the case at hand
> here, a file that doesn't call SH_INSERT would be at risk of getting
> unused-function complaints about SH_GROW.  But as you say, if we do
> find that happening, __attribute__((unused)) would probably be
> enough to silence it.

I was hoping that a reference from a static inline function ought to be
sufficient to prevent warning about an unused-static-not-inline function, even
if the referencing static inline function is unused... It does work that way
with at least the last few versions of gcc (tested 8-11) and clang (tested 6.0
to 13).

Greetings,

Andres Freund




Re: Delegating superuser tasks to new security roles (Was: Granting control of SUSET gucs to non-superusers)

2021-10-25 Thread Stephen Frost
Greetings,

* Robert Haas (robertmh...@gmail.com) wrote:
> On Mon, Oct 25, 2021 at 12:20 PM Stephen Frost  wrote:
> > > Exactly.  That's the main point.  Also, it's currently a best practice for
> > > only non-LOGIN roles to have members.  The proposed approach invites 
> > > folks to
> > > abandon that best practice.  I take the two smells as a sign that we'll 
> > > regret
> > > adopting the proposal, despite not knowing how it will go seriously wrong.
> >
> > This seems like a pretty good point, which leads me to again think that
> > we should explicitly add a way for an individual who can create event
> > triggers to be able to specify for whom the event trigger should fire,
> > and only allow them to specify roles other than their own provided they
> > have been given that authority (either explicitly somehow or implicitly
> > based on some defined access that they have to that other role).
> 
> I agree that Noah has a reasonably good point here. I don't think it's
> a total slam-dunk but it it's certainly not a stupid argument.

Ok.

> Conceding that point for the purposes of discussion, I don't
> understand how this kind of proposal gets us out from under the
> problem. Surely, it can't be the case that user X can cause event
> trigger E to run as user Y unless X can become Y, because to do so
> would allow X to usurp Y's privileges, except in the corner case where
> Y never does anything that can trigger an event trigger. But if X has
> to be able to become Y in order to force E to be run by Y, then I
> think we've made no progress in addressing Noah's complaint.

X having rights over Y is what would allow X to create an event trigger
which fires when Y does $something, but the act of GRANT'ing Y to X
wouldn't make it automatically start happening.  The latter is what I
believed Noah's concern was around.

The downside there though is that GRANT'ing of roles to other roles is
how we build up sets of roles and you'd certainly wish to be able to
leverage that when deciding which roles a given event trigger should
fire for.  If we made that work for event triggers then you'd still have
the case that *some* GRANT A to B would cause event triggers to suddenly
start happening for B without other actions being taken.  Still, in that
case you could create specific such roles to manage that independently
of which roles happened to have admin rights over which other roles.

Examples might help here.

CREATE ROLE X;
CREATE ROLE Y;
CREATE ROLE Z;

GRANT Y to X;
GRANT Z to X;

SET ROLE X;
CREATE EVENT TRIGGER do_stuff();

Under one approach, that event trigger then fires for X, Y and Z.  What
if you don't actually want it to though?  What if some role Q is later
created and GRANT'd to X?  Then the event trigger would also fire for
them.

Consider instead:

CREATE ROLE X;
CREATE ROLE Y;
CREATE ROLE Z;

GRANT Y to X;
GRANT Z to X;

SET ROLE X;
CREATE EVENT TRIGGER FOR Y do_stuff();

Now, X has explicitly said that they want the event trigger to fire for
role Y and if the event trigger fires or not is no longer based on
membership in the role creating the trigger but instead depends on being
the role which the event trigger was explicitly defined to fire on.

Does membership in role Y cause the event trigger to fire for that role?
I'd argue that the answer is probably 'yes', but at least it's no longer
tied back to membership in X (the owner of the trigger).  That Noah
explicitly mentioned 'login' roles vs. 'non-login' roles makes me think
this is more in line with what the argument was about- the owner of the
trigger would almost certainly be a 'login' role.  All that said, this
is definitely a complex area and there's certainly a lot of different
ways we could go.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: Predefined role pg_maintenance for VACUUM, ANALYZE, CHECKPOINT.

2021-10-25 Thread Stephen Frost
Greetings,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Stephen Frost  writes:
> > Independent of other things, getting to the point where everything can
> > be done in the database without the need for superuser is absolutely a
> > good goal to be striving for, not something to be avoiding.
> > I don't think that makes superuser become 'dummy', but perhaps the
> > only explicit superuser check we end up needing is "superuser is a
> > member of all roles".  That would be a very cool end state.
> 
> I'm not entirely following how that's going to work.  It implies that
> there is some allegedly-not-superuser role that has the ability to
> become superuser -- either within SQL or by breaking out to the OS --
> because certainly a superuser can do those things.

I don't think it implies that at all.  Either that, or I'm not following
what you're saying here.  We have predefined roles today which aren't
superusers and yet they're able to break out to the OS.  Of course they
can become superusers if they put the effort in.  None of that gets away
from the more general idea that we could get to a point where all of a
superuser's capabilities come from roles which the superuser is
automatically a member of such that we need have only one explicit
superuser() check.

> I don't think we're serving any good purpose by giving people the
> impression that roles with such permissions are somehow not
> superuser-equivalent.  Certainly, the providers who don't want to
> give users superuser are just going to need a longer list of roles
> they won't give access to (and they probably won't be pleased about
> having to vet every predefined role carefully).

I agree that we need to be clear through the documentation about which
predefined roles are able to "break outside the box" and become
superuser, but that's independent from the question of if we will get to
a point where every capability the superuser has can also be given
through membership in some predefined role or not.

That providers have to figure out what makes sense for them in terms of
what they'll allow their users to do or not do doesn't seem entirely
relevant here- different providers are by definition different and some
might be fine with given out certain rights that others don't want to
give out.  That's actually kind of the point of breaking out all of
these capabilities into more fine-grained ways of granting capabilities
out.

We already have roles today which are ones that can break outside the
box and get to the OS and are able to do things that a superuser can do,
or become a superuser themselves, and that's been generally a positive
thing.  We also have roles which are able to do things that only
superusers used to be able to do but which are not able to become
superusers themselves and aren't able to break out of the box.  I don't
see any reason why we can't continue with this and eventually eliminate
the explicit superuser() checks except from the one where a superuser is
a member of all roles.  Sure, some of those roles give capabilities
which can be used to become superuser, while others don't, but I hardly
see that as an argument against the general idea that each is able to be
independently GRANT'd.

If nothing else, if we could eventually get to the point where there's
only one such explicit check then maybe we'd stop creating *new* places
where capabilities are locked behind a superuser check.  I did an audit
once upon a time of all superuser checks and rather than that number
going down, as I had hoped at the time, it's been going up instead
across new releases.  I view that as unfortunate.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: pgsql: Remove unused wait events.

2021-10-25 Thread Daniel Gustafsson
> On 25 Oct 2021, at 20:01, Andres Freund  wrote:
> 
> On 2021-10-25 13:39:44 -0400, Tom Lane wrote:
>> Daniel Gustafsson  writes:
>>> Since this will cause integer values to have different textual enum value
>>> representations in 14 and 15+, do we want to skip two numbers by assigning 
>>> the
>>> next wait event the integer value of WAIT_EVENT_WAL_WRITE incremented by 
>>> three?
>>> Or enum integer reuse not something we guarantee against across major 
>>> versions?
>> 
>> We require a recompile across major versions.  I don't see a reason why
>> this particular enum needs more stability than any other one.
> 
> +1. That'd end up pushing us to be more conservative about defining new wait
> events, which I think would be bad tradeoff.

Fair enough, makes sense.

--
Daniel Gustafsson   https://vmware.com/





Re: pgsql: Remove unused wait events.

2021-10-25 Thread Andres Freund
On 2021-10-25 13:39:44 -0400, Tom Lane wrote:
> Daniel Gustafsson  writes:
> > Since this will cause integer values to have different textual enum value
> > representations in 14 and 15+, do we want to skip two numbers by assigning 
> > the
> > next wait event the integer value of WAIT_EVENT_WAL_WRITE incremented by 
> > three?
> > Or enum integer reuse not something we guarantee against across major 
> > versions?
> 
> We require a recompile across major versions.  I don't see a reason why
> this particular enum needs more stability than any other one.

+1. That'd end up pushing us to be more conservative about defining new wait
events, which I think would be bad tradeoff.




Re: Experimenting with hash tables inside pg_dump

2021-10-25 Thread Tom Lane
Andres Freund  writes:
> On 2021-10-22 16:32:39 -0400, Tom Lane wrote:
>> Hmm, harder than it sounds.  If I remove "inline" from SH_SCOPE then
>> the compiler complains about unreferenced static functions, while
>> if I leave it there than adding pg_noinline causes a complaint about
>> conflicting options.

> The easy way out would be to to not declare SH_GROW inside SH_DECLARE - that'd
> currently work, because there aren't any calls to grow from outside of
> simplehash.h.

Seems like a reasonable approach.  If somebody wanted to call that
from outside, I'd personally feel they were getting way too friendly
with the implementation.

>> Seems like we need a less quick-and-dirty approach to dealing with
>> unnecessary simplehash support functions.

> I don't think the problem is unnecessary ones?

I was thinking about the stuff like SH_ITERATE, which you might or
might not have use for in any particular file.  In the case at hand
here, a file that doesn't call SH_INSERT would be at risk of getting
unused-function complaints about SH_GROW.  But as you say, if we do
find that happening, __attribute__((unused)) would probably be
enough to silence it.

regards, tom lane




Re: Predefined role pg_maintenance for VACUUM, ANALYZE, CHECKPOINT.

2021-10-25 Thread Stephen Frost
Greetings,

* Jeff Davis (pg...@j-davis.com) wrote:
> On Sun, 2021-10-24 at 21:32 +, Bossart, Nathan wrote:
> > My initial reaction was that members of pg_maintenance should be able
> > to do all of these things (VACUUM, ANALYZE, CLUSTER, REINDEX, and
> > CHECKPOINT).
> 
> What about REFRESH MATERIALIZED VIEW? That seems more specific to a
> workload, but it's hard to draw a clear line between that and CLUSTER.

Let's not forget that there are already existing non-superusers who can
run things like REFRESH MATERIALIZED VIEW- the owner.

> >   Maybe one
> > option is to have two separate roles, one for commands that require
> > lower lock levels (i.e., ANALYZE and VACUUM without TRUNCATE and
> > FULL), and another for all of the maintenance commands.
> 
> My main motivation is CHECKPOINT and database-wide VACUUM and ANALYZE.
> I'm fine extending it if others think it would be worthwhile, but it
> goes beyond my use case.

I've been wondering what the actual use-case here is.  DB-wide VACUUM
and ANALYZE are already able to be run by the database owner, but
probably more relevant is that DB-wide VACUUMs and ANALYZEs shouldn't
really be necessary given autovacuum, so why are we adding predefined
roles which will encourage users to do that?

I was also contemplating a different angle on this- allowing users to
request autovacuum to run vacuum/analyze on a particular table.  This
would have the advantage that you get the vacuum/analyze behavior that
autovacuum has (giving up an attempted truncate lock if another process
wants a lock on the table, going at a slower pace rather than going all
out and sucking up lots of I/O, etc).

I'm not completely against this approach but just would like a bit
better understanding of why it makes sense and what things we'll be able
to say about what this role can/cannot do.

Lastly though- I dislike the name, it seems far too general.  I get that
naming things is hard but maybe we could find something better than
'pg_maintenance' for this.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: Predefined role pg_maintenance for VACUUM, ANALYZE, CHECKPOINT.

2021-10-25 Thread Tom Lane
Stephen Frost  writes:
> Independent of other things, getting to the point where everything can
> be done in the database without the need for superuser is absolutely a
> good goal to be striving for, not something to be avoiding.
> I don't think that makes superuser become 'dummy', but perhaps the
> only explicit superuser check we end up needing is "superuser is a
> member of all roles".  That would be a very cool end state.

I'm not entirely following how that's going to work.  It implies that
there is some allegedly-not-superuser role that has the ability to
become superuser -- either within SQL or by breaking out to the OS --
because certainly a superuser can do those things.

I don't think we're serving any good purpose by giving people the
impression that roles with such permissions are somehow not
superuser-equivalent.  Certainly, the providers who don't want to
give users superuser are just going to need a longer list of roles
they won't give access to (and they probably won't be pleased about
having to vet every predefined role carefully).

regards, tom lane




Re: Experimenting with hash tables inside pg_dump

2021-10-25 Thread Andres Freund
Hi,

Thanks for pushing the error handling cleanup etc!

On 2021-10-22 16:32:39 -0400, Tom Lane wrote:
> I wrote:
> > Andres Freund  writes:
> >> Wonder if we should mark simplehash's grow as noinline? Even with a single 
> >> caller it seems better to not inline it to remove register allocator 
> >> pressure.
>
> > Seems plausible --- you want me to go change that?
>
> Hmm, harder than it sounds.  If I remove "inline" from SH_SCOPE then
> the compiler complains about unreferenced static functions, while
> if I leave it there than adding pg_noinline causes a complaint about
> conflicting options.

The easy way out would be to to not declare SH_GROW inside SH_DECLARE - that'd
currently work, because there aren't any calls to grow from outside of
simplehash.h. The comment says:
 * ... But resizing to the exact input size can be advantageous
 * performance-wise, when known at some point.

But perhaps that's sufficiently served to create the table with the correct
size immediately?

If we were to go for that, we'd just put SH_GROW in the SH_DEFINE section not
use SH_SCOPE, but just static. That works here, and I have some hope it'd not
cause warnings on other compilers either, because there'll be references from
the other inline functions. Even if there's a SH_SCOPE=static inline
simplehash use inside a header and there aren't any callers in a TU, there'd
still be static inline references to it.


Another alternative would be to use __attribute__((unused)) or such on
non-static-inline functions that might or might not be used.


> Seems like we need a less quick-and-dirty approach to dealing with
> unnecessary simplehash support functions.

I don't think the problem is unnecessary ones? It's "cold" functions we don't
want to have inlined into larger functions.

Greetings,

Andres Freund




Re: parallelizing the archiver

2021-10-25 Thread Bossart, Nathan
On 10/25/21, 10:18 AM, "Robert Haas"  wrote:
> On Mon, Oct 25, 2021 at 1:14 PM Bossart, Nathan  wrote:
>> IIUC this would mean that archive modules that need to define GUCs or
>> register background workers would have to separately define a
>> _PG_init() and be loaded via shared_preload_libraries in addition to
>> archive_library.  That doesn't seem too terrible to me, but it was
>> something I was trying to avoid.
>
> Hmm. That doesn't seem like a terrible goal, but I think we should try
> to find some way of achieving it that looks tidier than this does.

We could just treat archive_library as if it is tacked onto the
shared_preload_libraries list.  I think I can make that look
relatively tidy.

Nathan



Re: Predefined role pg_maintenance for VACUUM, ANALYZE, CHECKPOINT.

2021-10-25 Thread Stephen Frost
Greetings,

* Bharath Rupireddy (bharath.rupireddyforpostg...@gmail.com) wrote:
> On Sun, Oct 24, 2021 at 3:15 AM Jeff Davis  wrote:
> > Add new predefined role pg_maintenance, which can issue VACUUM,
> > ANALYZE, CHECKPOINT.
> >
> > Patch attached.
> 
> At this point, the idea of having a new role for maintenance work
> looks good. With this patch and Mark Dilger's patch introducing a
> bunch of new predefined roles, one concern is that we might reach to a
> state where we will have patches being proposed for new predefined
> roles for every database activity and the superuser eventually will
> have nothing to do in the database, it just becomes dummy?

Independent of other things, getting to the point where everything can
be done in the database without the need for superuser is absolutely a
good goal to be striving for, not something to be avoiding.

I don't think that makes superuser become 'dummy', but perhaps the
only explicit superuser check we end up needing is "superuser is a
member of all roles".  That would be a very cool end state.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: pgsql: Remove unused wait events.

2021-10-25 Thread Tom Lane
Daniel Gustafsson  writes:
> Since this will cause integer values to have different textual enum value
> representations in 14 and 15+, do we want to skip two numbers by assigning the
> next wait event the integer value of WAIT_EVENT_WAL_WRITE incremented by 
> three?
> Or enum integer reuse not something we guarantee against across major 
> versions?

We require a recompile across major versions.  I don't see a reason why
this particular enum needs more stability than any other one.

regards, tom lane




Re: pgsql: Remove unused wait events.

2021-10-25 Thread Daniel Gustafsson
> On 25 Oct 2021, at 19:18, Tom Lane  wrote:
> 
> Robert Haas  writes:
>> ... But while I agree it's good to remove unused stuff in the
>> master, it doesn't seem like we really need to back-patch it.
> 
> Yeah, exactly.  I don't see any benefit that's commensurate with
> even a small risk of breaking extensions --- and apparently, in
> this case that's not a risk but a certainty.

Since this will cause integer values to have different textual enum value
representations in 14 and 15+, do we want to skip two numbers by assigning the
next wait event the integer value of WAIT_EVENT_WAL_WRITE incremented by three?
Or enum integer reuse not something we guarantee against across major versions?

--
Daniel Gustafsson   https://vmware.com/





Re: Allow pg_signal_backend members to use pg_log_backend_memory_stats().

2021-10-25 Thread Andres Freund
Hi,

On 2021-10-23 12:57:02 -0700, Jeff Davis wrote:
> Simple patch to implement $SUBJECT attached.
> 
> pg_signal_backend seems like the appropriate predefined role, because
> pg_log_backend_memory_contexts() is implemented by a sending signal.

I like the idea of making pg_log_backend_memory_contexts() more widely
available. But I think tying it to pg_signal_backend isn't great. It's
unnecessarily baking in an implementation detail.

Greetings,

Andres Freund




Re: Delegating superuser tasks to new security roles (Was: Granting control of SUSET gucs to non-superusers)

2021-10-25 Thread Robert Haas
On Mon, Oct 25, 2021 at 12:20 PM Stephen Frost  wrote:
> > Exactly.  That's the main point.  Also, it's currently a best practice for
> > only non-LOGIN roles to have members.  The proposed approach invites folks 
> > to
> > abandon that best practice.  I take the two smells as a sign that we'll 
> > regret
> > adopting the proposal, despite not knowing how it will go seriously wrong.
>
> This seems like a pretty good point, which leads me to again think that
> we should explicitly add a way for an individual who can create event
> triggers to be able to specify for whom the event trigger should fire,
> and only allow them to specify roles other than their own provided they
> have been given that authority (either explicitly somehow or implicitly
> based on some defined access that they have to that other role).

I agree that Noah has a reasonably good point here. I don't think it's
a total slam-dunk but it it's certainly not a stupid argument.
Conceding that point for the purposes of discussion, I don't
understand how this kind of proposal gets us out from under the
problem. Surely, it can't be the case that user X can cause event
trigger E to run as user Y unless X can become Y, because to do so
would allow X to usurp Y's privileges, except in the corner case where
Y never does anything that can trigger an event trigger. But if X has
to be able to become Y in order to force E to be run by Y, then I
think we've made no progress in addressing Noah's complaint.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: pg_dump versus ancient server versions

2021-10-25 Thread Andres Freund
Hi,

On 2021-10-25 13:09:43 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > I'd really like us to adopt a "default" policy on this. I think it's a waste
> > to spend time every few years arguing what exact versions to drop. I'd much
> > rather say that, unless there are concrete reasons to deviate from that, we
> > provide pg_dump compatibility for 5+3 releases, pg_upgrade for 5+1, and psql
> > for 5 releases or something like that.
> 
> I agree with considering something like that to be the minimum support
> policy, but the actual changes need a bit more care.  For example, when
> we last did this, the technical need was just to drop pre-7.4 versions,
> but we chose to make the cutoff 8.0 on the grounds that that was more
> understandable to users [1].  In the same way, I'm thinking of moving the
> cutoff to 9.0 now, although 8.4 would be sufficient from a technical
> standpoint.

I think that'd be less of a concern if we had a documented policy
somewhere. It'd not be hard to include a version table in that policy to make
it easier to understand. We could even add it to the table in
https://www.postgresql.org/support/versioning/ or something similar.


> OTOH, in the new world of one-part major versions, it's less clear that
> there will be obvious division points for future cutoff changes.  Maybe
> versions-divisible-by-five would work?

I think that's more confusing than helpful, because the support timeframes
then differ between releases. It's easier to just subtract a number of major
releases for from a specific major version. Especially if there's a table
somewhere.


> Or versions divisible by ten, but experience so far suggests that we'll want
> to move the cutoff more often than once every ten years.

Yes, I think that'd be quite a bit too restrictive.

Greetings,

Andres Freund




Re: pgsql: Remove unused wait events.

2021-10-25 Thread Tom Lane
Robert Haas  writes:
> ... But while I agree it's good to remove unused stuff in the
> master, it doesn't seem like we really need to back-patch it.

Yeah, exactly.  I don't see any benefit that's commensurate with
even a small risk of breaking extensions --- and apparently, in
this case that's not a risk but a certainty.

regards, tom lane




Re: parallelizing the archiver

2021-10-25 Thread Robert Haas
On Mon, Oct 25, 2021 at 1:14 PM Bossart, Nathan  wrote:
> IIUC this would mean that archive modules that need to define GUCs or
> register background workers would have to separately define a
> _PG_init() and be loaded via shared_preload_libraries in addition to
> archive_library.  That doesn't seem too terrible to me, but it was
> something I was trying to avoid.

Hmm. That doesn't seem like a terrible goal, but I think we should try
to find some way of achieving it that looks tidier than this does.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: pg_dump versus ancient server versions

2021-10-25 Thread Andres Freund
Hi,

On 2021-10-25 12:43:15 -0400, Tom Lane wrote:
> Agreed, that might be too much work compared to the value.  But if we're
> to be selective about support for this, I'm unclear on how we decide
> which platforms are supported --- and, more importantly, how we keep
> that list up to date over time.

I honestly think that if we just test on linux with a single distribution,
we're already covering most of the benefit. From memory there have been two
rough classes of doesn't-build-anymore:

1) New optimizations / warnings. At least between gcc and clang, within a year
   or two, most of the issues end up being visible with the other compiler
   too. These aren't particularly distribution / OS specific.

2) Library dependencies cause problems, like the ssl detection mentioned
   elsewhere in this thread. This is also not that OS dependent. It's also not
   that clear that we can do something about the issues with a reasonable
   amount of effort in all cases. It's easy enough if it's just a minor
   configure fix, but we'd not want to backpatch larger SSL changes or such.


Maybe there's also a case for building older releases with msvc, but that
seems like a pain due to the msvc project generation needing to support a
specific version of msvc.

Greetings,

Andres Freund




Re: parallelizing the archiver

2021-10-25 Thread Bossart, Nathan
On 10/25/21, 10:02 AM, "Robert Haas"  wrote:
> I don't see why this patch should need to make any changes to
> internal_load_library(), PostmasterMain(), SubPostmasterMain(), or any
> other central point of control, and I don't think it should.
> pgarch_archiveXlog() can just load the library at the time it's
> needed. That way it only gets loaded in the archiver process, and the
> required changes are much more localized. Like instead of asserting
> that the functions are initialized, just
> load_external_function(libname, "_PG_archive_module_init") and call it
> if they aren't.

IIUC this would mean that archive modules that need to define GUCs or
register background workers would have to separately define a
_PG_init() and be loaded via shared_preload_libraries in addition to
archive_library.  That doesn't seem too terrible to me, but it was
something I was trying to avoid.

> I think the attempt in check_archive_command()/check_archive_library()
> to force exactly one of those two to be set is not going to work well
> and should be removed. In general, GUCs whose legal values depend on
> the values of other GUCs don't end up working out well. I think what
> should happen instead is that if archive_library=shell then
> archive_command does whatever it does; otherwise archive_command is
> without effect.

I'm fine with this approach.  I'll go this route in the next revision.

Nathan



Re: pg_dump versus ancient server versions

2021-10-25 Thread Tom Lane
Andres Freund  writes:
> On 2021-10-25 10:23:40 -0400, Tom Lane wrote:
>> It seems like a fresh checkout from the repo would be little more expensive
>> than the current copy-a-checkout process.)

> I haven't looked in detail, but from what I've seen in the logs the
> is-there-anything-new check is already not cheap, and does a checkout / update
> of the git directory.

Yeah, you probably need a checkout to apply the rule about don't rebuild
after documentation-only changes.  But it seems like the case where the
branch tip hasn't moved at all could be optimized fairly easily.  I'm not
sure it's worth the trouble to add code for that given our current usage
of the buildfarm; but if we were to start tracking branches that only
change a couple of times a year, it would be.

regards, tom lane




Re: pgsql: Remove unused wait events.

2021-10-25 Thread Robert Haas
On Mon, Oct 25, 2021 at 12:38 PM Tom Lane  wrote:
> Um ... the removed symbols were at the end of the WaitEventIO enum,
> so is there really an ABI break?  I suppose if an extension contains
> actual references to the removed symbols, it would fail to recompile,
> which'd be annoying for a released branch.

I think that you're right. I believe one of the two extensions I know
about hopes that values won't be renumbered or become invalid across
minor releases, and the other contains specific references to these
particular constants.

Now of course it is always arguable whether or not anything that some
extension is doing ought to be deemed an acceptable use of the
facilities provided by core, and how much stability ought to be
guaranteed. But while I agree it's good to remove unused stuff in the
master, it doesn't seem like we really need to back-patch it.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: pg_dump versus ancient server versions

2021-10-25 Thread Tom Lane
Andres Freund  writes:
> On 2021-10-22 19:30:25 -0400, Tom Lane wrote:
>> Yeah.  I checked into when it was that we dropped pre-8.0 support
>> from pg_dump, and the answer is just about five years ago (64f3524e2).
>> So moving the bar forward by five releases isn't at all out of line.
>> 8.4 would be eight years past EOL by the time v15 comes out.

> I'd really like us to adopt a "default" policy on this. I think it's a waste
> to spend time every few years arguing what exact versions to drop. I'd much
> rather say that, unless there are concrete reasons to deviate from that, we
> provide pg_dump compatibility for 5+3 releases, pg_upgrade for 5+1, and psql
> for 5 releases or something like that.

I agree with considering something like that to be the minimum support
policy, but the actual changes need a bit more care.  For example, when
we last did this, the technical need was just to drop pre-7.4 versions,
but we chose to make the cutoff 8.0 on the grounds that that was more
understandable to users [1].  In the same way, I'm thinking of moving the
cutoff to 9.0 now, although 8.4 would be sufficient from a technical
standpoint.

OTOH, in the new world of one-part major versions, it's less clear that
there will be obvious division points for future cutoff changes.  Maybe
versions-divisible-by-five would work?  Or versions divisible by ten,
but experience so far suggests that we'll want to move the cutoff more
often than once every ten years.

regards, tom lane

[1] https://www.postgresql.org/message-id/flat/2661.1475849167%40sss.pgh.pa.us




Re: pg_dump versus ancient server versions

2021-10-25 Thread Andres Freund
Hi,

On 2021-10-25 10:23:40 -0400, Tom Lane wrote:
> Also, I concur with Andrew's point that we'd really have to have
> buildfarm support.  However, this might not be as bad as it seems.
> In principle we might just need to add resurrected branches back to
> the branches_to_build list.  Given my view of what the back-patching
> policy ought to be, a new build in an old branch might only be
> required a couple of times a year, which would not be an undue
> investment of buildfarm resources.

FWIW, if helpful I could easily specify a few additional branches to some of
my buildfarm animals. Perhaps serinus/flaviventris (snapshot gcc wo/w
optimizations) so we'd see problems coming early? I could also add
recent-clang one.

I think doing this to a few designated animals is a better idea than wasting
cycles and space on a lot of animals.


> It seems like a fresh checkout from the repo would be little more expensive
> than the current copy-a-checkout process.)

I haven't looked in detail, but from what I've seen in the logs the
is-there-anything-new check is already not cheap, and does a checkout / update
of the git directory.

Greetings,

Andres Freund




Re: parallelizing the archiver

2021-10-25 Thread Robert Haas
On Sun, Oct 24, 2021 at 2:15 AM Bossart, Nathan  wrote:
> Here is an attempt at doing this.  Archive modules are expected to
> declare _PG_archive_module_init(), which can define GUCs, register
> background workers, etc.  This function must at least define the
> archive callbacks.  For now, I've introduced two callbacks.  The first
> is for checking that the archive module is configured, and the second
> contains the actual archiving logic.

I don't see why this patch should need to make any changes to
internal_load_library(), PostmasterMain(), SubPostmasterMain(), or any
other central point of control, and I don't think it should.
pgarch_archiveXlog() can just load the library at the time it's
needed. That way it only gets loaded in the archiver process, and the
required changes are much more localized. Like instead of asserting
that the functions are initialized, just
load_external_function(libname, "_PG_archive_module_init") and call it
if they aren't.

I think the attempt in check_archive_command()/check_archive_library()
to force exactly one of those two to be set is not going to work well
and should be removed. In general, GUCs whose legal values depend on
the values of other GUCs don't end up working out well. I think what
should happen instead is that if archive_library=shell then
archive_command does whatever it does; otherwise archive_command is
without effect.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: pg_dump versus ancient server versions

2021-10-25 Thread Andres Freund
Hi,

On 2021-10-22 19:30:25 -0400, Tom Lane wrote:
> Yeah.  I checked into when it was that we dropped pre-8.0 support
> from pg_dump, and the answer is just about five years ago (64f3524e2).
> So moving the bar forward by five releases isn't at all out of line.
> 8.4 would be eight years past EOL by the time v15 comes out.

I'd really like us to adopt a "default" policy on this. I think it's a waste
to spend time every few years arguing what exact versions to drop. I'd much
rather say that, unless there are concrete reasons to deviate from that, we
provide pg_dump compatibility for 5+3 releases, pg_upgrade for 5+1, and psql
for 5 releases or something like that.

It's fine to not actually spend the time to excise support for old versions
every release if not useful, but we should be able to "just do it" whenever
version compat is a meaningful hindrance.

Greetings,

Andres Freund




Re: pg_dump versus ancient server versions

2021-10-25 Thread Tom Lane
Alvaro Herrera  writes:
> I do think you have moved the goalposts: to reiterate what I said above,
> I thought what we wanted was to have *some* server in order to test
> client-side changes with; not to be able to get a server running on
> every possible platform.  I'm not really on board with the idea that old
> branches have to be buildable everywhere all the time.

Agreed, that might be too much work compared to the value.  But if we're
to be selective about support for this, I'm unclear on how we decide
which platforms are supported --- and, more importantly, how we keep
that list up to date over time.

regards, tom lane




Re: pgsql: Remove unused wait events.

2021-10-25 Thread Tom Lane
Robert Haas  writes:
> On Wed, Oct 20, 2021 at 10:52 PM Amit Kapila  wrote:
>> Remove unused wait events.

> This commit forces a recompile of every extension that knows about the
> integer values assigned to the enums in WaitEventIO. I know of 2
> extensions that are affected by this. I think that it should be
> reverted in v14 and kept only in master.

Um ... the removed symbols were at the end of the WaitEventIO enum,
so is there really an ABI break?  I suppose if an extension contains
actual references to the removed symbols, it would fail to recompile,
which'd be annoying for a released branch.

On the whole, I agree that this patch had no business being committed
to v14.

regards, tom lane




Re: parallelizing the archiver

2021-10-25 Thread Stephen Frost
Greetings,

* Magnus Hagander (mag...@hagander.net) wrote:
> On Thu, Oct 21, 2021 at 11:05 PM Robert Haas  wrote:
> > On Thu, Oct 21, 2021 at 4:29 PM Stephen Frost  wrote:
> > > restore_command used to be in recovery.conf, which disappeared with v12
> > > and it now has to go into postgresql.auto.conf or postgresql.conf.
> > >
> > > That's a huge breaking change.
> >
> > Not in the same sense. Moving the functionality to a different
> > configuration file can and probably did cause a lot of problems for
> > people, but the same basic functionality was still available.
> 
> Yeah.
> 
> And as a bonus it got a bunch of people to upgrade their backup software
> that suddenly stopped working. Or in some case, to install backup software
> instead of using the hand-rolled scripts. So there were some good
> side-effects specifically to breaking it as well.

I feel like there's some confusion here- just to clear things up, I
wasn't suggesting that we wouldn't include the capability, just that we
should be open to changing the interface/configuration based on what
makes sense and not, necessarily, insist on perfect backwards
compatibility.  Seems everyone else has come out in support of that as
well at this point and so I don't think there's much more to say here.

The original complaint I had made was that it felt like folks were
pushing hard on backwards compatibility for the sake of it and I was
just trying to make sure it's clear that we can, and do, break backwards
compatibility sometimes and the bar to clear isn't necessarily all that
high, though of course we should be gaining something if we do decide to
make such a change.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: pg_dump versus ancient server versions

2021-10-25 Thread Alvaro Herrera
On 2021-Oct-25, Tom Lane wrote:

> It's also unclear to me why we'd leave Windows out of this discussion.
> We keep saying we want to encourage Windows-based hackers to contribute,
> so doesn't that require testing it on the same basis as other platforms?

Testing of in-support branches, sure -- I don't propose to break that.
But this is all about providing *some* server against which to test
client-side changes with, right?  Not to test the old servers
themselves.  Looking at Amit K's "Postgres person of the week" interview[1]
and remembering conversations with David Rowley, Windows hackers seem
perfectly familiar with getting Linux builds going, so we wouldn't need
to force MSVC fixes in order for them to have old servers available.

But anyway, I was thinking that the fixes required for MSVC buildability
were quite invasive, but on looking again they don't seem all that
bad[2], so I withdraw that comment.

I do think you have moved the goalposts: to reiterate what I said above,
I thought what we wanted was to have *some* server in order to test
client-side changes with; not to be able to get a server running on
every possible platform.  I'm not really on board with the idea that old
branches have to be buildable everywhere all the time.

[1] https://postgresql.life/post/amit_kapila/
[2] e.g., commit 2b1394fc2b52a2573d08aa626e7b49568f27464e

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/




Re: Delegating superuser tasks to new security roles (Was: Granting control of SUSET gucs to non-superusers)

2021-10-25 Thread Stephen Frost
Greetings,

* Noah Misch (n...@leadboat.com) wrote:
> On Wed, Oct 20, 2021 at 11:27:11AM -0700, Jeff Davis wrote:
> > On Wed, 2021-10-20 at 10:32 -0700, Mark Dilger wrote:
> > > I'd like to have a much clearer understanding of Noah's complaint
> > > first.  There are multiple things to consider: (1) the role which
> > > owns the trigger, (2) the role which is performing an action which
> > > would cause the trigger to fire, (3) the set of roles role #1 belongs
> > > to, (4) the set of roles role #1 has ADMIN privilege on, (5) the set
> > > of roles that role #2 belongs to, and (6) the set of roles that role
> > > #2 has ADMIN privilege on.  Maybe more?
> 
> > > I'd like to know precisely which combinations of these six things are
> > > objectionable, and why.  There may be a way around the objections
> > > without needing to create new user options or new privileged roles.
> > 
> > I can't speak for Noah, but my interpretation is that it would be
> > surprising if GRANT/REVOKE or membership in an ordinary role had
> > effects other than "permission denied" errors. It might make sense for
> > event trigger firing in all the cases we can think of, but it would
> > certainly be strange if we started accumulating a collection of
> > behaviors that implicitly change when you move in or out of a role.
> > 
> > That's pretty general, so to answer your question: it seems like a
> > problem to use #3-6 in the calculation about whether to fire an event
> > trigger.
> 
> Exactly.  That's the main point.  Also, it's currently a best practice for
> only non-LOGIN roles to have members.  The proposed approach invites folks to
> abandon that best practice.  I take the two smells as a sign that we'll regret
> adopting the proposal, despite not knowing how it will go seriously wrong.

This seems like a pretty good point, which leads me to again think that
we should explicitly add a way for an individual who can create event
triggers to be able to specify for whom the event trigger should fire,
and only allow them to specify roles other than their own provided they
have been given that authority (either explicitly somehow or implicitly
based on some defined access that they have to that other role).

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: pgsql: Remove unused wait events.

2021-10-25 Thread Robert Haas
On Wed, Oct 20, 2021 at 10:52 PM Amit Kapila  wrote:
> Remove unused wait events.
>
> Commit 464824323e introduced the wait events which were neither used by
> that commit nor by follow-up commits for that work.

This commit forces a recompile of every extension that knows about the
integer values assigned to the enums in WaitEventIO. I know of 2
extensions that are affected by this. I think that it should be
reverted in v14 and kept only in master.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Allow pg_signal_backend members to use pg_log_backend_memory_stats().

2021-10-25 Thread Bossart, Nathan
On 10/25/21, 2:21 AM, "Bharath Rupireddy" 
 wrote:
> On Mon, Oct 25, 2021 at 12:40 PM Michael Paquier  wrote:
>> Hmm.  Why don't you split the patch into two parts that can be
>> discussed separately then?  There would be one to remove all the
>> superuser() checks you can think of, and a potential second to grant
>> those function's execution to some system role.
>
> IMO, in this thread we can focus on remvong the
> pg_log_backend_memory_contexts()'s superuser() check and +1 to start a
> separate thread to remove superuser() checks for the other functions
> and REVOKE the permissions in appropriate places, for system functins
> system_functions.sql files, for extension functions, the extension
> installation .sql files. See [1] and [2].

I like the general idea of removing hard-coded superuser checks first
and granting execution to predefined roles second.  I don't have any
strong opinion about what should be done in this thread and what
should be done elsewhere.

Nathan



Re: XTS cipher mode for cluster file encryption

2021-10-25 Thread Stephen Frost
Greetings,

* Yura Sokolov (y.soko...@postgrespro.ru) wrote:
> В Чт, 21/10/2021 в 13:28 -0400, Stephen Frost пишет:
> > I really don't think this is necessary.  Similar to PageSetChecksumCopy
> > and PageSetChecksumInplace, I'm sure we would have functions which are
> > called in the appropriate spots to do encryption (such as 'encrypt_page'
> > and 'encrypt_block' in the Cybertec patch) and folks could review those
> > in relative isolation to the rest.  Dealing with blocks in PG is already
> > pretty well handled, the infrastructure that needs to be added is around
> > handling temporary files and is being actively worked on ... if we could
> > move past this debate around if we should be adding support for XTS or
> > if only GCM-SIV would be accepted.
> > 
> > .
> > 
> > No, the CTR approach isn't great because, as has been discussed quite a
> > bit already, using the LSN as the IV means that different data might be
> > re-encrypted with the same LSN and that's not an acceptable thing to
> > have happen with CTR.
> > 
> > .
> > 
> > We've discussed at length how using CTR for heap isn't a good idea even
> > if we're using the LSN for the IV, while if we use XTS then we don't
> > have the issues that CTR has with IV re-use and using the LSN (plus
> > block number and perhaps other things).  Nothing in what has been
> > discussed here has really changed anything there that I can see and so
> > it's unclear to me why we continue to go round and round with it.
> > 
> 
> Instead of debatting XTS vs GCM-SIV I'd suggest Google's Adiantum [1][2]
> [3][4].

That sounds like a great thing to think about adding ... after we get
something in that's based on XTS.

> It is explicitely created to solve large block encryption issue - disk
> encryption. It is used to encrypt 4kb at whole, but in fact has no
> (practical) limit on block size: it is near-zero modified to encrypt 1kb
> or 8kb or 32kb.
> 
> It has benefits of both XTS and GCM-SIV:
> - like GCM-SIV every bit of cipher text depends on every bit of plain text
> - therefore like GCM-SIV it is resistant to IV reuse: it is safe to reuse
>   LSN+reloid+blocknumber tuple as IV even for hint-bit changes since every
>   block's bit will change.

The advantage of GCM-SIV is that it provides integrity as well as
confidentiality.

> - like XTS it doesn't need to change plain text format and doesn't need in
>   additional Nonce/Auth Code.

Sure, in which case it's something that could potentially be added later
as another option in the future.  I don't think we'll always have just
one encryption method and it's good to generally think about what it
might look like to have others but I don't think it makes sense to try
and get everything in all at once.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: Predefined role pg_maintenance for VACUUM, ANALYZE, CHECKPOINT.

2021-10-25 Thread Bossart, Nathan
On 10/24/21, 11:13 PM, "Jeff Davis"  wrote:
> On Sun, 2021-10-24 at 21:32 +, Bossart, Nathan wrote:
>> My initial reaction was that members of pg_maintenance should be able
>> to do all of these things (VACUUM, ANALYZE, CLUSTER, REINDEX, and
>> CHECKPOINT).
>
> What about REFRESH MATERIALIZED VIEW? That seems more specific to a
> workload, but it's hard to draw a clear line between that and CLUSTER.

Hm.  CLUSTER reorders the content of a table but does not change it.
REFRESH MATERIALIZED VIEW, on the other hand, does replace the
content.  I think that's the sort of line I'd draw between REFRESH
MATERIALIZED VIEW and the other commands as well, so I'd leave it out
of pg_maintenance.

Nathan



Re: [PATCH] Fix memory corruption in pg_shdepend.c

2021-10-25 Thread Tom Lane
Michael Paquier  writes:
> I was thinking on this one over the last couple of days, and doing a
> copy of shared deps from a template within createdb still feels
> natural, as of this patch:
> https://www.postgresql.org/message-id/yxdtl+pfsnqmb...@paquier.xyz
> Would somebody object to the addition of this test?  Or perhaps
> somebody has a better idea?

I agree that we're not testing that area well enough.  Proposed
patch seems basically OK, but I think the test needs to be stricter
about what the expected output looks like --- for instance, it
wouldn't complain if tab_foobar were described as something other
than a table.

regards, tom lane




Re: XTS cipher mode for cluster file encryption

2021-10-25 Thread Stephen Frost
Greetings,

* Bruce Momjian (br...@momjian.us) wrote:
> On Tue, Oct 19, 2021 at 02:54:56PM -0400, Stephen Frost wrote:
> > * Sasasu (i...@sasa.su) wrote:
> > > A unified block-based I/O API sounds great. Has anyone tried to do this
> > > before? It would be nice if the front-end tools could also use these API.
> > 
> > The TDE patch from Cybertec did go down this route, but the API ended up
> > being rather different which menat a lot of changes in other parts of
> > the system.  If we can get a block-based temporary file method that
> > maintains more-or-less the same API, that'd be great, but I'm not sure
> > that we can really do so and I am not entirely convinced that we should
> > make the TDE effort depend on an otherwise quite independent effort of
> > making all temp files usage be block based.
> 
> Uh, I thought people felt the Cybertec patch was too large and that a
> unified API for temporary file I/O-encryption was a requirement.  Would
> a CTR-steaming-encryption API for temporary tables be easier to
> implement?

Having a unified API for temporary file I/O (that could then be extended
to provide encryption) would definitely help with reducing the size of a
TDE patch.  The approach used in the Cybertec patch was to make
temporary file access block based, but the way that was implemented was
with an API different from pread/pwrite and that meant changing pretty
much all of the call sites for temporary file access, which naturally
resulted in changes in a lot of otherwise unrelated code.

There was an argument put forth that a block-based API for temporary
file access would generally be good as it would mean fewer syscalls.  If
we can get behind that and make it happen in (relatively) short order
then we'd certainly be better off when it comes to implementing TDE
which also deals with temporary files.  I'm a bit concerned about that
approach due to the changes needed but I'm also not against it.  I do
think that an API which was more-or-less the same as what's used today
would be a smaller change and therefore might be easier to get in and
that it'd also make a TDE patch smaller.  Perhaps both could be
accomplished (an API that's similar to today, but the actual file access
being block-based).

Either way, we should get that unification done in the core code first
as an independent effort.  That's what I hope the Cybertec folks have
had a chance to work on.

As for the specific encryption method to use, using CTR would be simpler
as it doesn't require access to be block-based, though we would need to
make sure to not re-use the IV across any of the temporary files being
created (potentially concurrently).  Probably not that hard to do but
just something to make sure we do.  Of course, if we arrange for
block-based access then we could use XTS or perhaps GCM/GCM-SIV if we
wanted to.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: when the startup process doesn't (logging startup delays)

2021-10-25 Thread Robert Haas
On Tue, Oct 19, 2021 at 9:06 AM Nitin Jadhav
 wrote:
> Thanks for sharing the patch. Overall approach looks good to me. But
> just one concern about using enable_timeout_every() functionality. As
> per my understanding the caller should calculate the first scheduled
> timeout (now + interval) and pass it as the second argument but this
> is not the same in 'v2-0002-Quick-testing-hack.patch'. Anyways I have
> done the changes as I have mentioned (like now + interval). Kindly
> correct me if I am wrong. I am attaching 2 patches here.
> 'v19-0001-Add-enable_timeout_every-to-fire-the-same-timeout.patch' is
> the same as Robert's v2 patch. I have rebased my patch on top of this
> and it is 'v19-0002-startup-progress.patch'.

This version looks fine, so I have committed it (and my
enable_timeout_every patch also, as a necessary prerequisite).

Thanks,

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: pg_dump versus ancient server versions

2021-10-25 Thread Tom Lane
Andrew Dunstan  writes:
> On 10/25/21 10:23, Tom Lane wrote:
>> (Hmmm ... but disk space could
>> become a problem, particularly on older machines with not so much
>> disk.  Do we really need to maintain a separate checkout for each
>> branch?  It seems like a fresh checkout from the repo would be
>> little more expensive than the current copy-a-checkout process.)

> If you set it up with these settings then the disk space used is minimal:
>      git_use_workdirs => 1,
>      rm_worktrees => 1,

Maybe we should make those the defaults?  AFAICS the current
default setup uses circa 200MB per back branch, even between runs.
I'm not sure what that is buying us.

regards, tom lane




Re: [PATCH] Make ENOSPC not fatal in semaphore creation

2021-10-25 Thread Mikhail
On Sun, Oct 17, 2021 at 10:29:24AM -0400, Tom Lane wrote:
> Also, you haven't explained why the existing (and much safer) recycling
> logic in IpcSemaphoreCreate doesn't solve your problem.

I think I'll drop the diffs, you're right that current proven logic need
not to be changed for such rare corner case, which DBA can fix.

I've added references to ipcs(1) and ipcrm(1) in OpenBSD's semget(2) man
page, so newcomer won't need to spend hours digging in sysv semas
management, if one would encounter the same situation as I did.

Thanks for reviews.




Re: Feature request for adoptive indexes

2021-10-25 Thread Tomas Vondra

Hi,

On 10/25/21 16:07, Hayk Manukyan wrote:

Hi everyone. I want to do some feature request regarding indexes, as far as
I know this kind of functionality doesn't exists in Postgres. Here is my
problem :
I need to create following indexes:
         Create index job_nlp_year_scan on ingest_scans_stageing
(`job`,`nlp`,`year`,`scan_id`);
         Create index job_nlp_year_issue_flag on ingest_scans_stageing
(`job`,`nlp`,`year`,`issue_flag`);
         Create index job_nlp_year_sequence on ingest_scans_stageing
(`job`,`nlp`,`year`,`sequence`);
As you can see the first 3 columns are the same (job, nlp, year). so if I
create 3 different indexes db should manage same job_nlp_year structure 3
times.
The Data Structure that I think which can be efficient in this kind of
scenarios is to have 'Adaptive Index'  which will be something like
Create index job_nlp_year on ingest_scans_stageing
(`job`,`nlp`,`year`,(`issue_flag`,`scan_id`, `sequence`));
And depend on query it will use or job_nlp_year_scan  or
job_nlp_year_issue_flag , or job_nlp_year_sequence ( job, nlp, year and one
of ( `issue_flag` , `scan_id` ,  `sequence` )
For more description please feel free to refer me


It's not very clear what exactly would the "adaptive index" do, except 
that it'd have all three columns. Clearly, the three columns can't be 
considered for ordering etc. but need to be in the index somehow. So why 
wouldn't it be enough to either to create an index with all six columns?


CREATE INDEX ON job_nlp_year_scan (job, nlp, year, scan_id, issue_flag, 
sequence);


or possibly with the columns just "included" in the index:

CREATE INDEX ON job_nlp_year_scan (job, nlp, year) INCLUDE (scan_id, 
issue_flag, sequence);


If this does not work, you either need to explain more clearly what 
exactly the adaptive indexes does, or show queries that can't benefit 
from these existing features.



regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: pg_dump versus ancient server versions

2021-10-25 Thread Tom Lane
Alvaro Herrera  writes:
> On 2021-Oct-25, Tom Lane wrote:
>> Also, I concur with Andrew's point that we'd really have to have
>> buildfarm support.  However, this might not be as bad as it seems.
>> In principle we might just need to add resurrected branches back to
>> the branches_to_build list.

> Well, we would add them to *some* list, but not to the one used by stock
> BF members -- not only because of the diskspace issue but also because
> of the time to build.  I suggest that we should have a separate
> list-of-branches file that would only be used by BF members especially
> configured to do so; and hopefully we won't allow more than a handful
> animals to do that but rather a well-chosen subset, and also maybe allow
> only GCC rather than try to support other compilers.  (There's no need
> to ensure compilability on any Windows platform, for example.)

Meh.  I don't think that's a great approach, because then we're only
ensuring buildability on a rather static set of platforms.  The whole
point here is that when release N+1 of $your_favorite_platform arrives,
we want to know whether the old branches still build on it.  If the
default behavior for new buildfarm animals is to ignore the old branches,
we're much less likely to find that out.

It's also unclear to me why we'd leave Windows out of this discussion.
We keep saying we want to encourage Windows-based hackers to contribute,
so doesn't that require testing it on the same basis as other platforms?

regards, tom lane




Re: pg_dump versus ancient server versions

2021-10-25 Thread Andrew Dunstan


On 10/25/21 11:05, Alvaro Herrera wrote:
>
>> Also, I concur with Andrew's point that we'd really have to have
>> buildfarm support.  However, this might not be as bad as it seems.
>> In principle we might just need to add resurrected branches back to
>> the branches_to_build list.
> Well, we would add them to *some* list, but not to the one used by stock
> BF members -- not only because of the diskspace issue but also because
> of the time to build.  I suggest that we should have a separate
> list-of-branches file that would only be used by BF members especially
> configured to do so; and hopefully we won't allow more than a handful
> animals to do that but rather a well-chosen subset, and also maybe allow
> only GCC rather than try to support other compilers.  (There's no need
> to ensure compilability on any Windows platform, for example.)


Well, we do build with gcc on Windows :-) But yes, maybe we should make
this a more opt-in process.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: pg_dump versus ancient server versions

2021-10-25 Thread Andrew Dunstan


On 10/25/21 10:23, Tom Lane wrote:
>
> Also, I concur with Andrew's point that we'd really have to have
> buildfarm support.  However, this might not be as bad as it seems.
> In principle we might just need to add resurrected branches back to
> the branches_to_build list.  Given my view of what the back-patching
> policy ought to be, a new build in an old branch might only be
> required a couple of times a year, which would not be an undue
> investment of buildfarm resources.  (Hmmm ... but disk space could
> become a problem, particularly on older machines with not so much
> disk.  Do we really need to maintain a separate checkout for each
> branch?  It seems like a fresh checkout from the repo would be
> little more expensive than the current copy-a-checkout process.)


If you set it up with these settings then the disk space used is minimal:

     git_use_workdirs => 1,
     rm_worktrees => 1,

So I have this on crake:

andrew@emma:root $ du -sh REL*/pgsql
5.5M    REL_10_STABLE/pgsql
5.6M    REL_11_STABLE/pgsql
5.6M    REL_12_STABLE/pgsql
5.6M    REL_13_STABLE/pgsql
2.0M    REL_14_STABLE/pgsql
2.6M    REL9_5_STABLE/pgsql
5.5M    REL9_6_STABLE/pgsql


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: pg_dump versus ancient server versions

2021-10-25 Thread Tom Lane
Robert Haas  writes:
> On Mon, Oct 25, 2021 at 11:00 AM Tom Lane  wrote:
>> Actually, I think we do.  If I want to test against 7.4, ISTM I want
>> to test against the last released 7.4 version, not something with
>> arbitrary later changes.  Otherwise, what exactly is the point?

> 1. You're free to check out any commit you like.

Yeah, and get something that won't build.  If there's any point
to this work at all, it has to be that we'll maintain the closest
possible buildable approximation to the last released version.

> Oh, OK. I wonder how that plays with the buildfarm status page's
> desire to drop old results that are more than 30 days old. I guess
> you'd just need to force a run at least every 28 days or something.

I don't think it's a problem.  If we haven't committed anything to
branch X in a month, it's likely not interesting.  It might be worth
having a way to get the website to show results further back than
a month, but that doesn't need to be in the default view.

regards, tom lane




Re: pg_dump versus ancient server versions

2021-10-25 Thread Andrew Dunstan


On 10/25/21 11:09, Robert Haas wrote:
> On Mon, Oct 25, 2021 at 11:00 AM Tom Lane  wrote:
>> Actually, I think we do.  If I want to test against 7.4, ISTM I want
>> to test against the last released 7.4 version, not something with
>> arbitrary later changes.  Otherwise, what exactly is the point?
> 1. You're free to check out any commit you like.
>
> 2. Nothing I said can reasonably be confused with "let's allow
> arbitrary later changes."
>
>> Uh, don't we have that already?  I know you can configure a buildfarm
>> animal to force a run at least every-so-often, but it's not required,
>> and I don't think it's even the default.


Yes, in fact its rather discouraged. The default is just to build when
there's a code change detected.


> Oh, OK. I wonder how that plays with the buildfarm status page's
> desire to drop old results that are more than 30 days old. I guess
> you'd just need to force a run at least every 28 days or something.
>

Well, we could do that, or we could modify the way the server does the
status. The table it's based on has the last 500 records for each branch
for each animal, so the data is there.


cheers


andrew




--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: pg_dump versus ancient server versions

2021-10-25 Thread Robert Haas
On Mon, Oct 25, 2021 at 11:00 AM Tom Lane  wrote:
> Actually, I think we do.  If I want to test against 7.4, ISTM I want
> to test against the last released 7.4 version, not something with
> arbitrary later changes.  Otherwise, what exactly is the point?

1. You're free to check out any commit you like.

2. Nothing I said can reasonably be confused with "let's allow
arbitrary later changes."

> Uh, don't we have that already?  I know you can configure a buildfarm
> animal to force a run at least every-so-often, but it's not required,
> and I don't think it's even the default.

Oh, OK. I wonder how that plays with the buildfarm status page's
desire to drop old results that are more than 30 days old. I guess
you'd just need to force a run at least every 28 days or something.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: pg_dump versus ancient server versions

2021-10-25 Thread Alvaro Herrera
On 2021-Oct-25, Tom Lane wrote:

> Roughly speaking, I think the policy should be "no feature bug fixes,
> not even security fixes, for EOL'd branches; only fixes that are
> minimally necessary to make it build on newer platforms".  And
> I want to have a sunset provision even for that.  Fixing every branch
> forevermore doesn't scale.

Agreed.  I think dropping such support at the same time we drop
psql/pg_dump support is a decent answer to that.  That meets the stated
purpose of being able to test such support, and also it moves forward
according to subjective choice per development needs.

> Also, I concur with Andrew's point that we'd really have to have
> buildfarm support.  However, this might not be as bad as it seems.
> In principle we might just need to add resurrected branches back to
> the branches_to_build list.

Well, we would add them to *some* list, but not to the one used by stock
BF members -- not only because of the diskspace issue but also because
of the time to build.  I suggest that we should have a separate
list-of-branches file that would only be used by BF members especially
configured to do so; and hopefully we won't allow more than a handful
animals to do that but rather a well-chosen subset, and also maybe allow
only GCC rather than try to support other compilers.  (There's no need
to ensure compilability on any Windows platform, for example.)

-- 
Álvaro Herrera  Valdivia, Chile  —  https://www.EnterpriseDB.com/
"Ed is the standard text editor."
  http://groups.google.com/group/alt.religion.emacs/msg/8d94ddab6a9b0ad3




Re: ThisTimeLineID can be used uninitialized

2021-10-25 Thread Robert Haas
On Thu, Oct 21, 2021 at 3:29 PM Robert Haas  wrote:
> I think the correct fix for this particular problem is just to delete
> the initialization, as in the attached patch. I spent a long time
> studying this today and eventually convinced myself that there's just
> no way these initializations can ever do anything (details in proposed
> commit message).  While it is important that we do not access the
> global variable when it's uninitialized, here there is no reason to
> access it in the first place.

I have committed this.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: pg_dump versus ancient server versions

2021-10-25 Thread Tom Lane
Robert Haas  writes:
> On Mon, Oct 25, 2021 at 10:23 AM Tom Lane  wrote:
>> Roughly speaking, I think the policy should be "no feature bug fixes,
>> not even security fixes, for EOL'd branches; only fixes that are
>> minimally necessary to make it build on newer platforms".  And
>> I want to have a sunset provision even for that.  Fixing every branch
>> forevermore doesn't scale.

> Sure, but you can ameliorate that a lot by just saying it's something
> people have the *option* to do, not something anybody is *expected* to
> do. I agree it's best if we continue to discourage back-patching bug
> fixes into supported branches, but I also think we don't need to be
> too stringent about this.

Actually, I think we do.  If I want to test against 7.4, ISTM I want
to test against the last released 7.4 version, not something with
arbitrary later changes.  Otherwise, what exactly is the point?

>> In principle we might just need to add resurrected branches back to
>> the branches_to_build list.  Given my view of what the back-patching
>> policy ought to be, a new build in an old branch might only be
>> required a couple of times a year, which would not be an undue
>> investment of buildfarm resources.

> I suppose it would be useful if we had the ability to do new runs only
> when the source code has changed...

Uh, don't we have that already?  I know you can configure a buildfarm
animal to force a run at least every-so-often, but it's not required,
and I don't think it's even the default.

regards, tom lane




Re: [Patch] ALTER SYSTEM READ ONLY

2021-10-25 Thread Robert Haas
On Mon, Oct 25, 2021 at 3:05 AM Amul Sul  wrote:
> Ok, did the same in the attached 0001 patch.
>
> There is no harm in calling LocalSetXLogInsertAllowed() calling
> multiple times, but the problem I can see is that with this patch user
> is allowed to call LocalSetXLogInsertAllowed() at the time it is
> supposed not to be called e.g. when LocalXLogInsertAllowed = 0;
> WAL writes are explicitly disabled.

I've pushed 0001 and 0002 but I reversed the order of them and made a
few other edits.

I don't really see the issue you mention here as a problem. There's
only one place where we set LocalXLogInsertAllowed = 0, and I don't
know that we'll ever have another one.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: pg_dump versus ancient server versions

2021-10-25 Thread Robert Haas
On Mon, Oct 25, 2021 at 10:23 AM Tom Lane  wrote:
> What concerns me here is that we not get into a position where we're
> effectively still maintaining EOL'd versions.  Looking at the git
> history yesterday reminded me that we had such a situation back in
> the early 7.x days.  I can see that I still occasionally made commits
> into 7.1 and 7.2 years after the last releases of those branches,
> which ended up being a complete waste of effort.  There was no policy
> guiding what to back-patch into what branches, partly because we
> didn't have a defined EOL policy then.  So I want to have a policy
> (and a pretty tight one) before I'll go back to doing that.
>
> Roughly speaking, I think the policy should be "no feature bug fixes,
> not even security fixes, for EOL'd branches; only fixes that are
> minimally necessary to make it build on newer platforms".  And
> I want to have a sunset provision even for that.  Fixing every branch
> forevermore doesn't scale.

Sure, but you can ameliorate that a lot by just saying it's something
people have the *option* to do, not something anybody is *expected* to
do. I agree it's best if we continue to discourage back-patching bug
fixes into supported branches, but I also think we don't need to be
too stringent about this. What I think we don't want is, for example,
somebody working at company X deciding to back-patch all the bug fixes
that customers of company X cares about into our back-branches, but
not the other ones. But on the other hand if somebody is trying to
benchmark or test compatibility an old branch and it keeps crashing
because of some bug, telling them that they're not allowed to fix that
bug because it's not a sufficiently-minimal change to a dead branch is
kind of ridiculous. In other words, if you try to police every change
anyone wants to make, e.g. "well I know that would help YOU build on a
newer platform but it doesn't seem like it meets the criteria of the
minimum necessary change to make it build on a newer platform," then
you might as well just give up now. Nobody cares about the older
branches enough to put work into fixing whatever's wrong and then
having to argue about whether that work ought to be thrown away
anyway.

> There's also the question of how we get to a working state in the
> first place -- as we found upthread, there's a fair-sized amount
> of work to do just to restore buildability right now, for anything
> that was EOL'd more than a year or two back.  I'm not volunteering
> for that, but somebody would have to to get things off the ground.

Right.

> Also, I concur with Andrew's point that we'd really have to have
> buildfarm support.  However, this might not be as bad as it seems.
> In principle we might just need to add resurrected branches back to
> the branches_to_build list.  Given my view of what the back-patching
> policy ought to be, a new build in an old branch might only be
> required a couple of times a year, which would not be an undue
> investment of buildfarm resources.  (Hmmm ... but disk space could
> become a problem, particularly on older machines with not so much
> disk.  Do we really need to maintain a separate checkout for each
> branch?  It seems like a fresh checkout from the repo would be
> little more expensive than the current copy-a-checkout process.)

I suppose it would be useful if we had the ability to do new runs only
when the source code has changed...

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Feature request for adoptive indexes

2021-10-25 Thread Hayk Manukyan
Hi everyone. I want to do some feature request regarding indexes, as far as
I know this kind of functionality doesn't exists in Postgres. Here is my
problem :
I need to create following indexes:
Create index job_nlp_year_scan on ingest_scans_stageing
(`job`,`nlp`,`year`,`scan_id`);
Create index job_nlp_year_issue_flag on ingest_scans_stageing
(`job`,`nlp`,`year`,`issue_flag`);
Create index job_nlp_year_sequence on ingest_scans_stageing
(`job`,`nlp`,`year`,`sequence`);
As you can see the first 3 columns are the same (job, nlp, year). so if I
create 3 different indexes db should manage same job_nlp_year structure 3
times.
The Data Structure that I think which can be efficient in this kind of
scenarios is to have 'Adaptive Index'  which will be something like
Create index job_nlp_year on ingest_scans_stageing
(`job`,`nlp`,`year`,(`issue_flag`,`scan_id`, `sequence`));
And depend on query it will use or job_nlp_year_scan  or
job_nlp_year_issue_flag , or job_nlp_year_sequence ( job, nlp, year and one
of ( `issue_flag` , `scan_id` ,  `sequence` )
For more description please feel free to refer me


Re: pg_dump versus ancient server versions

2021-10-25 Thread Tom Lane
Robert Haas  writes:
> Right. Well, we could leave it up to people who care to decide how
> much work they want to do, perhaps. But I do find it annoying that
> pg_dump is supposed to maintain compatibility with server releases
> that I can't easily build. Fortunately I don't patch pg_dump very
> often, but if I did, it'd be very difficult for me to verify that
> things work against really old versions. I know that you (Tom) do a
> lot of work of this type though. In my opinion, if you find yourself
> working on a project of this type and as part of that you do some
> fixes to an older branch to make it compile, maybe you ought to commit
> those so that the next person doesn't have the same problem.

Well, the answer to that so far is that I've never done such fixes.
I have the last released versions of old branches laying around,
and that's what I test against.  It's been sufficient so far, although
if I suddenly needed to do (say) SSL-enabled testing, that would be
a problem because I don't think I built with SSL for any of those
branches.

Because of that angle, I concur with your position that it'd really
be desirable to be able to build old versions on modern platforms.
Even if you've got an old executable, it might be misconfigured for
the purpose you have in mind.

> And maybe
> when we add support for newer versions of OpenSSL or Windows, we ought
> to consider back-patching those even to unsupported releases if
> someone's willing to do the work. If they're not, they're not, but I
> think we tend to strongly discourage commits to EOL branches, and I
> think maybe we should stop doing that. Not that people should
> routinely back-patch bug fixes, but stuff that makes it easier to
> build seems fair game.

What concerns me here is that we not get into a position where we're
effectively still maintaining EOL'd versions.  Looking at the git
history yesterday reminded me that we had such a situation back in
the early 7.x days.  I can see that I still occasionally made commits
into 7.1 and 7.2 years after the last releases of those branches,
which ended up being a complete waste of effort.  There was no policy
guiding what to back-patch into what branches, partly because we
didn't have a defined EOL policy then.  So I want to have a policy
(and a pretty tight one) before I'll go back to doing that.

Roughly speaking, I think the policy should be "no feature bug fixes,
not even security fixes, for EOL'd branches; only fixes that are
minimally necessary to make it build on newer platforms".  And
I want to have a sunset provision even for that.  Fixing every branch
forevermore doesn't scale.

There's also the question of how we get to a working state in the
first place -- as we found upthread, there's a fair-sized amount
of work to do just to restore buildability right now, for anything
that was EOL'd more than a year or two back.  I'm not volunteering
for that, but somebody would have to to get things off the ground.

Also, I concur with Andrew's point that we'd really have to have
buildfarm support.  However, this might not be as bad as it seems.
In principle we might just need to add resurrected branches back to
the branches_to_build list.  Given my view of what the back-patching
policy ought to be, a new build in an old branch might only be
required a couple of times a year, which would not be an undue
investment of buildfarm resources.  (Hmmm ... but disk space could
become a problem, particularly on older machines with not so much
disk.  Do we really need to maintain a separate checkout for each
branch?  It seems like a fresh checkout from the repo would be
little more expensive than the current copy-a-checkout process.)

regards, tom lane




Re: pg_dump versus ancient server versions

2021-10-25 Thread Robert Haas
On Mon, Oct 25, 2021 at 8:29 AM Andrew Dunstan  wrote:
> But we don't need to build them on modern platforms, just run them on
> modern platforms, ISTM.

I don't really agree with this.

> Some months ago I built binaries all the way back to 7.2 that with a
> little help run on modern Fedora and Ubuntu systems. I just upgraded my
> Fedora system from 31 to 34 and they still run. See
>  One of the intended use cases
> was to test pg_dump against old versions.

That's cool, but I don't have a Fedora or Ubuntu VM handy, and it does
seem like if people are working on testing against old versions, they
might even want to be able to recompile with debugging statements
added or something. So I think actually compiling is a lot better than
being able to get working binaries from someplace, even though the
latter is better than nothing.

> I'm not opposed to us cutting off support for very old versions,
> although I think we should only do that very occasionally (no more than
> once every five years, say) unless there's a very good reason. I'm also
> not opposed to us making small adjustments to allow us to build old
> versions on modern platforms, but if we do that then we should probably
> have some buildfarm support for it.

Yeah, I think having a small number of buildfarm animals testing very
old versions would be nice. Perhaps we can call them tyrannosaurus,
brontosaurus, triceratops, etc. :-)

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: pg_dump versus ancient server versions

2021-10-25 Thread Robert Haas
On Sun, Oct 24, 2021 at 5:46 PM Tom Lane  wrote:
> Hmm ... I guess the question is how much work we feel like putting
> into that, and how we'd track whether old branches still work,
> and on what platforms.  It could easily turn into a time sink
> that's not justified by the value.  I do see your point that there's
> some value in it; I'm just not sure about the cost/benefit ratio.

Right. Well, we could leave it up to people who care to decide how
much work they want to do, perhaps. But I do find it annoying that
pg_dump is supposed to maintain compatibility with server releases
that I can't easily build. Fortunately I don't patch pg_dump very
often, but if I did, it'd be very difficult for me to verify that
things work against really old versions. I know that you (Tom) do a
lot of work of this type though. In my opinion, if you find yourself
working on a project of this type and as part of that you do some
fixes to an older branch to make it compile, maybe you ought to commit
those so that the next person doesn't have the same problem. And maybe
when we add support for newer versions of OpenSSL or Windows, we ought
to consider back-patching those even to unsupported releases if
someone's willing to do the work. If they're not, they're not, but I
think we tend to strongly discourage commits to EOL branches, and I
think maybe we should stop doing that. Not that people should
routinely back-patch bug fixes, but stuff that makes it easier to
build seems fair game.

I don't think we need to worry too much about users getting the wrong
impression. People who want to know what is supported are going to
look at our web site for that information, and they are going to look
for releases. I can't rule out the possibility that someone is going
to build an updated version of 7.4 or 8.2 with whatever patches we
might choose to commit there, but they're unlikely to think that means
those are fully supported branches. And if they somehow do think that
despite all evidence to the contrary, we can just tell them that they
are mistaken.

> One thing we could do that would help circumscribe the costs is to say
> "we are not going to consider issues involving new compiler warnings
> or bugs caused by more-aggressive optimization".  We could mechanize
> that pretty effectively by changing configure shortly after a branch's
> EOL to select -O0 and no extra warning flags, so that anyone building
> from branch tip would get those switch choices.

I don't much like the idea of including -O0 because it seems like it
could be confusing. People might not realize that that the build
settings have been changed. I don't think that's really the problem
anyway: anybody who hits compiler warnings in older branches could
decide to fix them (and as long as it's a committer who will be
responsible for their own work, I think that's totally fine) or enable
-O0 locally. I routinely do that when I hit problems on older
branches, and it helps a lot, but the way I see it, that's such an
easy change that there's little reason to make it in the source code.
What's a lot more annoying is if the compile fails altogether, or you
can't even get past the configure step.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: pg_dump versus ancient server versions

2021-10-25 Thread Andrew Dunstan


On 10/22/21 19:30, Tom Lane wrote:
> "David G. Johnston"  writes:
>> On Fri, Oct 22, 2021 at 3:42 PM Tom Lane  wrote:
>>> Anyway, I think the default answer is "revert 92316a458 and keep the
>>> compatibility goalposts where they are".  But I wanted to open up a
>>> discussion to see if anyone likes the other approach better.
>> ... IMO, the bar for this kind of situation should be 10 releases at
>> most - 5 of which would be in support at the time the patch goes in.  We
>> don't have to actively drop support of older stuff but anything older
>> shouldn't be preventing new commits.
> Yeah.  I checked into when it was that we dropped pre-8.0 support
> from pg_dump, and the answer is just about five years ago (64f3524e2).
> So moving the bar forward by five releases isn't at all out of line.
> 8.4 would be eight years past EOL by the time v15 comes out.
>
> One of the arguments for the previous change was that it was getting
> very hard to build old releases on modern platforms, thus making it
> hard to do any compatibility testing.  I believe the same is starting
> to become true of the 8.x releases, though I've not tried personally
> to build any of them in some time.  (The executables I'm using for
> them date from 2014 or earlier, and have not been recompiled in
> subsequent platform upgrades ...)  Anyway it's definitely not free
> to continue to support old source server versions.


But we don't need to build them on modern platforms, just run them on
modern platforms, ISTM.

Some months ago I built binaries all the way back to 7.2 that with a
little help run on modern Fedora and Ubuntu systems. I just upgraded my
Fedora system from 31 to 34 and they still run. See
 One of the intended use cases
was to test pg_dump against old versions.

I'm not opposed to us cutting off support for very old versions,
although I think we should only do that very occasionally (no more than
once every five years, say) unless there's a very good reason. I'm also
not opposed to us making small adjustments to allow us to build old
versions on modern platforms, but if we do that then we should probably
have some buildfarm support for it.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





  1   2   >