date:20220228

Re: Allow file inclusion in pg_hba and pg_ident files

2022-02-28 Thread Michael Paquier

On Mon, Feb 28, 2022 at 07:42:17PM +0800, Julien Rouhaud wrote:
> Done in attached v2.  I did the split in a separate commit, as the diff is
> otherwise unreadable.  While at it I also fixed a few minor issues (I missed a
> MemoryContextDelete, and now avoid relying on inet_net_pton which apparently
> doesn't exist in cygwin).

Hmm.  The diffs of 0001 are really hard to read.  Do you know why this
is happening?  Is that because some code has been moved around?  I
have been doing a comparison of all the routines showing up in the
diffs, to note that the contents of load_hba(), load_ident(),
hba_getauthmethod() & friends are actually the same, but this makes
the change history harder to follow.  Moving around fill_hba_line()
and fill_hba_view() should be enough, indeed.

+#include "utils/guc.h"
+//#include "utils/tuplestore.h"

Ditto.

+   /* Build a tuple descriptor for our result type */
+   if (get_call_result_type(fcinfo, NULL, ) != TYPEFUNC_COMPOSITE)
+   elog(ERROR, "return type must be a row type");

Worth noting that I was planning to apply a patch from Melanie
Plageman to simplify the creation of tupledesc and tuplestores for
set-returning functions like this one, so this would cut a bit of code
here.  This is not directly related to your patch, though, that's my
business :)

Well, as of 0002, one thing that makes things harder to follow is that
parse_ident_line() is moved at a different place in hba.c, one slight
difference being err_msg to store the error message in the token
line..  But shouldn't the extension of parse_ident_line() with its
elevel be included in 0001?  Or, well, it could just be done in its
own patch to make for a cleaner history, so as 0002 could be shaped as
two commits itself.

Also, it seems to me that we'd better have some TAP tests for that to
make sure of its contents?  One place would be src/test/auth/.
Another place where we make use of user mapping is the krb5 tests but
these are run in a limited fashion in the buildfarm.  We also set some
mappings for SSPI on Windows all the time, so we'd better be careful
about that and paint some $windows_os in the tests when looking at the
output of the view.

+-- We expect no user mapping in this test
+select count(*) = 0 as ok from pg_ident_file_mappings;

It could be possible to do installcheck on an instance that has user
mappings meaning that this had better be ">= 0", no?  Does this pass
on Windows where pg_regress sets some mappings for SSPI when creating
one or more roles?
--
Michael

signature.asc
Description: PGP signature

Re: Allow async standbys wait for sync replication

2022-02-28 Thread Kyotaro Horiguchi

(Now I understand what "async" mean here..)

At Mon, 28 Feb 2022 22:05:28 -0800, Nathan Bossart  
wrote in 
> On Tue, Mar 01, 2022 at 11:10:09AM +0530, Bharath Rupireddy wrote:
> > On Tue, Mar 1, 2022 at 12:27 AM Nathan Bossart  
> > wrote:
> >> My feedback is specifically about this behavior.  I don't think we should
> >> spin in XLogSend*() waiting for an LSN to be synchronously replicated.  I
> >> think we should just choose the SendRqstPtr based on what is currently
> >> synchronously replicated.
> > 
> > Do you mean something like the following?
> > 
> > /* Main loop of walsender process that streams the WAL over Copy messages. 
> > */
> > static void
> > WalSndLoop(WalSndSendDataCallback send_data)
> > {
> > /*
> >  * Loop until we reach the end of this timeline or the client requests 
> > to
> >  * stop streaming.
> >  */
> > for (;;)
> > {
> > if (am_async_walsender && there_are_sync_standbys)
> > {
> >  XLogRecPtr SendRqstLSN;
> >  XLogRecPtr SyncFlushLSN;
> > 
> > SendRqstLSN = GetFlushRecPtr(NULL);
> > LWLockAcquire(SyncRepLock, LW_SHARED);
> > SyncFlushLSN = walsndctl->lsn[SYNC_REP_WAIT_FLUSH];
> > LWLockRelease(SyncRepLock);
> > 
> > if (SendRqstLSN > SyncFlushLSN)
> >continue;
> > }

The current trend is energy-savings. We never add a "wait for some
fixed time then exit if the condition makes, otherwise repeat" loop
for this kind of purpose where there's no guarantee that the loop
exits quite shortly.  Concretely we ought to rely on condition
variables to do that.

> Not quite.  Instead of "continue", I would set SendRqstLSN to SyncFlushLSN
> so that the WAL sender only sends up to the current synchronously

I'm not sure, but doesn't that makes walsender falsely believes it
have caught up to the bleeding edge of WAL?

> replicated LSN.  TBH there are probably other things that need to be
> considered (e.g., how do we ensure that the WAL sender sends the rest once
> it is replicated?), but I still think we should avoid spinning in the WAL
> sender waiting for WAL to be replicated.

It seems to me it would be something similar to
SyncRepReleaseWaiters().  Or it could be possible to consolidate this
feature into the function, I'm not sure, though.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: BufferAlloc: don't take two simultaneous locks

2022-02-28 Thread Yura Sokolov

В Пт, 25/02/2022 в 09:38 +, Simon Riggs пишет:
> On Fri, 25 Feb 2022 at 09:24, Yura Sokolov  wrote:
> 
> > > This approach is cleaner than v1, but should also perform better
> > > because there will be a 1:1 relationship between a buffer and its
> > > dynahash entry, most of the time.
> > 
> > Thank you for suggestion. Yes, it is much clearer than my initial proposal.
> > 
> > Should I incorporate it to v4 patch? Perhaps, it could be a separate
> > commit in new version.
> 
> I don't insist that you do that, but since the API changes are a few
> hours work ISTM better to include in one patch for combined perf
> testing. It would be better to put all changes in this area into PG15
> than to split it across multiple releases.
> 
> > Why there is need for this? Which way backend could be forced to abort
> > between BufTableReuse and BufTableAssign in this code path? I don't
> > see any CHECK_FOR_INTERRUPTS on the way, but may be I'm missing
> > something.
> 
> Sounds reasonable.

Ok, here is v4.
It is with two commits: one for BufferAlloc locking change and other
for dynahash's freelist avoiding.

Buffer locking patch is same to v2 with some comment changes. Ie it uses
Lock+UnlockBufHdr 

For dynahash HASH_REUSE and HASH_ASSIGN as suggested.
HASH_REUSE stores deleted element into per-process static variable.
HASH_ASSIGN uses this element instead of freelist. If there's no
such stored element, it falls back to HASH_ENTER.

I've implemented Robert Haas's suggestion to count element in freelists
instead of nentries:

> One idea is to jigger things so that we maintain a count of the total
> number of entries that doesn't change except when we allocate, and
> then for each freelist partition we maintain the number of entries in
> that freelist partition.  So then the size of the hash table, instead
> of being sum(nentries) is totalsize - sum(nfree).

https://postgr.es/m/CA%2BTgmoZkg-04rcNRURt%3DjAG0Cs5oPyB-qKxH4wqX09e-oXy-nw%40mail.gmail.com

It helps to avoid freelist lock just to actualize counters.
I made it with replacing "nentries" with "nfree" and adding
"nalloced" to each freelist. It also makes "hash_update_hash_key" valid
for key that migrates partitions.

I believe, there is no need for "nalloced" for each freelist, and
instead single such field should be in HASHHDR. More, it seems to me
`element_alloc` function needs no acquiring freelist partition lock
since it is called only during initialization of shared hash table.
Am I right?

I didn't go this path in v4 for simplicity, but can put it to v5
if approved.

To be honest, "reuse" patch gives little improvement. But still
measurable on some connection numbers.

I tried to reduce freelist partitions to 8, but it has mixed impact.
Most of time performance is same, but sometimes a bit lower. I
didn't investigate reasons. Perhaps they are not related to buffer
manager.

I didn't introduce new functions BufTableReuse and BufTableAssign
since there are single call to BufTableInsert and two calls to
BufTableDelete. So I reused this functions, just added "reuse" flag
to BufTableDelete. 

Tests simple_select for Xeon 8354H, 128MB and 1G shared buffers
for scale 100.

1 socket:
  conns | master |   patch_v4 |  master 1G | patch_v4 1G 
++++
  1 |  41975 |  41540 |  52898 |  52213 
  2 |  77693 |  77908 |  97571 |  98371 
  3 | 114713 | 115522 | 142709 | 145226 
  5 | 188898 | 187617 | 239322 | 237269 
  7 | 261516 | 260006 | 329119 | 329449 
 17 | 521821 | 519473 | 672390 | 662106 
 27 | 555487 | 555697 | 674630 | 672736 
 53 | 868213 | 896539 |1190734 |1202505 
 83 | 868232 | 866029 |1164997 |1158719 
107 | 850477 | 845685 |1140597 |1134502 
139 | 816311 | 816808 |1101471 |1091258 
163 | 794788 | 796517 |1078445 |1071568 
191 | 765934 | 776185 |1059497 |1041944 
211 | 738656 | 777365 |1083356 |1046422 
239 | 713124 | 841337 |1104629 |1116668 
271 | 692138 | 847803 |1094432 |1128971 
307 | 682919 | 849239 |1086306 |1127051 
353 | 679449 | 842125 |1071482 |1117471 
397 | 676217 | 844015 |1058937 |1118628 

2 sockets:
  conns | master |   patch_v4 |  master 1G | patch_v4 1G 
++++
  1 |  44317 |  44034 |  53920 |  53583 
  2 |  81193 |  78621 |  99138 |  97968 
  3 | 120755 | 115648 | 148102 | 147423 
  5 | 190007 | 188943 | 232078 | 231029 
  7 | 258602 | 260649 | 325545 | 318567 
 17 | 551814 | 552914 | 692312 | 697518 
 27 | 787353 |

Re: Failed transaction statistics to measure the logical replication progress

2022-02-28 Thread Peter Smith

Please see below my review comments for v25.

==

1. Commit message

Introduce cumulative columns of transactions of
logical replication subscriber to the pg_stat_subscription_stats view.

"cumulative columns of transactions" sounds a bit strange to me.

SUGGESTED
Introduce 2 new subscription statistics columns (apply_commit_count,
and apply_rollback_count) to the pg_stat_subscription_stats view for
counting cumulative transaction commits/rollbacks.

~~~

2. doc/src/sgml/monitoring.sgml - bug

The new SGML s have been added in the wrong place!

I don't think this renders like you expect it does. Please regenerate
the help to see for yourself.

~~~

3. doc/src/sgml/monitoring.sgml - wording

+  
+   Number of transactions rollbacked in this subscription. Both
+   ROLLBACK of transaction streamed as in-progress
+   transaction and ROLLBACK PREPARED increment this
+   counter.
+  

BEFORE
Number of transactions rollbacked in this subscription.

SUGGESTED
Number of transaction rollbacks in this subscription.

~~~

4. doc/src/sgml/monitoring.sgml - wording

+  
+   Number of transactions rollbacked in this subscription. Both
+   ROLLBACK of transaction streamed as in-progress
+   transaction and ROLLBACK PREPARED increment this
+   counter.
+  

Trying to distinguish between the ROLLBACK of a transaction and of a
streamed in-progress transaction seems to have made this description
too complicated. I don’t think the user even cares/knows about this
(in-progress) distinction. So, I think this should just be written
more simply (like the COMMIT part was)

BEFORE
Both  ROLLBACK of transaction streamed as
in-progress  transaction and ROLLBACK PREPARED
increment this counter.

SUGGESTED
Both  ROLLBACK and ROLLBACK
PREPARED increment this counter.

~~~

5. Question - column names.

Just curious why the columns are called "apply_commit_count" and
"apply_rollback_count"? Specifically, what extra meaning do those
names have versus just calling them "commit_count" and
"rollback_count"?

~~~

6. src/backend/postmaster/pgstat.c - pgstat_report_subscription_xact

@@ -3421,6 +3425,60 @@ pgstat_send_slru(void)
 }

 /* --
+ * pgstat_report_subscription_xact() -
+ *
+ * Send a subscription transaction stats to the collector.
+ * The statistics are cleared upon sending.
+ *
+ * 'force' is true only when the subscription worker process exits.
+ * --
+ */
+void
+pgstat_report_subscription_xact(bool force)

6a.
I think this comment should be worded more like the other
pgstat_report_subscption_XXX comments

BEFORE
Send a subscription transaction stats to the collector.

SUGGESTED
Tell the collector about subscriptions transaction stats.

6b.
+ * 'force' is true only when the subscription worker process exits.

I thought this comment should just describe what the 'force' param
actually does in this function; not the scenario about who calls it...

~~~

7. src/backend/postmaster/pgstat.c - pgstat_report_subscription_xact

I think the entire function maybe should be relocated to be nearby the
other pgstat_report_subscription_XXX functions in the source.

~~~

8. src/backend/postmaster/pgstat.c - pgstat_report_subscription_xact

+ /*
+ * This function can be called even if nothing at all has happened. In
+ * this case, there's no need to go forward.
+ */

Too much information. Clearly, it is possible for this function to be
called for this case otherwise this code would not exist in the first
place :)
IMO the comment can be much simpler but still say all it needs to.

BEFORE
This function can be called even if nothing at all has happened. In
this case, there's no need to go forward.
SUGGESTED
Bailout early if nothing to do.

~~~

9. src/backend/postmaster/pgstat.c - pgstat_report_subscription_xact

+ if (subStats.subid == InvalidOid ||
+ (subStats.apply_commit_count == 0 && subStats.apply_rollback_count == 0))
+ return;

Maybe using !OisIsValid(subStats.subid) is better?

~~~

10. src/backend/postmaster/pgstat.c - pgstat_report_subscription_xact

+ /*
+ * Don't send a message unless it's been at least PGSTAT_STAT_INTERVAL
+ * msec since we last sent one to avoid overloading the stats
+ * collector.
+ */

SUGGESTED (2 sentences instead of 1)
Don't send a message unless it's been at least PGSTAT_STAT_INTERVAL
msec since we last sent one. This is to avoid overloading the stats
collector.

~~~

11. src/backend/postmaster/pgstat.c - pgstat_report_subscription_xact

+ if (!force)
+ {
+ TimestampTz now = GetCurrentTimestamp();
+
+ /*
+ * Don't send a message unless it's been at least PGSTAT_STAT_INTERVAL
+ * msec since we last sent one to avoid overloading the stats
+ * collector.
+ */
+ if (!TimestampDifferenceExceeds(last_report, now, PGSTAT_STAT_INTERVAL))
+ return;
+ last_report = now;
+ }

(Yeah, I know there is similar code in this module but 2 wrongs do not
make a right)

I think logically it is better to put the 'now' and the 'last_report'
outside this if (!force)

Re: [Proposal] Global temporary tables

2022-02-28 Thread Wenjing Zeng



> 2022年2月27日 08:21，Justin Pryzby  写道：
> 
> I read through this.
> Find attached some language fixes.  You should be able to apply each "fix"
> patch on top of your own local branch with git am, and then squish them
> together.  Let me know if you have trouble with that.
> 
> I think get_seqence_start_value() should be static.  (Or otherwise, it should
> be in lsyscache.c).
> 
> The include added to execPartition.c seems to be unused.
> 
> +#define RELATION_IS_TEMP_ON_CURRENT_SESSION(relation) \
> +#define RELATION_IS_TEMP(relation) \
> +#define RelpersistenceTsTemp(relpersistence) \
> +#define RELATION_GTT_ON_COMMIT_DELETE(relation)\
> 
> => These macros can evaluate their arguments multiple times.
> You should add a comment to warn about that.  And maybe avoid passing them a
> function argument, like: RelpersistenceTsTemp(get_rel_persistence(rte->relid))
> 
> +list_all_backend_gtt_frozenxids should return TransactionId not int.
> The function name should say "oldest" and not "all" ?
> 
> I think the GUC should have a longer name.  max_active_gtt is too short for a
> global var.
> 
> +#defineMIN_NUM_ACTIVE_GTT  0
> +#defineDEFAULT_NUM_ACTIVE_GTT  1000
> +#defineMAX_NUM_ACTIVE_GTT  100
> 
> +intmax_active_gtt = MIN_NUM_ACTIVE_GTT
> 
> It's being initialized to MIN, but then the GUC machinery sets it to DEFAULT.
> By convention, it should be initialized to default.
> 
> fout->remoteVersion >= 14
> 
> => should say 15
> 
> describe.c has gettext_noop("session"), which is a half-truth.  The data is
> per-session but the table definition is persistent..
Thanks for your advice, I will try to merge this part of the code.

> 
> You redirect stats from pg_class and pg_statistics to a local hash table.
> This is pretty hairy :(
> I guess you'd also need to handle pg_statistic_ext and ext_data.
> pg_stats doesn't work, since the data isn't in pg_statistic - it'd need to 
> look
> at pg_get_gtt_statistics.
> 
> I wonder if there's a better way to do it, like updating pg_statistic but
> forcing the changes to be rolled back when the session ends...  But I think
> that would make longrunning sessions behave badly, the same as "longrunning
> transactions".

There are three pieces of data related to session-level GTT data that need to 
be managed
1 session-level storage info like relfilenode
2 session-level like relfrozenxid
3 session-level stats like relpages or column stats

I think the 1 and 2 are necessary, but not for stats.
In the previous email, It has been suggested that GTT statistics not be 
processed.
This means that GTT statistics are not recorded in the localhash or catalog.
In my observation, very few users require an accurate query plan for temporary 
tables to
perform manual analyze.
Of course, doing this will also avoid catalog bloat and performance problems.


> 
> Have you looked at Gilles Darold's GTT extension ?
If you are referring to https://github.com/darold/pgtt 
 , yes.
It is smart to use unlogged table as a template and then use LTT to read and 
write data.
For this implementation, I want to point out two things:
1 For the first insert of GTT in each session, create table or create index is 
implicitly executed.
2 The catalog bloat caused by LTT still exist.


Regards, Wenjing.


> <0002-f-0002-gtt-v64-doc.txt><0004-f-0003-gtt-v64-implementation.txt><0006-f-0004-gtt-v64-regress.txt>

Re: pgbench: using prepared BEGIN statement in a pipeline could cause an error

2022-02-28 Thread Yugo NAGATA

Hi Horiguchi-san,

On Thu, 27 Jan 2022 17:50:25 +0900 (JST)
Kyotaro Horiguchi  wrote:

> At Tue, 16 Nov 2021 02:26:43 +0900, Yugo NAGATA  wrote 
> in 
> > Thank you for pointing it out!
> > I attached the updated patch.
> 
> I think we want more elabolative comment for the new place of
> preparing as you mentioned in the first mail.

Thank you for your suggestion.

I added comments on the prepareCommands() call as in the updated patch.

Regards,
Yugo Nagata


Yugo NAGATA 
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index f166a77e3a..c5893eabe4 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -2832,6 +2832,30 @@ chooseScript(TState *thread)
 	return i - 1;
 }
 
+/* Prepare SQL commands in the chosen script */
+static void
+prepareCommands(CState *st)
+{
+	int			j;
+	Command   **commands = sql_script[st->use_file].commands;
+
+	for (j = 0; commands[j] != NULL; j++)
+	{
+		PGresult   *res;
+		char		name[MAX_PREPARE_NAME];
+
+		if (commands[j]->type != SQL_COMMAND)
+			continue;
+		preparedStatementName(name, st->use_file, j);
+		res = PQprepare(st->con, name,
+		commands[j]->argv[0], commands[j]->argc - 1, NULL);
+		if (PQresultStatus(res) != PGRES_COMMAND_OK)
+			pg_log_error("%s", PQerrorMessage(st->con));
+		PQclear(res);
+	}
+	st->prepared[st->use_file] = true;
+}
+
 /* Send a SQL command, using the chosen querymode */
 static bool
 sendCommand(CState *st, Command *command)
@@ -2865,42 +2889,6 @@ sendCommand(CState *st, Command *command)
 		char		name[MAX_PREPARE_NAME];
 		const char *params[MAX_ARGS];
 
-		if (!st->prepared[st->use_file])
-		{
-			int			j;
-			Command   **commands = sql_script[st->use_file].commands;
-
-			for (j = 0; commands[j] != NULL; j++)
-			{
-PGresult   *res;
-char		name[MAX_PREPARE_NAME];
-
-if (commands[j]->type != SQL_COMMAND)
-	continue;
-preparedStatementName(name, st->use_file, j);
-if (PQpipelineStatus(st->con) == PQ_PIPELINE_OFF)
-{
-	res = PQprepare(st->con, name,
-	commands[j]->argv[0], commands[j]->argc - 1, NULL);
-	if (PQresultStatus(res) != PGRES_COMMAND_OK)
-		pg_log_error("%s", PQerrorMessage(st->con));
-	PQclear(res);
-}
-else
-{
-	/*
-	 * In pipeline mode, we use asynchronous functions. If a
-	 * server-side error occurs, it will be processed later
-	 * among the other results.
-	 */
-	if (!PQsendPrepare(st->con, name,
-	   commands[j]->argv[0], commands[j]->argc - 1, NULL))
-		pg_log_error("%s", PQerrorMessage(st->con));
-}
-			}
-			st->prepared[st->use_file] = true;
-		}
-
 		getQueryParams(st, command, params);
 		preparedStatementName(name, st->use_file, st->command);
 
@@ -3172,6 +3160,20 @@ advanceConnectionState(TState *thread, CState *st, StatsData *agg)
 	memset(st->prepared, 0, sizeof(st->prepared));
 }
 
+/*
+ * Prepare SQL commands if needed.
+ *
+ * We should send Parse messages before executing meta commands
+ * especially /startpipeline. If a Parse message is sent in
+ * pipeline mode, a transaction starts before BEGIN is sent, and
+ * it could be a problem. For example, "BEGIN ISOLATION LEVEL
+ * SERIALIZABLE" is sent after a transaction starts, the error
+ * "ERROR:  SET TRANSACTION ISOLATION LEVEL must be called before any query"
+ * occurs.
+ */
+if (querymode == QUERY_PREPARED && !st->prepared[st->use_file])
+	prepareCommands(st);
+
 /* record transaction start time */
 st->txn_begin = now;
 
diff --git a/src/bin/pgbench/t/001_pgbench_with_server.pl b/src/bin/pgbench/t/001_pgbench_with_server.pl
index f1341092fe..7e4ec728e8 100644
--- a/src/bin/pgbench/t/001_pgbench_with_server.pl
+++ b/src/bin/pgbench/t/001_pgbench_with_server.pl
@@ -837,6 +837,23 @@ select 1 \gset f
 }
 	});
 
+# Working \startpipeline in prepared query mode with serializable
+$node->pgbench(
+	'-t 1 -n -M prepared',
+	0,
+	[ qr{type: .*/001_pgbench_pipeline_serializable}, qr{actually processed: 1/1} ],
+	[],
+	'working \startpipeline with serializable',
+	{
+		'001_pgbench_pipeline_serializable' => q{
+-- test startpipeline with serializable
+\startpipeline
+BEGIN ISOLATION LEVEL SERIALIZABLE;
+} . "select 1;\n" x 10 . q{
+END;
+\endpipeline
+}
+	});
 
 # trigger many expression errors
 my @errors = (

Re: [PATCH] Expose port->authn_id to extensions and triggers

2022-02-28 Thread Michael Paquier

On Mon, Feb 28, 2022 at 04:00:36PM -0500, Stephen Frost wrote:
> * Jacob Champion (pchamp...@vmware.com) wrote:
>> On Fri, Feb 25, 2022 at 01:23:49PM -0800, Andres Freund wrote:
>>> Looks to me like authn_id isn't synchronized to parallel workers right now. 
>>> So
>>> the function will return the wrong thing when executed as part of a parallel
>>> query.
>> 
>> Thanks for the catch. It looks like MyProcPort is left empty, and other
>> functions that rely on like inet_server_addr() are marked parallel-
>> restricted, so I've done the same in v4.
> 
> That's probably alright.

I'd say as well that this is right as-is.  If it happens that there is
a use-case for making this parallel aware in the future, it could be
done.  Now, it may be a bit weird to make parallel workers inherit the
authn ID of the parent as these did not go through authentication, no?
Letting this function being run only by the leader feels intuitive.

> Yeah, we really should make this available to trigger-based auditing
> systems too and not just through log_connections which involves a great
> deal more log parsing and work external to the database to figure out
> who did what.

Okay, I won't fight hard on that if all of you think that this is
useful for a given session.

>> Subject: [PATCH v4] Add API to retrieve authn_id from SQL
>> 
>> The authn_id field in MyProcPort is currently only accessible to the
>> backend itself.  Add a SQL function, pg_session_authn_id(), to expose
>> the field to triggers that may want to make use of it.
> 
> Only did a quick look but generally looks reasonable to me.

The function and the test are fine, pgperltidy complains a bit about
the format of the tests. 

Ayway, this function needs to be documented.  I think that you should
just add that in "Session Information Functions" in func.sgml, same
area as current_user().  The last time we talked about the authn ID,
one thing we discussed about was how to describe that in a good way to
the user, which is why the section of log_connections was reworked a
bit.  And we don't have yet any references to what an authenticated
identity is in the docs.

There is no need to update catversion.h in the patch, committers
usually take care of that and that's an area of the code that
conflicts a lot.
--
Michael

signature.asc
Description: PGP signature

RE: logical replication restrictions

2022-02-28 Thread osumi.takami...@fujitsu.com

On Tuesday, March 1, 2022 9:19 AM Euler Taveira  wrote:
> Long time, no patch. Here it is. I will provide documentation in the next
> 
> version. I would appreciate some feedback.
Hi, thank you for posting the patch !


$ git am v1-0001-Time-delayed-logical-replication-subscriber.patch

Applying: Time-delayed logical replication subscriber
error: patch failed: src/backend/catalog/system_views.sql:1261
error: src/backend/catalog/system_views.sql: patch does not apply


FYI, by one recent commit(7a85073), the HEAD redesigned 
pg_stat_subscription_workers.
Thus, the blow change can't be applied. Could you please rebase v1 ?


diff --git a/src/backend/catalog/system_views.sql 
b/src/backend/catalog/system_views.sql
index 3cb69b1f87..1cc0d86f2e 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1261,7 +1261,8 @@ REVOKE ALL ON pg_replication_origin_status FROM public;
 -- All columns of pg_subscription except subconninfo are publicly readable.
 REVOKE ALL ON pg_subscription FROM public;
 GRANT SELECT (oid, subdbid, subname, subowner, subenabled, subbinary,
-  substream, subtwophasestate, subslotname, subsynccommit, 
subpublications)
+  substream, subtwophasestate, subslotname, subsynccommit,
+  subapplydelay, subpublications)
 ON pg_subscription TO public;

 CREATE VIEW pg_stat_subscription_workers AS


Best Regards,
Takamichi Osumi

Re: Proposal: Support custom authentication methods using hooks

2022-02-28 Thread Michael Paquier

On Mon, Feb 28, 2022 at 04:42:55PM -0500, Stephen Frost wrote:
> Keeping it around will just push out the point at which everyone will
> finally be done with it, as there's really only two groups: those who
> have already moved to scram, and those who won't move until they want to
> upgrade to a release that doesn't have md5.

FWIW, I am not sure if we are at this point yet.  An extra reason to
remove it would be that it is a support burden, but I don't have seen
in recent memory any problems related to it that required any deep
changes in the way to use it, and its code paths are independent.

The last time I played with this area is the recent error handling
improvement with cryptohashes but MD5 has actually helped here in
detecting the problem as a patched OpenSSL would complain if trying to
use MD5 as hash function when FIPS is enabled.
--
Michael


signature.asc
Description: PGP signature

pipeline mode and commands not allowed in a transaction block

2022-02-28 Thread Yugo NAGATA

Hi,

I found that when we use pipeline mode, we can execute commands
which is not allowed in a transaction block, for example
CREATE DATABASE, in the same transaction with other commands.

In extended query protocol, a transaction starts when Parse, 
Bind, Executor, or Describe message is received, and is closed
when Sync message is received if COMMIT, ROLLBACK, or END is not
sent. In a pipeline mode, Sync message is sent at the end of the
pipeline instead of for each query. Therefore, multiple queries
can be in the same transaction without using an explicit
transaction block.

It is similar to implicit transaction block which starts when
multiple statements are sent in simple query protocol, but the
server doesn't regard it as an implicit transaction block. 
Therefore, problems that would not occur in implicit  transactions
could occur in transactions started in a pipeline mode.

For example, CREATE DATABASE  or DROP DATABASE can be executed
in the same transaction with other commands, and when the
transaction fails, this causes an inconsistency between the
system catalog and base directory. 

Do you think we should prevent such problems from server side? or, 
it is user's responsible to avoid such problematic use of pipeline
or protocol messages?

If we want to handle it from server side, I think a few ideas:

1. 
If the server receive more than one Execute messages before
receiving Sync, start an implicit transaction block. If the first
Execute message is for a command not allowed in a transaction
(CREATE DATABASE etc.), explicitly close the transaction after the
command not to share the transaction with other commands.

2.
When a pipeline start by calling PQenterPipelineMode in libpq, 
start an implicit transaction at the server. For this purpose, we
would need to add a new message to signal the start of pipeline mode
to the protocol. It is user responsible to avoid the problematic
protocol use when libpq is not used.

What do you think about it?


Regards,
Yugo Nagata

-- 
Yugo NAGATA

Re: Allow async standbys wait for sync replication (was: Disallow quorum uncommitted (with synchronous standbys) txns in logical replication subscribers)

2022-02-28 Thread Nathan Bossart

On Tue, Mar 01, 2022 at 11:10:09AM +0530, Bharath Rupireddy wrote:
> On Tue, Mar 1, 2022 at 12:27 AM Nathan Bossart  
> wrote:
>> My feedback is specifically about this behavior.  I don't think we should
>> spin in XLogSend*() waiting for an LSN to be synchronously replicated.  I
>> think we should just choose the SendRqstPtr based on what is currently
>> synchronously replicated.
> 
> Do you mean something like the following?
> 
> /* Main loop of walsender process that streams the WAL over Copy messages. */
> static void
> WalSndLoop(WalSndSendDataCallback send_data)
> {
> /*
>  * Loop until we reach the end of this timeline or the client requests to
>  * stop streaming.
>  */
> for (;;)
> {
> if (am_async_walsender && there_are_sync_standbys)
> {
>  XLogRecPtr SendRqstLSN;
>  XLogRecPtr SyncFlushLSN;
> 
> SendRqstLSN = GetFlushRecPtr(NULL);
> LWLockAcquire(SyncRepLock, LW_SHARED);
> SyncFlushLSN = walsndctl->lsn[SYNC_REP_WAIT_FLUSH];
> LWLockRelease(SyncRepLock);
> 
> if (SendRqstLSN > SyncFlushLSN)
>continue;
> }

Not quite.  Instead of "continue", I would set SendRqstLSN to SyncFlushLSN
so that the WAL sender only sends up to the current synchronously
replicated LSN.  TBH there are probably other things that need to be
considered (e.g., how do we ensure that the WAL sender sends the rest once
it is replicated?), but I still think we should avoid spinning in the WAL
sender waiting for WAL to be replicated.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file

2022-02-28 Thread Nathan Bossart

On Sat, Feb 26, 2022 at 02:06:14PM -0800, Nathan Bossart wrote:
> On Sat, Feb 26, 2022 at 04:48:52PM +, Chapman Flack wrote:
>> Assuming the value is false, so no error is thrown, is it practical to 
>> determine
>> from flinfo->fn_expr whether the value was defaulted or supplied? If so, I 
>> would
>> further suggest reporting a deprecation WARNING if it was explicitly 
>> supplied,
>> with a HINT that the argument can simply be removed at the call site, and 
>> will
>> become unrecognized in some future release.
> 
> This is a good point.  I think I agree with your proposed changes.  I
> believe it is possible to add a deprecation warning only when 'exclusive'
> is specified.  If anything, we can create a separate function that accepts
> the 'exclusive' parameter and that always emits a NOTICE or WARNING.

I've spent some time looking into this, and I haven't found a clean way to
emit a WARNING only if the "exclusive" parameter is supplied (and set to
false).  AFAICT flinfo->fn_expr doesn't tell us whether the parameter was
supplied or the default value was used.  I was able to get it working by
splitting pg_start_backup() into 3 separate internal functions (i.e.,
pg_start_backup_1arg(), pg_start_backup_2arg(), and
pg_start_backup_3arg()), but this breaks calls such as
pg_start_backup('mylabel', exclusive => false), and it might complicate
privilege management for users.

Without a WARNING, I think it will be difficult to justify removing the
"exclusive" parameter in the future.  We would either need to leave it
around forever, or we would have to risk unnecessarily breaking some
working backup scripts.  I wonder if we should just remove it now and make
sure that this change is well-documented in the release notes.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: Separate the result of \watch for each query execution (psql)

2022-02-28 Thread Pavel Stehule

po 28. 2. 2022 v 23:46 odesílatel Tom Lane  napsal:

> Noboru Saito  writes:
> > I have created a patch that allows you to turn it on and off in \pset.
> > The attached patch adds the following features.
> > Formfeed can be turned on with the command line option or \pset.
> > Formfeed (\f\n) is output after the query execution result by \watch.
>
> Hmm ... I grant your use-case for this, but I think the patch
> is too narrow-minded, because it supposes that the only string
> anybody could wish to output between \watch commands is "\f\n".
> Once you open the floodgates of inserting formatting there,
> ISTM that people might want other things.
>
> Also, I'm not that thrilled with treating this as a \pset option,
> because it has nothing to do with formatting of normal query
> results.  (IMV anyway, perhaps others will disagree.)
>

pspg (and ov pager too) supports streaming (pspg is used in another
terminal than psql), and for this case, the marks can be useful for all
modes.

Regards

Pavel


>
> How about instead of defining fixed semantics, we invent a psql
> special variable that can contain a string to be output between
> \watch commands?  It looks like you could then set it through
> a command like
>
> \set WATCH_SEPARATOR '\f\n'
>
> (not wedded to that variable name, it's just the first idea
> that came to mind)
>
> Personally I'd not bother with inventing a specialized command-line
> option to set it, either.  There's already -v and friends.
>
> > * Is formfeed output after the result, not before?
>
> Or we could invent WATCH_BEFORE and WATCH_AFTER ...
>
> regards, tom lane
>
>
>

RE: Optionally automatically disable logical replication subscriptions on error

2022-02-28 Thread osumi.takami...@fujitsu.com

On Tuesday, March 1, 2022 9:49 AM Peter Smith  wrote:
> Please see below my review comments for v22.
> 
> ==
> 
> 1. Commit message
> 
> "table sync worker" -> "tablesync worker"
Fixed.

> ~~~
> 
> 2. doc/src/sgml/catalogs.sgml
> 
> +  
> +   If true, the subscription will be disabled when subscription
> +   workers detect any errors
> +  
> 
> It felt a bit strange to say "subscription" 2x in the sentence, but I am not 
> sure
> how to improve it. Maybe like below?
> 
> BEFORE
> If true, the subscription will be disabled when subscription workers detect 
> any
> errors
> 
> SUGGESTED
> If true, the subscription will be disabled if one of its workers detects an 
> error
Fixed.


> ~~~
> 
> 3. src/backend/replication/logical/worker.c - DisableSubscriptionOnError
> 
> @@ -2802,6 +2803,69 @@ LogicalRepApplyLoop(XLogRecPtr
> last_received)  }
> 
>  /*
> + * Disable the current subscription, after error recovery processing.
> + */
> +static void
> +DisableSubscriptionOnError(void)
> 
> I thought the "after error recovery processing" part was a bit generic and 
> did not
> really say what it was doing.
> 
> BEFORE
> Disable the current subscription, after error recovery processing.
> SUGGESTED
> Disable the current subscription, after logging the error that caused this
> function to be called.
Fixed.

> ~~~
> 
> 4. src/backend/replication/logical/worker.c - start_apply
> 
> + if (MySubscription->disableonerr)
> + {
> + DisableSubscriptionOnError();
> + return;
> + }
> +
> + MemoryContextSwitchTo(ecxt);
> + PG_RE_THROW();
> + }
> + PG_END_TRY();
> 
> The current code looks correct, but I felt it is a bit tricky to easily see 
> the
> execution path after the return.
> 
> Since it will effectively just exit anyhow I think it will be simpler just to 
> do that
> explicitly right here instead of the 'return'. This will also make the code
> consistent with the same 'disableonerr' logic of the start_start_sync.
> 
> SUGGESTION
> if (MySubscription->disableonerr)
> {
> DisableSubscriptionOnError();
> proc_exit(0);
> }
Fixed.

> ~~~
> 
> 5. src/bin/pg_dump/pg_dump.c
> 
> @@ -4463,6 +4473,9 @@ dumpSubscription(Archive *fout, const
> SubscriptionInfo *subinfo)
>   if (strcmp(subinfo->subtwophasestate, two_phase_disabled) != 0)
>   appendPQExpBufferStr(query, ", two_phase = on");
> 
> + if (strcmp(subinfo->subdisableonerr, "f") != 0)
> + appendPQExpBufferStr(query, ", disable_on_error = true");
> +
> 
> Although the code is correct, I think it would be more natural to set this 
> option
> as true when the user wants it true. e.g. check for "t"
> same as 'subbinary' is doing. This way, even if there was some
> unknown/corrupted value the code would do nothing, which is the behaviour
> you want...
> 
> SUGGESTION
> if (strcmp(subinfo->subdisableonerr, "t") == 0)
Fixed.


> ~~~
> 
> 6. src/include/catalog/pg_subscription.h
> 
> @@ -67,6 +67,9 @@ CATALOG(pg_subscription,6100,SubscriptionRelationId)
> BKI_SHARED_RELATION BKI_ROW
> 
>   char subtwophasestate; /* Stream two-phase transactions */
> 
> + bool subdisableonerr; /* True if occurrence of apply errors
> + * should disable the subscription */
> 
> The comment seems not quite right because it's not just about apply errors. 
> E.g.
> I think any error in tablesync will cause disablement too.
> 
> BEFORE
> True if occurrence of apply errors should disable the subscription SUGGESTED
> True if a worker error should cause the subscription to be disabled
Fixed.


> ~~~
> 
> 7. src/test/regress/sql/subscription.sql - whitespace
> 
> +-- now it works
> +CREATE SUBSCRIPTION regress_testsub CONNECTION
> 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (connect = false,
> disable_on_error = false);
> +
> +\dRs+
> +
> +ALTER SUBSCRIPTION regress_testsub SET (disable_on_error = true);
> +
> +\dRs+
> +ALTER SUBSCRIPTION regress_testsub SET (slot_name = NONE); DROP
> +SUBSCRIPTION regress_testsub;
> +
> 
> I think should be a blank line after that last \dRs+ just like the other one,
> because it belongs logically with the code above it, not with the ALTER
> slot_name.
Fixed.


> ~~~
> 
> 8. src/test/subscription/t/028_disable_on_error.pl - filename
> 
> The 028 number needs to be bumped because there is already a TAP test
> called 028 now
This is already done in v22, so I've skipped this.

> ~~~
> 
> 9. src/test/subscription/t/028_disable_on_error.pl - missing test
> 
> There was no test case for the last combination where the user correct the
> apply worker problem: E.g. After a previous error/disable of the subscriber,
> remove the index, publish the inserts again, and check they get applied
> properly.
Fixed.

Attached the updated version v24.


Best Regards,
Takamichi Osumi



v24-0001-Optionally-disable-subscriptions-on-error.patch
Description: v24-0001-Optionally-disable-subscriptions-on-error.patch

Re: Allow async standbys wait for sync replication (was: Disallow quorum uncommitted (with synchronous standbys) txns in logical replication subscribers)

2022-02-28 Thread Bharath Rupireddy

On Tue, Mar 1, 2022 at 12:27 AM Nathan Bossart  wrote:
>
> On Mon, Feb 28, 2022 at 06:45:51PM +0530, Bharath Rupireddy wrote:
> > On Sat, Feb 26, 2022 at 9:37 PM Nathan Bossart  
> > wrote:
> >> For
> >> this feature, I think we always need to consider what the primary considers
> >> synchronously replicated.  My suggested approach doesn't change that.  I'm
> >> saying that instead of spinning in a loop waiting for the WAL to be
> >> synchronously replicated, we just immediately send WAL up to the LSN that
> >> is presently known to be synchronously replicated.
> >
> > As I said above, v1 patch does that i.e. async standbys wait until the
> > sync standbys update their flush LSN.
> >
> > Flush LSN is this - flushLSN = walsndctl->lsn[SYNC_REP_WAIT_FLUSH];
> > which gets updated in SyncRepReleaseWaiters.
> >
> > Async standbys with their SendRqstPtr will wait in XLogSendPhysical or
> > XLogSendLogical until SendRqstPtr <= flushLSN.
>
> My feedback is specifically about this behavior.  I don't think we should
> spin in XLogSend*() waiting for an LSN to be synchronously replicated.  I
> think we should just choose the SendRqstPtr based on what is currently
> synchronously replicated.

Do you mean something like the following?

/* Main loop of walsender process that streams the WAL over Copy messages. */
static void
WalSndLoop(WalSndSendDataCallback send_data)
{
/*
 * Loop until we reach the end of this timeline or the client requests to
 * stop streaming.
 */
for (;;)
{
if (am_async_walsender && there_are_sync_standbys)
{
 XLogRecPtr SendRqstLSN;
 XLogRecPtr SyncFlushLSN;

SendRqstLSN = GetFlushRecPtr(NULL);
LWLockAcquire(SyncRepLock, LW_SHARED);
SyncFlushLSN = walsndctl->lsn[SYNC_REP_WAIT_FLUSH];
LWLockRelease(SyncRepLock);

if (SendRqstLSN > SyncFlushLSN)
   continue;
}

if (!pq_is_send_pending())
send_data();  /* THIS IS WHERE XLogSendPhysical or
XLogSendLogical gets called */
else
WalSndCaughtUp = false;
   }

Regards,
Bharath Rupireddy.

Re: In-placre persistance change of a relation

2022-02-28 Thread Kyotaro Horiguchi

Rebased on a recent xlog refactoring.

No functional changes have been made.

- Removed the default case in smgr_desc since it seems to me we don't
 assume out-of-definition values in xlog records elsewhere.

- Simplified some added to storage.c.

- Fix copy-pasto'ed comments in extractPageInfo().

- The previous version smgrDoPendingCleanups() assumes that init-fork
  are not loaded onto shared buffer but it is wrong
  (SetRelationBuffersPersistence assumes the opposite.).  Thus we need
  to drop buffers before unlink an init fork. But it is already
  guaranteed by logic so I rewrote the comment for for PCOP_UNLINK_FORK.

  > * Unlink the fork file. Currently we use this only for
  > * init forks and we're sure that the init fork is not
  > * loaded on shared buffers.  For RelationDropInitFork
  > * case, the function dropped that buffers. For
  > * RelationCreateInitFork case, PCOP_SET_PERSISTENCE(true)
  > * is set and the buffers have been dropped just before.
  
  This logic has the same critical window as
  DropRelFilenodeBuffers. That is, if file deletion fails after
  successful buffer dropping, theoretically the file content of the
  init fork may be stale. However, AFAICS init-fork is write-once fork
  so I don't think that actually matters.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From 420a9d9a0dae3bcfb1396c14997624ad67a3e557 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Wed, 11 Nov 2020 21:51:11 +0900
Subject: [PATCH v18 1/2] In-place table persistence change

Even though ALTER TABLE SET LOGGED/UNLOGGED does not require data
rewriting, currently it runs heap rewrite which causes large amount of
file I/O.  This patch makes the command run without heap rewrite.
Addition to that, SET LOGGED while wal_level > minimal emits WAL using
XLOG_FPI instead of massive number of HEAP_INSERT's, which should be
smaller.

Also this allows for the cleanup of files left behind in the crash of
the transaction that created it.
---
 src/backend/access/rmgrdesc/smgrdesc.c|  49 ++
 src/backend/access/transam/README |   9 +
 src/backend/access/transam/xact.c |   7 +
 src/backend/access/transam/xlogrecovery.c |  18 +
 src/backend/catalog/storage.c | 548 +-
 src/backend/commands/tablecmds.c  | 266 +--
 src/backend/replication/basebackup.c  |   3 +-
 src/backend/storage/buffer/bufmgr.c   |  86 
 src/backend/storage/file/fd.c |   4 +-
 src/backend/storage/file/reinit.c | 344 ++
 src/backend/storage/smgr/md.c |  94 +++-
 src/backend/storage/smgr/smgr.c   |  32 ++
 src/backend/storage/sync/sync.c   |  20 +-
 src/bin/pg_rewind/parsexlog.c |  22 +
 src/common/relpath.c  |  47 +-
 src/include/catalog/storage.h |   3 +
 src/include/catalog/storage_xlog.h|  42 +-
 src/include/common/relpath.h  |   9 +-
 src/include/storage/bufmgr.h  |   2 +
 src/include/storage/fd.h  |   1 +
 src/include/storage/md.h  |   8 +-
 src/include/storage/reinit.h  |  10 +-
 src/include/storage/smgr.h|  17 +
 23 files changed, 1459 insertions(+), 182 deletions(-)

diff --git a/src/backend/access/rmgrdesc/smgrdesc.c b/src/backend/access/rmgrdesc/smgrdesc.c
index 7547813254..f8908e2c0a 100644
--- a/src/backend/access/rmgrdesc/smgrdesc.c
+++ b/src/backend/access/rmgrdesc/smgrdesc.c
@@ -40,6 +40,46 @@ smgr_desc(StringInfo buf, XLogReaderState *record)
 		 xlrec->blkno, xlrec->flags);
 		pfree(path);
 	}
+	else if (info == XLOG_SMGR_UNLINK)
+	{
+		xl_smgr_unlink *xlrec = (xl_smgr_unlink *) rec;
+		char	   *path = relpathperm(xlrec->rnode, xlrec->forkNum);
+
+		appendStringInfoString(buf, path);
+		pfree(path);
+	}
+	else if (info == XLOG_SMGR_MARK)
+	{
+		xl_smgr_mark *xlrec = (xl_smgr_mark *) rec;
+		char	   *path = GetRelationPath(xlrec->rnode.dbNode,
+		   xlrec->rnode.spcNode,
+		   xlrec->rnode.relNode,
+		   InvalidBackendId,
+		   xlrec->forkNum, xlrec->mark);
+		char	   *action;
+
+		switch (xlrec->action)
+		{
+			case XLOG_SMGR_MARK_CREATE:
+action = "CREATE";
+break;
+			case XLOG_SMGR_MARK_UNLINK:
+action = "DELETE";
+break;
+		}
+
+		appendStringInfo(buf, "%s %s", action, path);
+		pfree(path);
+	}
+	else if (info == XLOG_SMGR_BUFPERSISTENCE)
+	{
+		xl_smgr_bufpersistence *xlrec = (xl_smgr_bufpersistence *) rec;
+		char	   *path = relpathperm(xlrec->rnode, MAIN_FORKNUM);
+
+		appendStringInfoString(buf, path);
+		appendStringInfo(buf, " persistence %d", xlrec->persistence);
+		pfree(path);
+	}
 }
 
 const char *
@@ -55,6 +95,15 @@ smgr_identify(uint8 info)
 		case XLOG_SMGR_TRUNCATE:
 			id = "TRUNCATE";
 			break;
+		case XLOG_SMGR_UNLINK:
+			id = "UNLINK";
+			break;
+		case XLOG_SMGR_MARK:
+			id = "MARK";
+			break;
+		case XLOG_SMGR_BUFPERSISTENCE:
+			id = "BUFPERSISTENCE";
+

Re: logical replication empty transactions

2022-02-28 Thread Ajin Cherian

On Fri, Feb 25, 2022 at 9:17 PM Peter Smith  wrote:
>
> Hi. Here are my review comments for the v19 patch.
>
> ==
>
> 1. Commit message
>
> The current logical replication behavior is to send every transaction to
> subscriber even though the transaction is empty (because it does not
> contain changes from the selected publications).
>
> SUGGESTION
> "to subscriber even though" --> "to the subscriber even if"

Fixed.

>
> ~~~
>
> 2. Commit message
>
> This patch addresses the above problem by postponing the BEGIN message
> until the first change. While processing a COMMIT message,
> if there is no other change for that transaction,
> do not send COMMIT message. It means that pgoutput will
> skip BEGIN/COMMIT messages for transactions that are empty.
>
> SUGGESTION
> "if there is" --> "if there was"
> "do not send COMMIT message" --> "do not send the COMMIT message"
> "It means that pgoutput" --> "This means that pgoutput"
>
> ~~~

Fixed.

>
> 3. Commit message
>
> Shouldn't there be some similar description about using a lazy send
> mechanism for STREAM START?
>
> ~~~

Added.

>
> 4. src/backend/replication/pgoutput/pgoutput.c - typedef struct 
> PGOutputTxnData
>
> +/*
> + * Maintain a per-transaction level variable to track whether the
> + * transaction has sent BEGIN. BEGIN is only sent when the first
> + * change in a transaction is processed. This makes it possible
> + * to skip transactions that are empty.
> + */
> +typedef struct PGOutputTxnData
> +{
> +   bool sent_begin_txn;/* flag indicating whether BEGIN has been sent */
> +   bool sent_stream_start; /* flag indicating if stream start has been sent 
> */
> +   bool sent_any_stream;   /* flag indicating if any stream has been sent */
> +} PGOutputTxnData;
> +
>
> The struct comment looks stale because it doesn't mention anything
> about the similar lazy send mechanism for STREAM_START.
>
> ~~~

Added.

>
> 5. src/backend/replication/pgoutput/pgoutput.c - pgoutput_begin_txn
>
>  static void
>  pgoutput_begin_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn)
>  {
> + PGOutputTxnData*txndata = MemoryContextAllocZero(ctx->context,
> + sizeof(PGOutputTxnData));
> +
> + txndata->sent_begin_txn = false;
> + txn->output_plugin_private = txndata;
> +}
>
> You don’t need to assign the other members 'sent_stream_start',
> 'sent_any_stream' because you are doing MemoryContextAllocZero anyway,
> but for the same reason you did not really need to assign the
> 'sent_begin_txn' flag either.
>
> I guess for consistency maybe it is better to (a)  set all of them or
> (b) set none of them. I prefer (b).
>
> ~~~

Did (b)


>
> 6. src/backend/replication/pgoutput/pgoutput.c - pgoutput_begin
>
> I feel the 'pgoutput_begin' function is not well named. It makes some
> of the code where they are called look quite confusing.
>
> For streaming there is:
> 1. pgoutput_stream_start (does not send)
> 2. pgoutput_send_stream_start (does send)
> so it is very clear.
>
> OTOH there are
> 3. pgoutput_begin_txn (does not send)
> 4. pgoutput_begin (does send)
>
> For consistency I think the 'pgoutput_begin' name should be changed to
> include "send" verb
> 1. pgoutput_begin_txn (does not send)
> 2. pgoutput_send_begin_txn (does send)
>
> ~~~

Changed as mentioned.

>
> 7. src/backend/replication/pgoutput/pgoutput.c - maybe_send_schema
>
> @@ -594,6 +663,20 @@ maybe_send_schema(LogicalDecodingContext *ctx,
>   if (schema_sent)
>   return;
>
> +   /* set up txndata */
> +   txndata = toptxn->output_plugin_private;
> +
> +   /*
> +* Before we send schema, make sure that STREAM START/BEGIN/BEGIN PREPARE
> +* is sent. If not, send now.
> +*/
> +   if (in_streaming && !txndata->sent_stream_start)
> +   pgoutput_send_stream_start(ctx, toptxn);
> +   else if (txndata && !txndata->sent_begin_txn)
> +   {
> +   pgoutput_begin(ctx, toptxn);
> +   }
> +
>
> How come the in_streaming case is not checking for a NULL txndata
> before referencing it? Even if it is OK to do that, some more comments
> or assertions might help for this piece of code.
> (Stop-Press: see later comments #9, #10)
>
> ~~~

Updated.

>
> 8. src/backend/replication/pgoutput/pgoutput.c - maybe_send_schema
>
> @@ -594,6 +663,20 @@ maybe_send_schema(LogicalDecodingContext *ctx,
>   if (schema_sent)
>   return;
>
> +   /* set up txndata */
> +   txndata = toptxn->output_plugin_private;
> +
> +   /*
> +* Before we send schema, make sure that STREAM START/BEGIN/BEGIN PREPARE
> +* is sent. If not, send now.
> +*/
>
> What part of this code is doing anything about "BEGIN PREPARE" ?
>
> ~~~

Removed that reference.

>
> 9. src/backend/replication/pgoutput/pgoutput.c - pgoutput_change
>
> @@ -1183,6 +1267,15 @@ pgoutput_change(LogicalDecodingContext *ctx,
> ReorderBufferTXN *txn,
>   Assert(false);
>   }
>
> + /* If streaming, send STREAM START if we haven't yet */
> + if (in_streaming && (txndata && !txndata->sent_stream_start))
> + pgoutput_send_stream_start(ctx, txn);
> + /*
> + *

Re: definition of CalculateMaxmumSafeLSN

2022-02-28 Thread Julien Rouhaud

Hi,

On Tue, Mar 01, 2022 at 10:11:03AM +0900, Kyotaro Horiguchi wrote:
> At Mon, 28 Feb 2022 17:01:10 +0300, Sergei Kornilov  wrote in 
> > Hello
> > I just spotted in src/include/access/xlog.h:
> > extern XLogRecPtr CalculateMaxmumSafeLSN(void);
> > 
> > This function doesn't seem to be used anywhere or even defined? "git grep 
> > CalculateMaxmumSafeLSN" shows only this line.
> > Was added in c6550776394e25c1620bc8258427c8f1d448080d "Allow users to limit 
> > storage reserved by replication slots"
> 
> Hmm.  I think I remember of that name. You're right, it is a function
> that was once added during development of the commit but eventually
> gone.
> 
> So it should be removed.

+1.  I'm Ccing Álvaro as he's the committer.

Re: range_agg with multirange inputs

2022-02-28 Thread Paul Jungwirth


On 2/26/22 17:13, Chapman Flack wrote:

This applies (with some fuzz) and passes installcheck-world, but a rebase
is needed, because 3 lines of context aren't enough to get the doc changes
in the right place in the aggregate function table. (I think generating
the patch with 4 lines of context would be enough to keep that from being
a recurring issue.)


Thank you for the review and the tip re 4 lines of context! Rebase attached.


One thing that seems a bit funny is this message in the new
multirange_agg_transfn:

+   if (!type_is_multirange(mltrngtypoid))
+   ereport(ERROR,
+   (errcode(ERRCODE_DATATYPE_MISMATCH),
+errmsg("range_agg must be called with a multirange")));


I agree it would be more helpful to users to let them know we can take 
either kind of argument. I changed the message to "range_agg must be 
called with a range or multirange". How does that seem?



I kind of wonder whether either message is really reachable, at least
through the aggregate machinery in the expected way. Won't that machinery
ensure that it is calling the right transfn with the right type of
argument? If that's the case, maybe the messages could only be seen
by someone calling the transfn directly ... which also seems ruled out
for these transfns because the state type is internal. Is this whole test
more of the nature of an assertion?


I don't think they are reachable, so perhaps they are more like asserts. 
Do you think I should change it? It seems like a worthwhile check in any 
case.


Yours,

--
Paul  ~{:-)
p...@illuminatedcomputing.comFrom a6689485aab9b1aaa6e866f2a577368c7a0e324e Mon Sep 17 00:00:00 2001
From: "Paul A. Jungwirth" 
Date: Fri, 10 Dec 2021 16:04:57 -0800
Subject: [PATCH v2] Add range_agg with multirange inputs

---
 doc/src/sgml/func.sgml|  30 ++
 src/backend/utils/adt/multirangetypes.c   |  62 ++-
 src/include/catalog/pg_aggregate.dat  |   3 +
 src/include/catalog/pg_proc.dat   |  11 ++
 src/test/regress/expected/multirangetypes.out | 100 ++
 src/test/regress/expected/opr_sanity.out  |   1 +
 src/test/regress/sql/multirangetypes.sql  |  21 
 src/test/regress/sql/opr_sanity.sql   |   1 +
 8 files changed, 228 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index df3cd5987b..6a72785327 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -19959,8 +19959,23 @@ SELECT NULLIF(value, '(none)') ...

No
   
 
+  
+   
+
+ range_agg
+
+range_agg ( value
+ anymultirange )
+anymultirange
+   
+   
+Computes the union of the non-null input values.
+   
+   No
+  
+
   

 
  max
@@ -20012,8 +20027,23 @@ SELECT NULLIF(value, '(none)') ...

No
   
 
+  
+   
+
+ range_intersect_agg
+
+range_intersect_agg ( value
+ anymultirange )
+anymultirange
+   
+   
+Computes the intersection of the non-null input values.
+   
+   No
+  
+
   

 
  range_intersect_agg
diff --git a/src/backend/utils/adt/multirangetypes.c b/src/backend/utils/adt/multirangetypes.c
index c474b24431..0efef8cf35 100644
--- a/src/backend/utils/adt/multirangetypes.c
+++ b/src/backend/utils/adt/multirangetypes.c
@@ -1346,9 +1346,9 @@ range_agg_transfn(PG_FUNCTION_ARGS)
 	rngtypoid = get_fn_expr_argtype(fcinfo->flinfo, 1);
 	if (!type_is_range(rngtypoid))
 		ereport(ERROR,
 (errcode(ERRCODE_DATATYPE_MISMATCH),
- errmsg("range_agg must be called with a range")));
+ errmsg("range_agg must be called with a range or multirange")));
 
 	if (PG_ARGISNULL(0))
 		state = initArrayResult(rngtypoid, aggContext, false);
 	else
@@ -1397,8 +1397,68 @@ range_agg_finalfn(PG_FUNCTION_ARGS)
 
 	PG_RETURN_MULTIRANGE_P(make_multirange(mltrngtypoid, typcache->rngtype, range_count, ranges));
 }
 
+/*
+ * multirange_agg_transfn: combine adjacent/overlapping multiranges.
+ *
+ * All we do here is gather the input multiranges' ranges into an array
+ * so that the finalfn can sort and combine them.
+ */
+Datum
+multirange_agg_transfn(PG_FUNCTION_ARGS)
+{
+	MemoryContext aggContext;
+	Oid			mltrngtypoid;
+	TypeCacheEntry *typcache;
+	TypeCacheEntry *rngtypcache;
+	ArrayBuildState *state;
+	MultirangeType *current;
+	int32		range_count;
+	RangeType **ranges;
+	int32		i;
+
+	if (!AggCheckCallContext(fcinfo, ))
+		elog(ERROR, "multirange_agg_transfn called in non-aggregate context");
+
+	mltrngtypoid = get_fn_expr_argtype(fcinfo->flinfo, 1);
+	if (!type_is_multirange(mltrngtypoid))
+		ereport(ERROR,
+(errcode(ERRCODE_DATATYPE_MISMATCH),
+ errmsg("range_agg must be called with a range or multirange")));
+
+	typcache = multirange_get_typcache(fcinfo, mltrngtypoid);
+	rngtypcache = typcache->rngtype;
+
+	if

Re: PATCH: add "--config-file=" option to pg_rewind

2022-02-28 Thread Michael Paquier

On Mon, Feb 28, 2022 at 08:48:23PM +0100, Gunnar "Nick" Bluth wrote:
> So, how should we call the global "find the * 'postgres' executable and
> boil out if that fails" function?
> char postgres_exec_path[MAXPGPATH] = findPostgresExec();
> ?

That would mean only a couple of lines gained, and the readability
gained is minimal, so I'd be fine to keep the code as-is.  I am not
sure about the full patch set yet, but the refactoring of the commands
to use PQExpBuffer is good by itself, so I have extracted this part of
the patch and applied that for now.
--
Michael


signature.asc
Description: PGP signature

Re: Allow root ownership of client certificate key

2022-02-28 Thread Tom Lane

Stephen Frost  writes:
> * Tom Lane (t...@sss.pgh.pa.us) wrote:
>> I'd be more eager to do that if we had some field complaints
>> about it.  Since we don't, my inclination is not to, but I'm
>> only -0.1 or so; anybody else want to vote?

> This patch was specifically developed in response to field complaints
> about it working differently, so there's that.

Hmm ... I didn't recall seeing any on the lists, but a bit of archive
searching found

https://www.postgresql.org/message-id/flat/20170213184323.6099.18278%40wrigleys.postgresql.org

wherein we'd considered the idea and rejected it, or at least decided
that we wanted finer-grained control than the server side needs.
So that's *a* field complaint.  But are we still worried about the
concerns that were raised there?

Re-reading, it looks like the submitter then wanted us to just drop the
prohibition of group-readability without tying it to root ownership,
which I feel would indeed be pretty dangerous given how many systems have
groups like "users".  But I don't think root-owned-group-readable is such
a problem: if you can create such a file then you can make one owned by
the calling user, too.

Anyway, I'd be happier about back-patching if we could document
actual requests to make it work like the server side does.

regards, tom lane

Re: Removing unneeded self joins

2022-02-28 Thread Greg Stark

I don't think the benchmarking that's needed is to check whether
pruning unnecessary joins is helpful. Obviously it's going to be hard
to measure on simple queries and small tables. But the resulting plan
is unambiguously superior and in more complex cases could extra i/o.

The benchmarking people were looking for in the past was testing the
impact of the extra planning work in cases where it doesn't end up
being applied. I'm not sure what the worst case is, perhaps a many-way
self-join where the join clauses are not suitable for pruning?

Re: Showing I/O timings spent reading/writing temp buffers in EXPLAIN

2022-02-28 Thread Masahiko Sawada

On Wed, Jan 19, 2022 at 5:52 PM Julien Rouhaud  wrote:
>
> Hi,
>
> On Tue, Nov 16, 2021 at 04:37:44PM +0900, Masahiko Sawada wrote:
> >
> > I've attached an updated patch. Please review it.
>
> It seems that the regression tests aren't entirely stable, per cfbot:
> https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest/36/3298.
>
> The failures look like:
>
> diff -U3 
> /tmp/cirrus-ci-build/src/test/recovery/../regress/expected/explain.out 
> /tmp/cirrus-ci-build/src/test/recovery/tmp_check/results/explain.out
> --- /tmp/cirrus-ci-build/src/test/recovery/../regress/expected/explain.out
>   2022-01-19 03:50:37.087931908 +
> +++ /tmp/cirrus-ci-build/src/test/recovery/tmp_check/results/explain.out  
>   2022-01-19 03:57:41.013616137 +
> @@ -512,9 +512,10 @@
> I/O Timings: temp read=N.N write=N.N
> ->  Function Scan on generate_series  (cost=N.N..N.N rows=N width=N) 
> (actual time=N.N..N.N rows=N loops=N)
>   I/O Timings: temp read=N.N write=N.N
> +   I/O Timings: shared/local read=N.N
>   Planning Time: N.N ms
>   Execution Time: N.N ms
> -(6 rows)
> +(7 rows)
>
>  select explain_filter('explain (analyze, buffers, format json) select 
> count(*) from generate_series(1, 10)');
>  explain_filter
>
>
> I don't see any obvious error in the code, so I'm wondering if it's simply
> the result of populating the cache with generate_series() info.

Thank you for reporting.

You're right, the regression test with track_io_timing = on is not
entirely stable because we may read catalog data during planning time,
resulting in an additional line in the EXPLAIN output. I've removed
the regression tests.

I've attached updated patches. I've included an improvement of
pg_stat_statement support to support temp I/O timing.


Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/


v3-0002-pg_stat_statements-Track-I-O-timing-for-temp-bloc.patch
Description: Binary data


v3-0001-Track-I-O-timing-for-temp-buffers.patch
Description: Binary data

RE: Optionally automatically disable logical replication subscriptions on error

2022-02-28 Thread osumi.takami...@fujitsu.com

On Friday, February 25, 2022 9:45 PM osumi.takami...@fujitsu.com 
 wrote:
> Kindly have a look at attached the v22.
> It has incorporated other improvements of TAP test, refinement of the
> DisableSubscriptionOnError function and so on.
The recent commit(7a85073) has changed the subscription workers
error handling. So, I rebased my disable_on_error patch first
for anyone who are interested in the review.

I'll incorporate incoming comments for v22 in my next version.

Best Regards,
Takamichi Osumi



v23-0001-Optionally-disable-subscriptions-on-error.patch
Description: v23-0001-Optionally-disable-subscriptions-on-error.patch

RE: Failed transaction statistics to measure the logical replication progress

2022-02-28 Thread osumi.takami...@fujitsu.com

On Friday, February 25, 2022 7:58 AM osumi.takami...@fujitsu.com 
 wrote:
> Kindly have a look at v24.
Hi.

The recent commit(7a85073) has redesigned the view pg_stat_subscription_workers
and now we have pg_stat_subscription_stats. Therefore, I rebased my patch
so that my statistics patch can be applied on top of the HEAD.

In the process of this rebase, I had to drop one column
that stored error count for unique errors(which was
incremented after confirming the error is not the same as previous
one), because the commit tentatively removes the same error check
mechanism.

Therefore, this patch has apply_commit_count, apply_rollback_count only.
I slightly changed minor changes as well so that those can
become more aligned.

Kindly please have a look at the patch.

Best Regards,
Takamichi Osumi



v25-0001-Extend-pg_stat_subscription_stats-to-include-gen.patch
Description:  v25-0001-Extend-pg_stat_subscription_stats-to-include-gen.patch

Re: Kerberos delegation support in libpq and postgres_fdw

2022-02-28 Thread Stephen Frost

Greetings,

(Dropping the original poster as their email address apparently no
longer works)

* Peter Eisentraut (peter.eisentr...@enterprisedb.com) wrote:
> On 22.07.21 10:39, Peifeng Qiu wrote:
> >I've slightly modified the patch to support "gssencmode" and added TAP
> >tests.
> 
> For the TAP tests, please put then under src/test/kerberos/, instead of
> copying the whole infrastructure to contrib/postgres_fdw/.  Just make a new
> file, for example t/002_postgres_fdw_proxy.pl, and put your tests there.

I've incorporated the tests into the existing kerberos/001_auth.pl as
there didn't seem any need to create another file.

> Also, you can put code and tests in one patch, no need to separate.

Done.  Also rebased and updated for the changes in the TAP testing
infrastructure and other changes.  Also added code to track if
credentials were forwarded or not and to log that information.

> I wonder if this feature would also work in dblink.  Since there is no
> substantial code changes in postgres_fdw itself as part of this patch, I
> would suspect yes.  Can you check?

Yup, this should work fine.  I didn't include any explicit testing of
postgres_fdw or dblink in this, yet.  Instead, for the moment at least,
I've added to the connection log message an indiciation of if
credentials were passed along with the connection along with tests of
both the negative case and the positive case.  Not sure if that's useful
information to have in pg_stat_gssapi, but if so, then we could add it
there pretty easily.

I'm happy to try and get testing with postgres_fdw and dblink working
soon though, assuming there aren't any particular objections to moving
this forward.

Will add to the CF for consideration.

Thanks,

Stephen
From c8aff4ae7595647fb5c82ea2f726c2d5b866765c Mon Sep 17 00:00:00 2001
From: Stephen Frost 
Date: Mon, 28 Feb 2022 20:17:55 -0500
Subject: [PATCH] Add support for Kerberos credential delegation

Accept GSSAPI/Kerberos delegated credentials.  With this, a user could
authenticate to PostgreSQL using Kerberos credentials, delegate
credentials to the PostgreSQL server, and then the PostgreSQL server
could use those credentials to connect to another service, such as with
postgres_fdw or dblink or theoretically any other authenticated
connection which is able to use delegated credentials.

Original patch by: Peifeng Qiu, whacked around some by me.
---
 .../postgres_fdw/expected/postgres_fdw.out|  2 +-
 contrib/postgres_fdw/option.c |  3 ++
 src/backend/libpq/auth.c  | 12 -
 src/backend/libpq/be-gssapi-common.c  | 44 +++
 src/backend/libpq/be-secure-gssapi.c  | 26 ++-
 src/backend/utils/init/postinit.c |  8 ++--
 src/include/libpq/be-gssapi-common.h  |  3 ++
 src/include/libpq/libpq-be.h  |  2 +
 src/interfaces/libpq/fe-auth.c|  9 +++-
 src/interfaces/libpq/fe-secure-gssapi.c   |  2 +-
 src/test/kerberos/t/001_auth.pl   | 25 +++
 src/test/perl/PostgreSQL/Test/Utils.pm| 27 
 12 files changed, 146 insertions(+), 17 deletions(-)

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index f210f91188..a73a468d89 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -188,7 +188,7 @@ ALTER USER MAPPING FOR public SERVER testserver1
 ALTER USER MAPPING FOR public SERVER testserver1
 	OPTIONS (ADD sslmode 'require');
 ERROR:  invalid option "sslmode"
-HINT:  Valid options in this context are: user, password, sslpassword, password_required, sslcert, sslkey
+HINT:  Valid options in this context are: user, password, sslpassword, password_required, sslcert, sslkey, gssencmode
 -- But we can add valid ones fine
 ALTER USER MAPPING FOR public SERVER testserver1
 	OPTIONS (ADD sslpassword 'dummy');
diff --git a/contrib/postgres_fdw/option.c b/contrib/postgres_fdw/option.c
index 572591a558..05922cfe6d 100644
--- a/contrib/postgres_fdw/option.c
+++ b/contrib/postgres_fdw/option.c
@@ -262,6 +262,9 @@ InitPgFdwOptions(void)
 		{"sslcert", UserMappingRelationId, true},
 		{"sslkey", UserMappingRelationId, true},
 
+		/* gssencmode is also libpq option, same to above. */
+		{"gssencmode", UserMappingRelationId, true},
+
 		{NULL, InvalidOid, false}
 	};
 
diff --git a/src/backend/libpq/auth.c b/src/backend/libpq/auth.c
index efc53f3135..6f820a34f1 100644
--- a/src/backend/libpq/auth.c
+++ b/src/backend/libpq/auth.c
@@ -920,6 +920,7 @@ pg_GSS_recvauth(Port *port)
 	int			mtype;
 	StringInfoData buf;
 	gss_buffer_desc gbuf;
+	gss_cred_id_t proxy;
 
 	/*
 	 * Use the configured keytab, if there is one.  Unfortunately, Heimdal
@@ -949,6 +950,9 @@ pg_GSS_recvauth(Port *port)
 	 */
 	port->gss->ctx = GSS_C_NO_CONTEXT;
 
+	proxy = NULL;
+	port->gss->proxy_creds = false;
+
 	/*
 	 * Loop through GSSAPI message exchange. This exchange can consist

Re: Use "WAL segment" instead of "log segment" consistently in user-facing messages

2022-02-28 Thread Kyotaro Horiguchi

At Mon, 28 Feb 2022 21:03:07 +0530, Bharath Rupireddy 
 wrote in 
> Hi,
> 
> It looks like we use "log segment" in various user-facing messages.
> The term "log" can mean server logs as well. The "WAL segment" suits
> well here and it is consistently used across the other user-facing
> messages [1].
> 
> Here's a small patch attempting to consistently use the "WAL segment".
> 
> Thoughts?

I tend to agree to this.  I also see "log record(s)" (without prefixed
by "write-ahead") in many places especially in the documentation.  I'm
not sure how we should treat "WAL log", though.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: definition of CalculateMaxmumSafeLSN

2022-02-28 Thread Kyotaro Horiguchi

At Mon, 28 Feb 2022 17:01:10 +0300, Sergei Kornilov  wrote in 
> Hello
> I just spotted in src/include/access/xlog.h:
> extern XLogRecPtr CalculateMaxmumSafeLSN(void);
> 
> This function doesn't seem to be used anywhere or even defined? "git grep 
> CalculateMaxmumSafeLSN" shows only this line.
> Was added in c6550776394e25c1620bc8258427c8f1d448080d "Allow users to limit 
> storage reserved by replication slots"

Hmm.  I think I remember of that name. You're right, it is a function
that was once added during development of the commit but eventually
gone.

So it should be removed.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Optionally automatically disable logical replication subscriptions on error

2022-02-28 Thread Peter Smith

Please see below my review comments for v22.

==

1. Commit message

"table sync worker" -> "tablesync worker"

~~~

2. doc/src/sgml/catalogs.sgml

+  
+   If true, the subscription will be disabled when subscription
+   workers detect any errors
+  

It felt a bit strange to say "subscription" 2x in the sentence, but I
am not sure how to improve it. Maybe like below?

BEFORE
If true, the subscription will be disabled when subscription workers
detect any errors

SUGGESTED
If true, the subscription will be disabled if one of its workers
detects an error

~~~

3. src/backend/replication/logical/worker.c - DisableSubscriptionOnError

@@ -2802,6 +2803,69 @@ LogicalRepApplyLoop(XLogRecPtr last_received)
 }

 /*
+ * Disable the current subscription, after error recovery processing.
+ */
+static void
+DisableSubscriptionOnError(void)

I thought the "after error recovery processing" part was a bit generic
and did not really say what it was doing.

BEFORE
Disable the current subscription, after error recovery processing.
SUGGESTED
Disable the current subscription, after logging the error that caused
this function to be called.

~~~

4. src/backend/replication/logical/worker.c - start_apply

+ if (MySubscription->disableonerr)
+ {
+ DisableSubscriptionOnError();
+ return;
+ }
+
+ MemoryContextSwitchTo(ecxt);
+ PG_RE_THROW();
+ }
+ PG_END_TRY();

The current code looks correct, but I felt it is a bit tricky to
easily see the execution path after the return.

Since it will effectively just exit anyhow I think it will be simpler
just to do that explicitly right here instead of the 'return'. This
will also make the code consistent with the same 'disableonerr' logic
of the start_start_sync.

SUGGESTION
if (MySubscription->disableonerr)
{
DisableSubscriptionOnError();
proc_exit(0);
}

~~~

5. src/bin/pg_dump/pg_dump.c

@@ -4463,6 +4473,9 @@ dumpSubscription(Archive *fout, const
SubscriptionInfo *subinfo)
  if (strcmp(subinfo->subtwophasestate, two_phase_disabled) != 0)
  appendPQExpBufferStr(query, ", two_phase = on");

+ if (strcmp(subinfo->subdisableonerr, "f") != 0)
+ appendPQExpBufferStr(query, ", disable_on_error = true");
+

Although the code is correct, I think it would be more natural to set
this option as true when the user wants it true. e.g. check for "t"
same as 'subbinary' is doing. This way, even if there was some
unknown/corrupted value the code would do nothing, which is the
behaviour you want...

SUGGESTION
if (strcmp(subinfo->subdisableonerr, "t") == 0)

~~~

6. src/include/catalog/pg_subscription.h

@@ -67,6 +67,9 @@ CATALOG(pg_subscription,6100,SubscriptionRelationId)
BKI_SHARED_RELATION BKI_ROW

  char subtwophasestate; /* Stream two-phase transactions */

+ bool subdisableonerr; /* True if occurrence of apply errors
+ * should disable the subscription */

The comment seems not quite right because it's not just about apply
errors. E.g. I think any error in tablesync will cause disablement
too.

BEFORE
True if occurrence of apply errors should disable the subscription
SUGGESTED
True if a worker error should cause the subscription to be disabled

~~~

7. src/test/regress/sql/subscription.sql - whitespace

+-- now it works
+CREATE SUBSCRIPTION regress_testsub CONNECTION
'dbname=regress_doesnotexist' PUBLICATION testpub WITH (connect =
false, disable_on_error = false);
+
+\dRs+
+
+ALTER SUBSCRIPTION regress_testsub SET (disable_on_error = true);
+
+\dRs+
+ALTER SUBSCRIPTION regress_testsub SET (slot_name = NONE);
+DROP SUBSCRIPTION regress_testsub;
+

I think should be a blank line after that last \dRs+ just like the
other one, because it belongs logically with the code above it, not
with the ALTER slot_name.

~~~

8. src/test/subscription/t/028_disable_on_error.pl - filename

The 028 number needs to be bumped because there is already a TAP test
called 028 now

~~~

9. src/test/subscription/t/028_disable_on_error.pl - missing test

There was no test case for the last combination where the user correct
the apply worker problem: E.g. After a previous error/disable of the
subscriber, remove the index, publish the inserts again, and check
they get applied properly.

--
Kind Regards,
Peter Smith.
Fujitsu Australia

Re: Showing I/O timings spent reading/writing temp buffers in EXPLAIN

2022-02-28 Thread Masahiko Sawada

Hi,

Sorry for the late reply.

On Fri, Nov 19, 2021 at 7:24 AM Melanie Plageman
 wrote:
>
> On Sun, Aug 22, 2021 at 9:47 PM Masahiko Sawada  wrote:
> >
> > On Thu, Aug 19, 2021 at 10:52 PM Ranier Vilela  wrote:
> > >
> > > Em qui., 19 de ago. de 2021 às 09:21, Masahiko Sawada 
> > >  escreveu:
> >
> > > The presentation seems a little confusing, wouldn't it be better?
> > >
> > > I/O Timings: shared/local read= write=xxx temp read=0.487 write=2.073
> >
> > Yeah, it looks better to add "shared/local".
>
>  Using the patch, I do feel like the EXPLAIN format of
>
>  shared/local xxx, temp xxx
>
>  is a bit confusing. If temp is going to be its own EXPLAIN IO timing
>  category (as opposed to being summed with relation data block IO from
>  local and shared buffers), then it seems like local and shared should
>  be separated as well.
>
>  shared xxx, local xxx, temp xxx

I think the I/O timing shown as shared/local is the time spent on disk
I/O so it doesn't matter if these disk I/O are for shared buffers or
local buffers.

>
>  With your patch applied, below is the top of the EXPLAIN output for a
>  query joining a temporary table (so using local buffers) to a regular
>  table (using shared buffers) and spilling the hash join (temp files).
>
> Aggregate (actual rows=1 loops=1)
>   Buffers: shared read=4425, local read=4425 dirtied=4425
> written=4423, temp read=5963 written=5963
>   I/O Timings: shared/local read=23.546, temp read=13.309 write=74.198
>
>  I found that using the same terminology as the "EXPLAIN BUFFERS" output
>  but not using the same categories was kind of confusing.
>
>  If it is only meaningful to distinguish between relation data IO and
>  query temp file IO, then maybe the words used in I/O Timings in EXPLAIN
>  should be "rel data" and "temp" or something like that.

But if we do that, we end up using different terminology in "I/O
Timing" and "Buffers". I think it's better to use consistent words
used in them.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

Re: Allow root ownership of client certificate key

2022-02-28 Thread Stephen Frost

Greetings,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> David Steele  writes:
> > Any thoughts on back-patching at least the client portion of this? 
> > Probably hard to argue that it's a bug, but it is certainly painful.
> 
> I'd be more eager to do that if we had some field complaints
> about it.  Since we don't, my inclination is not to, but I'm
> only -0.1 or so; anybody else want to vote?

This patch was specifically developed in response to field complaints
about it working differently, so there's that.  Currently it's being
worked around in the container environments by copying the key from the
secret that's provided to a temporary space where we can modify the
privileges, but that's pretty terrible.  Would be great to be able to
get rid of that in favor of being able to use it directly.

Thanks,

Stephen

signature.asc
Description: PGP signature

Re: logical replication restrictions

2022-02-28 Thread Euler Taveira

On Wed, Sep 22, 2021, at 1:57 PM, Euler Taveira wrote:
> On Wed, Sep 22, 2021, at 1:18 AM, Amit Kapila wrote:
>> On Tue, Sep 21, 2021 at 4:21 PM Marcos Pegoraro  wrote:
>>> No, I´m talking about that configuration you can have on standby servers
>>> recovery_min_apply_delay = '8h'
>>> 
>> 
>> oh okay, I think this can be useful in some cases where we want to avoid 
>> data loss similar to its use for physical standby. For example, if the user 
>> has by mistake truncated the table (or deleted some required data) on the 
>> publisher, we can always it from the subscriber if we have such a feature.
>> 
>> Having said that, I am not sure if we can call it a restriction. It is more 
>> of a TODO kind of thing. It doesn't sound advisable to me to keep growing 
>> the current Restrictions page [1].
> It is a new feature. pglogical supports it and it is useful for delayed
> secondary server and if, for some business reason, you have to delay when data
> is available. There might be other use cases but these are the ones I 
> regularly
> heard from customers.
> 
> BTW, I have a WIP patch for this feature. I didn't have enough time to post it
> because it lacks documentation and tests. I'm planning to do it as soon as 
> this
> CF ends.
Long time, no patch. Here it is. I will provide documentation in the next
version. I would appreciate some feedback.


--
Euler Taveira
EDB   https://www.enterprisedb.com/
From 9635fec1a031b82ec5d67cdfe16aa1f553ffa936 Mon Sep 17 00:00:00 2001
From: Euler Taveira 
Date: Sat, 6 Nov 2021 11:31:10 -0300
Subject: [PATCH v1] Time-delayed logical replication subscriber

Similar to physical replication, a time-delayed copy of the data is
useful for some scenarios (specially to fix errors that might cause data
loss).

If the subscriber sets apply_delay parameter, the logical replication
worker will delay the transaction commit for apply_delay milliseconds.

Discussion: https://postgr.es/m/CAB-JLwYOYwL=xtyaxkih5ctm_vm8kjkh7aaitckvmch4rzr...@mail.gmail.com
---
 src/backend/catalog/pg_subscription.c  |  1 +
 src/backend/catalog/system_views.sql   |  3 +-
 src/backend/commands/subscriptioncmds.c| 44 +-
 src/backend/replication/logical/worker.c   | 48 +++
 src/backend/utils/adt/timestamp.c  |  8 ++
 src/bin/pg_dump/pg_dump.c  | 16 +++-
 src/bin/pg_dump/pg_dump.h  |  1 +
 src/bin/psql/describe.c|  8 +-
 src/include/catalog/pg_subscription.h  |  3 +
 src/include/utils/timestamp.h  |  2 +
 src/test/regress/expected/subscription.out | 96 +++---
 src/test/subscription/t/029_apply_delay.pl | 71 
 12 files changed, 248 insertions(+), 53 deletions(-)
 create mode 100644 src/test/subscription/t/029_apply_delay.pl

diff --git a/src/backend/catalog/pg_subscription.c b/src/backend/catalog/pg_subscription.c
index ca65a8bd20..0788384579 100644
--- a/src/backend/catalog/pg_subscription.c
+++ b/src/backend/catalog/pg_subscription.c
@@ -69,6 +69,7 @@ GetSubscription(Oid subid, bool missing_ok)
 	sub->binary = subform->subbinary;
 	sub->stream = subform->substream;
 	sub->twophasestate = subform->subtwophasestate;
+	sub->applydelay = subform->subapplydelay;
 
 	/* Get conninfo */
 	datum = SysCacheGetAttr(SUBSCRIPTIONOID,
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 3cb69b1f87..1cc0d86f2e 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1261,7 +1261,8 @@ REVOKE ALL ON pg_replication_origin_status FROM public;
 -- All columns of pg_subscription except subconninfo are publicly readable.
 REVOKE ALL ON pg_subscription FROM public;
 GRANT SELECT (oid, subdbid, subname, subowner, subenabled, subbinary,
-  substream, subtwophasestate, subslotname, subsynccommit, subpublications)
+  substream, subtwophasestate, subslotname, subsynccommit,
+  subapplydelay, subpublications)
 ON pg_subscription TO public;
 
 CREATE VIEW pg_stat_subscription_workers AS
diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c
index 3ef6607d24..19916f04a8 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -46,6 +46,7 @@
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
 #include "utils/syscache.h"
+#include "utils/timestamp.h"
 
 /*
  * Options that can be specified by the user in CREATE/ALTER SUBSCRIPTION
@@ -61,6 +62,7 @@
 #define SUBOPT_BINARY0x0080
 #define SUBOPT_STREAMING			0x0100
 #define SUBOPT_TWOPHASE_COMMIT		0x0200
+#define SUBOPT_APPLY_DELAY			0x0400
 
 /* check if the 'val' has 'bits' set */
 #define IsSet(val, bits)  (((val) & (bits)) == (bits))
@@ -82,6 +84,7 @@ typedef struct SubOpts
 	bool		binary;
 	bool		streaming;
 	bool		twophase;
+	int64		apply_delay;
 } SubOpts;
 
 static List *fetch_table_list(WalReceiverConn *wrconn, List

Re: Allow root ownership of client certificate key

2022-02-28 Thread Tom Lane

David Steele  writes:
> Any thoughts on back-patching at least the client portion of this? 
> Probably hard to argue that it's a bug, but it is certainly painful.

I'd be more eager to do that if we had some field complaints
about it.  Since we don't, my inclination is not to, but I'm
only -0.1 or so; anybody else want to vote?

regards, tom lane

Re: Removing unneeded self joins

2022-02-28 Thread Jaime Casanova

On Thu, Jul 15, 2021 at 05:49:11PM +0300, Andrey Lepikhov wrote:
> On 6/7/21 13:49, Hywel Carver wrote:
> > On Mon, Jul 5, 2021 at 2:20 PM Andrey Lepikhov
> > mailto:a.lepik...@postgrespro.ru>> wrote:
> > Looking through the email chain, a previous version of this patch added
> > ~0.6% to planning time in the worst case tested - does that meet the
> > "essentially free" requirement?
> I think these tests weren't full coverage of possible use cases. It will
> depend on a number of relations in the query. For the JOIN of partitioned
> tables, for example, the overhead could grow. But in the context of overall
> planning time this overhead will be small till the large number of
> relations.
> Also, we made this feature optional to solve possible problems.
> Rebased on 768ea9bcf9
> 

I made some tests in a machine with 16 cores and 32GB of RAM.
So we can see if this is an improvement.

This is what I found:

+---+--+---+---+---+---+---+
| test  |   mode   |  master   |  enabled  |   %   | disabled  
|   %   |
+---+--+---+---+---+---+---+
| pgbench read only | standard |  64418.13 |  63942.94 | -0.74 |  62231.38 
| -3.39 |
| pgbench read only | prepared | 108463.51 | 107002.13 | -1.35 | 100960.83 
| -6.92 |
| pgbench read only | extended |  55409.65 |  56427.63 |  1.84 |  55927.62 
|  0.93 |
+---+--+---+---+---+---+---+
| pgbench read/write| standard |   9374.91 |   9135.21 | -2.56 |   8840.68 
| -5.70 |
| pgbench read/write| prepared |  11849.86 |  11672.23 | -1.50 |  11393.39 
| -3.85 |
| pgbench read/write| extended |   7976.80 |   7947.07 | -0.37 |   7788.99 
| -2.35 |
+---+--+---+---+---+---+---+
| select non optimize 1 | standard | 80.97 | 81.29 |  0.40 | 81.30 
|  0.41 |
| select non optimize 1 | prepared | 81.29 | 81.28 | -0.01 | 80.89 
| -0.49 |
| select non optimize 1 | extended | 81.07 | 80.81 | -0.32 | 80.98 
| -0.11 |
+---+--+---+---+---+---+---+
| select optimized 1| standard | 15.84 | 13.90 |-12.25 | 15.80 
| -0.25 |
| select optimized 1| prepared | 15.24 | 13.82 | -9.32 | 15.55 
|  2.03 |
| select optimized 1| extended | 15.38 | 13.89 | -9.69 | 15.59 
|  1.37 |
+---+--+---+---+---+---+---+
| select optimized 2| standard |  10204.91 |  10818.39 |  6.01 |  10261.07 
|  0.55 |
| select optimized 2| prepared |  13284.06 |  15579.33 | 17.28 |  13116.22 
| -1.26 |
| select optimized 2| extended |  10143.43 |  10645.23 |  4.95 |  10142.77 
| -0.01 |
+---+--+---+---+---+---+---+
| select shoe   | standard |   5645.28 |   5661.71 |  0.29 |   6180.60 
|  9.48 |
| select shoe   | prepared |   9660.45 |   9602.37 | -0.60 |   9894.82 
|  2.43 |
| select shoe   | extended |   5666.47 |   5634.10 | -0.57 |   5757.26 
|  1.60 |
+---+--+---+---+---+---+---+

Obviously the pgbench runs are from the standard script. The numbers are
not clear for me, I can see improvementes with the patch only in one
case and, for some reason, if I disable the patch
(enable_self_join_removal='off') I still see a regression in normal
cases and curiosly an improvement in one case.

I'm attaching the queries. I used the users table that is down-thread
and loaded with ~200k rows using:

insert into users 
select seq, case when random() < 0.2 then null else random() * 1000 end, 
   random() * 1 
  from generate_series(1, 100) seq 
  on conflict (nullable_int) do nothing;

for master I just dumped the data from the table and loaded it. I'm also
attaching the queries I used.

After this tests, I'm not convinced this is actually providing something
performance-wise. At least not in its current state.

-- 
Jaime Casanova
Director de Servicios Profesionales
SystemGuards - Consultores de PostgreSQL


select_nonoptimize1.sql
Description: application/sql


select_optimize1.sql
Description: application/sql


select_optimize2.sql
Description: application/sql


select_shoe.sql
Description: application/sql

Re: Allow root ownership of client certificate key

2022-02-28 Thread David Steele


On 2/28/22 13:20, Tom Lane wrote:

David Steele  writes:

[ client-key-perm-003.patch ]


Pushed with a bit of copy-editing of the comments.


Thank you!

Any thoughts on back-patching at least the client portion of this? 
Probably hard to argue that it's a bug, but it is certainly painful.


Regards,
-David

Re: Overflow of attmissingval is not handled gracefully

2022-02-28 Thread Tom Lane

Andrew Dunstan  writes:
> On 2/28/22 18:21, Tom Lane wrote:
>> regression=# create table foo(f1 int);
>> CREATE TABLE
>> regression=# alter table foo add column bar text default repeat('xyzzy', 
>> 100);
>> ERROR:  row is too big: size 57416, maximum size 8160
>> 
>> I think the simplest answer, and likely the only feasible one for
>> the back branches, is to disable the attmissingval optimization
>> if the proposed value is "too large".  Not sure exactly where the
>> threshold for that ought to be, but maybe BLCKSZ/8 could be a
>> starting offer.

> WFM. After all, it's taken several years for this to surface. Is it
> based on actual field experience?

No, it was an experiment that occurred to me while thinking about
the nearby proposal to add a TOAST table to pg_attribute [1].
If we do that, this restriction could be dropped.  But I agree that
there's hardly any practical use-case for such default values,
so I wouldn't mind living with the de-optimization either.

regards, tom lane

[1] 
https://www.postgresql.org/message-id/flat/1643112264.186902...@f325.i.mail.ru

Re: Overflow of attmissingval is not handled gracefully

2022-02-28 Thread Andrew Dunstan



On 2/28/22 18:21, Tom Lane wrote:
> Consider this admittedly-rather-contrived example:
>
> regression=# create table foo(f1 int);
> CREATE TABLE
> regression=# alter table foo add column bar text default repeat('xyzzy', 
> 100);
> ERROR:  row is too big: size 57416, maximum size 8160
>
> Since the table contains no rows at all, this is a surprising
> failure.  The reason for it of course is that pg_attribute
> has no TOAST table, so it can't store indefinitely large
> attmissingval fields.
>
> I think the simplest answer, and likely the only feasible one for
> the back branches, is to disable the attmissingval optimization
> if the proposed value is "too large".  Not sure exactly where the
> threshold for that ought to be, but maybe BLCKSZ/8 could be a
> starting offer.
>
>   


WFM. After all, it's taken several years for this to surface. Is it
based on actual field experience?


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Overflow of attmissingval is not handled gracefully

2022-02-28 Thread Tom Lane

Consider this admittedly-rather-contrived example:

regression=# create table foo(f1 int);
CREATE TABLE
regression=# alter table foo add column bar text default repeat('xyzzy', 
100);
ERROR:  row is too big: size 57416, maximum size 8160

Since the table contains no rows at all, this is a surprising
failure.  The reason for it of course is that pg_attribute
has no TOAST table, so it can't store indefinitely large
attmissingval fields.

I think the simplest answer, and likely the only feasible one for
the back branches, is to disable the attmissingval optimization
if the proposed value is "too large".  Not sure exactly where the
threshold for that ought to be, but maybe BLCKSZ/8 could be a
starting offer.

regards, tom lane

Re: [PATCH] Add TOAST support for several system tables

2022-02-28 Thread Tom Lane

=?UTF-8?B?U29maWEgS29waWtvdmE=?=  writes:
> ACL lists in tables may potentially be large in size. I suggest adding TOAST 
> support for system tables, namely pg_class, pg_attribute and 
> pg_largeobject_metadata, for they include ACL columns.

TBH, the idea of adding a toast table to pg_class scares the daylights
out of me.  I do not for one minute believe that you've fixed every
problem that will cause, nor do I think "allow wider ACLs" is a
compelling enough reason to take the risk.

I wonder if it'd be practical to move the ACLs for relations
and attributes into some new table, indexed like pg_depend or
pg_description, so that we could dodge that whole problem.
pg_attrdef is a prototype for how this could work.

(Obviously, once we had such a table the ACLs for other things
could be put in it, but I'm not sure that the ensuing breakage
would be justified for other sorts of objects.)

regards, tom lane

MSVC build system installs extra executables

2022-02-28 Thread Andrew Dunstan



I don't know if this has been discussed before, but I mentioned recently
()
that I think the MSVC build system is too eager about installing
executables it builds. In particular, it installs these binaries for
which the analogs are not installed by the makefile system:


isolationtester.exe

libpq_pipeline.exe

pg_isolation_regress.exe

pg_regress_ecpg.exe

pg_regress.exe

zic.exe


Do we want to do anything about that? ISTM we should be installing
identical sets of binaries as far as possible. The installation of
libpq_pipeline.exe is apparently what has led Justin Pryzby to think
it's OK to undo the effect of commit f4ce6c4d3a on vcregress.pl.


cheers


andrew



--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: [PATCH] add relation and block-level filtering to pg_waldump

2022-02-28 Thread Cary Huang

The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   tested, passed
Documentation:tested, passed

Hi

I am glad to find this patch here because it helps with my current development 
work, which involves a lot of debugging with the WAL records and this patch 
definitely make this much easier rather than using grep externally. 

I have tried all of the new options added by the patch and every combination 
seems to result correctly. 

The only comment I would have is in the documentation, where I would replace:
"Display only records touching the given block" with "Display only records 
associated with the given block" 
"Display only records touching the given relation" with " Display only records 
associated with the given relation"

just to make it sound more formal. :)

best

Cary Huang
--
HighGo Software Canada
www.highgo.ca

stale statistics on postgres 14

2022-02-28 Thread Jaime Casanova

Hi everyone,

I have been doing some tests on a little server (at least compared to
some others around). It have 128 cores (64 physical), 128GB of RAM and
against my will a 4-disk (10k RPM) RAID5.

I have been using hammerdb testing from 9.6 to 14, currently on windows
10. Obviously, I have been recording changes on performance.

Hammerdb results shows 2 numbers, NOPM and TPM the second one is
calculated using statistics.

Sadly, on 14 I saw a lowered number of TPM while NOPM kept on the
average (at least is the average since 11).

The reason that on 14 the TPM number dropped is because it's based 
on the statistics[1] which of course are stalled[2].

I consider this a regression because no other postgres version had 
this problem on the same machine and the same OS. Anything I can do to
track what caused this regression?

[1] Query from hammerdb to get TPM number:
select sum(xact_commit + xact_rollback) 
  from pg_stat_database

[2] Message from the log, saying what is obvious
LOG:  0: using stale statistics instead of current ones because
stats collector is not responding

-- 
Jaime Casanova
Director de Servicios Profesionales
SystemGuards - Consultores de PostgreSQL

Re: real/float example for testlibpq3

2022-02-28 Thread Tom Lane

Chapman Flack  writes:
> This stimulates a question for me.

> Not just libpq, but all drivers implemented in whatever language, if they
> wish to support binary protocol, depend on knowing what the committed
> send/recv wire formats are for whichever types they mean to support.

> In the current state of affairs, what's considered the ur-source of that
> information?

The source code for the type's send/receive functions :-(.  One could
wish for something better, but no one has stepped up to produce such
documentation.

regards, tom lane

Re: Separate the result of \watch for each query execution (psql)

2022-02-28 Thread Tom Lane

Noboru Saito  writes:
> I have created a patch that allows you to turn it on and off in \pset.
> The attached patch adds the following features.
> Formfeed can be turned on with the command line option or \pset.
> Formfeed (\f\n) is output after the query execution result by \watch.

Hmm ... I grant your use-case for this, but I think the patch
is too narrow-minded, because it supposes that the only string
anybody could wish to output between \watch commands is "\f\n".
Once you open the floodgates of inserting formatting there,
ISTM that people might want other things.

Also, I'm not that thrilled with treating this as a \pset option,
because it has nothing to do with formatting of normal query
results.  (IMV anyway, perhaps others will disagree.)

How about instead of defining fixed semantics, we invent a psql
special variable that can contain a string to be output between
\watch commands?  It looks like you could then set it through
a command like

\set WATCH_SEPARATOR '\f\n'

(not wedded to that variable name, it's just the first idea
that came to mind)

Personally I'd not bother with inventing a specialized command-line
option to set it, either.  There's already -v and friends.

> * Is formfeed output after the result, not before?

Or we could invent WATCH_BEFORE and WATCH_AFTER ...

regards, tom lane

Re: real/float example for testlibpq3

2022-02-28 Thread Chapman Flack

On 02/28/22 10:19, Tom Lane wrote:
>> That will standardize the
>> way to fetch real typed values in libpq. That leads to the next
>> question. Do we need to introduce different PQget*value() for standard
>> C/SQL data types.
> 
> ... I do not want to go here.  Where would you stop?  How would you
> deal with cross-machine inconsistencies in integer widths?

This stimulates a question for me.

Not just libpq, but all drivers implemented in whatever language, if they
wish to support binary protocol, depend on knowing what the committed
send/recv wire formats are for whichever types they mean to support.

In the current state of affairs, what's considered the ur-source of that
information?

I have often seen those formats documented in code comments, usually above
the recv function in the .c file for a given adt.

Have we got any more, well, machine-readable collection of that knowledge?

Regards,
-Chap

Re: Removing unneeded self joins

2022-02-28 Thread Greg Stark

On Thu, 1 Jul 2021 at 02:38, Ronan Dunklau  wrote:
>
> Well in some cases they can't, when the query is not emitting redundant
> predicates by itself but they are added by something else like a view or a RLS
> policy.
> Maybe it would be worth it to allow spending a bit more time planning for
> those cases ?

Yeah, I'm generally in favour of doing more work in the optimizer to
save query authors work writing queries.

My question is whether it handles cases like:

select b.x,c.y
  from t
   join t2 as b on (b.id = t.id)
  join t2 as c on (c.id = t.id)

That is, if you join against the same table twice on the same qual.
Does the EC mechanism turn this into a qual on b.id = c.id and then
turn this into a self-join that can be removed?

That's the usual pattern I've seen this arise. Not so much that people
write self joins explicitly but that they add a join to check some
column but that is happening in some isolated piece of code that
doesn't know that that join is already in the query. You can easily
end up with a lot of joins against the same table this way.

It's not far different from the old chestnut

select (select x from t2 where id = t.id) as x,
   (select y from t2 where id = t.id) as y
  from t

which is actually pretty hard to avoid sometimes.

-- 
greg

Re: Patch: Code comments: why some text-handling functions are leakproof

2022-02-28 Thread Tom Lane

Robert Haas  writes:
> On Tue, Jan 11, 2022 at 2:07 AM Gurjeet Singh  wrote:
>> This is more or less a verbatim copy of Tom's comment in email thread at [1].
>> 
>> I could not find an appropriate spot to place these comments, so I placed 
>> them on bttextcmp() function, The only other place that I could see we can 
>> place these comments is in the file src/backend/optimizer/README, because 
>> there is some consideration given to leakproof functions in optimizer docs. 
>> But these comments seem quite out of place in optimizer docs.

> It doesn't seem particularly likely that someone who is thinking about
> changing this in the future would notice the comment in the place
> where you propose to put it, nor that they would read the optimizer
> README.

Agreed.  I think if we wanted to make an upgrade in the way function
leakproofness is documented, we ought to add a  about it in
xfunc.sgml, adjacent to the one about function volatility categories.
This could perhaps consolidate some of the existing documentation mentions
of leakproofness, as well as adding text similar to what Gurjeet suggests.

> Furthermore, I don't know that everyone agrees with Tom about this. I
> do agree that it's more important to mark relational operators
> leakproof than other things, and I also agree that conservatism is
> warranted. But that does not mean that someone could not make a
> compelling argument for marking other functions leakproof.

ISTM the proposed text does a reasonable job of explaining why
we made the decisions currently embedded in pg_proc.proleakproof.
If we make some other decisions in future, updating the rationale
in the docs would be an appropriate part of that.

regards, tom lane

Re: [PATCH] Expose port->authn_id to extensions and triggers

2022-02-28 Thread Jacob Champion

On Mon, 2022-02-28 at 16:00 -0500, Stephen Frost wrote:
> > commit efec9f040843d1de2fc52f5ce0d020478a5bc75d
> > Author: Jacob Champion 
> > Date:   Mon Feb 28 10:28:51 2022 -0800
> > 
> > squash! Add API to retrieve authn_id from SQL
> 
> Bleh. :)  Squash indeed.

Ha, I wasn't sure if anyone read the since-diffs :) I'll start
wordsmithing them more in the future.

> > Subject: [PATCH v4] Add API to retrieve authn_id from SQL
> > 
> > The authn_id field in MyProcPort is currently only accessible to the
> > backend itself.  Add a SQL function, pg_session_authn_id(), to expose
> > the field to triggers that may want to make use of it.
> 
> Only did a quick look but generally looks reasonable to me.

Thanks!

--Jacob

Re: Proposal: Support custom authentication methods using hooks

2022-02-28 Thread Stephen Frost

Greetings,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Stephen Frost  writes:
> > md5 should be removed.
> 
> Really? I've always thought that the arguments against it were
> overblown for our use-case.  At any rate, it's likely to be
> years before we could practically do that, since it's the best
> that older client libraries can manage.

Yes, really, it's a known-broken system which suffers from such an old
and well known attack that it's been given a name: pass-the-hash.  As
was discussed on this thread even, just the fact that it's not trivial
to break on the wire doesn't make it not-broken, particularly when we
use the username (which is rather commonly the same one used across
multiple systems..) as the salt.  Worse, md5 isn't exactly the pinnacle
of hashing techniques around these days.

The wiki page goes over it in some detail regarding LM/NTLM which
suffers the same problem (and also uses a challenge-response for the
over-the-network bits): https://en.wikipedia.org/wiki/Pass_the_hash

Further, a whole bunch of effort was put in to get scram support added
to the different libraries and language bindings and such, specifically
to allow us to get to a point where we can drop md5.  Even after it's
removed, folks will have 5 years before the release that removes it is
the oldest supported release.  I don't think we'll somehow get agreement
to remove it for v15, so it'll be 5 major versions of overlap (11->15)
by the time v16 comes out, and a total of 10 years of support for scram
before md5 is gone.

That's plenty, it's time to move on.

Keeping it around will just push out the point at which everyone will
finally be done with it, as there's really only two groups: those who
have already moved to scram, and those who won't move until they want to
upgrade to a release that doesn't have md5.

Thanks,

Stephen

signature.asc
Description: PGP signature

Re: Proposal: Support custom authentication methods using hooks

2022-02-28 Thread Tom Lane

Stephen Frost  writes:
> md5 should be removed.

Really? I've always thought that the arguments against it were
overblown for our use-case.  At any rate, it's likely to be
years before we could practically do that, since it's the best
that older client libraries can manage.

regards, tom lane

Re: Missed condition-variable wakeups on FreeBSD

2022-02-28 Thread Thomas Munro

On Sun, Feb 27, 2022 at 9:44 AM Andres Freund  wrote:
> > (gdb) p debug_query_string
> > $1 = 0x21873090 "select count(*) from simple r join simple s using (id);"
> > (gdb) bt
> > #0  _poll () at _poll.S:4
> > #1  0x21701361 in __thr_poll (fds=0x219dc170, nfds=2, timeout=-1) at 
> > /usr/src/lib/libthr/thread/thr_syscalls.c:338
> > #2  0x215eaf3f in poll (pfd=0x219dc170, nfds=2, timeout=-1) at 
> > /usr/src/lib/libc/sys/poll.c:47
> > #3  0x0097b0fd in WaitEventSetWaitBlock (set=, 
> > cur_timeout=-1, occurred_events=, nevents=) 
> > at latch.c:1171
>
> The next time this happens / if you still have this open, perhaps it could be
> worth checking if there's a byte in the self pipe?

As a basic sanity check I'd like to see pfd[0], pfd[1] and the output
of fstat -p PID to see unconsumed bytes in the pipe.  For example
"echo hello | sleep 60" shows "0* pipe ... 6 rw" for sleep, but "echo
hello | more" shows "0* pipe ... 0 rw".

Re: [PATCH] Expose port->authn_id to extensions and triggers

2022-02-28 Thread Stephen Frost

Greetings,

* Jacob Champion (pchamp...@vmware.com) wrote:
> On Fri, Feb 25, 2022 at 01:23:49PM -0800, Andres Freund wrote:
> > Looks to me like authn_id isn't synchronized to parallel workers right now. 
> > So
> > the function will return the wrong thing when executed as part of a parallel
> > query.
> 
> Thanks for the catch. It looks like MyProcPort is left empty, and other
> functions that rely on like inet_server_addr() are marked parallel-
> restricted, so I've done the same in v4.

That's probably alright.

> On Sat, 2022-02-26 at 14:39 +0900, Michael Paquier wrote:
> > FWIW, I am not completely sure what's the use case for being able to
> > see the authn of the current session through a trigger.  We expose
> > that when log_connections is enabled, for audit purposes.  I can also
> > get behind something more central so as one can get a full picture of
> > the authn used by a bunch of session, particularly with complex HBA
> > policies, but this looks rather limited to me in usability.  Perhaps
> > that's not enough to stand as an objection, though, and the patch is
> > dead simple.
> 
> I'm primarily motivated by the linked thread -- if the gap between
> builtin roles and authn_id are going to be used as ammo against other
> security features, then let's close that gap. But I think it's fair to
> say that if someone is already using triggers to exhaustively audit a
> table, it'd be nice to have this info in the same place too.

Yeah, we really should make this available to trigger-based auditing
systems too and not just through log_connections which involves a great
deal more log parsing and work external to the database to figure out
who did what.

> > > I don't think we should add further functions not prefixed with pg_.
> > 
> > Yep.
> 
> Fixed.

That's fine.

> > > Perhaps a few tests for less trivial authn_ids could be worthwhile?
> > > E.g. certificate DNs.
> > 
> > Yes, src/test/ssl would handle that just fine.  Now, this stuff
> > already looks after authn results with log_connections=on, so that
> > feels like a duplicate.
> 
> It was easy enough to add, so I added it. I suppose it does protect
> against any reimplementations of pg_session_authn_id() that can't
> handle longer ID strings, though I admit that's a stretch.
> 
> Thanks,
> --Jacob

> commit efec9f040843d1de2fc52f5ce0d020478a5bc75d
> Author: Jacob Champion 
> Date:   Mon Feb 28 10:28:51 2022 -0800
> 
> squash! Add API to retrieve authn_id from SQL

Bleh. :)  Squash indeed.

> Subject: [PATCH v4] Add API to retrieve authn_id from SQL
> 
> The authn_id field in MyProcPort is currently only accessible to the
> backend itself.  Add a SQL function, pg_session_authn_id(), to expose
> the field to triggers that may want to make use of it.

Only did a quick look but generally looks reasonable to me.

Thanks,

Stephen


signature.asc
Description: PGP signature

Re: Adding CI to our tree

2022-02-28 Thread Justin Pryzby

On Sat, Feb 26, 2022 at 08:08:38PM -0800, Andres Freund wrote:
> On 2022-02-26 21:10:57 -0600, Justin Pryzby wrote:
> > If someone renames or removes an xref target, shouldn't CI fail on its next
> > run for a patch which tries to reference it ?
> 
> Why wouldn't it?

I suppose you're right - I was thinking that cirrus was checking whether the
*patch* had changed any matching files, but it probably checks (as it should)
whether "the sources" have changed.

Hmm, it's behaving strangely...if there's a single argument ('docs/**'), then
it will skip the docs task if I resubmit it after git commit --amend --no-edit.
But with multiple args ('.cirrus.yaml', 'docs/**') it reruns it...
I tried it as skip: !changesInclude() and by adding it to the existing only_if:
(.. || ..) && changesInclude().

> > Are you sure about cfbot ?  AIUI cirrus would see that docs didn't change
> > relative to the previous run for branch: commitfest/NN/.
> 
> Not entirely sure, but it's what I remember observing when ammending commits
> in a repo using changesInclues. If I were to build a feature like it I'd look
> at the list of files of
>   git diff $(git merge-base last_green new_commit)..new_commit
> 
> or such.  cfbot doesn't commit incrementally but replaces the prior commit, so
> I suspect it'll always be viewn as new. But who knows, shouldn't be hard to
> figure out.

Anyway...

I still think that if "Build Docs" is a separate cirrus task, it should rebuild
docs on every CI run, even if they haven't changed, for any patch that touches
docs/.  It'll be confusing if cfbot shows 5 green circles and 4 of them were
built 1 day ago, and 1 was built 3 weeks ago.  Docs are the task that runs
quickest, so I don't think it's worth doing anything special there (especially
without understanding the behavior of changesInclude()).

Also, to allow users to view the built HTML docs, cfbot would need to 1) keep
track of previous CI runs; and 2) logic to handle "skipped" CI runs, to allow
showing artifacts from the previous run.  If it's not already done, I think the
first half is a good idea on its own.  But the 2nd part doesn't seem desirable.

However, I realized that we can filter on cfbot with either of these:
| $CIRRUS_CHANGE_TITLE =~ '^\[CF...'
| git log -1 |grep '^Author: Commitfest Bot '
If we can assume that cfbot will continue submitting branches as a single
patch, this resolves the question of a "base branch", for cfbot.

(Actually, I'd prefer if it preserved the original patches as separate commits,
but that isn't what it does).  

These patches implement that idea, and make "code coverage" and "HTML diffs"
stuff only run for cfbot commits.  This still needs another round of testing,
though.

-- 
Justin

PS. I've just done this.  I'm unsure whether to say that it's wonderful or
terrible.  This would certainly be better if each branch preserved the original
set of patches.

$ git remote add cfbot https://github.com/postgresql-cfbot/postgresql
$ git fetch cfbot
$ git branch -a |grep -c cfbot
5417
>From 883edaa653bcf7f1a2369d8edf46eaaac1ba0ba2 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Mon, 17 Jan 2022 00:53:04 -0600
Subject: [PATCH 1/6] cirrus: include hints how to install OS packages..

This is useful for patches during development, but once a feature is merged,
new libraries should be added to the OS image files, rather than installed
during every CI run forever into the future.
---
 .cirrus.yml | 16 +---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/.cirrus.yml b/.cirrus.yml
index d10b0a82f9f..1b7c36283e9 100644
--- a/.cirrus.yml
+++ b/.cirrus.yml
@@ -73,10 +73,11 @@ task:
 chown -R postgres:postgres .
 mkdir -p ${CCACHE_DIR}
 chown -R postgres:postgres ${CCACHE_DIR}
-  setup_cores_script: |
+  setup_os_script: |
 mkdir -m 770 /tmp/cores
 chown root:postgres /tmp/cores
 sysctl kern.corefile='/tmp/cores/%N.%P.core'
+#pkg install -y ...

   # NB: Intentionally build without --with-llvm. The freebsd image size is
   # already large enough to make VM startup slow, and even without llvm
@@ -180,10 +181,12 @@ task:
 chown -R postgres:postgres ${CCACHE_DIR}
 echo '* - memlock 134217728' > /etc/security/limits.d/postgres.conf
 su postgres -c "ulimit -l -H && ulimit -l -S"
-  setup_cores_script: |
+  setup_os_script: |
 mkdir -m 770 /tmp/cores
 chown root:postgres /tmp/cores
 sysctl kernel.core_pattern='/tmp/cores/%e-%s-%p.core'
+#apt-get update
+#apt-get -y install ...

   configure_script: |
 su postgres <<-EOF
@@ -237,7 +240,7 @@ task:
 ulimit -a -H && ulimit -a -S
 export

-  setup_cores_script:
+  setup_os_script:
 - mkdir ${HOME}/cores
 - sudo sysctl kern.corefile="${HOME}/cores/core.%P"

@@ -384,6 +387,9 @@ task:
 powershell -Command get-psdrive -psprovider filesystem
 set

+  setup_os_script: |
+REM choco install -y ...
+
   configure_script:
 # copy errors out when using forward slashes

Re: Proposal: Support custom authentication methods using hooks

2022-02-28 Thread Stephen Frost

Greetings,

* Jeff Davis (pg...@j-davis.com) wrote:
> On Fri, 2022-02-25 at 14:10 -0500, Tom Lane wrote:
> > I also have in mind here that there has been discussion of giving
> > libpq a feature to refuse, on the client side, to send cleartext
> > passwords.
> 
> I am generally in favor of that idea, but I'm not sure that will
> completely resolve the issue. For instance, should it also refuse MD5?

md5 should be removed.

Thanks,

Stephen


signature.asc
Description: PGP signature

Re: Proposal: Support custom authentication methods using hooks

2022-02-28 Thread Stephen Frost

Greetings,

* Jonathan S. Katz (jk...@postgresql.org) wrote:
> On 2/25/22 12:39 PM, Tom Lane wrote:
> >Jeff Davis  writes:
> >>On Thu, 2022-02-24 at 20:47 -0500, Tom Lane wrote:
> >>>... and, since we can't readily enforce that the client only sends
> >>>those cleartext passwords over suitably-encrypted connections, this
> >>>could easily be a net negative for security.  Not sure that I think
> >>>it's a good idea.
> >
> >>I don't understand your point. Can't you just use "hostssl" rather than
> >>"host"?
> >
> >My point is that sending cleartext passwords over the wire is an
> >insecure-by-definition protocol that we shouldn't be encouraging
> >more use of.
> 
> This is my general feeling as well. We just spent a bunch of effort adding,
> refining, and making SCRAM the default method. I think doing anything that
> would drive more use of sending plaintext passwords, even over TLS, is
> counter to that.

Agreed.

> I do understand arguments for (e.g. systems that require checking password
> complexity), but I wonder if it's better for us to delegate that to an
> external auth system. Regardless, I can get behind Andres' point to "check
> Port->ssl_in_use before sendAuthRequest(AUTH_REQ_PASSWORD)".

Password complexity is only needed to be checked at the time of password
change though, which is not on every login, and should be after a
confirmed mutual authentication between the client and the server.
That's a very different situation.

> I'm generally in favor of being able to support additional authentication
> methods, the first one coming to mind is supporting OIDC. Having a pluggable
> auth infrastructure could possibly make such efforts easier. I'm definitely
> intrigued.

I'm not thrilled with the idea, for my part.

Thanks,

Stephen


signature.asc
Description: PGP signature

Re: Fix overflow in justify_interval related functions

2022-02-28 Thread Tom Lane

Joseph Koshakow  writes:
> [ v4-0001-Check-for-overflow-in-justify_interval-functions.patch ]

Pushed.  I added a comment explaining why the one addition in
interval_justify_interval doesn't require an overflow check.

regards, tom lane

Re: [PATCH] Enable SSL library detection via PQsslAttribute

2022-02-28 Thread Jacob Champion

On Wed, 2022-02-23 at 23:20 +, Jacob Champion wrote:
> First stab in v2-0002. Though I see that Andres is overhauling the
> tests in this folder today [1], so I'll need to watch that thread. :)

v3 rebases over Andres' changes and actually adds the Perl driver that
I missed the git-add for.

--Jacob
From f5f3d39dbcdfd0c53ab9dce6c7a9dbdd94b2584c Mon Sep 17 00:00:00 2001
From: Jacob Champion 
Date: Mon, 29 Nov 2021 14:36:38 -0800
Subject: [PATCH v3] Enable SSL library detection via PQsslAttribute()

Currently, libpq client code must have a connection handle before it can
query the "library" SSL attribute. This poses problems if the client
needs to know what SSL library is in use before constructing a
connection string. (For example, with the NSS proposal, a client would
have to decide whether to use the "ssldatabase" connection setting
rather than "sslcert" et al.)

Allow PQsslAttribute(NULL, "library") to return the library in use --
currently, just "OpenSSL" or NULL. The new behavior is announced with
the LIBPQ_HAS_SSL_LIBRARY_DETECTION feature macro, allowing clients to
differentiate between a libpq that was compiled without SSL support and
a libpq that's just too old to tell.
---
 src/interfaces/libpq/Makefile|  1 +
 src/interfaces/libpq/fe-secure-openssl.c |  6 ++--
 src/interfaces/libpq/libpq-fe.h  |  2 ++
 src/interfaces/libpq/t/002_api.pl| 23 +++
 src/interfaces/libpq/test/.gitignore |  1 +
 src/interfaces/libpq/test/Makefile   |  2 +-
 src/interfaces/libpq/test/testclient.c   | 37 
 7 files changed, 68 insertions(+), 4 deletions(-)
 create mode 100644 src/interfaces/libpq/t/002_api.pl
 create mode 100644 src/interfaces/libpq/test/testclient.c

diff --git a/src/interfaces/libpq/Makefile b/src/interfaces/libpq/Makefile
index 3c53393fa4..89bf5e0126 100644
--- a/src/interfaces/libpq/Makefile
+++ b/src/interfaces/libpq/Makefile
@@ -13,6 +13,7 @@ subdir = src/interfaces/libpq
 top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
 
+export with_ssl
 
 PGFILEDESC = "PostgreSQL Access Library"
 
diff --git a/src/interfaces/libpq/fe-secure-openssl.c b/src/interfaces/libpq/fe-secure-openssl.c
index f6e563a2e5..e095a0f538 100644
--- a/src/interfaces/libpq/fe-secure-openssl.c
+++ b/src/interfaces/libpq/fe-secure-openssl.c
@@ -1597,14 +1597,14 @@ PQsslAttributeNames(PGconn *conn)
 const char *
 PQsslAttribute(PGconn *conn, const char *attribute_name)
 {
+	if (strcmp(attribute_name, "library") == 0)
+		return "OpenSSL";
+
 	if (!conn)
 		return NULL;
 	if (conn->ssl == NULL)
 		return NULL;
 
-	if (strcmp(attribute_name, "library") == 0)
-		return "OpenSSL";
-
 	if (strcmp(attribute_name, "key_bits") == 0)
 	{
 		static char sslbits_str[12];
diff --git a/src/interfaces/libpq/libpq-fe.h b/src/interfaces/libpq/libpq-fe.h
index 20eb855abc..7986445f1a 100644
--- a/src/interfaces/libpq/libpq-fe.h
+++ b/src/interfaces/libpq/libpq-fe.h
@@ -36,6 +36,8 @@ extern "C"
 #define LIBPQ_HAS_PIPELINING 1
 /* Indicates presence of PQsetTraceFlags; also new PQtrace output format */
 #define LIBPQ_HAS_TRACE_FLAGS 1
+/* Indicates that PQsslAttribute(NULL, "library") is useful */
+#define LIBPQ_HAS_SSL_LIBRARY_DETECTION 1
 
 /*
  * Option flags for PQcopyResult
diff --git a/src/interfaces/libpq/t/002_api.pl b/src/interfaces/libpq/t/002_api.pl
new file mode 100644
index 00..bcf91bc558
--- /dev/null
+++ b/src/interfaces/libpq/t/002_api.pl
@@ -0,0 +1,23 @@
+#!/usr/bin/perl
+
+# Copyright (c) 2022, PostgreSQL Global Development Group
+
+use strict;
+use warnings;
+
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+# Test PQsslAttribute(NULL, "library")
+my ($out, $err) = run_command(['testclient', '--ssl']);
+
+if ($ENV{with_ssl} eq 'openssl')
+{
+	is($out, 'OpenSSL', 'PQsslAttribute(NULL, "library") returns "OpenSSL"');
+}
+else
+{
+	is($err, 'SSL is not enabled', 'PQsslAttribute(NULL, "library") returns NULL');
+}
+
+done_testing();
diff --git a/src/interfaces/libpq/test/.gitignore b/src/interfaces/libpq/test/.gitignore
index 5e803d8816..4b17210483 100644
--- a/src/interfaces/libpq/test/.gitignore
+++ b/src/interfaces/libpq/test/.gitignore
@@ -1 +1,2 @@
+/testclient
 /uri-regress
diff --git a/src/interfaces/libpq/test/Makefile b/src/interfaces/libpq/test/Makefile
index 5421215906..1d45be0c37 100644
--- a/src/interfaces/libpq/test/Makefile
+++ b/src/interfaces/libpq/test/Makefile
@@ -11,7 +11,7 @@ endif
 override CPPFLAGS := -I$(libpq_srcdir) $(CPPFLAGS)
 LDFLAGS_INTERNAL += $(libpq_pgport)
 
-PROGS = uri-regress
+PROGS = testclient uri-regress
 
 all: $(PROGS)
 
diff --git a/src/interfaces/libpq/test/testclient.c b/src/interfaces/libpq/test/testclient.c
new file mode 100644
index 00..2c730d83fa
--- /dev/null
+++ b/src/interfaces/libpq/test/testclient.c
@@ -0,0 +1,37 @@
+/*
+ * testclient.c
+ *		A test program for the libpq public API
+ *
+ * Copyright (c) 2022, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *

Re: Add id's to various elements in protocol.sgml

2022-02-28 Thread Chapman Flack

On 02/28/22 14:41, Brar Piening wrote:
> Attached is an extended version of the patch that changes the XSL and
> CSS stylesheets to add links to the ids that are visible when hovering.

That works nicely over here.

I think that in other recent examples I've seen, there might be
(something like a) consensus forming around the Unicode LINK SYMBOL
 rather than # as the symbol for such things.

... and now that the concept is proven, how hard would it be to broaden
that template's pattern to apply to all the other DocBook constructs
(such as section headings) that emit anchors?

Regards,
-Chap

Re: PATCH: add "--config-file=" option to pg_rewind

2022-02-28 Thread Gunnar "Nick" Bluth


Am 28.02.22 um 12:56 schrieb Michael Paquier:

On Sun, Feb 27, 2022 at 02:30:33PM +0100, Gunnar "Nick" Bluth wrote:

That's universally true ;-)


-# Internal routine to enable archive recovery command on a standby node
+# Internal routine to enable archive recovery command on a standby node.
+# Returns generated restore_command.
  sub enable_restoring
  {
my ($self, $root_node, $standby) = @_;
@@ -1103,7 +1104,7 @@ restore_command = '$copy_command'
{
$self->set_recovery_mode();
}
-   return;
+   return $copy_command;

I don't like this change.  This makes an existing routine do some
extra work for something it is not designed for.  You could get this
information with a postgres -C command, for example.  Now, you cannot
use postgres -C because of..  Reasons (1a9d802).  But you could use a
pg_ctl command for the same purpose.  I guess that we'd better
refactor this into a new routine of Cluster.pm where we execute a
pg_ctl command and return both stdout and stderr back to the caller?


Turns out this change isn't needed anyway (it was copied from the other 
patch). Afterall, the patch is about extracting the restore_command from 
the config, so...



The TAP part could be refactored on its own.


Agreed. I've struggled quite a bit even understanding what's happening 
in there ;-)




+In case the postgresql.conf of your
target cluster is not in the
+target pgdata and you want to use the -c /
--restore-target-wal option,
+provide a (relative or absolute) path to the
postgresql.conf using this option.

This could do a better job to explain in which cases this option can
be used for.  You could say something like that:
"This option specifies a path to a configuration file to be used when
starting the target cluster.  This makes easier the use of
-c/--restore-target-wal by setting restore_command in the extra
configuration file specified via this option."


I reviewed the wording and think it is pretty universal now.



Hmm.  What about the target cluster started in --single mode as of
ensureCleanShutdown()?  Should we try to apply this option also in
this case?


I'd say so; as far as I can tell, "postgres" would deny starting a 
(Debian/non-standard) cluster the way the code is right now.


Which led me to realize we have

/* find postgres executable */
rc = find_other_exec(argv0, "postgres",
 PG_BACKEND_VERSIONSTR,
 postgres_exec_path);
in getRestoreCommand _and_

/* locate postgres binary */
if ((ret = find_other_exec(argv0, "postgres", 


PG_BACKEND_VERSIONSTR,
exec_path)) < 0)
in ensureCleanShutdown.
Both with the same error messages, AFAICS. D'oh... can of worms.

So, how should we call the global "find the * 'postgres' executable 
and boil out if that fails" function?

char postgres_exec_path[MAXPGPATH] = findPostgresExec();
?


Anyway, v4 attached so we can settle on the tests and wording...

--
Gunnar "Nick" Bluth

Eimermacherweg 106
D-48159 Münster

Mobil +49 172 8853339
Email: gunnar.bl...@pro-open.de
__
"Ceterum censeo SystemD esse delendam" - Catodiff --git a/doc/src/sgml/ref/pg_rewind.sgml b/doc/src/sgml/ref/pg_rewind.sgml
index 33e6bb64ad..d0278e9b46 100644
--- a/doc/src/sgml/ref/pg_rewind.sgml
+++ b/doc/src/sgml/ref/pg_rewind.sgml
@@ -241,6 +241,18 @@ PostgreSQL documentation
   
  
 
+ 
+  --config-file=filepath
+  
+   
+When using the -c / --restore-target-wal option, the restore_command
+is extracted from the configuration of the target cluster. If the postgresql.conf
+of that cluster is not in the target pgdata directory (or if you want to use an alternative config),
+use this option to provide a (relative or absolute) path to the postgresql.conf to be used.
+   
+  
+ 
+
  
   --debug
   
diff --git a/src/bin/pg_rewind/pg_rewind.c b/src/bin/pg_rewind/pg_rewind.c
index efb82a4034..b8c92c5590 100644
--- a/src/bin/pg_rewind/pg_rewind.c
+++ b/src/bin/pg_rewind/pg_rewind.c
@@ -23,6 +23,7 @@
 #include "common/restricted_token.h"
 #include "common/string.h"
 #include "fe_utils/recovery_gen.h"
+#include "fe_utils/string_utils.h"
 #include "file_ops.h"
 #include "filemap.h"
 #include "getopt_long.h"
@@ -60,6 +61,7 @@ char	   *datadir_target = NULL;
 char	   *datadir_source = NULL;
 char	   *connstr_source = NULL;
 char	   *restore_command = NULL;
+char	   *config_file = NULL;
 
 static bool debug = false;
 bool		showprogress = false;
@@ -86,6 +88,7 @@ usage(const char *progname)
 	printf(_("Options:\n"));
 	printf(_("  -c, --restore-target-wal   use restore_command in target configuration to\n"
 			 " retrieve

Re: Add id's to various elements in protocol.sgml

2022-02-28 Thread Brar Piening


On Feb 25, 2022 at 06:36, Brar Piening wrote:

The major problem in that regard would probably be my lack of
XSLT/docbook skills but if no one can jump in here, I can see if I can
make it work.


Ok, I've figured it out.

Attached is an extended version of the patch that changes the XSL and
CSS stylesheets to add links to the ids that are visible when hovering.

diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index 1c5ab00879..cb138b53ad 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -1810,7 +1810,7 @@ Replication commands are logged in the server log when
 
 The commands accepted in replication mode are:
 
-  
+  
 IDENTIFY_SYSTEM
  IDENTIFY_SYSTEM
 
@@ -1875,7 +1875,7 @@ The commands accepted in replication mode are:
 
   
 
-  
+  
 SHOW name
  SHOW
 
@@ -1899,7 +1899,7 @@ The commands accepted in replication mode are:
 
   
 
-  
+  
 TIMELINE_HISTORY tli
  TIMELINE_HISTORY
 
@@ -2084,7 +2084,7 @@ The commands accepted in replication mode are:
 
   
 
-  
+  
 CREATE_REPLICATION_SLOT slot_name [ TEMPORARY ] { 
PHYSICAL [ RESERVE_WAL ] | 
LOGICAL output_plugin [ 
EXPORT_SNAPSHOT | NOEXPORT_SNAPSHOT | 
USE_SNAPSHOT | TWO_PHASE ] }
 
 
@@ -2095,7 +2095,7 @@ The commands accepted in replication mode are:
 
   
 
-  
+  
 READ_REPLICATION_SLOT slot_name
   READ_REPLICATION_SLOT
 
@@ -2143,7 +2143,7 @@ The commands accepted in replication mode are:
 
   
 
-  
+  
 START_REPLICATION [ SLOT 
slot_name ] [ 
PHYSICAL ] XXX/XXX [ TIMELINE 
tli ]
  START_REPLICATION
 
@@ -2201,7 +2201,7 @@ The commands accepted in replication mode are:
 
  
   
-  
+  
   
   XLogData (B)
   
@@ -2270,7 +2270,7 @@ The commands accepted in replication mode are:
   
   
   
-  
+  
   
   Primary keepalive message (B)
   
@@ -2334,7 +2334,7 @@ The commands accepted in replication mode are:
 
  
   
-  
+  
   
   Standby status update (F)
   
@@ -2415,7 +2415,7 @@ The commands accepted in replication mode are:
 
  
   
-  
+  
   
   Hot Standby feedback message (F)
   
@@ -2497,7 +2497,7 @@ The commands accepted in replication mode are:
  
 
   
-  
+  
 START_REPLICATION SLOT 
slot_name 
LOGICAL XXX/XXX 
[ ( option_name [ 
option_value ] [, ...] ) ]
 
  
@@ -2572,7 +2572,7 @@ The commands accepted in replication mode are:
 
   
 
-  
+  
 
  DROP_REPLICATION_SLOT slot_name  WAIT 

  DROP_REPLICATION_SLOT
@@ -3266,7 +3266,7 @@ of any individual CopyData message cannot be 
interpretable on their own.)
 
 
 
-
+
 
 AuthenticationOk (B)
 
@@ -3311,7 +3311,7 @@ AuthenticationOk (B)
 
 
 
-
+
 
 AuthenticationKerberosV5 (B)
 
@@ -3355,7 +3355,7 @@ AuthenticationKerberosV5 (B)
 
 
 
-
+
 
 AuthenticationCleartextPassword (B)
 
@@ -3399,7 +3399,7 @@ AuthenticationCleartextPassword (B)
 
 
 
-
+
 
 AuthenticationMD5Password (B)
 
@@ -3454,7 +3454,7 @@ AuthenticationMD5Password (B)
 
 
 
-
+
 
 AuthenticationSCMCredential (B)
 
@@ -3499,7 +3499,7 @@ AuthenticationSCMCredential (B)
 
 
 
-
+
 
 AuthenticationGSS (B)
 
@@ -3544,7 +3544,7 @@ AuthenticationGSS (B)
 
 
 
-
+
 
 AuthenticationGSSContinue (B)
 
@@ -3599,7 +3599,7 @@ AuthenticationGSSContinue (B)
 
 
 
-
+
 
 AuthenticationSSPI (B)
 
@@ -3644,7 +3644,7 @@ AuthenticationSSPI (B)
 
 
 
-
+
 
 AuthenticationSASL (B)
 
@@ -3705,7 +3705,7 @@ following:
 
 
 
-
+
 
 AuthenticationSASLContinue (B)
 
@@ -3760,7 +3760,7 @@ AuthenticationSASLContinue (B)
 
 
 
-
+
 
 AuthenticationSASLFinal (B)
 
@@ -3816,7 +3816,7 @@ AuthenticationSASLFinal (B)
 
 
 
-
+
 
 BackendKeyData (B)
 
@@ -3873,7 +3873,7 @@ BackendKeyData (B)
 
 
 
-
+
 
 Bind (F)
 
@@ -4026,7 +4026,7 @@ Bind (F)
 
 
 
-
+
 
 BindComplete (B)
 
@@ -4061,7 +4061,7 @@ BindComplete (B)
 
 
 
-
+
 
 CancelRequest (F)
 
@@ -4119,7 +4119,7 @@ CancelRequest (F)
 
 
 
-
+
 
 Close (F)
 
@@ -4176,7 +4176,7 @@ Close (F)
 
 
 
-
+
 
 CloseComplete (B)
 
@@ -4211,7 +4211,7 @@ CloseComplete (B)
 
 
 
-
+
 
 CommandComplete (B)
 
@@ -4310,7 +4310,7 @@ CommandComplete (B)
 
 
 
-
+
 
 CopyData (F  B)
 
@@ -4356,7 +4356,7 @@ CopyData (F  B)
 
 
 
-
+
 
 CopyDone (F  B)
 
@@ -4391,7 +4391,7 @@ CopyDone (F  B)
 
 
 
-
+
 
 CopyFail (F)
 
@@ -4436,7 +4436,7 @@ CopyFail (F)
 
 
 
-
+
 
 CopyInResponse (B)
 
@@ -4512,7 +4512,7 @@ CopyInResponse (B)
 
 
 
-
+
 
 CopyOutResponse (B)
 
@@ -4585,7 +4585,7 @@ CopyOutResponse (B)
 
 
 
-
+
 
 CopyBothResponse (B)
 
@@ -4658,7 +4658,7 @@ CopyBothResponse (B)
 
 
 
-
+
 
 DataRow (B)
 
@@ -4730,7 +4730,7 @@ DataRow (B)
 
 
 
-
+
 
 Describe (F)
 
@@ -4787,7 +4787,7 @@ Describe (F)
 
 
 
-
+
 
 EmptyQueryResponse (B)
 
@@ -4823,7 +4823,7 @@ EmptyQueryResponse (B)
 
 
 
-
+
 
 ErrorResponse (B)
 
@@ -4889,7 +4889,7 @@ ErrorResponse (B)
 
 
 
-
+
 
 Execute (F)
 
@@ -4946,7 +4946,7 @@ Execute (F)
 
 
 
-
+

Re: Allow root ownership of client certificate key

2022-02-28 Thread Tom Lane

David Steele  writes:
> [ client-key-perm-003.patch ]

Pushed with a bit of copy-editing of the comments.

> So, to test the new functionality, just add this snippet on line 57 of 
> 001_ssltests.pl:
> chmod 0640, "$cert_tempdir/client.key"
>   or die "failed to change permissions on $cert_tempdir/client.key: $!";
> system_or_bail("sudo chown root $cert_tempdir/client.key");
> If you can think of a way to add this to the tests I'm all ears. Perhaps 
> we could add these lines commented out and explain what they are for?

I believe we have some precedents for invoking this sort of test
optionally if an appropriate environment variable is set.  However,
I'm having a pretty hard time seeing that there's any real use-case
for a test set up like this.  The TAP tests are meant for automatic
testing, and nobody is going to run automatic tests in an environment
where they'd be allowed to sudo.  (Or at least I sure hope nobody
working on this project is that naive.)

If somebody wants to put this in despite that, I'd merely suggest
that the server-side logic ought to get exercised too.

regards, tom lane

Re: CREATEROLE and role ownership hierarchies

2022-02-28 Thread Stephen Frost

Greetings,

* Robert Haas (robertmh...@gmail.com) wrote:
> 1. Don't allow a CREATEROLE user to give out membership in groups
> which that user does not possess. Leaving aside the details of any
> previously-proposed patches and just speaking theoretically, how can
> this be implemented? I can think of a few ideas. We could (1A) just
> change CREATEROLE to work that way, but IIUC that would break the use
> case you outline here, so I guess that's off the table unless I am
> misunderstanding the situation. We could also (1B) add a second role
> attribute with a different name, like, err, CREATEROLEWEAKLY, that
> behaves in that way, leaving the existing one untouched. But we could
> also take it a lot further, because someone might want to let an
> account hand out a set of privileges which corresponds neither to the
> privileges of that account nor to the full set of available
> privileges. That leads to another idea: (1C) implement an in-database
> system that lets you specify which privileges an account has, and,
> separately, which ones it can assign to others. I am skeptical of that
> idea because it seems really, really complicated, not only from an
> implementation standpoint but even just from a user-experience
> standpoint. Suppose user 'userbot' has rights to grant a suitable set
> of groups to the new users that it creates -- but then someone creates
> a new group. Should that also be added to the things 'userbot' can
> grant or not? What if we have 'userbot1' through 'userbot6' and each
> of them can grant a different set of roles? I wouldn't mind (1D)
> providing a hook that allows the system administrator to install a
> loadable module that can enforce any rules it likes, but it seems way
> too complicated to me to expose all of this configuration as SQL,
> especially because for what I want to do, either (1A) or (1B) is
> adequate, and (1B) is a LOT simpler than (1C). It also caters to what
> I believe to be a common thing to want, without prejudice to the
> possibility that other people want other things.

I'm generally in support of changing CREATEROLE to only allow roles that
the role with CREATEROLE is an admin of to be allowed as part of the
command (throwing an error in other cases).  That doesn't solve other
use-cases which would certainly be nice to solve but it would at least
reduce the shock people have when they discover how CREATEROLE actually
works (that is, the way we document it to work, but that's ultimately
not what people expect).

If that's all this was about then that would be one thing, but folks are
interested in doing more here and that's good because there's a lot here
that could be (and I'd say should be..) done.

I'm not a fan of 1B.  In general, I'm in support of 1C but I don't feel
that absolutely everything must be done for 1C right from the start-
rather, I would argue that we'd be better off building a way for 1C to
be improved upon in the future, akin to our existing privilege system
where we've added things like the ability to GRANT TRUNCATE rights which
didn't originally exist.  I don't think 1D is a reasonable way to
accomplish that though, particularly as this involves storing
information about roles which needs to be cleaned up if those roles are
removed or modified.  I also don't really agree with the statement that
this ends up being too complicated for SQL.

> 2. Only allow a CREATEROLE user to drop users which that account
> created, and not just any role that isn't a superuser. Again leaving
> aside previous proposals, this cannot be implemented without providing
> some means by which we know which CREATEROLE user created which other
> user. I believe there are a variety of words we could use to describe
> that linkage, and I don't deeply care which ones we pick, although I
> have my own preferences. We could speak of the CREATEROLE user being
> the owner, manager, or administrator of the created role. We could
> speak of a new kind of object, a TENANT, of which the CREATEROLE user
> is the administrator and to which the created user is linked. I
> proposed this previously and it's still my favorite idea. There are no
> doubt other options as well. But it's axiomatic that we cannot
> restrict the rights of a CREATEROLE user to drop other roles to a
> subset of roles without having some way to define which subset is at
> issue.

I don't think it's a great plan to limit who is allowed to DROP roles to
be just those that a given role created.  I also don't like the idea of
introducing a single field for owner/manager/tenant/whatever to the role
system- instead we should add other ways that roles can be associated to
each other by extending the existing system that we have for that, which
is role membership.  Role membership today is pretty limited but I don't
see any reason why we couldn't improve on that in a way that's flexible
and allows us to define new associations in the future.  The biggest
difference between a 'tenant' or such as proposed vs. a

Re: [PATCH] Expose port->authn_id to extensions and triggers

2022-02-28 Thread Jacob Champion

On Fri, Feb 25, 2022 at 01:23:49PM -0800, Andres Freund wrote:
> Looks to me like authn_id isn't synchronized to parallel workers right now. So
> the function will return the wrong thing when executed as part of a parallel
> query.

Thanks for the catch. It looks like MyProcPort is left empty, and other
functions that rely on like inet_server_addr() are marked parallel-
restricted, so I've done the same in v4.

On Sat, 2022-02-26 at 14:39 +0900, Michael Paquier wrote:
> FWIW, I am not completely sure what's the use case for being able to
> see the authn of the current session through a trigger.  We expose
> that when log_connections is enabled, for audit purposes.  I can also
> get behind something more central so as one can get a full picture of
> the authn used by a bunch of session, particularly with complex HBA
> policies, but this looks rather limited to me in usability.  Perhaps
> that's not enough to stand as an objection, though, and the patch is
> dead simple.

I'm primarily motivated by the linked thread -- if the gap between
builtin roles and authn_id are going to be used as ammo against other
security features, then let's close that gap. But I think it's fair to
say that if someone is already using triggers to exhaustively audit a
table, it'd be nice to have this info in the same place too.

> > I don't think we should add further functions not prefixed with pg_.
> 
> Yep.

Fixed.

> > Perhaps a few tests for less trivial authn_ids could be worthwhile?
> > E.g. certificate DNs.
> 
> Yes, src/test/ssl would handle that just fine.  Now, this stuff
> already looks after authn results with log_connections=on, so that
> feels like a duplicate.

It was easy enough to add, so I added it. I suppose it does protect
against any reimplementations of pg_session_authn_id() that can't
handle longer ID strings, though I admit that's a stretch.

Thanks,
--Jacob
commit efec9f040843d1de2fc52f5ce0d020478a5bc75d
Author: Jacob Champion 
Date:   Mon Feb 28 10:28:51 2022 -0800

squash! Add API to retrieve authn_id from SQL

diff --git a/src/backend/utils/adt/name.c b/src/backend/utils/adt/name.c
index b892d25c29..662a7943ed 100644
--- a/src/backend/utils/adt/name.c
+++ b/src/backend/utils/adt/name.c
@@ -258,7 +258,7 @@ namestrcmp(Name name, const char *str)
 
 
 /*
- * SQL-functions CURRENT_USER, SESSION_USER, SESSION_AUTHN_ID
+ * SQL-functions CURRENT_USER, SESSION_USER, PG_SESSION_AUTHN_ID
  */
 Datum
 current_user(PG_FUNCTION_ARGS)
@@ -273,7 +273,7 @@ session_user(PG_FUNCTION_ARGS)
 }
 
 Datum
-session_authn_id(PG_FUNCTION_ARGS)
+pg_session_authn_id(PG_FUNCTION_ARGS)
 {
if (!MyProcPort || !MyProcPort->authn_id)
PG_RETURN_NULL();
diff --git a/src/include/catalog/catversion.h b/src/include/catalog/catversion.h
index 14194afe1c..3787b8edaf 100644
--- a/src/include/catalog/catversion.h
+++ b/src/include/catalog/catversion.h
@@ -53,6 +53,6 @@
  */
 
 /* mmddN */
-#define CATALOG_VERSION_NO 202202251
+#define CATALOG_VERSION_NO 202202281
 
 #endif
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 51e0d24f01..45326a2fe5 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -1509,8 +1509,8 @@
   proname => 'session_user', provolatile => 's', prorettype => 'name',
   proargtypes => '', prosrc => 'session_user' },
 { oid => '9774', descr => 'session authenticated identity',
-  proname => 'session_authn_id', provolatile => 's', prorettype => 'text',
-  proargtypes => '', prosrc => 'session_authn_id' },
+  proname => 'pg_session_authn_id', provolatile => 's', proparallel => 'r',
+  prorettype => 'text', proargtypes => '', prosrc => 'pg_session_authn_id' },
 
 { oid => '744',
   proname => 'array_eq', prorettype => 'bool',
diff --git a/src/test/authentication/t/001_password.pl 
b/src/test/authentication/t/001_password.pl
index 2aa28ed547..1edac8d588 100644
--- a/src/test/authentication/t/001_password.pl
+++ b/src/test/authentication/t/001_password.pl
@@ -83,7 +83,7 @@ test_role($node, 'md5_role', 'trust', 0,
log_unlike => [qr/connection authenticated:/]);
 
 my $res = $node->safe_psql('postgres',
-   "SELECT session_authn_id() IS NULL;"
+   "SELECT pg_session_authn_id() IS NULL;"
 );
 is($res, 't', "users with trust authentication have NULL authn_id");
 
@@ -97,7 +97,7 @@ test_role($node, 'md5_role', 'password', 0,
  [qr/connection authenticated: identity="md5_role" method=password/]);
 
 $res = $node->safe_psql('postgres',
-   "SELECT session_authn_id();",
+   "SELECT pg_session_authn_id();",
connstr  => "user=md5_role"
 );
 is($res, 'md5_role', "users with md5 authentication have authn_id matching 
role name");
diff --git a/src/test/ssl/t/001_ssltests.pl b/src/test/ssl/t/001_ssltests.pl
index 5c5b16fbe7..79ef7b46f1 100644
--- a/src/test/ssl/t/001_ssltests.pl
+++ b/src/test/ssl/t/001_ssltests.pl
@@ -443,6 +443,13 @@

Re: Allow async standbys wait for sync replication (was: Disallow quorum uncommitted (with synchronous standbys) txns in logical replication subscribers)

2022-02-28 Thread Nathan Bossart

On Mon, Feb 28, 2022 at 06:45:51PM +0530, Bharath Rupireddy wrote:
> On Sat, Feb 26, 2022 at 9:37 PM Nathan Bossart  
> wrote:
>> For
>> this feature, I think we always need to consider what the primary considers
>> synchronously replicated.  My suggested approach doesn't change that.  I'm
>> saying that instead of spinning in a loop waiting for the WAL to be
>> synchronously replicated, we just immediately send WAL up to the LSN that
>> is presently known to be synchronously replicated.
> 
> As I said above, v1 patch does that i.e. async standbys wait until the
> sync standbys update their flush LSN.
> 
> Flush LSN is this - flushLSN = walsndctl->lsn[SYNC_REP_WAIT_FLUSH];
> which gets updated in SyncRepReleaseWaiters.
> 
> Async standbys with their SendRqstPtr will wait in XLogSendPhysical or
> XLogSendLogical until SendRqstPtr <= flushLSN.

My feedback is specifically about this behavior.  I don't think we should
spin in XLogSend*() waiting for an LSN to be synchronously replicated.  I
think we should just choose the SendRqstPtr based on what is currently
synchronously replicated.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: parse/analyze API refactoring

2022-02-28 Thread Nathan Bossart

On Mon, Feb 28, 2022 at 07:46:40AM +0100, Peter Eisentraut wrote:
> You set this commit fest entry to Waiting on Author, but there were no
> reviews posted and the patch still applies and builds AFAICT, so I don't
> know what you meant by that.

Apologies for the lack of clarity.  I believe my only feedback was around
deduplicating the pg_analyze_and_rewrite_*() functions.  Would you rather
handle that in a separate patch?

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

CommitFest begins tomorrow... Get your patches in

2022-02-28 Thread Greg Stark

28 days has February ... So the March commitfest begins tomorrow. Meet
your friendly neighbourhood Commitfest Manager for March.  Greetings!

If you have a patch you're hoping to get feedback on or you're
expecting to get committed this month make sure it's in the commitfest
at https://commitfest.postgresql.org/37/

Currently there are 199 patches marked Needs Review and 25 Ready for Committer.

This is the last commitfest of the cycle so we'll be concentrating on
patches that are likely to be committed this commitfest. I'll move any
patches that are unlikely to be committed to target Postgres 16 and
the July commitfest shortly.

If you think your patch is really viable for Postgres 15 despite
needing feedback it can be helpful to post describing exactly what
you're blocked on and what remains to be done. It also sometimes helps
get attention if you make clear that you have the time available this
month to work on it in response to feedback.

The "round robin reviewers" system is no longer. Generally people can
just grab any unassigned patch from the commitfest and review it but
if for some reason you're reluctant feel free to email me -- please
put "commitfest" in the subject line.

If you've submitted a patch and are feeling blocked waiting on
feedback please update the thread on the list. If you're still not
getting responses please let me know -- again, please put "commitfest"
in the subject.

-- 
greg

Re: Expose JIT counters/timing in pg_stat_statements

2022-02-28 Thread Julien Rouhaud

On Mon, Feb 28, 2022 at 05:00:05PM +0100, Peter Eisentraut wrote:
> On 25.02.22 14:06, Magnus Hagander wrote:
> > +OUT jit_generation_time float8,
> > +OUT jit_inlining_time float8,
> > +OUT jit_optimization_time float8,
> > +OUT jit_emission_time float8
> 
> Perhaps those should be of type interval?

The rest of the "time" fields are declared as float8, so I think it's better to
keep things consistent.  And changing the type of existing fields, even in a
major version, seems like a bit too much trouble.

Re: Readd use of TAP subtests

2022-02-28 Thread Peter Eisentraut


On 25.02.22 17:26, Andres Freund wrote:

Ok that's good to know.  What exactly happens when it tries to parse them?
Does it not count them or does it fail somehow?  The way the output is
structured

Says that it can't pase a line of the tap output:


Ok, then I suppose I'm withdrawing this.

Perhaps in another 7 years or so this will be resolved and we can make 
another attempt at this. ;-)

Re: Expose JIT counters/timing in pg_stat_statements

2022-02-28 Thread Peter Eisentraut


On 25.02.22 14:06, Magnus Hagander wrote:

+OUT jit_generation_time float8,
+OUT jit_inlining_time float8,
+OUT jit_optimization_time float8,
+OUT jit_emission_time float8


Perhaps those should be of type interval?

Use "WAL segment" instead of "log segment" consistently in user-facing messages

2022-02-28 Thread Bharath Rupireddy

Hi,

It looks like we use "log segment" in various user-facing messages.
The term "log" can mean server logs as well. The "WAL segment" suits
well here and it is consistently used across the other user-facing
messages [1].

Here's a small patch attempting to consistently use the "WAL segment".

Thoughts?

[1]
pg_log_error("could not fetch WAL segment size: got %d rows and %d
fields, expected %d rows and %d or more fields",
pg_log_error("WAL segment size could not be parsed");
pg_log_error(ngettext("WAL segment size must be a power of two between
1 MB and 1 GB, but the remote server reported a value of %d byte",
printf(_("WARNING: invalid WAL segment size\n"));
printf(_("Bytes per WAL segment:%u\n"),
fatal_error(ngettext("WAL segment size must be a power of two between
1 MB and 1 GB, but the WAL file \"%s\" header specifies %d byte",
errmsg("requested WAL segment %s has already been removed",
elog(DEBUG2, "removed temporary WAL segment \"%s\"", path);

Regards,
Bharath Rupireddy.


v1-0001-Use-WAL-segment-instead-of-log-segment.patch
Description: Binary data

Re: real/float example for testlibpq3

2022-02-28 Thread Tom Lane

Ashutosh Bapat  writes:
> I am wondering whether we should provide PQgetfloatvalue() which does
> what that example shows but transparently. It will return a float
> value of the given field instead of char *.

I'm against that, precisely because ...

> That will standardize the
> way to fetch real typed values in libpq. That leads to the next
> question. Do we need to introduce different PQget*value() for standard
> C/SQL data types.

... I do not want to go here.  Where would you stop?  How would you
deal with cross-machine inconsistencies in integer widths?

regards, tom lane

Re: psql: Make SSL info display more compact

2022-02-28 Thread Tom Lane

Daniel Gustafsson  writes:
> On 28 Feb 2022, at 12:56, Peter Eisentraut 
>  wrote:
>> On 28.02.22 11:50, Dagfinn Ilmari Mannsåker wrote:
>>> How about making it show "compression: on" if compression is on, but
>>> nothing in the common "off" case?

>> That would work for me.

> On POLA grounds I would prefer to never to show it.  If we ever get libpq
> compression and want to show that, I'd prefer that we didn't end up
> "compression" meaning one thing except when it means two things.

Well, any such output would presumably be on a different line and
thus distinguishable from the SSL property; plus, it'd be impossible
for both forms to show up in the same connection.

However, how about writing "SSL compression: on" versus writing
nothing?  That avoids doubt about what it means.  I don't buy
Michael's argument that this is ambiguous, either.

regards, tom lane

definition of CalculateMaxmumSafeLSN

2022-02-28 Thread Sergei Kornilov

Hello
I just spotted in src/include/access/xlog.h:
extern XLogRecPtr CalculateMaxmumSafeLSN(void);

This function doesn't seem to be used anywhere or even defined? "git grep 
CalculateMaxmumSafeLSN" shows only this line.
Was added in c6550776394e25c1620bc8258427c8f1d448080d "Allow users to limit 
storage reserved by replication slots"

regards, Sergei

Re: create_index test fails when synchronous_commit = off @ master

2022-02-28 Thread Aleksander Alekseev

Hi Andres,

> So it might be reasonable to use synchronous_commit=on in test_setup.sql?

> It's not super satisfying, but I don't immediately see what else could prevent
> all-visible to be set in this case?

I don't see what else can be done either. Here is the corresponding patch.

-- 
Best regards,
Aleksander Alekseev


fix-flacky-tests.patch
Description: Binary data

Re: Add the replication origin name and commit-LSN to logical replication worker errcontext

2022-02-28 Thread Amit Kapila

On Mon, Feb 28, 2022 at 5:46 PM Masahiko Sawada  wrote:
>
> We've added some information such as the command and the timestamp to
> the error context message by commit abc0910e2. This patch adds further
> information to it: replication origin name and commit-LSN.
>
> This will be helpful for users to set the origin name and LSN to
> pg_replication_origin_advance().
>

+1. This will make the use of pg_replication_origin_advance() relatively easy.

-- 
With Regards,
Amit Kapila.

Re: Allow async standbys wait for sync replication (was: Disallow quorum uncommitted (with synchronous standbys) txns in logical replication subscribers)

2022-02-28 Thread Bharath Rupireddy

On Sat, Feb 26, 2022 at 9:37 PM Nathan Bossart  wrote:
>
> On Sat, Feb 26, 2022 at 02:17:50PM +0530, Bharath Rupireddy wrote:
> > A global min LSN of SendRqstPtr of all the sync standbys can be
> > calculated and the async standbys can send WAL up to global min LSN.
> > This is unlike what the v1 patch does i.e. async standbys will wait
> > until the sync standbys report flush LSN back to the primary. Problem
> > with the global min LSN approach is that there can still be a small
> > window where async standbys can get ahead of sync standbys. Imagine
> > async standbys being closer to the primary than sync standbys and if
> > the failover has to happen while the WAL at SendRqstPtr isn't received
> > by the sync standbys, but the async standbys can receive them as they
> > are closer. We hit the same problem that we are trying to solve with
> > this patch. This is the reason, we are waiting till the sync flush LSN
> > as it guarantees more transactional protection.
>
> Do you mean that the application of WAL gets ahead on your async standbys
> or that the writing/flushing of WAL gets ahead?  If synchronous_commit is
> set to 'remote_write' or 'on', I think either approach can lead to
> situations where the async standbys are ahead of the sync standbys with WAL
> application.  For example, a conflict between WAL replay and a query on
> your sync standby could delay WAL replay, but the primary will not wait for
> this conflict to resolve before considering a transaction synchronously
> replicated and sending it to the async standbys.
>
> If writing/flushing WAL gets ahead on async standbys, I think something is
> wrong with the patch.  If you aren't sending WAL to async standbys until
> it is synchronously replicated to the sync standbys, it should by
> definition be impossible for this to happen.

With the v1 patch [1], the async standbys will never get WAL ahead of
sync standbys. That is guaranteed because the walsenders serving async
standbys are allowed to send WAL only after the walsenders serving
sync standbys receive the synchronous flush LSN.

> > Do you think allowing async standbys optionally wait for either remote
> > write or flush or apply or global min LSN of SendRqstPtr so that users
> > can choose what they want?
>
> I'm not sure I follow the difference between "global min LSN of
> SendRqstPtr" and remote write/flush/apply.  IIUC you are saying that we
> could use the LSN of what is being sent to sync standbys instead of the LSN
> of what the primary considers synchronously replicated.  I don't think we
> should do that because it provides no guarantee that the WAL has even been
> sent to the sync standbys before it is sent to the async standbys.

Correct.

> For
> this feature, I think we always need to consider what the primary considers
> synchronously replicated.  My suggested approach doesn't change that.  I'm
> saying that instead of spinning in a loop waiting for the WAL to be
> synchronously replicated, we just immediately send WAL up to the LSN that
> is presently known to be synchronously replicated.

As I said above, v1 patch does that i.e. async standbys wait until the
sync standbys update their flush LSN.

Flush LSN is this - flushLSN = walsndctl->lsn[SYNC_REP_WAIT_FLUSH];
which gets updated in SyncRepReleaseWaiters.

Async standbys with their SendRqstPtr will wait in XLogSendPhysical or
XLogSendLogical until SendRqstPtr <= flushLSN.

I will address review comments raised by Hsu, John and send the
updated patch for further review. Thanks.

[1] 
https://www.postgresql.org/message-id/CALj2ACVUa8WddVDS20QmVKNwTbeOQqy4zy59NPzh8NnLipYZGw%40mail.gmail.com

Regards,
Bharath Rupireddy.

Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)

2022-02-28 Thread Matthias van de Meent

On Sun, 27 Feb 2022 at 16:14, Bharath Rupireddy
 wrote:
> 3) Why do we need this extra calculation for start_lsn?
> Do you ever see a negative LSN or something here?
> +('0/0'::pg_lsn + (
> +CASE
> +WHEN (s.param3 < 0) THEN pow((2)::numeric, (64)::numeric)
> +ELSE (0)::numeric
> +END + (s.param3)::numeric)) AS start_lsn,

Yes: LSN can take up all of an uint64; whereas the pgstat column is a
bigint type; thus the signed int64. This cast is OK as it wraps
around, but that means we have to take care to correctly display the
LSN when it is > 0x7FFF___; which is what we do here using
the special-casing for negative values.

As to whether it is reasonable: Generating 16GB of wal every second
(2^34 bytes /sec) is probably not impossible (cpu <> memory bandwidth
has been > 20GB/sec for a while); and that leaves you 2^29 seconds of
database runtime; or about 17 years. Seeing that a cluster can be
`pg_upgrade`d (which doesn't reset cluster LSN) since PG 9.0 from at
least version PG 8.4.0 (2009) (and through pg_migrator, from 8.3.0)),
we can assume that clusters hitting LSN=2^63 will be a reasonable
possibility within the next few years. As the lifespan of a PG release
is about 5 years, it doesn't seem impossible that there will be actual
clusters that are going to hit this naturally in the lifespan of PG15.

It is also possible that someone fat-fingers pg_resetwal; and creates
a cluster with LSN >= 2^63; resulting in negative values in the
s.param3 field. Not likely, but we can force such situations; and as
such we should handle that gracefully.

> 4) Can't you use timestamptz_in(to_char(s.param4))  instead of
> pg_stat_get_progress_checkpoint_start_time? I don't quite understand
> the reasoning for having this function and it's named as *checkpoint*
> when it doesn't do anything specific to the checkpoint at all?

I hadn't thought of using the types' inout functions, but it looks
like timestamp IO functions use a formatted timestring, which won't
work with the epoch-based timestamp stored in the view.

> Having 3 unnecessary functions that aren't useful to the users at all
> in proc.dat will simply eatup the function oids IMO. Hence, I suggest
> let's try to do without extra functions.

I agree that (1) could be simplified, or at least fully expressed in
SQL without exposing too many internals. If we're fine with exposing
internals like flags and type layouts, then (2), and arguably (4), can
be expressed in SQL as well.

-Matthias

Re: real/float example for testlibpq3

2022-02-28 Thread Ashutosh Bapat

Hi Mark,
Fetching a "real" type field from libpq doesn't look that intuitive.
So this example is super helpful. Thanks.

I am wondering whether we should provide PQgetfloatvalue() which does
what that example shows but transparently. It will return a float
value of the given field instead of char *. That will standardize the
way to fetch real typed values in libpq. That leads to the next
question. Do we need to introduce different PQget*value() for standard
C/SQL data types.

-- 
Best Wishes,
Ashutosh Bapat

On Fri, Feb 25, 2022 at 3:12 AM Mark Wong  wrote:
>
> Hi everyone,
>
> Would adding additional examples to testlibpq3.c on handling more data
> types be helpful?  I've attached a patch adding how to handle a REAL to
> current example.
>
> Regards,
> Mark

Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)

2022-02-28 Thread Julien Rouhaud

On Mon, Feb 28, 2022 at 06:03:54PM +0530, Bharath Rupireddy wrote:
> On Mon, Feb 28, 2022 at 12:02 PM Julien Rouhaud  wrote:
> >
> > I suggested upthread to store the starting timeline instead.  This way you 
> > can
> > deduce whether it's a restartpoint or a checkpoint, but you can also deduce
> > other information, like what was the starting WAL.
> 
> I don't understand why we need the timeline here to just determine
> whether it's a restartpoint or checkpoint.

I'm not saying it's necessary, I'm saying that for the same space usage we can
store something a bit more useful.  If no one cares about having the starting
timeline available for no extra cost then sure, let's just store the kind
directly.

> Can't the checkpoint start LSN be deduced from
> PROGRESS_CHECKPOINT_LSN, checkPoint.redo?

I'm not sure I'm following, isn't checkPoint.redo the checkpoint start LSN?

> > As mentioned upthread, there can be multiple backends that request a
> > checkpoint, so unless we want to store an array of pid we should store a 
> > number
> > of backend that are waiting for a new checkpoint.
> 
> Yeah, you are right. Let's not go that path and store an array of
> pids. I don't see a strong use-case with the pid of the process
> requesting checkpoint. If required, we can add it later once the
> pg_stat_progress_checkpoint view gets in.

I don't think that's really necessary to give the pid list.

If you requested a new checkpoint, it doesn't matter if it's only your backend
that triggered it, another backend or a few other dozen, the result will be the
same and you have the information that the request has been seen.  We could
store just a bool for that but having a number instead also gives a bit more
information and may allow you to detect some broken logic on your client code
if it keeps increasing.

Re: psql: Make SSL info display more compact

2022-02-28 Thread Daniel Gustafsson

> On 28 Feb 2022, at 12:56, Peter Eisentraut 
>  wrote:
> On 28.02.22 11:50, Dagfinn Ilmari Mannsåker wrote:
>> Daniel Gustafsson  writes:

>>> This was originally done, but all client side changes reverted as there 
>>> still
>>> are server versions in production which allow compression.
>> How about making it show "compression: on" if compression is on, but
>> nothing in the common "off" case?
> 
> That would work for me.

On POLA grounds I would prefer to never to show it.  If we ever get libpq
compression and want to show that, I'd prefer that we didn't end up
"compression" meaning one thing except when it means two things.

--
Daniel Gustafsson   https://vmware.com/

Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)

2022-02-28 Thread Bharath Rupireddy

On Mon, Feb 28, 2022 at 12:02 PM Julien Rouhaud  wrote:
>
> Hi,
>
> On Mon, Feb 28, 2022 at 10:21:23AM +0530, Bharath Rupireddy wrote:
> >
> > Another thought for my review comment:
> > > 1) Can't we use pg_is_in_recovery to determine if it's a restartpoint
> > > or checkpoint instead of having a new function
> > > pg_stat_get_progress_checkpoint_type?
> >
> > I don't think using pg_is_in_recovery work here as it is taken after
> > the checkpoint has started. So, I think the right way here is to send
> > 1 in CreateCheckPoint  and 2 in CreateRestartPoint and use
> > CASE-WHEN-ELSE-END to show "1": "checkpoint" "2":"restartpoint".
>
> I suggested upthread to store the starting timeline instead.  This way you can
> deduce whether it's a restartpoint or a checkpoint, but you can also deduce
> other information, like what was the starting WAL.

I don't understand why we need the timeline here to just determine
whether it's a restartpoint or checkpoint. I know that the
InsertTimeLineID is 0 during recovery. IMO, emitting 1 for checkpoint
and 2 for restartpoint in CreateCheckPoint and CreateRestartPoint
respectively and using CASE-WHEN-ELSE-END to show it in readable
format is the easiest way.

Can't the checkpoint start LSN be deduced from
PROGRESS_CHECKPOINT_LSN, checkPoint.redo?

I'm completely against these pg_stat_get_progress_checkpoint_{type,
kind, start_time} functions unless there's a strong case. IMO, we can
achieve what we want without these functions as well.

> > 11) I think it's discussed, are we going to add the pid of the
> > checkpoint requestor?
>
> As mentioned upthread, there can be multiple backends that request a
> checkpoint, so unless we want to store an array of pid we should store a 
> number
> of backend that are waiting for a new checkpoint.

Yeah, you are right. Let's not go that path and store an array of
pids. I don't see a strong use-case with the pid of the process
requesting checkpoint. If required, we can add it later once the
pg_stat_progress_checkpoint view gets in.

Regards,
Bharath Rupireddy.

Add the replication origin name and commit-LSN to logical replication worker errcontext

2022-02-28 Thread Masahiko Sawada

Hia,

We've added some information such as the command and the timestamp to
the error context message by commit abc0910e2. This patch adds further
information to it: replication origin name and commit-LSN.

This will be helpful for users to set the origin name and LSN to
pg_replication_origin_advance().

The errcontext message would become like follows:

*Before
ERROR:  duplicate key value violates unique constraint "test_pkey"
DETAIL:  Key (c)=(1) already exists.
CONTEXT:  processing remote data during "INSERT" for replication
target relation "public.test" in transaction 726 at 2022-02-28
20:59:56.005909+09

* After
ERROR:  duplicate key value violates unique constraint "test_pkey"
DETAIL:  Key (c)=(1) already exists.
CONTEXT:  processing remote data during "INSERT" for replication
target relation "public.test" in transaction 726 committed at LSN
0/14BFA88 and timestamp 2022-02-28 20:58:27.964238+09 from replication
origin "pg_16395"

I'm a bit concerned that the message may be too long.

I've attached two patches: the first one changes
apply_error_callback() so that it uses complete sentences with if-else
blocks in order to have a translation work, the second patch adds the
origin name and commit-LSN to the errcontext message.

Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/
From dc6d97c71394c7c216920b9aa1d55bf33c5ac472 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada 
Date: Thu, 24 Feb 2022 16:56:58 +0900
Subject: [PATCH 2/2] Add the origin name and remote commit-LSN to logical
 replication worker errcontext.

This commits adds both the commit-LSN and replication origin name to
the existing error context message.

This will help users in specifying the origin name and commit-LSN to
pg_replication_origin_advance() SQL function to skip the particular transaction.
---
 doc/src/sgml/logical-replication.sgml| 19 +--
 src/backend/replication/logical/worker.c | 71 ++--
 2 files changed, 67 insertions(+), 23 deletions(-)

diff --git a/doc/src/sgml/logical-replication.sgml b/doc/src/sgml/logical-replication.sgml
index 96b4886e08..a96cc21a1c 100644
--- a/doc/src/sgml/logical-replication.sgml
+++ b/doc/src/sgml/logical-replication.sgml
@@ -354,12 +354,21 @@
   
The resolution can be done either by changing data or permissions on the subscriber so
that it does not conflict with the incoming change or by skipping the
-   transaction that conflicts with the existing data.  The transaction can be
-   skipped by calling the 
+   transaction that conflicts with the existing data.  When a conflict produces
+   an error, it is shown in the subscriber's server logs as follows:
+
+ERROR:  duplicate key value violates unique constraint "test_pkey"
+DETAIL:  Key (c)=(1) already exists.
+CONTEXT:  processing remote data during "INSERT" for replication target relation "public.test" in transaction 725 committed at LSN 0/14BFA88 and timestamp 2022-02-28 20:58:27.964238+00 from replication origin "pg_16395"
+
+   The LSN of the transaction that contains the change violating the constraint and
+   the replication origin name can be found from those outputs (LSN 0/14C0378 and
+   replication origin pg_16395 in the above case).  The transaction
+   can be skipped by calling the 
pg_replication_origin_advance() function with
-   a node_name corresponding to the subscription name,
-   and a position.  The current position of origins can be seen in the
-   
+   the node_name and the next LSN of the commit LSN
+   (i.e., 0/14C0379) from those outputs.  The current position of origins can be
+   seen in the 
pg_replication_origin_status system view.
   
  
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index ac49e73b45..3fe5f50806 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -226,6 +226,8 @@ typedef struct ApplyErrorCallbackArg
 	/* Remote node information */
 	int			remote_attnum;	/* -1 if invalid */
 	TransactionId remote_xid;
+	XLogRecPtr	commit_lsn;
+	char	   *origin_name;
 	TimestampTz ts;/* commit, rollback, or prepare timestamp */
 } ApplyErrorCallbackArg;
 
@@ -235,6 +237,8 @@ static ApplyErrorCallbackArg apply_error_callback_arg =
 	.rel = NULL,
 	.remote_attnum = -1,
 	.remote_xid = InvalidTransactionId,
+	.commit_lsn = InvalidXLogRecPtr,
+	.origin_name = NULL,
 	.ts = 0,
 };
 
@@ -334,7 +338,8 @@ static void apply_spooled_messages(TransactionId xid, XLogRecPtr lsn);
 
 /* Functions for apply error callback */
 static void apply_error_callback(void *arg);
-static inline void set_apply_error_context_xact(TransactionId xid, TimestampTz ts);
+static inline void set_apply_error_context_xact(TransactionId xid, XLogRecPtr lsn,
+TimestampTz ts);
 static inline void reset_apply_error_context_info(void);
 
 /*
@@ -787,7 +792,8 @@ apply_handle_begin(StringInfo s)
 	LogicalRepBeginData begin_data;
 
 	logicalrep_read_begin(s, _data);
-

Re: [Proposal] Global temporary tables

2022-02-28 Thread Wenjing Zeng



> 2022年2月25日 15:45，Andres Freund  写道：
> 
> Hi,
> 
> 
> This is a huge thread. Realistically reviewers and committers can't reread
> it. I think there needs to be more of a description of how this works included
> in the patchset and *why* it works that way. The readme does a bit of that,
> but not particularly well.
Thank you for your review of the design and code.
I'm always trying to improve it. If you are confused or need clarification on 
something, please point it out.


> 
> 
> On 2022-02-25 14:26:47 +0800, Wenjing Zeng wrote:
>> +++ b/README.gtt.txt
>> @@ -0,0 +1,172 @@
>> +Global Temporary Table(GTT)
>> +=
>> +
>> +Feature description
>> +-
>> +
>> +Previously, temporary tables are defined once and automatically
>> +exist (starting with empty contents) in every session before using them.
> 
> I think for a README "previously" etc isn't good language - if it were
> commited, it'd not be understandable anymore. It makes more sense for commit
> messages etc.
Thanks for pointing it out. I will adjust the description.

> 
> 
>> +Main design ideas
>> +-
>> +In general, GTT and LTT use the same storage and buffer design and
>> +implementation. The storage files for both types of temporary tables are 
>> named
>> +as t_backendid_relfilenode, and the local buffer is used to cache the data.
> 
> What does "named ast_backendid_relfilenode" mean?
This is the storage file naming format for describing temporary tables.
It starts with 't', followed by backendid and relfilenode, connected by an 
underscore.
File naming rules are the same as LTT.
The data in the file is no different from regular tables and LTT.

> 
> 
>> +The schema of GTTs is shared among sessions while their data are not. We 
>> build
>> +a new mechanisms to manage those non-shared data and their statistics.
>> +Here is the summary of changes:
>> +
>> +1) CATALOG
>> +GTTs store session-specific data. The storage information of GTTs'data, 
>> their
>> +transaction information, and their statistics are not stored in the catalog.
>> +
>> +2) STORAGE INFO & STATISTICS INFO & TRANSACTION INFO
>> +In order to maintain durability and availability of GTTs'session-specific 
>> data,
>> +their storage information, statistics, and transaction information is 
>> managed
>> +in a local hash table tt_storage_local_hash.
> 
> "maintain durability"? Durable across what? In the context of databases it's
> typically about crash safety, but that can't be the case here.
It means that the transaction information(relfrozenxid/relminmxid)  storage 
information(relfilenode)
and statistics(relpages) of GTT, which are maintained in hashtable , not 
pg_class.
This is to allow GTT to store its own local data in different sessions and to 
avoid frequent catalog changes.

> 
> 
>> +3) DDL
>> +Currently, GTT supports almost all table'DDL except CLUSTER/VACUUM FULL.
>> +Part of the DDL behavior is limited by shared definitions and multiple 
>> copies of
>> +local data, and we added some structures to handle this.
> 
>> +A shared hash table active_gtt_shared_hash is added to track the state of 
>> the
>> +GTT in a different session. This information is recorded in the hash table
>> +during the DDL execution of the GTT.
> 
>> +The data stored in a GTT can only be modified or accessed by owning session.
>> +The statements that only modify data in a GTT do not need a high level of
>> +table locking. The operations making those changes include truncate GTT,
>> +reindex GTT, and lock GTT.
> 
> I think you need to introduce a bit more terminology for any of this to make
> sense. Sometimes GTT means the global catalog entity, sometimes, like here, it
> appears to mean the session specific contents of a GTT.
> 
> What state of a GTT in a nother session?
> 
> 
> How do GTTs handle something like BEGIN; TRUNCATE some_gtt_table; ROLLBACK;?

GTT behaves exactly like a regular table.
Specifically, the latest relfilenode for the current session is stored in the 
hashtable and may change it.
If the transaction rolls back, the old relfilenode is enabled again, just as it 
is in pg_class.

> 
> 
>> +1.2 on commit clause
>> +LTT's status associated with on commit DELETE ROWS and on commit PRESERVE 
>> ROWS
>> +is not stored in catalog. Instead, GTTs need a bool value 
>> on_commit_delete_rows
>> +in reloptions which is shared among sessions.
> 
> Why?
The LTT is always created and used in the current session. The on commit clause 
property
does not need to be shared with other sessions. This is why LTT does not record 
the on commit clause
in the catalog.
However, GTT's table definitions are shared between sessions, including the on 
commit clause,
so it needs to be saved in the catalog.


> 
> 
> 
>> +2.3 statistics info
>> +1) relpages reltuples relallvisible relfilenode
> 
> ?
It was mentioned above.

> 
>> +3 DDL
>> +3.1. active_gtt_shared_hash
>> +This is the hash table

Re: psql: Make SSL info display more compact

2022-02-28 Thread Peter Eisentraut


On 28.02.22 12:21, Michael Paquier wrote:

What about that making the information
shown version-aware?  I would choose to show the compression part only
for server versions where it is settable.


That might lead to confusing results if you are not connecting to 
something that is a stock PostgreSQL server.

Re: psql: Make SSL info display more compact

2022-02-28 Thread Peter Eisentraut


On 28.02.22 11:50, Dagfinn Ilmari Mannsåker wrote:

Daniel Gustafsson  writes:


On 28 Feb 2022, at 10:02, Peter Eisentraut  
wrote:



Since support for SSL compression has been removed from PostgreSQL, it
doesn't seem sensible to display it anymore.


This was originally done, but all client side changes reverted as there still
are server versions in production which allow compression.


How about making it show "compression: on" if compression is on, but
nothing in the common "off" case?


That would work for me.

Re: PATCH: add "--config-file=" option to pg_rewind

2022-02-28 Thread Michael Paquier

On Sun, Feb 27, 2022 at 02:30:33PM +0100, Gunnar "Nick" Bluth wrote:
> That's universally true ;-)

-# Internal routine to enable archive recovery command on a standby node
+# Internal routine to enable archive recovery command on a standby node.
+# Returns generated restore_command.
 sub enable_restoring
 {
my ($self, $root_node, $standby) = @_;
@@ -1103,7 +1104,7 @@ restore_command = '$copy_command'
{
$self->set_recovery_mode();
}
-   return;
+   return $copy_command;

I don't like this change.  This makes an existing routine do some
extra work for something it is not designed for.  You could get this
information with a postgres -C command, for example.  Now, you cannot
use postgres -C because of..  Reasons (1a9d802).  But you could use a
pg_ctl command for the same purpose.  I guess that we'd better
refactor this into a new routine of Cluster.pm where we execute a
pg_ctl command and return both stdout and stderr back to the caller?

The TAP part could be refactored on its own.

+In case the postgresql.conf of your
target cluster is not in the
+target pgdata and you want to use the -c /
--restore-target-wal option,
+provide a (relative or absolute) path to the
postgresql.conf using this option.

This could do a better job to explain in which cases this option can
be used for.  You could say something like that:
"This option specifies a path to a configuration file to be used when
starting the target cluster.  This makes easier the use of
-c/--restore-target-wal by setting restore_command in the extra
configuration file specified via this option."

Hmm.  What about the target cluster started in --single mode as of
ensureCleanShutdown()?  Should we try to apply this option also in
this case?
--
Michael


signature.asc
Description: PGP signature

Re: logical decoding and replication of sequences

2022-02-28 Thread Amit Kapila

On Sat, Feb 12, 2022 at 6:04 AM Tomas Vondra
 wrote:
>
> On 2/10/22 19:17, Tomas Vondra wrote:
> > I've polished & pushed the first part adding sequence decoding
> > infrastructure etc. Attached are the two remaining parts.
> >
> > I plan to wait a day or two and then push the test_decoding part. The
> > last part (for built-in replication) will need more work and maybe
> > rethinking the grammar etc.
> >
>
> I've pushed the second part, adding sequences to test_decoding.
>

The test_decoding is failing randomly in the last few days. I am not
completely sure but they might be related to this work. The two of
these appears to be due to the same reason:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink=2022-02-25%2018%3A50%3A09
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=locust=2022-02-17%2015%3A17%3A07

TRAP: FailedAssertion("prev_first_lsn < cur_txn->first_lsn", File:
"reorderbuffer.c", Line: 1173, PID: 35013)
0   postgres0x00593de0 ExceptionalCondition + 160\\0

Another:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mandrill=2022-02-16%2006%3A21%3A48

--- 
/home/nm/farm/xlc32/HEAD/pgsql.build/contrib/test_decoding/expected/rewrite.out
2022-02-14 20:19:14.0 +
+++ 
/home/nm/farm/xlc32/HEAD/pgsql.build/contrib/test_decoding/results/rewrite.out
2022-02-16 07:42:18.0 +
@@ -126,6 +126,7 @@
   table public.replication_example: INSERT: id[integer]:4
somedata[integer]:3 text[character varying]:null
testcolumn1[integer]:null
   table public.replication_example: INSERT: id[integer]:5
somedata[integer]:4 text[character varying]:null
testcolumn1[integer]:2 testcolumn2[integer]:1
   COMMIT
+  sequence public.replication_example_id_seq: transactional:0
last_value: 38 log_cnt: 0 is_called:1
   BEGIN
   table public.replication_example: INSERT: id[integer]:6
somedata[integer]:5 text[character varying]:null
testcolumn1[integer]:3 testcolumn2[integer]:null
   COMMIT
@@ -133,7 +134,7 @@
   table public.replication_example: INSERT: id[integer]:7
somedata[integer]:6 text[character varying]:null
testcolumn1[integer]:4 testcolumn2[integer]:null
   table public.replication_example: INSERT: id[integer]:8
somedata[integer]:7 text[character varying]:null
testcolumn1[integer]:5 testcolumn2[integer]:null
testcolumn3[integer]:1
   COMMIT
- (15 rows)
+ (16 rows)

-- 
With Regards,
Amit Kapila.

Re: Allow file inclusion in pg_hba and pg_ident files

2022-02-28 Thread Julien Rouhaud

Hi,

On Wed, Feb 23, 2022 at 09:44:58AM -0800, Nathan Bossart wrote:
>
> > Finally I also added 0003, which is a POC for a new pg_hba_matches() 
> > function,
> > that can help DBA to understand why their configuration isn't working as 
> > they
> > expect.  This only to start the discussion on that topic, the code is for 
> > now
> > really hackish, as I don't know how much this is wanted and/or if some other
> > behavior would be better, and there's also no documentation or test.  The
> > function for now only takes an optional inet (null means unix socket), the
> > target role and an optional ssl flag and returns the file, line and raw line
> > matching if any, or null.  For instance:
>
> I think another use-case for this is testing updates to your configuration
> files.  For example, I could ensure that hba_forbid_non_ssl.conf wasn't
> accidentally reverted as part of an unrelated change.

Indeed, that function could really be helpful in many scenario.  Note that this
isn't my idea but Magnus idea, which he mentioned quite a long time ago.

Re: psql: Make SSL info display more compact

2022-02-28 Thread Michael Paquier

On Mon, Feb 28, 2022 at 10:50:01AM +, Dagfinn Ilmari Mannsåker wrote:
> Daniel Gustafsson  writes:
>> On 28 Feb 2022, at 10:02, Peter Eisentraut 
>>  wrote:
>> This was originally done, but all client side changes reverted as there still
>> are server versions in production which allow compression.
> 
> How about making it show "compression: on" if compression is on, but
> nothing in the common "off" case?

Hm, no, as it can be useful to know that compression is disabled when
connecting to an old server.  What about that making the information
shown version-aware?  I would choose to show the compression part only
for server versions where it is settable.
--
Michael


signature.asc
Description: PGP signature

Re: Add id's to various elements in protocol.sgml

2022-02-28 Thread Brar Piening


On 28.02.2022 at 10:24, Peter Eisentraut wrote:

On 28.02.22 09:41, Brar Piening wrote:

On Feb 25, 2022 at 14:31, Peter Eisentraut wrote:

I think that kind of stuff would be added in via the web site
stylesheets, so you wouldn't have to deal with XSLT at all.


True for the CSS but  adding the HTML (#)
will need either XSLT or Javascript.


That is already done by your proposed patch, isn't it?


No it isn't. My proposed patch performs the simple task of adding ids to
the dt elements (e.g. ).

This makes them usable as targets for links but they remain invisible to
users of the docs who don't know about them, and unusable to users who
don't know how to extract them from the HTML source code.

The links (e.g. #) aren't added by the current XSLT transformations
from Docbooc to HTML.

Adding them would create a visible element (I propose a hash '#') next
to the description term (inside the  element after the text) that
you can click on to put the link into the address bar, from where it can
be copied for further usage.

Re: Missed condition-variable wakeups on FreeBSD

2022-02-28 Thread Thomas Munro

On Sun, Feb 27, 2022 at 11:18 AM Melanie Plageman
 wrote:
> How could it be that worker 2 is waiting on the build barrier in
> PHJ_BUILD_HASHING_INNER and worker 1 and the leader are waiting on it
> with it supposedly in PHJ_BUILD_HASHING_OUTER?

That'd be consistent with a wakeup going missing, somehow.  W2 was
supposed to be released but somehow didn't get the message (until a
random debugger-induced EINTR releases it and it sees all the state it
needs to proceed in shared memory), meanwhile the other two got to the
end of the next phase and are waiting for it.

RE: Logical replication timeout problem

2022-02-28 Thread kuroda.hay...@fujitsu.com

Dear Wang,

> Attached a new patch that addresses following improvements I have got so far 
> as
> comments:
> 1. Consider other changes that need to be skipped(truncate, DDL and function
> calls pg_logical_emit_message). [suggestion by Ajin, Amit]
> (Add a new function SendKeepaliveIfNecessary for trying to send keepalive
> message.)
> 2. Set the threshold conservatively to a static value of 1.[suggestion by 
> Amit,
> Kuroda-San]
> 3. Reset sendTime in function WalSndUpdateProgress when send_keep_alive is
> false. [suggestion by Amit]

Thank you for giving a good patch! I'll check more detail later,
but it can be applied my codes and passed check world.
I put some minor comments:

```
+ * Try to send keepalive message
```

Maybe missing "a"?

```
+   /*
+   * After continuously skipping SKIPPED_CHANGES_THRESHOLD changes, try to 
send a
+   * keepalive message.
+   */
```

This comments does not follow preferred style:
https://www.postgresql.org/docs/devel/source-format.html

```
@@ -683,12 +683,12 @@ OutputPluginWrite(struct LogicalDecodingContext *ctx, 
bool last_write)
  * Update progress tracking (if supported).
  */
 void
-OutputPluginUpdateProgress(struct LogicalDecodingContext *ctx)
+OutputPluginUpdateProgress(struct LogicalDecodingContext *ctx, bool 
send_keep_alive)
```

This function is no longer doing just tracking.
Could you update the code comment above?

```
if (!is_publishable_relation(relation))
return;
```

I'm not sure but it seems that the function exits immediately if relation
is a sequence, view, temporary table and so on. Is it OK? Does it never happen?

```
+   SendKeepaliveIfNecessary(ctx, false);
```

I think a comment is needed above which clarifies sending a keepalive message.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Re: psql: Make SSL info display more compact

2022-02-28 Thread Dagfinn Ilmari Mannsåker

Daniel Gustafsson  writes:

>> On 28 Feb 2022, at 10:02, Peter Eisentraut 
>>  wrote:
>
>> Since support for SSL compression has been removed from PostgreSQL, it
>> doesn't seem sensible to display it anymore.
>
> This was originally done, but all client side changes reverted as there still
> are server versions in production which allow compression.

How about making it show "compression: on" if compression is on, but
nothing in the common "off" case?

- ilmari

Re: Proposal: Support custom authentication methods using hooks

2022-02-28 Thread Peter Eisentraut


On 17.02.22 20:25, samay sharma wrote:
A use case where this is useful are environments where you want 
authentication to be centrally managed across different services. This 
is a common deployment model for cloud providers where customers like to 
use single sign on and authenticate across different services including 
Postgres. Implementing this now is tricky as it requires syncing that 
authentication method's credentials with Postgres (and that gets 
trickier with TTL/expiry etc.). With these hooks, you can implement an 
extension to check credentials directly using the 
authentication provider's APIs.


We already have a variety of authentication mechanisms that support 
central management: LDAP, PAM, Kerberos, Radius.  What other mechanisms 
are people thinking about implementing using these hooks?  Maybe there 
are a bunch of them, in which case a hook system might be sensible, but 
if there are only one or two plausible ones, we could also just make 
them built in.

Re: Proposal: Support custom authentication methods using hooks

2022-02-28 Thread Peter Eisentraut


On 28.02.22 00:17, Jeff Davis wrote:

I also have in mind here that there has been discussion of giving
libpq a feature to refuse, on the client side, to send cleartext
passwords.

I am generally in favor of that idea, but I'm not sure that will
completely resolve the issue. For instance, should it also refuse MD5?


Presumably that feature could be more generally "refuse these 
authentication mechanisms" rather than only one hardcoded one.

Re: Support for grabbing multiple consecutive values with nextval()

2022-02-28 Thread Peter Eisentraut


On 27.02.22 10:42, Jille Timmermans wrote:
I wanted to be able to allocate a bunch of numbers from a sequence at 
once. Multiple people seem to be struggling with this 
(https://stackoverflow.com/questions/896274/select-multiple-ids-from-a-postgresql-sequence, 
https://www.depesz.com/2008/03/20/getting-multiple-values-from-sequences/).


I propose to add an extra argument to nextval() that specifies how many 
numbers you want to allocate (default 1).


What is the use of this?

I note that the stackoverflow question wanted to return multiple 
sequence values as a result set, whereas your implementation just skips 
a number of values and returns the last one.  At least we should be 
clear about what we are trying to achieve.

Re: unlogged sequences

2022-02-28 Thread Peter Eisentraut


rebased patch, no functional changes

On 11.02.22 10:12, Peter Eisentraut wrote:

On 25.06.19 20:37, Andres Freund wrote:

I.e. I think it'd be better if we just added a fork argument to
fill_seq_with_data(), and then do something like

smgrcreate(srel, INIT_FORKNUM, false);
log_smgrcreate(>rd_node, INIT_FORKNUM);
fill_seq_with_data(rel, tuple, INIT_FORKNUM);

and add a FlushBuffer() to the end of fill_seq_with_data() if writing
INIT_FORKNUM. The if (RelationNeedsWAL(rel)) would need an || forkNum ==
INIT_FORKNUM.


Now that logical replication of sequences is nearing completion, I 
figured it would be suitable to dust off this old discussion on unlogged 
sequences, mainly so that sequences attached to unlogged tables can be 
excluded from replication.


Attached is a new patch that incorporates the above suggestions, with 
some slight refactoring.  The only thing I didn't/couldn't do was to 
call FlushBuffers(), since that is not an exported function.  So this 
still calls FlushRelationBuffers(), which was previously not liked. 
Ideas welcome.


I have also re-tested the crash reported by Michael Paquier in the old 
discussion and added test cases that catch them.


The rest of the patch is just documentation, DDL support, client 
support, etc.


What is not done yet is support for ALTER SEQUENCE ... SET 
LOGGED/UNLOGGED.  This is a bit of a problem because:


1. The new behavior is that a serial/identity sequence of a new unlogged 
table is now also unlogged.
2. There is also a new restriction that changing a table to logged is 
not allowed if it is linked to an unlogged sequence.  (This is IMO 
similar to the existing restriction on linking mixed logged/unlogged 
tables via foreign keys.)
3. Thus, currently, you can't create an unlogged table with a 
serial/identity column and then change it to logged.  This is reflected 
in some of the test changes I had to make in alter_table.sql to work 
around this.  These should eventually go away.


Interestingly, there is grammar support for ALTER SEQUENCE ... SET 
LOGGED/UNLOGGED because there is this:


     |   ALTER SEQUENCE qualified_name alter_table_cmds
     {
     AlterTableStmt *n = makeNode(AlterTableStmt);
     n->relation = $3;
     n->cmds = $4;
     n->objtype = OBJECT_SEQUENCE;
     n->missing_ok = false;
     $$ = (Node *)n;
     }

But it is rejected later in tablecmds.c.  In fact, it appears that this 
piece of grammar is currently useless because there are no 
alter_table_cmds that actually work for sequences.  (This used to be 
different because things like OWNER TO also went through here.)


I tried to make tablecmds.c handle sequences as well, but that became 
messy.  So I'm thinking about making ALTER SEQUENCE ... SET 
LOGGED/UNLOGGED an entirely separate code path and rip out the above 
grammar, but that needs some further pondering.


But all that is a bit of a separate effort, so in the meantime some 
review of the changes in and around fill_seq_with_data() would be useful.
From 039eee052c10803fed01b2dbd83de8e06bea5303 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Fri, 11 Feb 2022 09:44:37 +0100
Subject: [PATCH v3] Unlogged sequences

Add support for unlogged sequences.  Unlike for unlogged tables, this
is not a performance feature.  It allows sequences associated with
unlogged tables to be excluded from replication.

An identity/serial sequence is automatically made unlogged when its
owning table is.  (See also discussion in bug #15631.)

TODO:
- ALTER SEQUENCE ... SET LOGGED/UNLOGGED

Discussion: 
https://www.postgresql.org/message-id/flat/04e12818-2f98-257c-b926-2845d74ed04f%402ndquadrant.com
---
 doc/src/sgml/ref/create_sequence.sgml  | 23 +-
 doc/src/sgml/ref/pg_dump.sgml  |  6 +--
 src/backend/commands/sequence.c| 36 
 src/backend/commands/tablecmds.c   | 22 ++
 src/backend/parser/parse_utilcmd.c |  1 +
 src/bin/pg_dump/pg_dump.c  |  4 +-
 src/bin/psql/describe.c|  8 +++-
 src/test/recovery/t/014_unlogged_reinit.pl | 49 +++---
 src/test/regress/expected/alter_table.out  | 40 +-
 src/test/regress/expected/sequence.out | 12 +-
 src/test/regress/sql/alter_table.sql   |  9 ++--
 src/test/regress/sql/sequence.sql  |  8 +++-
 12 files changed, 172 insertions(+), 46 deletions(-)

diff --git a/doc/src/sgml/ref/create_sequence.sgml 
b/doc/src/sgml/ref/create_sequence.sgml
index 20bdbc002f..a84aa5bf56 100644
--- a/doc/src/sgml/ref/create_sequence.sgml
+++ b/doc/src/sgml/ref/create_sequence.sgml
@@ -21,7 +21,7 @@
 
  
 
-CREATE [ TEMPORARY | TEMP ] SEQUENCE [ IF NOT EXISTS ] name
+CREATE [ { TEMPORARY | TEMP } | UNLOGGED ] SEQUENCE [ IF NOT EXISTS ] 
name
 [ AS data_type ]
 [ INCREMENT [ BY ] increment ]
 [ MINVALUE minvalue | NO

Re: postgres_fdw: commit remote (sub)transactions in parallel during pre-commit

2022-02-28 Thread Etsuro Fujita

On Thu, Feb 24, 2022 at 2:49 PM Etsuro Fujita  wrote:
> I think the 0003 patch needs rebase.
> I'll update the patch.

Here is an updated version.  I added to the 0003 patch a macro for
defining the milliseconds to wait, as proposed by David upthread.

Best regards,
Etsuro Fujita


v5-0002-postgres_fdw-Minor-cleanup-for-pgfdw_abort_cleanup.patch
Description: Binary data


v5-0003-postgres-fdw-Add-support-for-parallel-abort.patch
Description: Binary data

Optimize external TOAST storage

2022-02-28 Thread davinder singh

Hi,

For Toast storage [1] in PostgreSQL, first, the attribute value is
compressed
and then divided into chunks. The problem with storing compressed value is,
if we
are not saving enough space such that it reduces the #chunks then we end up
adding extra decompression cost on every read.
Based on the discussion with Robert and Dilip, we tried to optimize this
process. The idea is to TOAST the compressed value only if we are saving at
least
1 chunk(2KB default) of disk storage else use the uncompressed one. In this
way,
we will save decompression costs without losing much on storage.

In our tests, we have observed improvement in the read performance up to
28% by
giving up at most TOAST_MAX_CHUNK_SIZE (2KB) bytes for storage on disk. The
gain is more on large attributes (size > 4KB) because
compression/decompression
cost increases with size.
However, We have found, this assumption is not true when the data
compression
ratio is more and the size is barely big enough to cause TOASTing. For
example,
in the following test 4. b, we didn't get any performance advantage but the
table
size grew by 42% by storing uncompressed values.
Test Setup.

Create table t1_lz4 ( a text compression lz4, b text compression lz4);
-- Generate random data
create or replace function generate_att_data(len_info int)
returns text
language plpgsql
as
$$
declare
   value text;
begin
   select array_agg(md5(g::text))
   into value
   from generate_series(1, round(len_info/33)::int) g;
   return value;
end;
$$;

--Test
Select b from t1_lz4;

Test 1:
Data: rows 20
insert into t1_lz4(a, b) select generate_att_data(364), repeat
(generate_att_data(1980), 2);
Summary:
Attribute size: original: 7925 bytes, after compression: 7845 bytes
Time for select: head: 42  sec, patch: 37 sec, *Performance Gain: 11%*
table size: Head 1662 MB, Patch: 1662 MB

Test 2:
Data: rows 10
insert into t1_lz4(a, b) select generate_att_data(364),
generate_att_data(16505);
Summary:
Attribute size: original: 16505 bytes, after compression: 16050 bytes
Time for select: head: 35.6  sec, patch: 30 sec, *Performance Gain: 14%*
table size: Head 1636 MB, Patch: 1688 MB

Test 3:
Data: rows 5
insert into t1_lz4(a, b) select generate_att_data(364),
generate_att_data(31685);
Summary:
Attribute size: original: 31685 bytes, after compression: 30263 bytes
Time for select: head: 35.4  sec, patch: 25.5 sec, *Performance Gain: 28%*
table size: Head 1601 MB, Patch: 1601 MB

Test 4.a:
Data: rows 20
insert into t1_lz4(a, b) select generate_att_data(11), repeat ('b', 250)
|| generate_att_data(3885);
Summary:
Attribute size: original: 3885 bytes, after compression: 3645 bytes
Time for select: head: 28 sec, patch: 26 sec, *Performance Gain: 7%*
*table size: Head 872 MB, Patch: 872 MB*

Test 4.b (High compression):
Data: rows 20
insert into t1_lz4(a, b) select generate_att_data(364), repeat
(generate_att_data(1980), 2);
Summary:
Attribute size: original: 3966 bytes, after compression: 2012 bytes
Time for select: head: 27  sec, patch: 26 sec, *Performance Gain: 0%*
*table size: Head 612 MB, Patch: 872 MB*

This is the worst case for this optimization because of the following 2
reasons.
First, the table size would grow by 50% when compressed size is half of the
original size and
yet barely large enough for TOASTing. Table size can't grow more than that
because If
compression reduces the size even more then it will reduce the #chunks as
well and it stores
the compressed value in the table.

Second, not much gain in performance because of the small attribute size,
more attributes fit

in page (almost twice), on each page access it can access twice the number
of rows. And

also small value means low compression/decompression costs.


We have avoided such cases by applying the optimization when attribute size
> 4KB.

Regards,
Davinder
EnterpriseDB: http://www.enterprisedb.com


v1_0001_optimize_external_toast_storage.patch
Description: Binary data

Re: Add id's to various elements in protocol.sgml

2022-02-28 Thread Peter Eisentraut


On 28.02.22 09:41, Brar Piening wrote:

On Feb 25, 2022 at 14:31, Peter Eisentraut wrote:

I think that kind of stuff would be added in via the web site
stylesheets, so you wouldn't have to deal with XSLT at all.


True for the CSS but  adding the HTML (#)
will need either XSLT or Javascript.


That is already done by your proposed patch, isn't it?

1 2 >

1 - 100 of 105 matches

Mail list logo