Re: pg_stat_reset_single_*_counters vs pg_stat_database.stats_reset

2022-03-23 Thread Andres Freund
Hi,

On 2022-03-24 06:27:48 +0100, Laurenz Albe wrote:
> On Wed, 2022-03-23 at 17:55 -0700, Andres Freund wrote:
> > Starting with the below commit, pg_stat_reset_single_function_counters,
> > pg_stat_reset_single_table_counters don't just reset the stats for the
> > individual function, but also set pg_stat_database.stats_reset.
> 
> I see the point in the fine-grained reset, but I am -1 on having that
> reset "pg_stat_database.stats_reset".  That would make the timestamp
> mostly useless.

Just to be clear - that's the current and long-time behaviour.

Greetings,

Andres Freund




Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints

2022-03-23 Thread Dilip Kumar
On Wed, Mar 23, 2022 at 10:50 PM Dilip Kumar  wrote:
>
> On Wed, Mar 23, 2022 at 10:37 PM Andres Freund  wrote:
> >
> > Hi,
> >
> > On 2022-03-23 22:29:40 +0530, Dilip Kumar wrote:
> > > I could not see any reason for it to fail, and I could not reproduce
> > > it either.  Is it possible to access the server log for this cfbot
> > > failure?
> >
> > Yes, near the top, below the cpu / memory graphs, there's a file
> > navigator. Should have all files ending with *.log or starting with
> > regress_log_*.
>
> Okay, I think I have found the reasoning for this failure, basically,
> if we see the below logs then the second statement is failing with
> foobar5 already exists and that is because some of the above test case
> is conditionally generating the same name.  So the fix is to use a
> different name.

In the latest version I have fixed this issue by using a non
conflicting name, because when it was compiled with-icu the foobar5
was already used and we were seeing failure.  Apart from this I have
fixed the duplicate cleanup problem by passing an extra parameter to
RelationCreateStorage, which decides whether to register for on-abort
delete or not and added the comments for the same.  IMHO this looks
the most cleaner way to do it, please check the patch and let me know
your thoughts.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
From d9821c93b8d5b4a5707943d23f7beae6826627f0 Mon Sep 17 00:00:00 2001
From: Robert Haas 
Date: Tue, 22 Mar 2022 11:22:26 -0400
Subject: [PATCH v5] Add new block-by-block strategy for CREATE DATABASE.

Because this strategy logs changes on a block-by-block basis, it
avoids the need to checkpoint before and after the operation.
However, because it logs each changed block individually, it might
generate a lot of extra write-ahead logging if the template database
is large. Therefore, the older strategy remains available via a new
STRATEGY parameter to CREATE DATABASE, and a corresponding --strategy
option to createdb.

Somewhat controversially, this patch assembles the list of relations
to be copied to the new database by reading the pg_class relation of
the template database. Cross-database access like this isn't normally
possible, but it can be made to work here because there can't be any
connections to the database being copied, nor can it contain any
in-doubt transactions. Even so, we have to use lower-level interfaces
than normal, since the table scan and relcache interfaces will not
work for a database to which we're not connected. The advantage of
this approach is that we do not need to rely on the filesystem to
determine what ought to be copied, but instead on PostgreSQL's own
knowledge of the database structure. This avoids, for example,
copying stray files that happen to be located in the source database
directory.

Dilip Kumar, with a fairly large number of cosmetic changes by me.
---
 contrib/bloom/blinsert.c |   2 +-
 doc/src/sgml/ref/create_database.sgml|  22 +
 doc/src/sgml/ref/createdb.sgml   |  11 +
 src/backend/access/heap/heapam_handler.c |   6 +-
 src/backend/access/nbtree/nbtree.c   |   2 +-
 src/backend/access/rmgrdesc/dbasedesc.c  |  20 +-
 src/backend/access/transam/xlogutils.c   |   6 +-
 src/backend/catalog/heap.c   |   2 +-
 src/backend/catalog/storage.c|  34 +-
 src/backend/commands/dbcommands.c| 761 ++-
 src/backend/commands/tablecmds.c |   2 +-
 src/backend/storage/buffer/bufmgr.c  | 172 ++-
 src/backend/storage/lmgr/lmgr.c  |  28 ++
 src/backend/utils/activity/wait_event.c  |   3 +
 src/backend/utils/cache/relcache.c   |   2 +-
 src/backend/utils/cache/relmapper.c  |  64 +++
 src/bin/pg_rewind/parsexlog.c|   9 +-
 src/bin/psql/tab-complete.c  |   4 +-
 src/bin/scripts/createdb.c   |  10 +-
 src/bin/scripts/t/020_createdb.pl|  20 +
 src/include/catalog/storage.h|   4 +-
 src/include/commands/dbcommands_xlog.h   |  25 +-
 src/include/storage/bufmgr.h |   6 +-
 src/include/storage/lmgr.h   |   1 +
 src/include/utils/relmapper.h|   4 +-
 src/include/utils/wait_event.h   |   1 +
 src/tools/pgindent/typedefs.list |   5 +-
 27 files changed, 1069 insertions(+), 157 deletions(-)

diff --git a/contrib/bloom/blinsert.c b/contrib/bloom/blinsert.c
index c94cf34..82378db 100644
--- a/contrib/bloom/blinsert.c
+++ b/contrib/bloom/blinsert.c
@@ -173,7 +173,7 @@ blbuildempty(Relation index)
 	 * Write the page and log it.  It might seem that an immediate sync would
 	 * be sufficient to guarantee that the file exists on disk, but recovery
 	 * itself might remove it while replaying, for example, an
-	 * XLOG_DBASE_CREATE or XLOG_TBLSPC_CREATE record.  Therefore, we need
+	 * XLOG_DBASE_CREATE* or XLOG_TBLSPC_CREATE record.  Therefore, we need
 	 * this even when wal_level=minimal.
 	 */
 	PageSetChecksumInplace(metapage, 

Re: pg_stat_reset_single_*_counters vs pg_stat_database.stats_reset

2022-03-23 Thread Laurenz Albe
On Wed, 2022-03-23 at 17:55 -0700, Andres Freund wrote:
> Starting with the below commit, pg_stat_reset_single_function_counters,
> pg_stat_reset_single_table_counters don't just reset the stats for the
> individual function, but also set pg_stat_database.stats_reset.

I see the point in the fine-grained reset, but I am -1 on having that
reset "pg_stat_database.stats_reset".  That would make the timestamp
mostly useless.

One could argue that resetting a single counter and *not* resetting
"pg_stat_database.stats_reset" would also be a lie, but at least it is
a smaller lie.

Yours,
Laurenz Albe





Re: Removing unneeded self joins

2022-03-23 Thread Andrey V. Lepikhov

On 3/22/22 05:58, Andres Freund wrote:

Hi,

On 2022-03-04 15:47:47 +0500, Andrey Lepikhov wrote:

Also, in new version of the patch I fixed one stupid bug: checking a
self-join candidate expression operator - we can remove only expressions
like F(arg1) = G(arg2).


This CF entry currently fails tests: 
https://cirrus-ci.com/task/4632127944785920?logs=test_world#L1938

Looks like you're missing an adjustment of postgresql.conf.sample

Marked as waiting-on-author.

Thanks, I fixed it.

--
regards,
Andrey Lepikhov
Postgres ProfessionalFrom 620dea31ce19965beefe545f08dcc5c8b319c434 Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" 
Date: Thu, 15 Jul 2021 15:26:13 +0300
Subject: [PATCH] Remove self-joins.

Remove inner joins of a relation to itself if could prove that the join
can be replaced with a scan. We can prove the uniqueness
using the existing innerrel_is_unique machinery.

We can remove a self-join when for each outer row:
1. At most one inner row matches the join clauses.
2. If the join target list contains any inner vars, an inner row
must be (physically) the same row as the outer one.

In this patch we use Rowley's [1] approach to identify a self-join:
1. Collect all mergejoinable join quals which look like a.x = b.x
2. Check innerrel_is_unique() for the qual list from (1). If it
returns true, then outer row matches only the same row from the inner
relation. So proved, that this join is self-join and can be replaced by
a scan.

Some regression tests changed due to self-join removal logic.

[1] https://www.postgresql.org/message-id/raw/CAApHDvpggnFMC4yP-jUO7PKN%3DfXeErW5bOxisvJ0HvkHQEY%3DWw%40mail.gmail.com
---
 src/backend/optimizer/plan/analyzejoins.c | 888 +-
 src/backend/optimizer/plan/planmain.c |   5 +
 src/backend/optimizer/util/joininfo.c |   3 +
 src/backend/optimizer/util/relnode.c  |  26 +-
 src/backend/utils/misc/guc.c  |  10 +
 src/include/optimizer/pathnode.h  |   4 +
 src/include/optimizer/planmain.h  |   2 +
 src/test/regress/expected/equivclass.out  |  32 +
 src/test/regress/expected/join.out| 426 +++
 src/test/regress/expected/sysviews.out|   3 +-
 src/test/regress/sql/equivclass.sql   |  16 +
 src/test/regress/sql/join.sql | 197 +
 12 files changed, 1583 insertions(+), 29 deletions(-)

diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index 337f470d58..c5ac8e2bd4 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -22,6 +22,7 @@
  */
 #include "postgres.h"
 
+#include "catalog/pg_class.h"
 #include "nodes/nodeFuncs.h"
 #include "optimizer/clauses.h"
 #include "optimizer/joininfo.h"
@@ -32,10 +33,12 @@
 #include "optimizer/tlist.h"
 #include "utils/lsyscache.h"
 
+bool enable_self_join_removal;
+
 /* local functions */
 static bool join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo);
 static void remove_rel_from_query(PlannerInfo *root, int relid,
-  Relids joinrelids);
+  Relids joinrelids, int subst_relid);
 static List *remove_rel_from_joinlist(List *joinlist, int relid, int *nremoved);
 static bool rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel);
 static bool rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel,
@@ -47,6 +50,9 @@ static bool is_innerrel_unique_for(PlannerInfo *root,
    RelOptInfo *innerrel,
    JoinType jointype,
    List *restrictlist);
+static void change_rinfo(RestrictInfo* rinfo, Index from, Index to);
+static Bitmapset* change_relid(Relids relids, Index oldId, Index newId);
+static void change_varno(Expr *expr, Index oldRelid, Index newRelid);
 
 
 /*
@@ -86,7 +92,7 @@ restart:
 
 		remove_rel_from_query(root, innerrelid,
 			  bms_union(sjinfo->min_lefthand,
-		sjinfo->min_righthand));
+		sjinfo->min_righthand), 0);
 
 		/* We verify that exactly one reference gets removed from joinlist */
 		nremoved = 0;
@@ -300,7 +306,10 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
 
 /*
  * Remove the target relid from the planner's data structures, having
- * determined that there is no need to include it in the query.
+ * determined that there is no need to include it in the query. Or replace
+ * with another relid.
+ * To reusability, this routine can work in two modes: delete relid from a plan
+ * or replace it. It is used in replace mode in a self-join removing process.
  *
  * We are not terribly thorough here.  We must make sure that the rel is
  * no longer treated as a baserel, and that attributes of other baserels
@@ -309,13 +318,16 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
  * lists, but only if they belong to the outer join identified by joinrelids.
  */
 static void
-remove_rel_from_query(PlannerInfo *root, int relid, Relids joinrelids)
+remove_rel_from_query(PlannerInfo *root, int relid, Relids joinrelids,
+	  int 

Re: Remove an unnecessary errmsg_plural in dependency.c

2022-03-23 Thread Kyotaro Horiguchi
At Wed, 23 Mar 2022 17:39:52 +0100, Peter Eisentraut 
 wrote in 
> On 23.03.22 17:33, Bharath Rupireddy wrote:
> > It looks like the following errmsg_plural in dependency.c is
> > unnecessary as numReportedClient > 1 always and numNotReportedClient
> > can never be < 0. Therefore plural version of the error message is
> > sufficient. Attached a patch to fix it.
> 
> Some languages have more than two forms, so we still need to keep this
> to handle those.

The point seems to be that numReportedClient + numNotReportedClient >=
2 (not 1) there. So the singular form is never used.  It doesn't harm
as-is but translation burden decreases a bit by fixing it.

By the way it has a translator-note as follows.

>   else if (numReportedClient > 1)
>   {
>   ereport(msglevel,
>   /* translator: %d always has a value larger than 1 */
>   (errmsg_plural("drop cascades to %d other 
> object",
>  "drop cascades to %d 
> other objects",
>  numReportedClient + 
> numNotReportedClient,
>  numReportedClient + 
> numNotReportedClient),

The comment and errmsg_plural don't seem to be consistent.  When the
code was added by c4f2a0458d, it had only singular form and already
had the comment.  After that 8032d76b5 turned it to errmsg_plural
ignoring the comment.  It seems like a thinko of 8032d76b5.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: [PoC] Let libpq reject unexpected authentication requests

2022-03-23 Thread Laurenz Albe
On Wed, 2022-03-23 at 21:31 +, Jacob Champion wrote:
> On Mon, 2022-03-07 at 11:44 +0100, Laurenz Albe wrote:
> > I am all for the idea, but you implemented the reverse of proposal 2.
> >
> > Wouldn't it be better to list the *rejected* authentication methods?
> > Then we could have "password" on there by default.
> 
> Specifying the allowed list rather than the denied list tends to have
> better security properties.
> 
> In the case I'm pursuing (the attack vector from the CVE), the end user
> expects certificates to be used. Any other authentication method --
> plaintext, hashed, SCRAM, Kerberos -- is unacceptable;

That makes sense.

> But that doesn't help your case; you want to choose a good default, and
> I agree that's important. Since there are arguments already for
> accepting a OR in the list, and -- if we couldn't find a good
> orthogonal method for certs, like Tom suggested -- an AND, maybe it
> wouldn't be so bad to accept a NOT as well?
> 
>     require_auth=cert    # certs only
>     require_auth=cert+scram-sha-256  # SCRAM wrapped by certs
>     require_auth=cert,scram-sha-256  # SCRAM or certs (or both)
>     require_auth=!password   # anything but plaintext
>     require_auth=!password,!md5  # no plaintext or MD5

Great, if there is a !something syntax, then I have nothing left to wish.
It may not be the most secure way do do it, but it sure is convenient.

Yours,
Laurenz Albe





Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

2022-03-23 Thread Kyotaro Horiguchi
At Wed, 23 Mar 2022 21:36:09 +0530, Bharath Rupireddy 
 wrote in 
> Here's a refactoring patch that basically moves the pg_waldump's
> functions and stats structures to xlogreader.h/.c so that the
> pg_walinspect can reuse them. If it looks okay, I will send the
> pg_walinspect patches based on it.


+void
+XLogRecGetBlockRefInfo(XLogReaderState *record, char *delimiter,
+  uint32 *fpi_len, bool 
detailed_format,
+  StringInfo buf)
...
+   if (detailed_format && delimiter)
+   appendStringInfoChar(buf, '\n');

It is odd that the variable "delimiter" is used as a bool in the
function, though it is a "char *", which I meant that it is used as
delimiter string (assuming that you might want to insert ", " between
two blkref descriptions).


+get_forkname(ForkNumber num)

forkNames[] is public and used in reinit.c.  I think we don't need
this function.


+#define MAX_XLINFO_TYPES 16
...
+   XLogRecStatsrmgr_stats[RM_NEXT_ID];
+   XLogRecStatsrecord_stats[RM_NEXT_ID][MAX_XLINFO_TYPES];
+} XLogStats;
+

This doesn't seem to be a part of xlogreader.  Couldn't we add a new
module "xlogstats"?  XLogRecGetBlockRefInfo also doesn't seem to me as
a part of xlogreader, the xlogstats looks like a better place.


+#define XLOG_GET_STATS_PERCENTAGE(n_pct, rec_len_pct, fpi_len_pct, \
+ tot_len_pct, 
total_count, \

It doesn't need to be a macro. However in the first place I don't
think it is useful to have.  Rather it may be harmful since it doesn't
reduce complexity much but instead just hides details.  If we want to
avoid tedious repetitions of the same statements, a macro like the
following may work.

#define CALC_PCT (num, denom) ((denom) == 0 ? 0.0 ? 100.0 * (num) / (denom))
...
>   n_pct = CALC_PCT(n, total_count);
>   rec_len_pct = CALC_PCT(rec_len, total_rec_len);
>   fpi_len_pct = CALC_PCT(fpi_len, total_fpi_len);
>   tot_len_pct = CALC_PCT(tot_len, total_len);

But it is not seem that different if we directly write out the detail.

>   n_pct = (total_count == 0 ? 0 : 100.0 * n / total_count);
>   rec_len_pct = (total_rec_len == 0 ? 0 : 100.0 * rec_len / total_rec_len);
>   fpi_len_pct = (total_fpi_len == 0 ? 0 : 100.0 * fpi_len / total_fpi_len);
>   tot_len_pct = (total_len == 0 ? 0 : 100.0 * tot_len / total_len);

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

2022-03-23 Thread Tatsuo Ishii
>> My hoary animal prairiedog doesn't like this [1]:
>> 
>> #   Failed test 'concurrent update with retrying stderr /(?s-xim:client 
>> (0|1) got an error in command 3 \\(SQL\\) of script 0; ERROR:  could not 
>> serialize access due to concurrent update\\b.*\\g1)/'
>> #   at t/001_pgbench_with_server.pl line 1229.
>> #   'pgbench: pghost: /tmp/nhghgwAoki pgport: 58259 
>> nclients: 2 nxacts: 1 dbName: postgres
>> ...
>> # pgbench: client 0 got an error in command 3 (SQL) of script 0; ERROR:  
>> could not serialize access due to concurrent update
>> ...
>> # '
>> # doesn't match '(?s-xim:client (0|1) got an error in command 3 
>> \\(SQL\\) of script 0; ERROR:  could not serialize access due to concurrent 
>> update\\b.*\\g1)'
>> # Looks like you failed 1 test of 425.
>> 
>> I'm not sure what the "\\b.*\\g1" part of this regex is meant to
>> accomplish, but it seems to be assuming more than it should
>> about the output format of TAP messages.
> 
> I have edited the test code from the original patch by mistake, but
> I could not realize because the test works in my machine without any
> errors somehow.
> 
> I attached a patch to fix the test as was in the original patch, where
> backreferences are used to check retry of the same query.

My machine (Ubuntu 20) did not complain either. Maybe perl version
difference?  Any way, the fix pushed. Let's see how prairiedog feels.

Best reagards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp




Re: Extensible Rmgr for Table AMs

2022-03-23 Thread Jeff Davis
On Fri, 2022-02-04 at 22:56 +0800, Julien Rouhaud wrote:
> > Right, which I guess raises another question: if the maximum is
> > UINT8_MAX, which BTW I find perfectly reasonable, why are we not
> > just
> > defining this as an array of size 256? There's no point in adding
> > code
> > complexity to save a few kB of memory.
> 
> Agreed, especially if combined with your suggested approach 3 (array
> of
> pointers).

Implemented and attached. I also updated pg_waldump and pg_rewind to do
something reasonable.

Additionally, I now have a reasonably complete implementation of a
custom resource manager now:

https://github.com/citusdata/citus/tree/custom-rmgr-15

(Not committed or intended to actually be used right now -- just a
POC.)

Offline, Andres mentioned that I should test recovery performance if we
take your approach, because making the RmgrTable non-const could impact
optimizations. Not sure if that would be a practical concern compared
to all the other work done at REDO time.

Regards,
Jeff Davis

From faa7eb847fab30c0fbe47e703d1c2d64f84ba51d Mon Sep 17 00:00:00 2001
From: Jeff Davis 
Date: Sat, 6 Nov 2021 13:01:38 -0700
Subject: [PATCH] Extensible rmgr.

Allow extensions to specify a new custom rmgr, which allows
specialized WAL. This is meant to be used by a custom Table Access
Method, which would not otherwise be able to offer logical
decoding/replication. It may also be used by new Index Access Methods.

Prior to this commit, only Generic WAL was available, which offers
support for recovery and physical replication but not logical
replication.

Reviewed-by: Julien Rouhaud
Discussion: https://postgr.es/m/ed1fb2e22d15d3563ae0eb610f7b61bb15999c0a.camel%40j-davis.com
---
 src/backend/access/transam/rmgr.c | 87 ++-
 src/backend/access/transam/xlogreader.c   |  2 +-
 src/backend/access/transam/xlogrecovery.c | 29 +++-
 src/backend/replication/logical/decode.c  |  4 +-
 src/backend/utils/misc/guc.c  |  6 +-
 src/bin/pg_rewind/parsexlog.c | 11 +--
 src/bin/pg_waldump/pg_waldump.c   | 27 +++
 src/bin/pg_waldump/rmgrdesc.c | 32 -
 src/bin/pg_waldump/rmgrdesc.h |  2 +-
 src/include/access/rmgr.h | 11 ++-
 src/include/access/xlog_internal.h|  5 +-
 11 files changed, 168 insertions(+), 48 deletions(-)

diff --git a/src/backend/access/transam/rmgr.c b/src/backend/access/transam/rmgr.c
index f8847d5aebf..1805596071e 100644
--- a/src/backend/access/transam/rmgr.c
+++ b/src/backend/access/transam/rmgr.c
@@ -24,6 +24,7 @@
 #include "commands/dbcommands_xlog.h"
 #include "commands/sequence.h"
 #include "commands/tablespace.h"
+#include "miscadmin.h"
 #include "replication/decode.h"
 #include "replication/message.h"
 #include "replication/origin.h"
@@ -32,8 +33,90 @@
 
 /* must be kept in sync with RmgrData definition in xlog_internal.h */
 #define PG_RMGR(symname,name,redo,desc,identify,startup,cleanup,mask,decode) \
-	{ name, redo, desc, identify, startup, cleanup, mask, decode },
+	&(struct RmgrData){ name, redo, desc, identify, startup, cleanup, mask, decode },
 
-const RmgrData RmgrTable[RM_MAX_ID + 1] = {
+static RmgrData *RmgrTable[RM_MAX_ID + 1] = {
 #include "access/rmgrlist.h"
 };
+
+/*
+ * Start up all resource managers.
+ */
+void
+StartupResourceManagers()
+{
+	for (int i = 0; i < RM_MAX_ID; i++)
+	{
+		if (RmgrTable[i] == NULL)
+			continue;
+
+		if (RmgrTable[i]->rm_startup != NULL)
+			RmgrTable[i]->rm_startup();
+	}
+}
+
+/*
+ * Clean up all resource managers.
+ */
+void
+CleanupResourceManagers()
+{
+	for (int i = 0; i < RM_MAX_ID; i++)
+	{
+		if (RmgrTable[i] == NULL)
+			continue;
+
+		if (RmgrTable[i]->rm_cleanup != NULL)
+			RmgrTable[i]->rm_cleanup();
+	}
+}
+
+/*
+ * Register a new custom rmgr.
+ *
+ * Refer to https://wiki.postgresql.org/wiki/ExtensibleRmgr to reserve a
+ * unique RmgrId for your extension, to avoid conflicts. During development,
+ * use RM_EXPERIMENTAL_ID.
+ */
+void
+RegisterCustomRmgr(RmgrId rmid, RmgrData *rmgr)
+{
+	if (rmid < RM_MIN_CUSTOM_ID)
+		ereport(PANIC, errmsg("custom rmgr id %d is out of range", rmid));
+
+	if (!process_shared_preload_libraries_in_progress)
+		ereport(ERROR,
+(errmsg("custom rmgr must be registered while initializing modules in shared_preload_libraries")));
+
+	ereport(LOG,
+			(errmsg("registering custom rmgr \"%s\" with ID %d",
+	rmgr->rm_name, rmid)));
+
+	if (RmgrTable[rmid] != NULL)
+		ereport(PANIC,
+(errmsg("custom rmgr ID %d already registered with name \"%s\"",
+		rmid, RmgrTable[rmid]->rm_name)));
+
+	/* check for existing rmgr with the same name */
+	for (int i = 0; i <= RM_MAX_ID; i++)
+	{
+		const RmgrData *existing_rmgr = RmgrTable[i];
+
+		if (existing_rmgr == NULL)
+			continue;
+
+		if (!strcmp(existing_rmgr->rm_name, rmgr->rm_name))
+			ereport(PANIC,
+	(errmsg("custom rmgr \"%s\" has the same name as builtin rmgr",
+			existing_rmgr->rm_name)));
+	}
+
+	/* register it */
+	

Re: postgres_fdw: commit remote (sub)transactions in parallel during pre-commit

2022-03-23 Thread Etsuro Fujita
On Sat, Mar 5, 2022 at 7:32 PM Etsuro Fujita  wrote:
> Attached is an updated version.

In the 0002 patch I introduced a macro for building an abort command
in preparation for the parallel abort patch (0003), but I moved it to
0003.  Attached is a new patch set.  The new version of 0002 is just a
cleanup patch (see the commit message in 0002), and I think it's
committable, so I'm planning to commit it, if no objections.

Best regards,
Etsuro Fujita


v7-0002-postgres_fdw-Minor-cleanup-for-pgfdw_abort_cleanup.patch
Description: Binary data


v7-0003-postgres-fdw-Add-support-for-parallel-abort.patch
Description: Binary data


Re: pg_tablespace_location() failure with allow_in_place_tablespaces

2022-03-23 Thread Thomas Munro
On Tue, Mar 8, 2022 at 12:58 AM Michael Paquier  wrote:
> Junction points are directories, no?  Are you sure that this works
> correctly on WIN32?  It seems to me that we'd better use readlink()
> only for entries in pg_tlbspc/ that are PGFILETYPE_LNK on non-WIN32
> and pgwin32_is_junction() on WIN32.

Hmm.  So the code we finished up with in the tree looks like this:

#ifdef WIN32
if (!pgwin32_is_junction(fullpath))
continue;
#else
if (get_dirent_type(fullpath, de, false, ERROR) != PGFILETYPE_LNK)
continue;
#endif

As mentioned, I was unhappy with the lack of error checking for that
interface, and I've started a new thread about that, but then I
started wondering if we missed a trick here: get_dirent_type() contain
code that wants to return PGFILETYPE_LNK for reparse points.  Clearly
it's not working, based on results reported in this thread.  Is that
explained by your comment above, "junction points _are_ directories",
and we're testing the attribute flags in the wrong order here?

if ((fd.dwFileAttributes & FILE_ATTRIBUTE_DIRECTORY) != 0)
d->ret.d_type = DT_DIR;
/* For reparse points dwReserved0 field will contain the ReparseTag */
else if ((fd.dwFileAttributes & FILE_ATTRIBUTE_REPARSE_POINT) != 0 &&
 (fd.dwReserved0 == IO_REPARSE_TAG_MOUNT_POINT))
d->ret.d_type = DT_LNK;
else
d->ret.d_type = DT_REG;




RE: logical replication empty transactions

2022-03-23 Thread shiy.f...@fujitsu.com
On Thursday, March 24, 2022 11:19 AM Hou, Zhijie/侯 志杰  
wrote:
> 
> Attach the new version patch which include the following changes:
> 
> - Fix a typo
> - Change the requestreply flag of the newly added WalSndKeepalive to false,
>   because the subscriber can judge whether it's necessary to post a reply
> based
>   on the received LSN.
> - Add a testcase to make sure there is no data in subscriber side when the
>   transaction is skipped.
> - Change the name of flag skipped_empty_xact to skipped_xact which seems
> more
>   understandable.
> - Merge Amit's suggested changes.
> 

Hi,

This patch skips sending BEGIN/COMMIT messages for empty transactions and saves
network bandwidth. So I tried to do a test to see how does it affect bandwidth.

This test refers to the previous test by Peter[1]. I temporarily modified the
code in worker.c to log the length of the data received by the subscriber (after
calling walrcv_receive()). At the conclusion of the test run, the logs are
processed to extract the numbers.

[1] 
https://www.postgresql.org/message-id/CAHut%2BPuyqcDJO0X2BxY%2B9ycF%2Bew3x77FiCbTJQGnLDbNmMASZQ%40mail.gmail.com

The number of transactions is fixed (1000), and I tested different mixes of
empty and not-empty transactions sent - 0%, 25%, 50%, 100%. The patch will send
keepalive message when skipping empty transaction in synchronous replication
mode, so I tested both synchronous replication and asynchronous replication.

The results are as follows, and attach the bar chart.

Sync replication - size of sending data

0%  25% 50% 75% 100%
HEAD335211  281655  223661  170271  115108
patched 335217  256617  173878  98095   18108

Async replication - size of sending data

0%  25% 50% 75% 100%
HEAD339379  285835  236343  184227  115000
patched 335077  260953  180022  11  18126


The details of the test is also attached.

Summary of result:
In both synchronous replication mode and asynchronous replication mode, as more
empty transactions, the improvement is more obvious. Even if when there is no
empty transaction, I can't see any overhead.

Regards,
Shi yu
-- create table 
CREATE TABLE test_tab (a int primary key, b text, c timestamptz DEFAULT now(), 
d bigint DEFAULT 999);
CREATE TABLE test_tab_nopub (a int primary key, b text, c timestamptz DEFAULT 
now(), d bigint DEFAULT 999);

test_empty_not_published.sql:
BEGIN;
INSERT INTO test_tab_nopub VALUES(1, 'foo');
UPDATE test_tab_nopub SET b = 'bar' WHERE a = 1;
DELETE FROM test_tab_nopub WHERE a = 1;
COMMIT;

test_empty_published.sql:
BEGIN;
INSERT INTO test_tab VALUES(1, 'foo');
UPDATE test_tab SET b = 'bar' WHERE a = 1;
DELETE FROM test_tab WHERE a = 1;
COMMIT;

-- create publication
create publication pub for table test_tab;

-- create subscription
CREATE SUBSCRIPTION sub CONNECTION 'host=localhost port=5432 dbname=postgres' 
PUBLICATION pub;"


-- empty transaction: 0%
pgbench -n -p 5432 -s 100 -t 1000 -c 1 -f test_empty_published.sql postgres
-- empty transaction: 25%
pgbench -n -p 5432 -s 100 -t 1000 -c 1 -f test_empty_not_published.sql@5 -f 
test_empty_published.sql@15 postgres
-- empty transaction: 50%
pgbench -n -p 5432 -s 100 -t 1000 -c 1 -f test_empty_not_published.sql@10 -f 
test_empty_published.sql@10 postgres
-- empty transaction: 75%
pgbench -n -p 5432 -s 100 -t 1000 -c 1 -f test_empty_not_published.sql@15 -f 
test_empty_published.sql@5 postgres
-- empty transaction: 100%
pgbench -n -p 5432 -s 100 -t 1000 -c 1 -f test_empty_not_published.sql postgres


Checking pgwin32_is_junction() errors

2022-03-23 Thread Thomas Munro
Hi,

The comment for pgwin32_is_junction() says "Assumes the file exists,
so will return false if it doesn't (since a nonexistent file is not a
junction)".  In fact that's the behaviour for any kind of error, and
although we set errno in that case, no caller ever checks it.

I think it'd be better to add missing_ok and elevel parameters,
following existing patterns.  Unfortunately, it can't use the generic
frontend logging to implement elevel in frontend code from its current
location, because pgport can't call pgcommon.  For now I came up with
a kludge to work around that problem, but I don't like it, and would
need to come up with something better...

Sketch code attached.
From 8e0e51359c0191cf673c3ddc87a3262b13f530a4 Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Wed, 16 Mar 2022 23:05:39 +1300
Subject: [PATCH] Raise errors in pgwin32_is_junction().

XXX It's not very nice to use elevel like this in frontend code
---
 src/backend/access/transam/xlog.c|  2 +-
 src/backend/replication/basebackup.c |  4 ++--
 src/backend/storage/file/fd.c|  2 +-
 src/backend/utils/adt/misc.c |  2 +-
 src/bin/pg_checksums/pg_checksums.c  |  2 +-
 src/bin/pg_rewind/file_ops.c |  2 +-
 src/common/file_utils.c  |  5 +++-
 src/include/port.h   |  2 +-
 src/include/port/win32_port.h|  2 +-
 src/port/dirmod.c| 36 +---
 10 files changed, 46 insertions(+), 13 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 4ac3871c74..819b05177d 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -8314,7 +8314,7 @@ do_pg_start_backup(const char *backupidstr, bool fast, TimeLineID *starttli_p,
 			 * directories directly under pg_tblspc, which would fail below.
 			 */
 #ifdef WIN32
-			if (!pgwin32_is_junction(fullpath))
+			if (!pgwin32_is_junction(fullpath, false, ERROR))
 continue;
 #else
 			if (get_dirent_type(fullpath, de, false, ERROR) != PGFILETYPE_LNK)
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 6884cad2c0..4be9090445 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -1352,7 +1352,7 @@ sendDir(bbsink *sink, const char *path, int basepathlen, bool sizeonly,
 #ifndef WIN32
 			S_ISLNK(statbuf.st_mode)
 #else
-			pgwin32_is_junction(pathbuf)
+			pgwin32_is_junction(pathbuf, true, ERROR)
 #endif
 			)
 		{
@@ -1840,7 +1840,7 @@ convert_link_to_directory(const char *pathbuf, struct stat *statbuf)
 #ifndef WIN32
 	if (S_ISLNK(statbuf->st_mode))
 #else
-	if (pgwin32_is_junction(pathbuf))
+	if (pgwin32_is_junction(pathbuf, true, ERROR))
 #endif
 		statbuf->st_mode = S_IFDIR | pg_dir_create_mode;
 }
diff --git a/src/backend/storage/file/fd.c b/src/backend/storage/file/fd.c
index 14b77f2861..ec2f49fefa 100644
--- a/src/backend/storage/file/fd.c
+++ b/src/backend/storage/file/fd.c
@@ -3453,7 +3453,7 @@ SyncDataDirectory(void)
 			xlog_is_symlink = true;
 	}
 #else
-	if (pgwin32_is_junction("pg_wal"))
+	if (pgwin32_is_junction("pg_wal", false, LOG))
 		xlog_is_symlink = true;
 #endif
 
diff --git a/src/backend/utils/adt/misc.c b/src/backend/utils/adt/misc.c
index 89690be2ed..723504ab10 100644
--- a/src/backend/utils/adt/misc.c
+++ b/src/backend/utils/adt/misc.c
@@ -317,7 +317,7 @@ pg_tablespace_location(PG_FUNCTION_ARGS)
 	 * found, a relative path to the data directory is returned.
 	 */
 #ifdef WIN32
-	if (!pgwin32_is_junction(sourcepath))
+	if (!pgwin32_is_junction(sourcepath, false, ERROR))
 		PG_RETURN_TEXT_P(cstring_to_text(sourcepath));
 #else
 	if (lstat(sourcepath, ) < 0)
diff --git a/src/bin/pg_checksums/pg_checksums.c b/src/bin/pg_checksums/pg_checksums.c
index 5f0f5ee62d..5720907d33 100644
--- a/src/bin/pg_checksums/pg_checksums.c
+++ b/src/bin/pg_checksums/pg_checksums.c
@@ -404,7 +404,7 @@ scan_directory(const char *basedir, const char *subdir, bool sizeonly)
 #ifndef WIN32
 		else if (S_ISDIR(st.st_mode) || S_ISLNK(st.st_mode))
 #else
-		else if (S_ISDIR(st.st_mode) || pgwin32_is_junction(fn))
+		else if (S_ISDIR(st.st_mode) || pgwin32_is_junction(fn, false, 1))
 #endif
 		{
 			/*
diff --git a/src/bin/pg_rewind/file_ops.c b/src/bin/pg_rewind/file_ops.c
index 6cb288f099..1eda2baf2f 100644
--- a/src/bin/pg_rewind/file_ops.c
+++ b/src/bin/pg_rewind/file_ops.c
@@ -434,7 +434,7 @@ recurse_dir(const char *datadir, const char *parentpath,
 #ifndef WIN32
 		else if (S_ISLNK(fst.st_mode))
 #else
-		else if (pgwin32_is_junction(fullpath))
+		else if (pgwin32_is_junction(fullpath, false, 1))
 #endif
 		{
 #if defined(HAVE_READLINK) || defined(WIN32)
diff --git a/src/common/file_utils.c b/src/common/file_utils.c
index 7138068633..2db909a3aa 100644
--- a/src/common/file_utils.c
+++ b/src/common/file_utils.c
@@ -88,8 +88,11 @@ fsync_pgdata(const char *pg_data,
 		else if (S_ISLNK(st.st_mode))
 			xlog_is_symlink = true;
 	}
+#elif defined(FRONTEND)

Re: Failed transaction statistics to measure the logical replication progress

2022-03-23 Thread Amit Kapila
On Thu, Mar 3, 2022 at 8:58 AM osumi.takami...@fujitsu.com
 wrote:
>

This patch introduces two new subscription statistics columns
(apply_commit_count and apply_rollback_count) to the
pg_stat_subscription_stats view for counting cumulative transactions
commits/rollbacks for a particular subscription. Now, users can
already see the total number of xacts committed/rolled back in a
particular database via pg_stat_database, so this can be considered
duplicate information. OTOH, some users might be interested in the
stats for a subscription to know how many transactions are
successfully applied during replication because the information in
pg_stat_database also includes the operations that happened on the
node.

I am not sure if it is worth adding this additional information or how
useful it will be for users. Does anyone else want to weigh in on
this?

If nobody else sees value in this then I feel it is better to mark
this patch as Returned with feedback or Rejected. We can come back to
it later if we see more demand for this.

-- 
With Regards,
Amit Kapila.




Re: add checkpoint stats of snapshot and mapping files of pg_logical dir

2022-03-23 Thread Bharath Rupireddy
On Wed, Mar 23, 2022 at 10:16 AM Bharath Rupireddy
 wrote:
>
> On Tue, Mar 22, 2022 at 8:12 PM Andres Freund  wrote:
> > > Do you mean like this?
> > > ereport(LOG,
> > > /* translator: the placeholders show checkpoint options */
> > > (errmsg("%s starting:%s%s%s%s%s%s%s%s",
> > > restartpoint ? _("restartpoint") : _("checkpoint"),
> > > (flags & CHECKPOINT_IS_SHUTDOWN) ? " shutdown" : "",
> > > (flags & CHECKPOINT_END_OF_RECOVERY) ? "
> > > end-of-recovery" : "",
> > > (flags & CHECKPOINT_IMMEDIATE) ? " immediate" : "",
> > > (flags & CHECKPOINT_FORCE) ? " force" : "",
> > > (flags & CHECKPOINT_WAIT) ? " wait" : "",
> > > (flags & CHECKPOINT_CAUSE_XLOG) ? " wal" : "",
> > > (flags & CHECKPOINT_CAUSE_TIME) ? " time" : "",
> > > (flags & CHECKPOINT_FLUSH_ALL) ? " flush-all" : "")));
> >
> > Yes.
>
> Done that way, see
> v7-0001-Deduplicate-checkpoint-restartpoint-starting-comp.patch.
>
> > > I think the reason in this case might be that some flag names with hyphens
> > > and spaces before words may not have the right/matching words in all
> > > languages. What happens if we choose to translate/not translate the entire
> > > message?
> >
> > If individual words aren't translated the "original" word would be used.
>
> Interestingly, the translated message for "checkpoint/restart
> complete" is empty. Maybe because it has untranslatable strings?
>
> #: access/transam/xlog.c:8752
> #, c-format
> msgid "restartpoint complete: wrote %d buffers (%.1f%%); %d WAL
> file(s) added, %d removed, %d recycled; write=%ld.%03d s,
> sync=%ld.%03d s, total=%ld.%03d s; sync files=%d, longest=%ld.%03d s,
> average=%ld.%03d s; distance=%d kB, estimate=%d kB"
> msgstr ""
>
> > > > Both seem still very long. I still am doubtful this level of detail is
> > > > appropriate. Seems more like a thing for a tracepoint or such. How 
> > > > about just
> > > > printing the time for the logical decoding operations in aggregate, 
> > > > without
> > > > breaking down into files, adding LSNs etc?
> > >
> > > The distinction that the patch makes right now is for snapshot and
> > > rewrite mapping files and it makes sense to have them separately.
> >
> > -1. The line also needs to be readable...
>
> IMHO, that's subjective. Even now, the existing
> "checkpoint/restartpoint complete" message has a good amount of info
> which makes it unreadable for some.
>
> The number of logical decoding files(snapshot and mapping) the
> checkpoint processed is a good metric to have in server logs along
> with the time it took for removing/syncing them. Thoughts?

Realized that the CF bot is applying patches in the alphabetical order
(I've sent out both v7 patches as v7-0001). Attaching v8 patch-set
with v8-0001 and v8-0002 names. Apart from this, no change in v8.

Regards,
Bharath Rupireddy.


v8-0001-Deduplicate-checkpoint-restartpoint-starting-comp.patch
Description: Binary data


v8-0002-Add-checkpoint-stats-of-snapshot-and-mapping-file.patch
Description: Binary data


Re: pg_stat_reset_single_*_counters vs pg_stat_database.stats_reset

2022-03-23 Thread Andres Freund
Hi,

On 2022-03-23 18:47:58 -0700, David G. Johnston wrote:
> It also seems that each tracked object type needs to have its own
> last_reset field (we could choose to name it stats_reset too, leaving
> pg_stat_database.last_reset as the only anomaly) added as an implied
> behavior needed for such individualized resetting.  I would go with
> *.last_reset though and leave the absence of pg_stat_archiver.last_reset as
> the anomaly (or just add it redundantly for consistency).

It's not free to track more information. We always have the stats for the
whole system in memory at least once (stats collector currently, shared hash
table with shared memory stats patch), often more than that (stats accessing
backends).


> I don't see removing existing functionality as a good course to getting a
> consistent implementation; we should just push forward with figuring out
> what is missing and fill in those gaps.  At worst if that isn't something
> we want to fix right now our new setup should at least leave the status quo
> behaviors in place.

Well, it depends on whether there's an actual use case for those super fine
grained reset functions. Neither docs nor the thread introducing them
presented that.  We don't have SQL level stats
values either.

Greetings,

Andres Freund




RE: logical replication empty transactions

2022-03-23 Thread houzj.f...@fujitsu.com
On Tuesday, March 22, 2022 7:50 PM Amit Kapila  wrote:
> On Tue, Mar 22, 2022 at 7:25 AM houzj.f...@fujitsu.com
>  wrote:
> >
> > > On Monday, March 21, 2022 6:01 PM Amit Kapila
> > > 
> > > wrote:
> >
> > Oh, sorry, I posted the wrong patch, here is the correct one.
> >
> 
> The test change looks good to me. I think additionally we can verify that the
> record is not reflected in the subscriber table. Apart from that, I had made
> minor changes mostly in the comments in the attached patch. If those look
> okay to you, please include those in the next version.

Thanks, the changes look good to me, I merged the diff patch.

Attach the new version patch which include the following changes:

- Fix a typo
- Change the requestreply flag of the newly added WalSndKeepalive to false,
  because the subscriber can judge whether it's necessary to post a reply based
  on the received LSN.
- Add a testcase to make sure there is no data in subscriber side when the
  transaction is skipped.
- Change the name of flag skipped_empty_xact to skipped_xact which seems more
  understandable.
- Merge Amit's suggested changes.


Best regards,
Hou zj


v29-0001-Skip-empty-transactions-for-logical-replication.patch
Description:  v29-0001-Skip-empty-transactions-for-logical-replication.patch


Re: Column Filtering in Logical Replication

2022-03-23 Thread Amit Kapila
On Thu, Mar 24, 2022 at 4:11 AM Tomas Vondra
 wrote:
>
> On 3/21/22 12:28, Amit Kapila wrote:
> > On Fri, Mar 18, 2022 at 8:13 PM Tomas Vondra
> >  wrote:
> >>
> >> Ah, thanks for reminding me - it's hard to keep track of all the issues
> >> in threads as long as this one.
> >>
> >> BTW do you have any opinion on the SET COLUMNS syntax? Peter Smith
> >> proposed to get rid of it in [1] but I'm not sure that's a good idea.
> >> Because if we ditch it, then removing the column list would look like this:
> >>
> >> ALTER PUBLICATION pub ALTER TABLE tab;
> >>
> >> And if we happen to add other per-table options, this would become
> >> pretty ambiguous.
> >>
> >> Actually, do we even want to allow resetting column lists like this? We
> >> don't allow this for row filters, so if you want to change a row filter
> >> you have to re-add the table, right?
> >>
> >
> > We can use syntax like: "alter publication pub1 set table t1 where (c2
> >> 10);" to reset the existing row filter. It seems similar thing works
> > for column list as well ("alter publication pub1 set table t1 (c2)
> > where (c2 > 10)"). If I am not missing anything, I don't think we need
> > additional Alter Table syntax.
> >
> >> So maybe we should just ditch ALTER
> >> TABLE entirely.
> >>
> >
> > Yeah, I agree especially if my above understanding is correct.
> >
>
> I think there's a gotcha that
>
>ALTER PUBLICATION pub SET TABLE t ...
>
> also removes all other relations from the publication, and it removes
> and re-adds the table anyway. So I'm not sure what's the advantage?
>

I think it could be used when the user has fewer tables and she wants
to change the list of published tables or their row/column filters. I
am not sure of the value of this to users but this was a pre-existing
syntax.

> Anyway, I don't see why we would need such ALTER TABLE only for column
> filters and not for row filters - either we need to allow this for both
> options or none of them.
>

+1. I think for now we can leave this new ALTER TABLE syntax and do it
for both column and row filters together.


-- 
With Regards,
Amit Kapila.




Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

2022-03-23 Thread Yugo NAGATA
On Wed, 23 Mar 2022 14:26:54 -0400
Tom Lane  wrote:

> Tatsuo Ishii  writes:
> > The patch Pushed. Thank you!
> 
> My hoary animal prairiedog doesn't like this [1]:
> 
> #   Failed test 'concurrent update with retrying stderr /(?s-xim:client (0|1) 
> got an error in command 3 \\(SQL\\) of script 0; ERROR:  could not serialize 
> access due to concurrent update\\b.*\\g1)/'
> #   at t/001_pgbench_with_server.pl line 1229.
> #   'pgbench: pghost: /tmp/nhghgwAoki pgport: 58259 nclients: 
> 2 nxacts: 1 dbName: postgres
> ...
> # pgbench: client 0 got an error in command 3 (SQL) of script 0; ERROR:  
> could not serialize access due to concurrent update
> ...
> # '
> # doesn't match '(?s-xim:client (0|1) got an error in command 3 \\(SQL\\) 
> of script 0; ERROR:  could not serialize access due to concurrent 
> update\\b.*\\g1)'
> # Looks like you failed 1 test of 425.
> 
> I'm not sure what the "\\b.*\\g1" part of this regex is meant to
> accomplish, but it seems to be assuming more than it should
> about the output format of TAP messages.

I have edited the test code from the original patch by mistake, but
I could not realize because the test works in my machine without any
errors somehow.

I attached a patch to fix the test as was in the original patch, where
backreferences are used to check retry of the same query.

Regards,
Yugo Nagata

-- 
Yugo NAGATA 
diff --git a/src/bin/pgbench/t/001_pgbench_with_server.pl b/src/bin/pgbench/t/001_pgbench_with_server.pl
index d173ceae7a..3eb5905e5a 100644
--- a/src/bin/pgbench/t/001_pgbench_with_server.pl
+++ b/src/bin/pgbench/t/001_pgbench_with_server.pl
@@ -1222,7 +1222,8 @@ local $ENV{PGOPTIONS} = "-c default_transaction_isolation=repeatable\\ read";
 # Check that we have a serialization error and the same random value of the
 # delta variable in the next try
 my $err_pattern =
-"client (0|1) got an error in command 3 \\(SQL\\) of script 0; "
+	"(client (0|1) sending UPDATE xy SET y = y \\+ -?\\d+\\b).*"
+  . "client \\g2 got an error in command 3 \\(SQL\\) of script 0; "
   . "ERROR:  could not serialize access due to concurrent update\\b.*"
   . "\\g1";
 


Re: Support isEmptyStringInfo

2022-03-23 Thread Kyotaro Horiguchi
At Wed, 23 Mar 2022 10:13:43 -0400, Tom Lane  wrote in 
> Robert Haas  writes:
> > I think that the code is perfectly readable as it is and that this
> > change makes it less so.
> 
> Yeah, after a quick look through this patch I'm unimpressed too.
> The new code is strictly longer, and it requires the introduction
> of distracting "!" and "&" operators in many places.

The struct members are not private at all.  In that sense StringInfo
is not a kind of class of C/Java but like a struct of C/C++ at least
to me.  I think encapsulating only ".len == 0" doesn't help.  Already
in many places we pull out buf.data to use it separately from buf, we
have a dozen of instances of "buf.len (<|>|<=|>=) " and
even "buf.data[buf.len - 1] == '\n'"

About read-easiness, isEmptyStringInfo(str) slightly spins my eyes
than str->len == 0.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: freeing bms explicitly

2022-03-23 Thread Amit Kapila
On Wed, Mar 23, 2022 at 9:30 AM Zhihong Yu  wrote:
>
> On Tue, Mar 22, 2022 at 8:45 PM Amit Kapila  wrote:
>>
>> On Tue, Mar 22, 2022 at 3:39 AM Zhihong Yu  wrote:
>>
>> Your patch looks good to me. I have found one more similar instance in
>> the same file and changed that as well accordingly. Let me know what
>> you think of the attached?
>>
>
> Hi, Amit:
> The patch looks good to me.
>

Thanks. I'll push this tomorrow unless Tom or someone else wants to
look at it or would like to commit.

-- 
With Regards,
Amit Kapila.




Re: [PoC] Reducing planning time when tables have many partitions

2022-03-23 Thread David Rowley
On Fri, 18 Mar 2022 at 23:32, Yuya Watari  wrote:
> I found a problem that planning takes too much time when the tables
> have many child partitions. According to my observation, the planning
> time increases in the order of O(n^2). Here, n is the number of child
> partitions. I attached the patch to solve this problem. Please be
> noted that this patch is a PoC.

> 3. How to Solve?

I think a better way to solve this would be just to have a single hash
table over all EquivalenceClasses that allows fast lookups of
EquivalenceMember->em_expr.  I think there's no reason that a given
Expr should appear in more than one non-merged EquivalenceClass. The
EquivalenceClass a given Expr belongs to would need to be updated
during the merge process.

For functions such as get_eclass_for_sort_expr() and
process_equivalence(), that would become a fairly fast hashtable
lookup instead of having nested loops to find if an EquivalenceMember
already exists for the given Expr. We might not want to build the hash
table for all queries. Maybe we could just do it if we get to
something like ~16 EquivalenceMember in total.

As of now, we don't have any means to hash Exprs, so all that
infrastructure would need to be built first.  Peter Eisentraut is
working on a patch [1] which is a step towards having this.

Here's a simple setup to show the pain of this problem:

create table lp (a int, b int) partition by list(a);
select 'create table lp'||x::text|| ' partition of lp for values
in('||x::text||');' from generate_Series(0,4095)x;
\gexec
explain analyze select * from lp where a=b order by a;

 Planning Time: 510.248 ms
 Execution Time: 264.659 ms

David

[1] https://commitfest.postgresql.org/37/3182/




Re: pg_stat_reset_single_*_counters vs pg_stat_database.stats_reset

2022-03-23 Thread David G. Johnston
On Wed, Mar 23, 2022 at 5:55 PM Andres Freund  wrote:

>
> Starting with the below commit, pg_stat_reset_single_function_counters,
> pg_stat_reset_single_table_counters don't just reset the stats for the
> individual function, but also set pg_stat_database.stats_reset.
>
> commit 4c468b37a281941afd3bf61c782b20def8c17047
> Author: Magnus Hagander 
> Date:   2011-02-10 15:09:35 +0100
>
> Track last time for statistics reset on databases and bgwriter
>
> Tracks one counter for each database, which is reset whenever
> the statistics for any individual object inside the database is
> reset, and one counter for the background writer.
>
> Tomas Vondra, reviewed by Greg Smith
>  /*
> [...]
> Maybe I just don't understand what these reset functions are intended for?
> Their introduction [3] didn't explain much either. To me the behaviour of
> resetting pg_stat_database.stats_reset but nothing else in pg_stat_database
> makes them kind of dangerous.
>

*tl/dr;*
There seems to be three scopes here:

Cluster Stats - (add last_reset fields for consistency)
Database and Shared Object Stats (add last_reset to handle recording the
resetting of just the records on this table, leaving stats_reset to be a
flag meaning self-or-children)
Object Stats (add last_reset, don't need stats_reset since there are no
children)

If we are OK with just changing pg_stat_database.stats_reset meanings to be
less informative than what it is today (I strongly dislike such a silent
behavioral change) we could simply standardize on the intended meaning of
stats_reset on all three scopes, adding stats_reset as needed to track
object-level resetting.

*Additional Exposition*

The description for the column declares that the field is reset when the
statistics on the pg_stat_database are reset.  That is also the expected
behavior, knowing when any statistics in the whole database are reset is
indeed not useful.

"Time at which these statistics were last reset"

The "these" clearly refers to the statistics columns in pg_stat_database.

In fact, pg_stat_archiver.stats_reset already exists (as does
pg_bgwriter.stats_reset) with (I presume) this interpretation.  This is a
problem because pg_stat_database.stats_reset does not have the same
meaning.  So we have to either live with inconsistency or break something.

In the vein of living with inconsistency I would suggest changing the
documentation of "pg_stat.database.stats_reset" to match the present
behavior.  Then add a new column (last_reset ?) to represent the existing
description of "stats_reset".

I suppose we could document "stats_reset" as the itself-or-any-children
reset timestamp, it's just that the archive and pgwriter don't have
children in this sense while databases do.  When the
pg_stat_database.last_reset field changes the pg_stat_database.stats_reset
would have to match anyway.

I don't have any issue with an indicator field saying "something regarding
stats has changed" at the database level.  It is much easier to monitor
that and then inspect what may have changed rather than monitoring a
last_reset column on every single catalog that has statistics that can be
reset.

It also seems that each tracked object type needs to have its own
last_reset field (we could choose to name it stats_reset too, leaving
pg_stat_database.last_reset as the only anomaly) added as an implied
behavior needed for such individualized resetting.  I would go with
*.last_reset though and leave the absence of pg_stat_archiver.last_reset as
the anomaly (or just add it redundantly for consistency).

I don't see removing existing functionality as a good course to getting a
consistent implementation; we should just push forward with figuring out
what is missing and fill in those gaps.  At worst if that isn't something
we want to fix right now our new setup should at least leave the status quo
behaviors in place.

I haven't looked into what kind of explicit resetting options are available
but the above seems to cover tracking resetting regardless of how it is
implemented.  I've only spot checked some of the tables to identify the
pattern.

David J.


Re: Fix overflow in DecodeInterval

2022-03-23 Thread Tom Lane
Joseph Koshakow  writes:
> Sorry about that, I didn't have my IDE set up quite right and
> noticed a little too late that I had some auto-formatting turned
> on. Thanks for doing the rebase, did it end up fixing
> the whitespace issues? If not I'll go through the patch and try
> and fix them all.

No, I just fixed the merge failure.

Our standard way to clean up whitespace issues and make sure code
meets our layout conventions is to run pgindent over it [1].
For this particular patch, that might be too much, because it will
reindent the sections that you added braces around, making the patch
harder to review.  So maybe the best bet is to leave well enough
alone and expect the committer to re-pgindent before pushing it.
However, if you spot any diff hunks where there's just a whitespace
change, getting rid of those would be appreciated.

regards, tom lane

[1] 
https://wiki.postgresql.org/wiki/Running_pgindent_on_non-core_code_or_development_code




Re: [PATCH] Add native windows on arm64 support

2022-03-23 Thread Andres Freund
Hi,

On 2022-03-23 16:30:58 +1300, Thomas Munro wrote:
> On Tue, Mar 22, 2022 at 11:30 PM Julien Rouhaud  wrote:
> > On Tue, Mar 22, 2022 at 09:37:46AM +, Niyas Sait wrote:
> > > Yes, we could look into providing a build machine. Do you have any
> > > reference to what the CI system looks like now for PostgresSQL and how to
> > > add new workers etc.?
> >
> > It's all documented at
> > https://wiki.postgresql.org/wiki/PostgreSQL_Buildfarm_Howto.
> 
> It seems likely there will be more and more Windows/ARM users, so yeah
> having a machine to test that combination would be great.  I wonder if
> ASLR isn't breaking for you by good luck only...

I think we've generally seen the ASLR issue become less prominent on
windows. Whether that's because of the silent retries we added, or because
just about everyone moved to 64bit windows / PG, I don't know. I'd guess both,
with 64bit being the larger influence.

Wonder if it's worth adding some debug logging to the retry code and stop
disabling ASLR on 64bit windows... It's imo pretty crazy that we loop up to
100 times in internal_forkexec() around CreateProcess() &&
pgwin32_ReserveSharedMemoryRegion() without, as far as I can see, a single
debug message.

I don't think we can infer too much about the failure rate on windows from the
!windows EXEC_BACKEND rates. The two internal_forkexec() implementations
behaves just too differently.

Greetings,

Andres Freund




Re: [PATCH] add relation and block-level filtering to pg_waldump

2022-03-23 Thread Andres Freund
On 2022-03-24 11:54:15 +1300, Thomas Munro wrote:
> Erm, is that really OK?  C says "Each enumerated type shall be
> compatible with char, a signed integer type, or an
> unsigned integer type. The choice of type is implementation-defined,
> but shall be capable of representing the values of all the members of
> the enumeration."  It could even legally vary from enum to enum,
> though in practice most compilers probably just use ints all the time
> unless you use weird pragma pack incantation.  Therefore I think you
> need an intermediate variable with the size and signedness matching the
> format string, if you're going to scanf directly into it, which
> David's V6 did.

/me yearns for the perfectly reasonable C++ 11 feature of defining the base
type for enums (enum name : basetype { }). One of those features C should have
adopted long ago. Not that we could use it yet, given we insist that C
standards have reached at least european drinking age before relying on them.




Re: pg_stat_reset_single_*_counters vs pg_stat_database.stats_reset

2022-03-23 Thread Andres Freund
Hi,

On 2022-03-23 17:55:16 -0700, Andres Freund wrote:
> Maybe I just don't understand what these reset functions are intended for?
> Their introduction [3] didn't explain much either. To me the behaviour of
> resetting pg_stat_database.stats_reset but nothing else in pg_stat_database
> makes them kind of dangerous.

Forgot to add: At the very least we should document that weird behaviour,
because it's certainly not obvious.  But imo we should either remove the
behaviour or drop the functions.

  
   

 pg_stat_reset_single_table_counters

pg_stat_reset_single_table_counters ( 
oid )
void
   
   
Resets statistics for a single table or index in the current database
or shared across all databases in the cluster to zero.
   
   
This function is restricted to superusers by default, but other users
can be granted EXECUTE to run the function.
   
  

  
   

 pg_stat_reset_single_function_counters

pg_stat_reset_single_function_counters ( 
oid )
void
   
   
Resets statistics for a single function in the current database to
zero.
   
   
This function is restricted to superusers by default, but other users
can be granted EXECUTE to run the function.
   
  


Greetings,

Andres Freund




pg_stat_reset_single_*_counters vs pg_stat_database.stats_reset

2022-03-23 Thread Andres Freund
Hi,

Starting with the below commit, pg_stat_reset_single_function_counters,
pg_stat_reset_single_table_counters don't just reset the stats for the
individual function, but also set pg_stat_database.stats_reset.

commit 4c468b37a281941afd3bf61c782b20def8c17047
Author: Magnus Hagander 
Date:   2011-02-10 15:09:35 +0100

Track last time for statistics reset on databases and bgwriter

Tracks one counter for each database, which is reset whenever
the statistics for any individual object inside the database is
reset, and one counter for the background writer.

Tomas Vondra, reviewed by Greg Smith
 /*


@@ -4107,6 +4118,8 @@ 
pgstat_recv_resetsinglecounter(PgStat_MsgResetsinglecounter *msg, int len)
 if (!dbentry)
 return;
 
+/* Set the reset timestamp for the whole database */
+dbentry->stat_reset_timestamp = GetCurrentTimestamp();
 
 /* Remove object if it exists, ignore it if not */
 if (msg->m_resettype == RESET_TABLE)


The relevant thread is [1], with the most-on-point message at [2].
pg_stat_reset_single_*_counters were introduced in [3]


This behaviour can be trivially (and is) implemented for the shared memory
stats patch. But every time I read over that part of the code it feels just
profoundly wrong to me.  Way worse than *not* resetting
pg_stat_database.stats_reset.

Anybody that uses the time since the stats reset as part of a calculation of
transactions / sec, reads / sec or such will get completely bogus results
after a call to pg_stat_reset_single_table_counters().


Maybe I just don't understand what these reset functions are intended for?
Their introduction [3] didn't explain much either. To me the behaviour of
resetting pg_stat_database.stats_reset but nothing else in pg_stat_database
makes them kind of dangerous.

Greetings,

Andres Freund

[1] 
https://postgr.es/m/7177d0cd40b82409024e7c495e9d6992.squirrel%40sq.gransy.com
[2] https://www.postgresql.org/message-id/4D0E5A54.3060302%40fuzzy.cz
[3] 
https://www.postgresql.org/message-id/9837222c1001240837r5c103519lc6a74c37be5f1831%40mail.gmail.com




Re: Add pg_freespacemap extension sql test

2022-03-23 Thread Michael Paquier
On Wed, Mar 23, 2022 at 10:45:19AM -0300, Fabrízio de Royes Mello wrote:
> On Wed, Mar 23, 2022 at 3:05 AM Michael Paquier  wrote:
>>  Another thing that itched me is that we
>> could also test more with indexes, particularly with btree, BRIN and
>> hash (the latter should not have a FSM with 10 pages as per the first
>> group batch, and each one has a stable an initial state).
> 
> What about GIN/GIST indexes?

Yes, we could extend that more.  For now, I am curious to see what the
buildfarm has to say with the current contents of the patch, and I can
keep an eye on the buildfarm today, so I have applied it.
--
Michael


signature.asc
Description: PGP signature


Re: Fix overflow in DecodeInterval

2022-03-23 Thread Joseph Koshakow
On Mon, Mar 21, 2022 at 8:31 PM Tom Lane  wrote:
> This isn't applying per the cfbot; looks like it got sideswiped
> by 9e9858389.  Here's a quick rebase.  I've not reviewed it, but
> I did notice (because git was in my face about this) that it's
> got whitespace issues.  Please try to avoid unnecessary whitespace
> changes ... pgindent will clean those up, but it makes reviewing
> harder.

Sorry about that, I didn't have my IDE set up quite right and
noticed a little too late that I had some auto-formatting turned
on. Thanks for doing the rebase, did it end up fixing
the whitespace issues? If not I'll go through the patch and try
and fix them all.

- Joe Koshakow




Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work

2022-03-23 Thread Thomas Munro
On Wed, Feb 16, 2022 at 12:11 PM Nathan Bossart
 wrote:
> On Tue, Feb 15, 2022 at 10:37:58AM -0800, Nathan Bossart wrote:
> > On Tue, Feb 15, 2022 at 10:10:34AM -0800, Andres Freund wrote:
> >> I generally think it'd be a good exercise to go through an use
> >> get_dirent_type() in nearly all ReadDir() style loops - it's a nice 
> >> efficiency
> >> win in general, and IIRC a particularly big one on windows.
> >>
> >> It'd probably be good to add a reference to get_dirent_type() to
> >> ReadDir[Extended]()'s docs.
> >
> > Agreed.  I might give this a try.
>
> Alright, here is a new patch set where I've tried to replace as many
> stat()/lstat() calls as possible with get_dirent_type().  0002 and 0003 are
> the same as v9.  I noticed a few remaining stat()/lstat() calls that don't
> appear to be doing proper error checking, but I haven't had a chance to try
> fixing those yet.

0001:  These get_dirent_type() conversions are nice.  LGTM.

0002:

 /* we're only handling directories here, skip if it's not ours */
-if (lstat(path, ) == 0 && !S_ISDIR(statbuf.st_mode))
+if (lstat(path, ) != 0)
+ereport(ERROR,
+(errcode_for_file_access(),
+ errmsg("could not stat file \"%s\": %m", path)));
+else if (!S_ISDIR(statbuf.st_mode))
 return;

Why is this a good place to silently ignore non-directories?
StartupReorderBuffer() is already in charge of skipping random
detritus found in the directory, so would it be better to do "if
(get_dirent_type(...) != PGFILETYPE_DIR) continue" there, and then
drop the lstat() stanza from ReorderBufferCleanupSeralizedTXNs()
completely?  Then perhaps its ReadDirExtended() shoud be using ERROR
instead of INFO, so that missing/non-dir/b0rked directories raise an
error.  I don't understand why it's reporting readdir() errors at INFO
but unlink() errors at ERROR, and as far as I can see the other paths
that reach this code shouldn't be sending in paths to non-directories
here unless something is seriously busted and that's ERROR-worthy.

0003:  I haven't studied this cure vs disease thing... no opinion today.




Re: [PATCH] Expose port->authn_id to extensions and triggers

2022-03-23 Thread Andres Freund
Hi,

On 2022-03-23 23:06:14 +, Jacob Champion wrote:
> On Wed, 2022-03-23 at 19:00 -0400, Tom Lane wrote:
> > Hm.  I was more envisioning getting the "sharable" info out of Port
> > entirely, although I'm not quite sure where it should go instead.
> 
> If it helps, I can move the substruct out and up to a new global struct
> (MyProcShared?). At this point I think it's mostly search-and-replace.

Perhaps alongside CurrentUserId etc in miscinit.c?  It would be nicer if all
those were together in a struct, but oh well.

Another option would be to make it a GUC. With a bit of care it could be
automatically synced by the existing parallelism infrastructure...

Greetings,

Andres Freund




Re: [PATCH] Accept IP addresses in server certificate SANs

2022-03-23 Thread Jacob Champion
On Wed, 2022-03-23 at 14:20 +0900, Kyotaro Horiguchi wrote:
> I tried to write out the doc part.  What do you think about it?

I like it, thanks! I've applied that in v10, with a tweak to two
iPAddress spellings and a short expansion of the condition in the Note,
and I've added you as a co-author to 0002.

--Jacob
From 84a107da33b7d901cb318f8c2fd140644fbc5100 Mon Sep 17 00:00:00 2001
From: Jacob Champion 
Date: Wed, 24 Nov 2021 14:46:11 -0800
Subject: [PATCH v10 1/3] Move inet_net_pton() to src/port

This will be helpful for IP address verification in libpq.
---
 src/backend/utils/adt/Makefile  |  1 -
 src/include/port.h  |  3 +++
 src/include/utils/builtins.h|  4 
 src/port/Makefile   |  1 +
 src/{backend/utils/adt => port}/inet_net_pton.c | 16 +++-
 src/tools/msvc/Mkvcbuild.pm |  2 +-
 6 files changed, 20 insertions(+), 7 deletions(-)
 rename src/{backend/utils/adt => port}/inet_net_pton.c (96%)

diff --git a/src/backend/utils/adt/Makefile b/src/backend/utils/adt/Makefile
index 41b486bcef..d173d52157 100644
--- a/src/backend/utils/adt/Makefile
+++ b/src/backend/utils/adt/Makefile
@@ -43,7 +43,6 @@ OBJS = \
 	geo_selfuncs.o \
 	geo_spgist.o \
 	inet_cidr_ntop.o \
-	inet_net_pton.o \
 	int.o \
 	int8.o \
 	json.o \
diff --git a/src/include/port.h b/src/include/port.h
index 3d103a2b31..2852e5b58b 100644
--- a/src/include/port.h
+++ b/src/include/port.h
@@ -515,6 +515,9 @@ extern int	pg_codepage_to_encoding(UINT cp);
 extern char *pg_inet_net_ntop(int af, const void *src, int bits,
 			  char *dst, size_t size);
 
+/* port/inet_net_pton.c */
+extern int	pg_inet_net_pton(int af, const char *src, void *dst, size_t size);
+
 /* port/pg_strong_random.c */
 extern void pg_strong_random_init(void);
 extern bool pg_strong_random(void *buf, size_t len);
diff --git a/src/include/utils/builtins.h b/src/include/utils/builtins.h
index 666e545496..67b9326dc8 100644
--- a/src/include/utils/builtins.h
+++ b/src/include/utils/builtins.h
@@ -93,10 +93,6 @@ extern int	xidLogicalComparator(const void *arg1, const void *arg2);
 extern char *pg_inet_cidr_ntop(int af, const void *src, int bits,
 			   char *dst, size_t size);
 
-/* inet_net_pton.c */
-extern int	pg_inet_net_pton(int af, const char *src,
-			 void *dst, size_t size);
-
 /* network.c */
 extern double convert_network_to_scalar(Datum value, Oid typid, bool *failure);
 extern Datum network_scan_first(Datum in);
diff --git a/src/port/Makefile b/src/port/Makefile
index bfe1feb0d4..c3ae7b3d5c 100644
--- a/src/port/Makefile
+++ b/src/port/Makefile
@@ -43,6 +43,7 @@ OBJS = \
 	bsearch_arg.o \
 	chklocale.o \
 	inet_net_ntop.o \
+	inet_net_pton.o \
 	noblock.o \
 	path.o \
 	pg_bitutils.o \
diff --git a/src/backend/utils/adt/inet_net_pton.c b/src/port/inet_net_pton.c
similarity index 96%
rename from src/backend/utils/adt/inet_net_pton.c
rename to src/port/inet_net_pton.c
index d3221a1313..bae50ba67e 100644
--- a/src/backend/utils/adt/inet_net_pton.c
+++ b/src/port/inet_net_pton.c
@@ -14,14 +14,18 @@
  * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT
  * OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
  *
- *	  src/backend/utils/adt/inet_net_pton.c
+ *	  src/port/inet_net_pton.c
  */
 
 #if defined(LIBC_SCCS) && !defined(lint)
 static const char rcsid[] = "Id: inet_net_pton.c,v 1.4.2.3 2004/03/17 00:40:11 marka Exp $";
 #endif
 
+#ifndef FRONTEND
 #include "postgres.h"
+#else
+#include "postgres_fe.h"
+#endif
 
 #include 
 #include 
@@ -29,9 +33,19 @@ static const char rcsid[] = "Id: inet_net_pton.c,v 1.4.2.3 2004/03/17 00:40:11 m
 #include 
 #include 
 
+#ifndef FRONTEND
 #include "utils/builtins.h" /* pgrminclude ignore */	/* needed on some
 		 * platforms */
 #include "utils/inet.h"
+#else
+/*
+ * In a frontend build, we can't include inet.h, but we still need to have
+ * sensible definitions of these two constants.  Note that pg_inet_net_ntop()
+ * assumes that PGSQL_AF_INET is equal to AF_INET.
+ */
+#define PGSQL_AF_INET	(AF_INET + 0)
+#define PGSQL_AF_INET6	(AF_INET + 1)
+#endif
 
 
 static int	inet_net_pton_ipv4(const char *src, u_char *dst);
diff --git a/src/tools/msvc/Mkvcbuild.pm b/src/tools/msvc/Mkvcbuild.pm
index 441d6ae6bf..667b173ec3 100644
--- a/src/tools/msvc/Mkvcbuild.pm
+++ b/src/tools/msvc/Mkvcbuild.pm
@@ -100,7 +100,7 @@ sub mkvcbuild
 
 	our @pgportfiles = qw(
 	  chklocale.c explicit_bzero.c fls.c getpeereid.c getrusage.c inet_aton.c
-	  getaddrinfo.c gettimeofday.c inet_net_ntop.c kill.c open.c
+	  getaddrinfo.c gettimeofday.c inet_net_ntop.c inet_net_pton.c kill.c open.c
 	  snprintf.c strlcat.c strlcpy.c dirmod.c noblock.c path.c
 	  dirent.c dlopen.c getopt.c getopt_long.c link.c
 	  pread.c preadv.c pwrite.c pwritev.c pg_bitutils.c
-- 
2.25.1

From 6ad0b390f11a025045be278a789d72547775fe77 Mon Sep 17 00:00:00 2001
From: Jacob Champion 
Date: Thu, 18 Nov 2021 

Re: multithreaded zstd backup compression for client and server

2022-03-23 Thread Andres Freund
On 2022-03-23 18:07:01 -0500, Justin Pryzby wrote:
> Did you try this on windows at all ?

Really should get zstd installed in the windows cf environment...


> It's probably no surprise that zstd implements threading differently there.

Worth noting that we have a few of our own threads running on windows already
- so we're guaranteed to build against the threaded standard libraries etc
already.




Re: multithreaded zstd backup compression for client and server

2022-03-23 Thread Andres Freund
Hi,

On 2022-03-23 18:31:12 -0400, Robert Haas wrote:
> On Wed, Mar 23, 2022 at 5:14 PM Andres Freund  wrote:
> > The most likely source of problem would errors thrown while zstd threads are
> > alive. Should make sure that that can't happen.
> >
> > What is the lifetime of the threads zstd spawns? Are they tied to a single
> > compression call? A single ZSTD_createCCtx()? If the latter, how bulletproof
> > is our code ensuring that we don't leak such contexts?
> 
> I haven't found any real documentation explaining how libzstd manages
> its threads. I am assuming that it is tied to the ZSTD_CCtx, but I
> don't know. I guess I could try to figure it out from the source code.

I found this the following section in the manual [1]:

ZSTD_c_nbWorkers=400,/* Select how many threads will be spawned to 
compress in parallel.
  * When nbWorkers >= 1, triggers asynchronous mode 
when invoking ZSTD_compressStream*() :
  * ZSTD_compressStream*() consumes input and flush 
output if possible, but immediately gives back control to caller,
  * while compression is performed in parallel, 
within worker thread(s).
  * (note : a strong exception to this rule is when 
first invocation of ZSTD_compressStream2() sets ZSTD_e_end :
  *  in which case, ZSTD_compressStream2() 
delegates to ZSTD_compress2(), which is always a blocking call).
  * More workers improve speed, but also increase 
memory usage.
  * Default value is `0`, aka "single-threaded 
mode" : no worker is spawned,
  * compression is performed inside Caller's 
thread, and all invocations are blocking */

"ZSTD_compressStream*() consumes input ... immediately gives back control"
pretty much confirms that.


Do we care about zstd's memory usage here? I think it's OK to mostly ignore
work_mem/maintenance_work_mem here, but I could also see limiting concurrency
so that estimated memory usage would fit into work_mem/maintenance_work_mem.



> It's probably also worth mentioning here that even if, contrary to
> expectations, the compression threads hang around to the end of time
> and chill, in practice nobody is likely to run BASE_BACKUP and then
> keep the connection open for a long time afterward. So it probably
> wouldn't really affect resource utilization in real-world scenarios
> even if the threads never exited, as long as they didn't, you know,
> busy-loop in the background. And I assume the actual library behavior
> can't be nearly that bad. This is a pretty mainstream piece of
> software.

I'm not really worried about resource utilization, more about the existence of
threads moving us into undefined behaviour territory or such. I don't think
that's possible, but it's IIRC UB to fork() while threads are present and do
pretty much *anything* other than immediately exec*().


> > > but that's not to say that there couldn't be problems.  I worry a bit that
> > > the mere presence of threads could in some way mess things up, but I don't
> > > know what the mechanism for that would be, and I don't want to postpone
> > > shipping useful features based on nebulous fears.
> >
> > One thing that'd be good to tests for is cancelling in-progress server-side
> > compression.  And perhaps a few assertions that ensure that we don't escape
> > with some threads still running. That'd have to be platform dependent, but I
> > don't see a problem with that in this case.
> 
> More specific suggestions, please?

I was thinking of doing something like calling pthread_is_threaded_np() before
and after the zstd section and erroring out if they differ. But I forgot that
that's on mac-ism.


> > > For both parallel and non-parallel zstd compression, I see differences
> > > between the compressed size depending on where the compression is
> > > done. I don't know whether this is an expected behavior of the zstd
> > > library or a bug. Both files uncompress OK and pass pg_verifybackup,
> > > but that doesn't mean we're not, for example, selecting different
> > > compression levels where we shouldn't be. I'll try to figure out
> > > what's going on here.
> > >
> > > zstd, client-side: 1.7GB, 17 seconds
> > > zstd, server-side: 1.3GB, 25 seconds
> > > parallel zstd, 4 workers, client-side: 1.7GB, 7.5 seconds
> > > parallel zstd, 4 workers, server-side: 1.3GB, 7.2 seconds
> >
> > What causes this fairly massive client-side/server-side size difference?
> 
> You seem not to have read what I wrote about this exact point in the
> text which you quoted.

Somehow not...

Perhaps it's related to the amounts of memory fed to ZSTD_compressStream2() in
one invocation? I recall that there's some differences between basebackup
client / serverside around buffer sizes - but that's before all the recent-ish
changes...

Greetings,

Andres Freund

[1] 

Re: multithreaded zstd backup compression for client and server

2022-03-23 Thread Justin Pryzby
On Wed, Mar 23, 2022 at 04:34:04PM -0400, Robert Haas wrote:
> be, spawning threads inside the PostgreSQL backend. Short of cats and
> dogs living together, it's hard to think of anything more terrifying,
> because the PostgreSQL backend is very much not thread-safe. However,
> a lot of the things we usually worry about when people make noises
> about using threads in the backend don't apply here, because the
> threads are hidden away behind libzstd interfaces and can't execute
> any PostgreSQL code. Therefore, I think it might be safe to just ...
> turn this on. One reason I think that is that this whole approach was
> recommended to me by Andres ... but that's not to say that there
> couldn't be problems.  I worry a bit that the mere presence of threads
> could in some way mess things up, but I don't know what the mechanism
> for that would be, and I don't want to postpone shipping useful
> features based on nebulous fears.

Note that the PGDG .RPMs and .DEBs are already linked with pthread, via
libxml => liblzma.

$ ldd /usr/pgsql-14/bin/postgres |grep xm
libxml2.so.2 => /lib64/libxml2.so.2 (0x7faab984e000)
$ objdump -p /lib64/libxml2.so.2 |grep NEED
  NEEDED   libdl.so.2
  NEEDED   libz.so.1
  NEEDED   liblzma.so.5
  NEEDED   libm.so.6
  NEEDED   libc.so.6
  VERNEED  0x00019218
  VERNEEDNUM   0x0005
$ objdump -p /lib64/liblzma.so.5 |grep NEED
  NEEDED   libpthread.so.0



Did you try this on windows at all ?  It's probably no surprise that zstd
implements threading differently there.




Re: [PATCH] Expose port->authn_id to extensions and triggers

2022-03-23 Thread Jacob Champion
On Wed, 2022-03-23 at 19:00 -0400, Tom Lane wrote:
> Hm.  I was more envisioning getting the "sharable" info out of Port
> entirely, although I'm not quite sure where it should go instead.

If it helps, I can move the substruct out and up to a new global struct
(MyProcShared?). At this point I think it's mostly search-and-replace.

--Jacob


Re: ubsan

2022-03-23 Thread Tom Lane
Andres Freund  writes:
> Running with asan found an existing use-after-free bug in pg_waldump (*), a 
> bug in
> dshash_seq_next() next that probably can't be hit in HEAD and a bug in my
> shared memory stats patch. I count that as a success.

Nice!

regards, tom lane




Re: [PATCH] Expose port->authn_id to extensions and triggers

2022-03-23 Thread Tom Lane
Jacob Champion  writes:
> On Thu, 2022-03-17 at 18:33 -0400, Tom Lane wrote:
>> I think what we ought to do here is separate out the data that we think
>> parallel workers need access to.  It does not seem wise to say "workers
>> can access fields A,B,C of MyPort but not fields X,Y,Z".  I do not have
>> a concrete proposal for fixing it though.

> v6-0002 has my first attempt at this. I moved authn_id into its own
> substruct inside Port, which gets serialized with the parallel key
> machinery. (My name selection of "SharedPort" is pretty bland.)

Hm.  I was more envisioning getting the "sharable" info out of Port
entirely, although I'm not quite sure where it should go instead.

regards, tom lane




Re: multithreaded zstd backup compression for client and server

2022-03-23 Thread Robert Haas
On Wed, Mar 23, 2022 at 5:52 PM Justin Pryzby  wrote:
> Also because the library may not be compiled with threading.  A few days ago, 
> I
> tried to rebase the original "parallel workers" patch over the COMPRESS DETAIL
> patch but then couldn't test it, even after trying various versions of the 
> zstd
> package and trying to compile it locally.  I'll try again soon...

Ah. Right, I can update the comment to mention that.

> I think you should also test the return value when setting the compress level.
> Not only because it's generally a good idea, but also because I suggested to
> support negative compression levels.  Which weren't allowed before v1.3.4, and
> then the range is only defined since 1.3.6 (ZSTD_minCLevel).  At some point,
> the range may have been -7..22 but now it's -131072..22.

Yeah, I was thinking that might be a good change. It would require
adjusting some other code though, because right now only compression
levels 1..22 are accepted anyhow.

> lib/compress/zstd_compress.c:int ZSTD_minCLevel(void) { return 
> (int)-ZSTD_TARGETLENGTH_MAX; }
> lib/zstd.h:#define ZSTD_TARGETLENGTH_MAXZSTD_BLOCKSIZE_MAX
> lib/zstd.h:#define ZSTD_BLOCKSIZE_MAX (1< lib/zstd.h:#define ZSTD_BLOCKSIZELOG_MAX  17
> ; -1<<17
> -131072

So does that, like, compress the value by making it way bigger? :-)

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: ubsan

2022-03-23 Thread Andres Freund
Hi,

On 2022-03-23 13:12:34 -0700, Andres Freund wrote:
> I'm planning to enable it on two of mine. Looks like gcc and clang find
> slightly different things, so I was intending to enable it on one of each.

Originally I'd planned to mix them into existing members, but I think it'd be
better to have dedicated ones. Applied for a few new buildfarm names for:
{gcc,clang}-{-fsanitize=undefined,-fsanitize=address}.


Running with asan found an existing use-after-free bug in pg_waldump (*), a bug 
in
dshash_seq_next() next that probably can't be hit in HEAD and a bug in my
shared memory stats patch. I count that as a success.

It's particularly impressive that the cost of running with ASAN is *so* much
lower than valgrind. On my workstation a check-world with
-fsanitize=alignment,undefined,address takes 3min17s, vs 1min10s or so without
-fsanitize.  Not something to always use, but certainly better than valgrind.

Greetings,

Andres Freund

(*) search_directory() uses fname = xlde->d_name after closedir(). Found in
pg_verifybackup.c's tests. Probably worth adding a few simple tests to
pg_waldump itself.




Re: [PATCH] add relation and block-level filtering to pg_waldump

2022-03-23 Thread Thomas Munro
On Thu, Mar 24, 2022 at 9:53 AM Peter Eisentraut
 wrote:
> On 21.03.22 05:55, Thomas Munro wrote:
> > [04:30:50.630] pg_waldump.c:963:26: error: format ‘%u’ expects
> > argument of type ‘unsigned int *’, but argument 3 has type ‘ForkNumber
> > *’ [-Werror=format=]
> > [04:30:50.630] 963 | if (sscanf(optarg, "%u",
> > _by_relation_forknum) != 1 ||
> > [04:30:50.630] | ~^ ~~
> > [04:30:50.630] | | |
> > [04:30:50.630] | | ForkNumber *
> > [04:30:50.630] | unsigned int *
> >
> > And now that this gets to the CompilerWarnings CI task, it looks like
> > GCC doesn't like an enum as a scanf %u destination (I didn't see that
> > warning locally when I compiled the above fixup because clearly Clang
> > is cool with it...).  Probably needs a temporary unsigned int to
> > sscanf into first.
>
> That's because ForkNum is a signed type.  You will probably succeed if
> you use "%d" instead.

Erm, is that really OK?  C says "Each enumerated type shall be
compatible with char, a signed integer type, or an
unsigned integer type. The choice of type is implementation-defined,
but shall be capable of representing the values of all the members of
the enumeration."  It could even legally vary from enum to enum,
though in practice most compilers probably just use ints all the time
unless you use weird pragma pack incantation.  Therefore I think you
need an intermediate variable with the size and signedness matching the
format string, if you're going to scanf directly into it, which
David's V6 did.




Re: [PATCH] Expose port->authn_id to extensions and triggers

2022-03-23 Thread Jacob Champion
On Thu, 2022-03-17 at 18:33 -0400, Tom Lane wrote:
> Yeah.  It seems to me that putting the auth info into struct Port was
> a fairly random thing to do in the first place, and we are now dealing
> with the fallout of that.
> 
> I think what we ought to do here is separate out the data that we think
> parallel workers need access to.  It does not seem wise to say "workers
> can access fields A,B,C of MyPort but not fields X,Y,Z".  I do not have
> a concrete proposal for fixing it though.

v6-0002 has my first attempt at this. I moved authn_id into its own
substruct inside Port, which gets serialized with the parallel key
machinery. (My name selection of "SharedPort" is pretty bland.)

Over time, we could move more fields into the shared struct and fill
out the serialization logic as needed, and then maybe eventually
SharedPort can be broken off into its own thing with its own
allocation. But I don't know if we should do it all at once, yet.

WDYT?

--Jacob
From c1323d7694deedf1fa829772083977c18c7f77f6 Mon Sep 17 00:00:00 2001
From: Jacob Champion 
Date: Mon, 14 Feb 2022 08:10:53 -0800
Subject: [PATCH v6 1/3] Add API to retrieve authn_id from SQL

The authn_id field in MyProcPort is currently only accessible to the
backend itself.  Add a SQL function, pg_session_authn_id(), to expose
the field to triggers that may want to make use of it.
---
 doc/src/sgml/func.sgml| 26 +++
 src/backend/utils/adt/name.c  | 12 ++-
 src/include/catalog/pg_proc.dat   |  3 +++
 src/test/authentication/t/001_password.pl | 11 ++
 src/test/ssl/t/001_ssltests.pl|  7 ++
 5 files changed, 58 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 89a5e17884..45df4ff158 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -22280,6 +22280,32 @@ SELECT * FROM pg_ls_dir('.') WITH ORDINALITY AS t(ls,n);

   
 
+  
+   
+
+ pg_session_authn_id
+
+pg_session_authn_id ()
+text
+   
+   
+Returns the authenticated identity for the current connection, or
+NULL if the user has not been authenticated.
+   
+   
+The authenticated identity is an immutable identifier for the user
+presented during the connection handshake; the exact format depends on
+the authentication method in use. (For example, when using the
+scram-sha-256 auth method, the authenticated identity
+is simply the username. When using the cert auth
+method, the authenticated identity is the Distinguished Name of the
+client certificate.) Even for auth methods which use the username as
+the authenticated identity, this function differs from
+session_user in that its return value cannot be
+changed after login.
+   
+  
+
   

 
diff --git a/src/backend/utils/adt/name.c b/src/backend/utils/adt/name.c
index e8bba3670c..662a7943ed 100644
--- a/src/backend/utils/adt/name.c
+++ b/src/backend/utils/adt/name.c
@@ -23,6 +23,7 @@
 #include "catalog/namespace.h"
 #include "catalog/pg_collation.h"
 #include "catalog/pg_type.h"
+#include "libpq/libpq-be.h"
 #include "libpq/pqformat.h"
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
@@ -257,7 +258,7 @@ namestrcmp(Name name, const char *str)
 
 
 /*
- * SQL-functions CURRENT_USER, SESSION_USER
+ * SQL-functions CURRENT_USER, SESSION_USER, PG_SESSION_AUTHN_ID
  */
 Datum
 current_user(PG_FUNCTION_ARGS)
@@ -271,6 +272,15 @@ session_user(PG_FUNCTION_ARGS)
 	PG_RETURN_DATUM(DirectFunctionCall1(namein, CStringGetDatum(GetUserNameFromId(GetSessionUserId(), false;
 }
 
+Datum
+pg_session_authn_id(PG_FUNCTION_ARGS)
+{
+	if (!MyProcPort || !MyProcPort->authn_id)
+		PG_RETURN_NULL();
+
+	PG_RETURN_TEXT_P(cstring_to_text(MyProcPort->authn_id));
+}
+
 
 /*
  * SQL-functions CURRENT_SCHEMA, CURRENT_SCHEMAS
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index d8e8715ed1..a1bf898476 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -1508,6 +1508,9 @@
 { oid => '746', descr => 'session user name',
   proname => 'session_user', provolatile => 's', prorettype => 'name',
   proargtypes => '', prosrc => 'session_user' },
+{ oid => '9774', descr => 'session authenticated identity',
+  proname => 'pg_session_authn_id', provolatile => 's', proparallel => 'r',
+  prorettype => 'text', proargtypes => '', prosrc => 'pg_session_authn_id' },
 
 { oid => '744',
   proname => 'array_eq', prorettype => 'bool',
diff --git a/src/test/authentication/t/001_password.pl b/src/test/authentication/t/001_password.pl
index 3e3079c824..f0bdeda52d 100644
--- a/src/test/authentication/t/001_password.pl
+++ b/src/test/authentication/t/001_password.pl
@@ -82,6 +82,10 @@ test_role($node, 'scram_role', 'trust', 0,
 test_role($node, 'md5_role', 'trust', 0,
 	log_unlike => 

Re: Column Filtering in Logical Replication

2022-03-23 Thread Tomas Vondra



On 3/21/22 12:28, Amit Kapila wrote:
> On Fri, Mar 18, 2022 at 8:13 PM Tomas Vondra
>  wrote:
>>
>> Ah, thanks for reminding me - it's hard to keep track of all the issues
>> in threads as long as this one.
>>
>> BTW do you have any opinion on the SET COLUMNS syntax? Peter Smith
>> proposed to get rid of it in [1] but I'm not sure that's a good idea.
>> Because if we ditch it, then removing the column list would look like this:
>>
>> ALTER PUBLICATION pub ALTER TABLE tab;
>>
>> And if we happen to add other per-table options, this would become
>> pretty ambiguous.
>>
>> Actually, do we even want to allow resetting column lists like this? We
>> don't allow this for row filters, so if you want to change a row filter
>> you have to re-add the table, right?
>>
> 
> We can use syntax like: "alter publication pub1 set table t1 where (c2
>> 10);" to reset the existing row filter. It seems similar thing works
> for column list as well ("alter publication pub1 set table t1 (c2)
> where (c2 > 10)"). If I am not missing anything, I don't think we need
> additional Alter Table syntax.
> 
>> So maybe we should just ditch ALTER
>> TABLE entirely.
>>
> 
> Yeah, I agree especially if my above understanding is correct.
> 

I think there's a gotcha that

   ALTER PUBLICATION pub SET TABLE t ...

also removes all other relations from the publication, and it removes
and re-adds the table anyway. So I'm not sure what's the advantage?

Anyway, I don't see why we would need such ALTER TABLE only for column
filters and not for row filters - either we need to allow this for both
options or none of them.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Column Filtering in Logical Replication

2022-03-23 Thread Tomas Vondra
On 3/21/22 12:55, Amit Kapila wrote:
> On Sat, Mar 19, 2022 at 3:56 AM Tomas Vondra
>  wrote:
>>
>> ...
>>
>> However, while looking at how pgoutput, I realized one thing - for row
>> filters we track them "per operation", depending on which operations are
>> defined for a given publication. Shouldn't we do the same thing for
>> column lists, really?
>>
>> I mean, if there are two publications with different column lists, one
>> for inserts and the other one for updates, isn't it wrong to merge these
>> two column lists?
>>
> 
> The reason we can't combine row filters for inserts with
> updates/deletes is that if inserts have some column that is not
> present in RI then during update filtering (for old tuple) it will
> give an error as the column won't be present in WAL log.
> 
> OTOH, the same problem won't be there for the column list/filter patch
> because all the RI columns are there in the column list (for
> update/delete) and we don't need to apply a column filter for old
> tuples in either update or delete.
> 
> Basically, the filter rules are slightly different for row filters and
> column lists, so we need them (combine of filters) for one but not for
> the other. Now, for the sake of consistency with row filters, we can
> do it but as such there won't be any problem or maybe we can just add
> a comment for the same in code.
> 

OK, thanks for the explanation. I'll add a comment explaining this to
the function initializing the column filter.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Proposal: allow database-specific role memberships

2022-03-23 Thread Kenaniah Cerny
Hi all,

cfbot is once again green as of the v7 patch:
https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest/37/3374

- Kenaniah


Re: XID formatting and SLRU refactorings (was: Add 64-bit XIDs into PostgreSQL 15)

2022-03-23 Thread Peter Eisentraut

On 23.03.22 10:51, Maxim Orlov wrote:

Thanks for updating the patchs. I have a little comment on 0002.

+                                errmsg_internal("found xmax %llu" "
(infomask 0x%04x) not frozen, not multi, not normal",
+   
(unsigned long long) xid, tuple->t_infomask)));



Thanks for your review! Fixed.


About v25-0001-Use-unsigned-64-bit-numbering-of-SLRU-pages.patch:

-static bool CLOGPagePrecedes(int page1, int page2);
+static bool CLOGPagePrecedes(uint64 page1, uint64 page2);

You are changing the API from signed to unsigned.  Is this intentional? 
It is not explained anywhere.  Are we sure that nothing uses, for 
example, negative values as error markers?


 #define SlruFileName(ctl, path, seg) \
-   snprintf(path, MAXPGPATH, "%s/%04X", (ctl)->Dir, seg)
+   snprintf(path, MAXPGPATH, "%s/%04X%08X", (ctl)->Dir, \
+(uint32) ((seg) >> 32), (uint32) ((seg) & 
(uint64)0x))


What's going on here?  Some explanation?  Why not use something like 
%llX or whatever you need?


+   uint64  segno = pageno / SLRU_PAGES_PER_SEGMENT;
+   uint64  rpageno = pageno % SLRU_PAGES_PER_SEGMENT;
[etc.]

Not clear whether segno etc. need to be changed to 64 bits, or whether 
an increase of SLRU_PAGES_PER_SEGMENT should also be considered.


-   if ((len == 4 || len == 5 || len == 6) &&
+   if ((len == 12 || len == 13 || len == 14) &&

This was horrible before, but maybe we can take this opportunity now to 
add a comment?


-   SlruFileName(ctl, path, ftag->segno);
+   SlruFileName(ctl, path, (uint64) ftag->segno);

Maybe ftag->segno should be changed to 64 bits as well?  Not clear.

There is a code comment at the definition of SLRU_PAGES_PER_SEGMENT that 
has some detailed explanations of how the SLRU numbering, SLRU file 
names, and transaction IDs tie together.  This doesn't seem to apply 
anymore after this change.


The reference page of pg_resetwal contains various pieces of information 
of how to map SLRU files back to transaction IDs.  Please check if that 
is still all up to date.



About v25-0002-Use-64-bit-format-to-output-XIDs.patch:

I don't see the point of applying this now.  It doesn't make PG15 any 
better.  It's just a patch part of which we might need later. 
Especially the issue of how we are handwaving past the epoch-enabled 
output sites disturbs me.  At least those should be omitted from this 
patch, since this patch makes these more wrong, not more right for the 
future.


An alternative we might want to consider is that we use PRId64 as 
explained here: 
. 
 We'd have to check whether we still support any non-GNU gettext 
implementations and to what extent they support this.  But I think it's 
something to think about if we are going to embark on a journey of much 
more int64 printf output.





Re: multithreaded zstd backup compression for client and server

2022-03-23 Thread Robert Haas
On Wed, Mar 23, 2022 at 5:14 PM Andres Freund  wrote:
> The most likely source of problem would errors thrown while zstd threads are
> alive. Should make sure that that can't happen.
>
> What is the lifetime of the threads zstd spawns? Are they tied to a single
> compression call? A single ZSTD_createCCtx()? If the latter, how bulletproof
> is our code ensuring that we don't leak such contexts?

I haven't found any real documentation explaining how libzstd manages
its threads. I am assuming that it is tied to the ZSTD_CCtx, but I
don't know. I guess I could try to figure it out from the source code.
Anyway, what we have now is  a PG_TRY()/PG_CATCH() block around the
code that uses the basink which will cause bbsink_zstd_cleanup() to
get called in the event of an error. That will do ZSTD_freeCCtx().

It's probably also worth mentioning here that even if, contrary to
expectations, the compression threads hang around to the end of time
and chill, in practice nobody is likely to run BASE_BACKUP and then
keep the connection open for a long time afterward. So it probably
wouldn't really affect resource utilization in real-world scenarios
even if the threads never exited, as long as they didn't, you know,
busy-loop in the background. And I assume the actual library behavior
can't be nearly that bad. This is a pretty mainstream piece of
software.

> If they're short-lived, are we compressing large enough batches to not waste a
> lot of time starting/stopping threads?

Well, we're using a single ZSTD_CCtx for an entire base backup. Again,
I haven't found documentation explaining with libzstd is actually
doing, but it's hard to see how we could make the batch any bigger
than that. The context gets reset for each new tablespace, which may
or may not do anything to the compression threads.

> > but that's not to say that there couldn't be problems.  I worry a bit that
> > the mere presence of threads could in some way mess things up, but I don't
> > know what the mechanism for that would be, and I don't want to postpone
> > shipping useful features based on nebulous fears.
>
> One thing that'd be good to tests for is cancelling in-progress server-side
> compression.  And perhaps a few assertions that ensure that we don't escape
> with some threads still running. That'd have to be platform dependent, but I
> don't see a problem with that in this case.

More specific suggestions, please?

> > For both parallel and non-parallel zstd compression, I see differences
> > between the compressed size depending on where the compression is
> > done. I don't know whether this is an expected behavior of the zstd
> > library or a bug. Both files uncompress OK and pass pg_verifybackup,
> > but that doesn't mean we're not, for example, selecting different
> > compression levels where we shouldn't be. I'll try to figure out
> > what's going on here.
> >
> > zstd, client-side: 1.7GB, 17 seconds
> > zstd, server-side: 1.3GB, 25 seconds
> > parallel zstd, 4 workers, client-side: 1.7GB, 7.5 seconds
> > parallel zstd, 4 workers, server-side: 1.3GB, 7.2 seconds
>
> What causes this fairly massive client-side/server-side size difference?

You seem not to have read what I wrote about this exact point in the
text which you quoted.

> Will this cause test failures on systems with older zstd?

I put a bunch of logic in the test case to try to avoid that, so
hopefully not, but if it does, we can adjust the logic.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations

2022-03-23 Thread Peter Geoghegan
On Wed, Mar 23, 2022 at 2:03 PM Thomas Munro  wrote:
> Yeah, I found it confusing that DISABLE_PAGE_SKIPPING doesn't disable
> all page skipping, so 3414099c turned out to be not enough.

The proposed change to DISABLE_PAGE_SKIPPING is partly driven by that,
and partly driven by a similar concern about aggressive VACUUM.

It seems worth emphasizing the idea that an aggressive VACUUM is now
just the same as any other VACUUM except for one detail: we're
guaranteed to advance relfrozenxid to a value >= FreezeLimit at the
end. The non-aggressive case has the choice to do things that make
that impossible. But there are only two places where this can happen now:

1. Non-aggressive VACUUMs might decide to skip some all-visible pages in
the new lazy_scan_skip() helper routine for skipping with the VM (see
v11-0002-*).

2. A non-aggressive VACUUM can *always* decide to ratchet back its
target relfrozenxid in lazy_scan_noprune, to avoid waiting for a
cleanup lock -- a final value from before FreezeLimit is usually still
pretty good.

The first scenario is the only one where it becomes impossible for
non-aggressive VACUUM to be able to advance relfrozenxid (with
v11-0001-* in place) by any amount. Even that's a choice, made by
weighing costs against benefits.

There is no behavioral change in v11-0002-* (we're still using the
old SKIP_PAGES_THRESHOLD strategy), but the lazy_scan_skip()
helper routine could fairly easily be taught a lot more about the
downside of skipping all-visible pages (namely how that makes it
impossible to advance relfrozenxid).

Maybe it's worth skipping all-visible pages (there are lots of them
and age(relfrozenxid) is still low), and maybe it isn't worth it. We
should get to decide, without implementation details making
relfrozenxid advancement unsafe.

It would be great if you could take a look v11-0002-*, Robert. Does it
make sense to you?

Thanks
--
Peter Geoghegan




Re: MDAM techniques and Index Skip Scan patch

2022-03-23 Thread Dmitry Dolgov
> On Wed, Mar 23, 2022 at 05:32:46PM -0400, Tom Lane wrote:
> Dmitry Dolgov <9erthali...@gmail.com> writes:
> > On Tue, Mar 22, 2022 at 04:55:49PM -0400, Tom Lane wrote:
> >> In short: I would throw out just about all the planner infrastructure
> >> that's been proposed so far.  It looks bulky, expensive, and
> >> drastically undercommented, and I don't think it's buying us anything
> >> of commensurate value.
>
> > Broadly speaking planner related changes proposed in the patch so far
> > are: UniqueKey, taken from the neighbour thread about select distinct;
> > list of uniquekeys to actually pass information about the specified
> > loose scan prefix into nbtree; some verification logic to prevent
> > applying skipping when it's not supported. I can imagine taking out
> > UniqueKeys and passing loose scan prefix in some other form (the other
> > parts seems to be essential) -- is that what you mean?
>
> My point is that for pure loose scans --- that is, just optimizing a scan,
> not doing AM-based duplicate-row-elimination --- you do not need to pass
> any new data to btree at all.  It can infer what to do on the basis of the
> set of index quals it's handed.
>
> The bigger picture here is that I think the reason this patch series has
> failed to progress is that it's too scattershot.  You need to pick a
> minimum committable feature and get that done, and then you can move on
> to the next part.  I think the minimum committable feature is loose scans,
> which will require a fair amount of work in access/nbtree/ but very little
> new planner code, and will be highly useful in their own right even if we
> never do anything more.
>
> In general I feel that the UniqueKey code is a solution looking for a
> problem, and that treating it as the core of the patchset is a mistake.
> We should be driving this work off of what nbtree needs to make progress,
> and not building more infrastructure elsewhere than we have to.  Maybe
> we'll end up with something that looks like UniqueKeys, but I'm far from
> convinced of that.

I see. I'll need some thinking time about how it may look like (will
probably return with more questions).

The CF item could be set to RwF, what would you say, Jesper?




Re: multithreaded zstd backup compression for client and server

2022-03-23 Thread Justin Pryzby
+* We check for failure here because (1) older versions of the library
+* do not support ZSTD_c_nbWorkers and (2) the library might want to
+* reject an unreasonable values (though in practice it does not seem 
to do
+* so).
+*/
+   ret = ZSTD_CCtx_setParameter(mysink->cctx, ZSTD_c_nbWorkers,
+
mysink->workers);
+   if (ZSTD_isError(ret))
+   ereport(ERROR,
+   errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+   errmsg("could not set compression worker count 
to %d: %s",
+  mysink->workers, 
ZSTD_getErrorName(ret)));

Also because the library may not be compiled with threading.  A few days ago, I
tried to rebase the original "parallel workers" patch over the COMPRESS DETAIL
patch but then couldn't test it, even after trying various versions of the zstd
package and trying to compile it locally.  I'll try again soon...

I think you should also test the return value when setting the compress level.
Not only because it's generally a good idea, but also because I suggested to
support negative compression levels.  Which weren't allowed before v1.3.4, and
then the range is only defined since 1.3.6 (ZSTD_minCLevel).  At some point,
the range may have been -7..22 but now it's -131072..22.

lib/compress/zstd_compress.c:int ZSTD_minCLevel(void) { return 
(int)-ZSTD_TARGETLENGTH_MAX; }
lib/zstd.h:#define ZSTD_TARGETLENGTH_MAXZSTD_BLOCKSIZE_MAX
lib/zstd.h:#define ZSTD_BLOCKSIZE_MAX (1<>From 80f45cfbe13d6fc0f16e49b7ea76f1e50afb632c Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Thu, 10 Mar 2022 20:16:19 -0600
Subject: [PATCH] pg_basebackup: support Zstd negative compression levels

"higher than maximum" is bogus

TODO: each compression methods should enforce its own levels
---
 src/backend/replication/basebackup_zstd.c | 7 +--
 src/bin/pg_basebackup/bbstreamer_zstd.c   | 5 -
 src/common/backup_compression.c   | 6 +-
 3 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/src/backend/replication/basebackup_zstd.c b/src/backend/replication/basebackup_zstd.c
index 4835aa70fca..74681ee3fe8 100644
--- a/src/backend/replication/basebackup_zstd.c
+++ b/src/backend/replication/basebackup_zstd.c
@@ -79,7 +79,7 @@ bbsink_zstd_new(bbsink *next, bc_specification *compress)
 	else
 	{
 		compresslevel = compress->level;
-		Assert(compresslevel >= 1 && compresslevel <= 22);
+		Assert(compresslevel >= -7 && compresslevel <= 22 && compresslevel != 0);
 	}
 
 	sink = palloc0(sizeof(bbsink_zstd));
@@ -108,8 +108,11 @@ bbsink_zstd_begin_backup(bbsink *sink)
 	if (!mysink->cctx)
 		elog(ERROR, "could not create zstd compression context");
 
-	ZSTD_CCtx_setParameter(mysink->cctx, ZSTD_c_compressionLevel,
+	ret = ZSTD_CCtx_setParameter(mysink->cctx, ZSTD_c_compressionLevel,
 		   mysink->compresslevel);
+	if (ZSTD_isError(ret))
+		elog(ERROR, "could not create zstd compression context: %s",
+ZSTD_getErrorName(ret));
 
 	/*
 	 * We check for failure here because (1) older versions of the library
diff --git a/src/bin/pg_basebackup/bbstreamer_zstd.c b/src/bin/pg_basebackup/bbstreamer_zstd.c
index e17dfb6bd54..640729003a4 100644
--- a/src/bin/pg_basebackup/bbstreamer_zstd.c
+++ b/src/bin/pg_basebackup/bbstreamer_zstd.c
@@ -85,8 +85,11 @@ bbstreamer_zstd_compressor_new(bbstreamer *next, bc_specification *compress)
 		pg_log_error("could not create zstd compression context");
 
 	/* Initialize stream compression preferences */
-	ZSTD_CCtx_setParameter(streamer->cctx, ZSTD_c_compressionLevel,
+	ret = ZSTD_CCtx_setParameter(streamer->cctx, ZSTD_c_compressionLevel,
 		   compress->level);
+	if (ZSTD_isError(ret))
+		pg_log_error("could not set compression level to: %d: %s",
+compress->level, ZSTD_getErrorName(ret));
 
 	/*
 	 * We check for failure here because (1) older versions of the library
diff --git a/src/common/backup_compression.c b/src/common/backup_compression.c
index 969e08cca20..c0eff30024c 100644
--- a/src/common/backup_compression.c
+++ b/src/common/backup_compression.c
@@ -260,13 +260,17 @@ validate_bc_specification(bc_specification *spec)
 		else if (spec->algorithm == BACKUP_COMPRESSION_LZ4)
 			max_level = 12;
 		else if (spec->algorithm == BACKUP_COMPRESSION_ZSTD)
+		{
 			max_level = 22;
+			/* The minimum level depends on the version.. */
+			min_level = -7;
+		}
 		else
 			return psprintf(_("compression algorithm \"%s\" does not accept a compression level"),
 			get_bc_algorithm_name(spec->algorithm));
 
 		if (spec->level < min_level || spec->level > max_level)
-			return psprintf(_("compression algorithm \"%s\" expects a compression level between %d and %d"),
+			return psprintf(_("compression algorithm \"%s\" expects a nonzero compression level between %d and %d"),
 			get_bc_algorithm_name(spec->algorithm),
 			min_level, max_level);
 	}
-- 

Re: MDAM techniques and Index Skip Scan patch

2022-03-23 Thread Tom Lane
Dmitry Dolgov <9erthali...@gmail.com> writes:
> On Tue, Mar 22, 2022 at 04:55:49PM -0400, Tom Lane wrote:
>> In short: I would throw out just about all the planner infrastructure
>> that's been proposed so far.  It looks bulky, expensive, and
>> drastically undercommented, and I don't think it's buying us anything
>> of commensurate value.

> Broadly speaking planner related changes proposed in the patch so far
> are: UniqueKey, taken from the neighbour thread about select distinct;
> list of uniquekeys to actually pass information about the specified
> loose scan prefix into nbtree; some verification logic to prevent
> applying skipping when it's not supported. I can imagine taking out
> UniqueKeys and passing loose scan prefix in some other form (the other
> parts seems to be essential) -- is that what you mean?

My point is that for pure loose scans --- that is, just optimizing a scan,
not doing AM-based duplicate-row-elimination --- you do not need to pass
any new data to btree at all.  It can infer what to do on the basis of the
set of index quals it's handed.

The bigger picture here is that I think the reason this patch series has
failed to progress is that it's too scattershot.  You need to pick a
minimum committable feature and get that done, and then you can move on
to the next part.  I think the minimum committable feature is loose scans,
which will require a fair amount of work in access/nbtree/ but very little
new planner code, and will be highly useful in their own right even if we
never do anything more.

In general I feel that the UniqueKey code is a solution looking for a
problem, and that treating it as the core of the patchset is a mistake.
We should be driving this work off of what nbtree needs to make progress,
and not building more infrastructure elsewhere than we have to.  Maybe
we'll end up with something that looks like UniqueKeys, but I'm far from
convinced of that.

regards, tom lane




Re: [PoC] Let libpq reject unexpected authentication requests

2022-03-23 Thread Jacob Champion
On Mon, 2022-03-07 at 11:44 +0100, Laurenz Albe wrote:
> I am all for the idea, but you implemented the reverse of proposal 2.

(This email was caught in my spam filter; sorry for the delay.)

> Wouldn't it be better to list the *rejected* authentication methods?
> Then we could have "password" on there by default.

Specifying the allowed list rather than the denied list tends to have
better security properties.

In the case I'm pursuing (the attack vector from the CVE), the end user
expects certificates to be used. Any other authentication method --
plaintext, hashed, SCRAM, Kerberos -- is unacceptable; it shouldn't be
possible for the server to extract any information about the client
environment other than the cert. And I don't want to have to specify
the whole list of things that _aren't_ allowed, and keep that list
updated as we add new fancy auth methods, if I just want certs to be
used. So that's my argument for making the methods opt-in rather than
opt-out.

But that doesn't help your case; you want to choose a good default, and
I agree that's important. Since there are arguments already for
accepting a OR in the list, and -- if we couldn't find a good
orthogonal method for certs, like Tom suggested -- an AND, maybe it
wouldn't be so bad to accept a NOT as well?

require_auth=cert# certs only
require_auth=cert+scram-sha-256  # SCRAM wrapped by certs
require_auth=cert,scram-sha-256  # SCRAM or certs (or both)
require_auth=!password   # anything but plaintext
require_auth=!password,!md5  # no plaintext or MD5

But it doesn't ever make sense to mix them:

require_auth=cert,!password  # error: !password is useless
require_auth=!password,password  # error: nonsense

--Jacob


Re: Proposal: allow database-specific role memberships

2022-03-23 Thread Kenaniah Cerny
Thanks Andres,

An updated patch is attached.

- Kenaniah

On Mon, Mar 21, 2022 at 5:40 PM Andres Freund  wrote:

> Hi,
>
> On 2022-01-22 22:56:44 +0800, Julien Rouhaud wrote:
> > On Sat, Jan 22, 2022 at 05:28:05AM -0800, Kenaniah Cerny wrote:
> > > Thank you so much for the backtrace!
> > >
> > > This latest patch should address by moving the dumpRoleMembership call
> to
> > > before the pointer is closed.
> >
> > Thanks!  The cfbot turned green since:
> >
> https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest/36/3374
>
> red again:
> https://cirrus-ci.com/task/5516269981007872?logs=test_world#L1480
>
> Marked as waiting-on-author.
>
> - Andres
>


database-role-memberships-v7.patch
Description: Binary data


Re: logical replication restrictions

2022-03-23 Thread Euler Taveira
On Mon, Mar 21, 2022, at 10:09 PM, Euler Taveira wrote:
> On Mon, Mar 21, 2022, at 10:04 PM, Andres Freund wrote:
>> On 2022-03-20 21:40:40 -0300, Euler Taveira wrote:
>> > On Mon, Feb 28, 2022, at 9:18 PM, Euler Taveira wrote:
>> > > Long time, no patch. Here it is. I will provide documentation in the next
>> > > version. I would appreciate some feedback.
>> > This patch is broken since commit 
>> > 705e20f8550c0e8e47c0b6b20b5f5ffd6ffd9e33. I
>> > rebased it.
>> 
>> This fails tests, specifically it seems psql crashes:
>> https://cirrus-ci.com/task/6592281292570624?logs=cores#L46
> Yeah. I forgot to test this patch with cassert before sending it. :( I didn't
> send a new patch because there is another issue (with int128) that I'm
> currently reworking. I'll send another patch soon.
Here is another version after rebasing it. In this version I fixed the psql
issue and rewrote interval_to_ms function.


--
Euler Taveira
EDB   https://www.enterprisedb.com/
From 6718e96d3682af9094c02b0e308a8c814148a197 Mon Sep 17 00:00:00 2001
From: Euler Taveira 
Date: Sat, 6 Nov 2021 11:31:10 -0300
Subject: [PATCH v3] Time-delayed logical replication subscriber

Similar to physical replication, a time-delayed copy of the data for
logical replication is useful for some scenarios (specially to fix
errors that might cause data loss).

If the subscriber sets min_apply_delay parameter, the logical
replication worker will delay the transaction commit for min_apply_delay
milliseconds.

The delay is calculated between the WAL time stamp and the current time
on the subscriber.

The delay occurs only on WAL records for transaction begins. Regular and
prepared transactions are covered. Streamed transactions are not
delayed. It should be implemented in future releases.

Author: Euler Taveira
Discussion: https://postgr.es/m/CAB-JLwYOYwL=xtyaxkih5ctm_vm8kjkh7aaitckvmch4rzr...@mail.gmail.com
---
 doc/src/sgml/ref/alter_subscription.sgml   |   5 +-
 doc/src/sgml/ref/create_subscription.sgml  |  33 +
 src/backend/catalog/pg_subscription.c  |   1 +
 src/backend/catalog/system_views.sql   |   4 +-
 src/backend/commands/subscriptioncmds.c|  46 ++-
 src/backend/replication/logical/worker.c   |  61 +
 src/backend/utils/adt/timestamp.c  |  32 +
 src/bin/pg_dump/pg_dump.c  |  13 +-
 src/bin/pg_dump/pg_dump.h  |   1 +
 src/bin/psql/describe.c|  13 +-
 src/include/catalog/pg_subscription.h  |   2 +
 src/include/datatype/timestamp.h   |   2 +
 src/include/utils/timestamp.h  |   2 +
 src/test/regress/expected/subscription.out | 149 -
 src/test/regress/sql/subscription.sql  |  20 +++
 src/test/subscription/t/030_apply_delay.pl |  76 +++
 16 files changed, 389 insertions(+), 71 deletions(-)
 create mode 100644 src/test/subscription/t/030_apply_delay.pl

diff --git a/doc/src/sgml/ref/alter_subscription.sgml b/doc/src/sgml/ref/alter_subscription.sgml
index ac2db249cb..83e4e352ca 100644
--- a/doc/src/sgml/ref/alter_subscription.sgml
+++ b/doc/src/sgml/ref/alter_subscription.sgml
@@ -205,8 +205,9 @@ ALTER SUBSCRIPTION name RENAME TO <
   information.  The parameters that can be altered
   are slot_name,
   synchronous_commit,
-  binary, streaming, and
-  disable_on_error.
+  binary, streaming,
+  disable_on_error, and
+  min_apply_delay.
  
 

diff --git a/doc/src/sgml/ref/create_subscription.sgml b/doc/src/sgml/ref/create_subscription.sgml
index b701752fc9..2bc8deb132 100644
--- a/doc/src/sgml/ref/create_subscription.sgml
+++ b/doc/src/sgml/ref/create_subscription.sgml
@@ -302,6 +302,39 @@ CREATE SUBSCRIPTION subscription_name
 

+
+   
+min_apply_delay (integer)
+
+ 
+  By default, subscriber applies changes as soon as possible. Similar
+  to the physical replication feature
+  (), it may be useful to
+  have a time-delayed copy of data for logical replication. This
+  parameter allows you to delay the application of changes by a
+  specified amount of time. If this value is specificed without units,
+  it is taken as milliseconds. The default is zero, adding no delay.
+ 
+ 
+  The delay occurs only after the initial table synchronization. It is
+  possible that the replication delay between publisher and subscriber
+  exceeds the value of this parameter, in which case no delay is added.
+  Note that the delay is calculated between the WAL time stamp as
+  written on publisher and the current time on the subscriber. Delays
+  in logical decoding and in transfer the transaction may reduce the
+  actual wait time. If the system clocks on publisher and subscriber
+  are not synchronized, this may lead to apply changes earlier than
+  expected. This is not a major issue because a 

Re: multithreaded zstd backup compression for client and server

2022-03-23 Thread Andres Freund
Hi,

On 2022-03-23 16:34:04 -0400, Robert Haas wrote:
> Therefore, I think it might be safe to just ...  turn this on. One reason I
> think that is that this whole approach was recommended to me by Andres ...

I didn't do a super careful analysis of the issues... But I do think it's
pretty much the one case where it "should" be safe.

The most likely source of problem would errors thrown while zstd threads are
alive. Should make sure that that can't happen.


What is the lifetime of the threads zstd spawns? Are they tied to a single
compression call? A single ZSTD_createCCtx()? If the latter, how bulletproof
is our code ensuring that we don't leak such contexts?

If they're short-lived, are we compressing large enough batches to not waste a
lot of time starting/stopping threads?


> but that's not to say that there couldn't be problems.  I worry a bit that
> the mere presence of threads could in some way mess things up, but I don't
> know what the mechanism for that would be, and I don't want to postpone
> shipping useful features based on nebulous fears.

One thing that'd be good to tests for is cancelling in-progress server-side
compression.  And perhaps a few assertions that ensure that we don't escape
with some threads still running. That'd have to be platform dependent, but I
don't see a problem with that in this case.



> For both parallel and non-parallel zstd compression, I see differences
> between the compressed size depending on where the compression is
> done. I don't know whether this is an expected behavior of the zstd
> library or a bug. Both files uncompress OK and pass pg_verifybackup,
> but that doesn't mean we're not, for example, selecting different
> compression levels where we shouldn't be. I'll try to figure out
> what's going on here.
>
> zstd, client-side: 1.7GB, 17 seconds
> zstd, server-side: 1.3GB, 25 seconds
> parallel zstd, 4 workers, client-side: 1.7GB, 7.5 seconds
> parallel zstd, 4 workers, server-side: 1.3GB, 7.2 seconds

What causes this fairly massive client-side/server-side size difference?



> + /*
> +  * We check for failure here because (1) older versions of the library
> +  * do not support ZSTD_c_nbWorkers and (2) the library might want to
> +  * reject unreasonable values (though in practice it does not seem to do
> +  * so).
> +  */
> + ret = ZSTD_CCtx_setParameter(streamer->cctx, ZSTD_c_nbWorkers,
> +  
> compress->workers);
> + if (ZSTD_isError(ret))
> + {
> + pg_log_error("could not set compression worker count to %d: %s",
> +  compress->workers, 
> ZSTD_getErrorName(ret));
> + exit(1);
> + }

Will this cause test failures on systems with older zstd?


Greetings,

Andres Freund




Re: [PATCH] Remove workarounds to format [u]int64's

2022-03-23 Thread Pavel Borisov
>
> On 21.03.22 15:37, Aleksander Alekseev wrote:
> >> It would not simplify things for them at all, just mess it up.
> >> The master copies of the .po files are kept in a different repo.
> >> Also, I believe that extraction of new message strings is automated
> >> already.
> >
> > Got it, thanks. Here is the corrected patch. It includes all the
> > changes by me and Japin, and doesn't touch PO files.
>
> I think in some cases we can make this even simpler (and cast-free) by
> changing the underlying variable to be long long instead of int64.
> Especially in cases where the whole point of the variable is to be some
> counter that ends up being printed, there isn't a need to use int64 in
> the first place.  See attached patch for examples.


Yes, this will work, when we can define a variable itself as *long long*.
But for some applications: [1], [2], I suppose we'll need exactly uint64 to
represent TransactionId. uint64 is warrantied to fit into *unsigned long
long*, but on most of archs it is just *unsigned long*. Defining what we
need to be uint64 as *unsigned long long* on these archs will mean it
become uint128, which we may not like.

In my opinion, in many places, it's better to have casts when it's for
printing fixed-width int/uint variables than the alternative.

[1]
https://www.postgresql.org/message-id/flat/CACG%3DezYV3FM5i9ws2QLyF%2Brz5WHTqheL59VRsHGsgAwfx8gh4g%40mail.gmail.com#d7068b9d25a2f8a1064d2ea4815df23d
[2]
https://www.postgresql.org/message-id/flat/CACG%3DezZe1NQSCnfHOr78AtAZxJZeCvxrts0ygrxYwe%3DpyyjVWA%40mail.gmail.com

--
Best regards,
Pavel Borisov

Postgres Professional: http://postgrespro.com 


Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations

2022-03-23 Thread Thomas Munro
On Thu, Mar 24, 2022 at 9:59 AM Peter Geoghegan  wrote:
> On Wed, Mar 23, 2022 at 1:53 PM Robert Haas  wrote:
> > And therefore I favor defining it to mean that we don't
> > skip any work at all.
>
> But even today DISABLE_PAGE_SKIPPING won't do pruning when we cannot
> acquire a cleanup lock on a page, unless it happens to have XIDs from
> before FreezeLimit (which is probably 50 million XIDs behind
> OldestXmin, the vacuum_freeze_min_age default). I don't see much
> difference.

Yeah, I found it confusing that DISABLE_PAGE_SKIPPING doesn't disable
all page skipping, so 3414099c turned out to be not enough.




Re: Adding CI to our tree

2022-03-23 Thread Justin Pryzby
On Thu, Mar 24, 2022 at 09:52:39AM +1300, Thomas Munro wrote:
> On Thu, Mar 10, 2022 at 9:37 AM Justin Pryzby  wrote:
> > -Og
> 
> Adding this to CXXFLAGS caused a torrent of warnings from g++ about
> LLVM headers, which I also see on my local system for LLVM 11 and LLVM
> 14:

Yes, I mentioned seeing some other warnings here.
20220302205058.gj15...@telsasoft.com

I think warnings were cleaned up with -O0/1/2 but not with -Og.




Re: Schema variables - new implementation for Postgres 15

2022-03-23 Thread Pavel Stehule
Hi


A bit more work seems to be needed for deparsing session variables:
>
> # create variable myvar text;
> CREATE VARIABLE
>
> # create view myview as select myvar;
> CREATE VIEW
>
> # \d+ myview
>   View "public.myview"
>  Column | Type | Collation | Nullable | Default | Storage  | Description
> +--+---+--+-+--+-
>  myvar  | text |   |  | | extended |
> View definition:
>  SELECT myvar AS myvar;
>
> There shouldn't be an explicit alias I think.
>

I check this issue, and I afraid so it is not fixable. The target list
entry related to session variable has not some magic value like ?column?
that can be used for check if tle->resname is implicit or explicit

And in this time I cannot to use FigureColname because it doesn't work with
transformed nodes. More - the Param node can be nested in SubscriptingRef
or FieldSelect. It doesn't work perfectly now. See following example:

create type xt as (a int, b int);
create view b as select (10, ((random()*100)::int)::xt).b;
\d+ b
SELECT (ROW(10, (random() * 100::double precision)::integer)::xt).b AS b;

Regards

Pavel


Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations

2022-03-23 Thread Peter Geoghegan
On Wed, Mar 23, 2022 at 1:53 PM Robert Haas  wrote:
> I see what you mean about it depending on how you define "skipping".
> But I think that DISABLE_PAGE_SKIPPING is intended as a sort of
> emergency safeguard when you really, really don't want to leave
> anything out.

I agree.

> And therefore I favor defining it to mean that we don't
> skip any work at all.

But even today DISABLE_PAGE_SKIPPING won't do pruning when we cannot
acquire a cleanup lock on a page, unless it happens to have XIDs from
before FreezeLimit (which is probably 50 million XIDs behind
OldestXmin, the vacuum_freeze_min_age default). I don't see much
difference.

Anyway, this isn't important. I'll just drop the third patch.

-- 
Peter Geoghegan




Re: ExecRTCheckPerms() and many prunable partitions

2022-03-23 Thread Zhihong Yu
On Wed, Mar 23, 2022 at 12:03 AM Amit Langote 
wrote:

> On Mon, Mar 14, 2022 at 4:36 PM Amit Langote 
> wrote:
> > Also needed fixes when rebasing.
>
> Needed another rebase.
>
> As the changes being made with the patch are non-trivial and the patch
> hasn't been reviewed very significantly since Alvaro's comments back
> in Sept 2021 which I've since addressed, I'm thinking of pushing this
> one into the version 16 dev cycle.
>
> --
> Amit Langote
> EDB: http://www.enterprisedb.com

Hi,
For patch 1:

bq. makes permissions-checking needlessly expensive when many inheritance
children are added to the range range

'range' is repeated in the above sentence.

+ExecCheckOneRelPerms(RelPermissionInfo *perminfo)

Since RelPermissionInfo is for one relation, I think the 'One' in func name
can be dropped.

+   else/* this isn't a child result rel */
+   resultRelInfo->ri_RootToChildMap = NULL;
...
+   resultRelInfo->ri_RootToChildMapValid = true;

Should the assignment of true value be moved into the if block (in the else
block, ri_RootToChildMap is assigned NULL) ?

+   /* Looks like the RTE doesn't, so try to find it the hard way. */

doesn't -> doesn't know

Cheers


Re: SQL/JSON: functions

2022-03-23 Thread Justin Pryzby
On Wed, Mar 23, 2022 at 03:49:17PM -0400, Andrew Dunstan wrote:
> 
> On 3/23/22 08:24, Justin Pryzby wrote:
> > At least 0002-SQL-JSON-constructors-v64.patch has an issue with nodes,
> > per COPY_PARSE_PLAN_TREES.
> >
> > +ERROR:  unrecognized node type: 157
> 
> I just tried to reproduce this and was unable to.  Ubuntu 20.04, gcc 9.4.0.
> 
> the build was done with CPPFLAGS=-DCOPY_PARSE_PLAN_TREES. To be on the
> safe side I took ccache out of the equation.
> 
> Can you give me any more details?

Sorry, the issue was actually with 

#define RAW_EXPRESSION_COVERAGE_TEST

make check is enough to see it:

@@ -6,181 +6,78 @@
 (1 row)
 
 SELECT JSON_OBJECT(RETURNING json);
- json_object 
--
- {}
-(1 row)
-
+ERROR:  unrecognized node type: 157
[...]




Re: [PATCH] add relation and block-level filtering to pg_waldump

2022-03-23 Thread Peter Eisentraut

On 21.03.22 05:55, Thomas Munro wrote:

[04:30:50.630] pg_waldump.c:963:26: error: format ‘%u’ expects
argument of type ‘unsigned int *’, but argument 3 has type ‘ForkNumber
*’ [-Werror=format=]
[04:30:50.630] 963 | if (sscanf(optarg, "%u",
_by_relation_forknum) != 1 ||
[04:30:50.630] | ~^ ~~
[04:30:50.630] | | |
[04:30:50.630] | | ForkNumber *
[04:30:50.630] | unsigned int *

And now that this gets to the CompilerWarnings CI task, it looks like
GCC doesn't like an enum as a scanf %u destination (I didn't see that
warning locally when I compiled the above fixup because clearly Clang
is cool with it...).  Probably needs a temporary unsigned int to
sscanf into first.


That's because ForkNum is a signed type.  You will probably succeed if 
you use "%d" instead.





Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations

2022-03-23 Thread Robert Haas
On Wed, Mar 23, 2022 at 4:49 PM Peter Geoghegan  wrote:
> On Wed, Mar 23, 2022 at 1:41 PM Robert Haas  wrote:
> > It seems to me that if DISABLE_PAGE_SKIPPING doesn't completely
> > disable skipping pages, we have a problem.
>
> It depends on how you define skipping. DISABLE_PAGE_SKIPPING was
> created at a time when a broader definition of skipping made a lot
> more sense.
>
> > The option isn't named CARE_ABOUT_VISIBILITY_MAP. It's named
> > DISABLE_PAGE_SKIPPING.
>
> VACUUM(DISABLE_PAGE_SKIPPING, VERBOSE) will still consistently show
> that 100% of all of the pages from rel_pages are scanned. A page that
> is "skipped" by lazy_scan_noprune isn't pruned, and won't have any of
> its tuples frozen. But every other aspect of processing the page
> happens in just the same way as it would in the cleanup
> lock/lazy_scan_prune path.

I see what you mean about it depending on how you define "skipping".
But I think that DISABLE_PAGE_SKIPPING is intended as a sort of
emergency safeguard when you really, really don't want to leave
anything out. And therefore I favor defining it to mean that we don't
skip any work at all.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Adding CI to our tree

2022-03-23 Thread Thomas Munro
On Thu, Mar 10, 2022 at 9:37 AM Justin Pryzby  wrote:
> -Og

Adding this to CXXFLAGS caused a torrent of warnings from g++ about
LLVM headers, which I also see on my local system for LLVM 11 and LLVM
14:

[19:47:11.047] /usr/lib/llvm-11/include/llvm/ADT/Twine.h: In member
function ‘llvm::CallInst*
llvm::IRBuilderBase::CreateCall(llvm::FunctionType*, llvm::Value*,
llvm::ArrayRef, const llvm::Twine&, llvm::MDNode*)’:
[19:47:11.047] /usr/lib/llvm-11/include/llvm/ADT/Twine.h:229:16:
warning: ‘.llvm::Twine::LHS.llvm::Twine::Child::twine’ may
be used uninitialized in this function [-Wmaybe-uninitialized]
[19:47:11.047] 229 | !LHS.twine->isBinary())
[19:47:11.047] | ^

https://cirrus-ci.com/task/5597526098182144?logs=build#L6

Not sure who to complain to about that...




Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations

2022-03-23 Thread Peter Geoghegan
On Wed, Mar 23, 2022 at 1:41 PM Robert Haas  wrote:
> It seems to me that if DISABLE_PAGE_SKIPPING doesn't completely
> disable skipping pages, we have a problem.

It depends on how you define skipping. DISABLE_PAGE_SKIPPING was
created at a time when a broader definition of skipping made a lot
more sense.

> The option isn't named CARE_ABOUT_VISIBILITY_MAP. It's named
> DISABLE_PAGE_SKIPPING.

VACUUM(DISABLE_PAGE_SKIPPING, VERBOSE) will still consistently show
that 100% of all of the pages from rel_pages are scanned. A page that
is "skipped" by lazy_scan_noprune isn't pruned, and won't have any of
its tuples frozen. But every other aspect of processing the page
happens in just the same way as it would in the cleanup
lock/lazy_scan_prune path.

We'll even still VACUUM the page if it happens to have some existing
LP_DEAD items left behind by opportunistic pruning. We don't need a
cleanup in either lazy_scan_noprune (a share lock is all we need), nor
do we even need one in lazy_vacuum_heap_page (a regular exclusive lock
is all we need).

-- 
Peter Geoghegan




Re: [PATCH] Remove workarounds to format [u]int64's

2022-03-23 Thread Peter Eisentraut

On 21.03.22 15:37, Aleksander Alekseev wrote:

It would not simplify things for them at all, just mess it up.
The master copies of the .po files are kept in a different repo.
Also, I believe that extraction of new message strings is automated
already.


Got it, thanks. Here is the corrected patch. It includes all the
changes by me and Japin, and doesn't touch PO files.


I think in some cases we can make this even simpler (and cast-free) by 
changing the underlying variable to be long long instead of int64. 
Especially in cases where the whole point of the variable is to be some 
counter that ends up being printed, there isn't a need to use int64 in 
the first place.  See attached patch for examples.From 9c5109c32b8702d14e42324cba027ee987df6cd0 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Wed, 23 Mar 2022 21:44:48 +0100
Subject: [PATCH] Change some variables to long long from int64

This makes printing them simpler.
---
 src/backend/commands/copy.c  |  2 +-
 src/backend/commands/copyfrom.c  | 20 
 src/backend/commands/copyto.c|  2 +-
 src/backend/tcop/utility.c   |  2 +-
 src/bin/pg_checksums/pg_checksums.c  | 30 
 src/include/commands/copy.h  |  6 ++---
 src/include/commands/copyfrom_internal.h |  2 +-
 7 files changed, 32 insertions(+), 32 deletions(-)

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 7da7105d44..8bcfc305ee 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -63,7 +63,7 @@
 void
 DoCopy(ParseState *pstate, const CopyStmt *stmt,
   int stmt_location, int stmt_len,
-  uint64 *processed)
+  unsigned long long *processed)
 {
boolis_from = stmt->is_from;
boolpipe = (stmt->filename == NULL);
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index db6eb6fae7..5696c7e857 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -80,7 +80,7 @@ typedef struct CopyMultiInsertBuffer
ResultRelInfo *resultRelInfo;   /* ResultRelInfo for 'relid' */
BulkInsertState bistate;/* BulkInsertState for this rel */
int nused;  /* number of 'slots' 
containing tuples */
-   uint64  linenos[MAX_BUFFERED_TUPLES];   /* Line # of tuple in 
copy
+   unsigned long long linenos[MAX_BUFFERED_TUPLES];/* Line # of 
tuple in copy

 * stream */
 } CopyMultiInsertBuffer;
 
@@ -122,12 +122,12 @@ CopyFromErrorCallback(void *arg)
if (cstate->cur_attname)
errcontext("COPY %s, line %llu, column %s",
   cstate->cur_relname,
-  (unsigned long long) 
cstate->cur_lineno,
+  cstate->cur_lineno,
   cstate->cur_attname);
else
errcontext("COPY %s, line %llu",
   cstate->cur_relname,
-  (unsigned long long) 
cstate->cur_lineno);
+  cstate->cur_lineno);
}
else
{
@@ -139,7 +139,7 @@ CopyFromErrorCallback(void *arg)
attval = limit_printout_length(cstate->cur_attval);
errcontext("COPY %s, line %llu, column %s: \"%s\"",
   cstate->cur_relname,
-  (unsigned long long) 
cstate->cur_lineno,
+  cstate->cur_lineno,
   cstate->cur_attname,
   attval);
pfree(attval);
@@ -149,7 +149,7 @@ CopyFromErrorCallback(void *arg)
/* error is relevant to a particular column, value is 
NULL */
errcontext("COPY %s, line %llu, column %s: null input",
   cstate->cur_relname,
-  (unsigned long long) 
cstate->cur_lineno,
+  cstate->cur_lineno,
   cstate->cur_attname);
}
else
@@ -166,14 +166,14 @@ CopyFromErrorCallback(void *arg)
lineval = 
limit_printout_length(cstate->line_buf.data);
errcontext("COPY %s, line %llu: \"%s\"",
   cstate->cur_relname,
-  (unsigned long long) 
cstate->cur_lineno, lineval);
+  

Re: MDAM techniques and Index Skip Scan patch

2022-03-23 Thread Dmitry Dolgov
> On Tue, Mar 22, 2022 at 04:55:49PM -0400, Tom Lane wrote:
> Peter Geoghegan  writes:
> > Like many difficult patches, the skip scan patch is not so much
> > troubled by problems with the implementation as it is troubled by
> > *ambiguity* about the design. Particularly concerning how skip scan
> > meshes with existing designs, as well as future designs --
> > particularly designs for other MDAM techniques. I've started this
> > thread to have a big picture conversation about how to think about
> > these things.
>
> Peter asked me off-list to spend some time thinking about the overall
> direction we ought to be pursuing here.  I have done that, and here
> are a few modest suggestions.

Thanks. To make sure I understand your proposal better, I have a couple
of questions:

> In short: I would throw out just about all the planner infrastructure
> that's been proposed so far.  It looks bulky, expensive, and
> drastically undercommented, and I don't think it's buying us anything
> of commensurate value.

Broadly speaking planner related changes proposed in the patch so far
are: UniqueKey, taken from the neighbour thread about select distinct;
list of uniquekeys to actually pass information about the specified
loose scan prefix into nbtree; some verification logic to prevent
applying skipping when it's not supported. I can imagine taking out
UniqueKeys and passing loose scan prefix in some other form (the other
parts seems to be essential) -- is that what you mean?




Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations

2022-03-23 Thread Robert Haas
On Wed, Mar 23, 2022 at 3:59 PM Peter Geoghegan  wrote:
> In other words, since DISABLE_PAGE_SKIPPING doesn't *consistently*
> force lazy_scan_noprune to refuse to process a page on HEAD (it all
> depends on FreezeLimit/vacuum_freeze_min_age), it is logical for
> DISABLE_PAGE_SKIPPING to totally get out of the business of caring
> about that -- better to limit it to caring only about the visibility
> map (by no longer making it force aggressiveness).

It seems to me that if DISABLE_PAGE_SKIPPING doesn't completely
disable skipping pages, we have a problem.

The option isn't named CARE_ABOUT_VISIBILITY_MAP. It's named
DISABLE_PAGE_SKIPPING.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




multithreaded zstd backup compression for client and server

2022-03-23 Thread Robert Haas
[ Changing subject line in the hopes of attracting more eyeballs. ]

On Mon, Mar 14, 2022 at 12:11 PM Dipesh Pandit  wrote:
> I tried to implement support for parallel ZSTD compression.

Here's a new patch for this. It's more of a rewrite than an update,
honestly; commit ffd53659c46a54a6978bcb8c4424c1e157a2c0f1 necessitated
totally different options handling, but I also redid the test cases,
the documentation, and the error message.

For those who may not have been following along, here's an executive
summary: libzstd offers an option for parallel compression. It's
intended to be transparent: you just say you want it, and the library
takes care of it for you. Since we have the ability to do backup
compression on either the client or the server side, we can expose
this option in both locations. That would be cool, because it would
allow for really fast backup compression with a good compression
ratio. It would also mean that we would be, or really libzstd would
be, spawning threads inside the PostgreSQL backend. Short of cats and
dogs living together, it's hard to think of anything more terrifying,
because the PostgreSQL backend is very much not thread-safe. However,
a lot of the things we usually worry about when people make noises
about using threads in the backend don't apply here, because the
threads are hidden away behind libzstd interfaces and can't execute
any PostgreSQL code. Therefore, I think it might be safe to just ...
turn this on. One reason I think that is that this whole approach was
recommended to me by Andres ... but that's not to say that there
couldn't be problems.  I worry a bit that the mere presence of threads
could in some way mess things up, but I don't know what the mechanism
for that would be, and I don't want to postpone shipping useful
features based on nebulous fears.

In my ideal world, I'd like to push this into v15. I've done a lot of
work to improve the backup code in this release, and this is actually
a very small change yet one that potentially enables the project to
get a lot more value out of the work that has already been committed.
That said, I also don't want to break the world, so if you have an
idea what this would break, please tell me.

For those curious as to how this affects performance and backup size,
I loaded up the UK land registry database. That creates a 3769MB
database. Then I backed it up using client-side compression and
server-side compression using the various different algorithms that
are supported in the master branch, plus parallel zstd.

no compression: 3.7GB, 9 seconds
gzip: 1.5GB, 140 seconds with server-side, 141 seconds with client-side
lz4: 2.0GB, 13 seconds with server-side, 12 seconds with client-side

For both parallel and non-parallel zstd compression, I see differences
between the compressed size depending on where the compression is
done. I don't know whether this is an expected behavior of the zstd
library or a bug. Both files uncompress OK and pass pg_verifybackup,
but that doesn't mean we're not, for example, selecting different
compression levels where we shouldn't be. I'll try to figure out
what's going on here.

zstd, client-side: 1.7GB, 17 seconds
zstd, server-side: 1.3GB, 25 seconds
parallel zstd, 4 workers, client-side: 1.7GB, 7.5 seconds
parallel zstd, 4 workers, server-side: 1.3GB, 7.2 seconds

Notice that compressing the backup with parallel zstd is actually
faster than taking an uncompressed backup, even though this test is
all being run on the same machine. That's kind of crazy to me: the
parallel compression is so fast that we save more time on I/O than we
spend compressing. This assumes of course that you have plenty of CPU
resources and limited I/O resources, which won't be true for everyone,
but it's not an unusual situation.

I think the documentation changes in this patch might not be quite up
to scratch. I think there's a brewing problem here: as we add more
compression options, whether or not that happens in this release, and
regardless of what specific options we add, the way things are
structured right now, we're going to end up either duplicating a bunch
of stuff between the pg_basebackup documentation and the BASE_BACKUP
documentation, or else one of those places is going to end up lacking
information that someone reading it might like to have. I'm not
exactly sure what to do about this, though.

This patch contains a trivial adjustment to
PostgreSQL::Test::Cluster::run_log to make it return a useful value
instead of not. I think that should be pulled out and committed
independently regardless of what happens to this patch overall, and
possibly back-patched.

Thanks,

-- 
Robert Haas
EDB: http://www.enterprisedb.com


0001-Allow-parallel-zstd-compression-when-taking-a-base-b.patch
Description: Binary data


Re: Parameter for planner estimate of recursive queries

2022-03-23 Thread Tom Lane
Simon Riggs  writes:
>> [New patch version pending]

Do you have any objection if I rename the GUC to
recursive_worktable_factor?  That seems a bit clearer as to what
it does, and it leaves more room for other knobs in the same area
if we decide we need any.

regards, tom lane




Re: SQL/JSON: functions

2022-03-23 Thread Andrew Dunstan


On 3/23/22 15:49, Andrew Dunstan wrote:
> On 3/23/22 08:24, Justin Pryzby wrote:
>> At least 0002-SQL-JSON-constructors-v64.patch has an issue with nodes,
>> per COPY_PARSE_PLAN_TREES.
>>
>> +ERROR:  unrecognized node type: 157
>
>
> I just tried to reproduce this and was unable to.  Ubuntu 20.04, gcc 9.4.0.
>
>
> the build was done with CPPFLAGS=-DCOPY_PARSE_PLAN_TREES. To be on the
> safe side I took ccache out of the equation.
>
>
> Can you give me any more details?
>
>

I have reproduced similar errors from patch 4 in the series.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: ubsan

2022-03-23 Thread Andres Freund
Hi,

On 2022-03-23 15:58:09 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > I think we should backpatch both, based on the reasoning in
> > 46ab07ffda9d6c8e63360ded2d4568aa160a7700 ?
> 
> Yeah, I suppose.  Is anyone going to step up and run a buildfarm
> member with ubsan enabled?  (I'm already checking -fsanitize=alignment
> on longfin, but it seems advisable to keep that separate from
> -fsanitize=undefined.)

I'm planning to enable it on two of mine. Looks like gcc and clang find
slightly different things, so I was intending to enable it on one of each.

Greetings,

Andres Freund




Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations

2022-03-23 Thread Peter Geoghegan
On Sun, Mar 13, 2022 at 9:05 PM Peter Geoghegan  wrote:
> Attached is v10. While this does still include the freezing patch,
> it's not in scope for Postgres 15. As I've said, I still think that it
> makes sense to maintain the patch series with the freezing stuff,
> since it's structurally related.

Attached is v11. Changes:

* No longer includes the patch that adds page-level freezing. It was
making it harder to assess code coverage for the patches that I'm
targeting Postgres 15 with. And so including it with each new revision
no longer seems useful. I'll pick it up for Postgres 16.

* Extensive isolation tests added to v11-0001-*, exercising a lot of
hard-to-hit code paths that are reached when VACUUM is unable to
immediately acquire a cleanup lock on some heap page. In particular,
we now have test coverage for the code in heapam.c that handles
tracking the oldest extant XID and MXID in the presence of MultiXacts
(on a no-cleanup-lock heap page).

* v11-0002-* (which is the patch that avoids missing out on advancing
relfrozenxid in non-aggressive VACUUMs due to a race condition on
HEAD) now moves even more of the logic for deciding how VACUUM will
skip using the visibility map into its own helper routine. Now
lazy_scan_heap just does what the state returned by the helper routine
tells it about the current skippable range -- it doesn't make any
decisions itself anymore. This is far simpler than what we do
currently, on HEAD.

There are no behavioral changes here, but this approach could be
pushed further to improve performance. We could easily determine
*every* page that we're going to scan (not skip) up-front in even the
largest tables, very early, before we've even scanned one page. This
could enable things like I/O prefetching, or capping the size of the
dead_items array based on our final scanned_pages (not on rel_pages).

* A new patch (v11-0003-*) alters the behavior of VACUUM's
DISABLE_PAGE_SKIPPING option. DISABLE_PAGE_SKIPPING no longer forces
aggressive VACUUM -- now it only forces the use of the visibility map,
since that behavior is totally independent of aggressiveness.

I don't feel too strongly about the DISABLE_PAGE_SKIPPING change. It
just seems logical to decouple no-vm-skipping from aggressiveness --
it might actually be helpful in testing the work from the patch series
in the future. Any page counted in scanned_pages has essentially been
processed by VACUUM with this work in place -- that was the idea
behind the lazy_scan_noprune stuff from commit 44fa8488. Bear in mind
that the relfrozenxid tracking stuff from v11-0001-* makes it almost
certain that a DISABLE_PAGE_SKIPPING-without-aggressiveness VACUUM
will still manage to advance relfrozenxid -- usually by the same
amount as an equivalent aggressive VACUUM would anyway. (Failing to
acquire a cleanup lock on some heap page might result in the final
older relfrozenxid being appreciably older, but probably not, and we'd
still almost certainly manage to advance relfrozenxid by *some* small
amount.)

Of course, anybody that wants both an aggressive VACUUM and a VACUUM
that never skips even all-frozen pages in the visibility map will
still be able to get that behavior quite easily. For example,
VACUUM(DISABLE_PAGE_SKIPPING, FREEZE) will do that. Several of our
existing tests must already use both of these options together,
because the tests require an effective vacuum_freeze_min_age of 0 (and
vacuum_multixact_freeze_min_age of 0) -- DISABLE_PAGE_SKIPPING alone
won't do that on HEAD, which seems to confuse the issue (see commit
b700f96c for an example of that).

In other words, since DISABLE_PAGE_SKIPPING doesn't *consistently*
force lazy_scan_noprune to refuse to process a page on HEAD (it all
depends on FreezeLimit/vacuum_freeze_min_age), it is logical for
DISABLE_PAGE_SKIPPING to totally get out of the business of caring
about that -- better to limit it to caring only about the visibility
map (by no longer making it force aggressiveness).

-- 
Peter Geoghegan


v11-0003-Don-t-force-aggressive-mode-for-DISABLE_PAGE_SKI.patch
Description: Binary data


v11-0002-Generalize-how-VACUUM-skips-all-frozen-pages.patch
Description: Binary data


v11-0001-Loosen-coupling-between-relfrozenxid-and-freezin.patch
Description: Binary data


Re: Proposal: Support custom authentication methods using hooks

2022-03-23 Thread Peter Eisentraut

On 15.03.22 20:27, samay sharma wrote:

This patch-set adds the following:

* Allow multiple custom auth providers to be registered (Addressing 
feedback from Aleksander and Andrew)
* Modify the test extension to use SCRAM to exchange secrets (Based on 
Andres's suggestion)
* Add support for custom auth options to configure provider's behavior 
(by exposing a new hook) (Required by OAUTHBEARER)

* Allow custom auth methods to use usermaps. (Required by OAUTHBEARER)


Some feedback on this specific patch set:

Custom authentication methods should be able to register their own name 
other than "custom".  You ought to refactor things so that existing 
methods such as ldap and pam go through your extension interface.  So 
the whole thing should be more like a lookup table or list with some 
built-in entries that modules can dynamically add on to.


Then you also don't need a test module, since the existing 
authentication methods would already test the interfaces.





Re: ubsan

2022-03-23 Thread Tom Lane
Andres Freund  writes:
> I think we should backpatch both, based on the reasoning in
> 46ab07ffda9d6c8e63360ded2d4568aa160a7700 ?

Yeah, I suppose.  Is anyone going to step up and run a buildfarm
member with ubsan enabled?  (I'm already checking -fsanitize=alignment
on longfin, but it seems advisable to keep that separate from
-fsanitize=undefined.)

regards, tom lane




Re: Documenting when to retry on serialization failure

2022-03-23 Thread Tom Lane
Simon Riggs  writes:
> I've tried to sum up the various points from everybody into this doc
> patch. Thanks all for replies.

This seemed rather badly in need of copy-editing.  How do you
like the attached text?

regards, tom lane

diff --git a/doc/src/sgml/mvcc.sgml b/doc/src/sgml/mvcc.sgml
index da07f3f6c6..cd659dd994 100644
--- a/doc/src/sgml/mvcc.sgml
+++ b/doc/src/sgml/mvcc.sgml
@@ -588,7 +588,7 @@ ERROR:  could not serialize access due to concurrent update
 applications using this level must
 be prepared to retry transactions due to serialization failures.
 In fact, this isolation level works exactly the same as Repeatable
-Read except that it monitors for conditions which could make
+Read except that it also monitors for conditions which could make
 execution of a concurrent set of serializable transactions behave
 in a manner inconsistent with all possible serial (one at a time)
 executions of those transactions.  This monitoring does not
@@ -1720,6 +1720,60 @@ SELECT pg_advisory_lock(q.id) FROM

   
 
+  
+   Serialization Failure Handling
+
+   
+serialization failure
+   
+   
+retryable error
+   
+
+   
+Both Repeatable Read and Serializable isolation levels can produce
+errors that are designed to prevent serialization anomalies.  As
+previously stated, applications using these levels must be prepared to
+retry transactions that fail due to serialization errors.  Such an
+error's message text will vary according to the precise circumstances,
+but it will always have the SQLSTATE code 40001
+(serialization_failure).
+   
+
+   
+It may also be advisable to retry deadlock failures.
+These have the SQLSTATE code 40P01
+(deadlock_detected).
+   
+
+   
+In some circumstances, a failure that is arguably a serialization
+problem may manifest as a unique-key failure, with SQLSTATE
+code 23505 (unique_violation),
+or as an exclusion constraint failure, with SQLSTATE
+code 23P01 (exclusion_violation).
+Therefore, retrying these cases may also be advisable, although one must
+be careful that such an error could be persistent.
+   
+
+   
+It is important to retry the complete transaction, including all logic
+that decides which SQL to issue and/or which values to use.
+Therefore, PostgreSQL does not offer an
+automatic retry facility, since it cannot do so with any guarantee of
+correctness.
+   
+
+   
+Transaction retry does not guarantee that the retried transaction will
+complete; multiple retries may be needed.  In cases with very high
+contention, it is possible that completion of a transaction may take
+many attempts.  In cases involving a conflicting prepared transaction,
+it may not be possible to make progress until the prepared transaction
+commits or rolls back.
+   
+  
+
   
Caveats
 


Re: SQL/JSON: functions

2022-03-23 Thread Andrew Dunstan


On 3/23/22 08:24, Justin Pryzby wrote:
> At least 0002-SQL-JSON-constructors-v64.patch has an issue with nodes,
> per COPY_PARSE_PLAN_TREES.
>
> +ERROR:  unrecognized node type: 157



I just tried to reproduce this and was unable to.  Ubuntu 20.04, gcc 9.4.0.


the build was done with CPPFLAGS=-DCOPY_PARSE_PLAN_TREES. To be on the
safe side I took ccache out of the equation.


Can you give me any more details?


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Parameter for planner estimate of recursive queries

2022-03-23 Thread Simon Riggs
On Wed, 23 Mar 2022 at 18:20, Simon Riggs  wrote:

> [New patch version pending]

-- 
Simon Riggshttp://www.EnterpriseDB.com/


recursive_worktable_estimate.v3.patch
Description: Binary data


Re: ubsan

2022-03-23 Thread Andres Freund
Hi,

On 2022-03-23 11:21:37 -0700, Andres Freund wrote:
> On 2022-03-23 13:54:50 -0400, Tom Lane wrote:
> > Andres Freund  writes:
> > > I tried to run postgres with ubsan to debug something.
> > 
> > For 0001, could we just replace configure's dlopen check with the
> > dlsym check?  Or are you afraid of reverse-case failures?
> 
> Yea, I was worried about that. But now that I think more about it, it's hard
> to believe something could provide / intercept dlsym but not dlopen. I guess
> we can try and see?
>
> > 0003: OK.  Interesting though that we haven't seen these before.

I think we should backpatch both, based on the reasoning in
46ab07ffda9d6c8e63360ded2d4568aa160a7700 ?

Greetings,

Andres Freund




Re: warn if GUC set to an invalid shared library

2022-03-23 Thread Tom Lane
Maciek Sakrejda  writes:
> In v4, the message looks fine to me for shared_preload_libraries
> (except there is a doubled "is"). However, I also get the message for
> a simple SET with local_preload_libraries:

> postgres=# set local_preload_libraries=xyz;
> WARNING:  could not access file "xyz"
> HINT:  The server will fail to start with the existing configuration.
> If it is is shut down, it will be necessary to manually edit the
> postgresql.auto.conf file to allow it to start.
> SET

I agree with Maciek's concerns about these HINTs being emitted
inappropriately, but I also have a stylistic gripe: they're only
halfway hints.  Given that we fix things so they only print when they
should, the complaint about the server not starting is not a hint,
it's a fact, which per style guidelines means it should be errdetail.
So I think this ought to look more like

WARNING:  could not access file "xyz"
DETAIL:  The server will fail to start with this setting.
HINT:  If the server is shut down, it will be necessary to manually edit the
postgresql.auto.conf file to allow it to start again.

I adjusted the wording a bit too --- YMMV, but I think my text is clearer.

regards, tom lane




Re: speed up a logical replica setup

2022-03-23 Thread Peter Eisentraut

On 18.03.22 23:34, Andrew Dunstan wrote:

On 3/15/22 09:51, Peter Eisentraut wrote:

On 21.02.22 13:09, Euler Taveira wrote:

A new tool called pg_subscriber does this conversion and is tightly
integrated
with Postgres.


Are we comfortable with the name pg_subscriber?  It seems too general.
Are we planning other subscriber-related operations in the future?  If
so, we should at least make this one use a --create option or
something like that.



Not really sold on the name (and I didn't much like the name
pglogical_create_subscriber either, although it's a cool facility and
I'm happy to see us adopting something like it).

ISTM we should have a name that conveys that we are *converting* a
replica or equivalent to a subscriber.


The pglogical tool includes the pg_basebackup run, so it actually 
"creates" the subscriber from scratch.  Whether this tool is also doing 
that is still being discussed.





Re: prevent immature WAL streaming

2022-03-23 Thread Alvaro Herrera
On 2021-Sep-03, Alvaro Herrera wrote:

> The last commit is something I noticed in pg_rewind ...

I had missed this one; it's pushed now.

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/
"I can see support will not be a problem.  10 out of 10."(Simon Wittber)
  (http://archives.postgresql.org/pgsql-general/2004-12/msg00159.php)




Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

2022-03-23 Thread Tom Lane
Tatsuo Ishii  writes:
> The patch Pushed. Thank you!

My hoary animal prairiedog doesn't like this [1]:

#   Failed test 'concurrent update with retrying stderr /(?s-xim:client (0|1) 
got an error in command 3 \\(SQL\\) of script 0; ERROR:  could not serialize 
access due to concurrent update\\b.*\\g1)/'
#   at t/001_pgbench_with_server.pl line 1229.
#   'pgbench: pghost: /tmp/nhghgwAoki pgport: 58259 nclients: 2 
nxacts: 1 dbName: postgres
...
# pgbench: client 0 got an error in command 3 (SQL) of script 0; ERROR:  could 
not serialize access due to concurrent update
...
# '
# doesn't match '(?s-xim:client (0|1) got an error in command 3 \\(SQL\\) 
of script 0; ERROR:  could not serialize access due to concurrent 
update\\b.*\\g1)'
# Looks like you failed 1 test of 425.

I'm not sure what the "\\b.*\\g1" part of this regex is meant to
accomplish, but it seems to be assuming more than it should
about the output format of TAP messages.

regards, tom lane

[1] 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prairiedog=2022-03-23%2013%3A21%3A44




Re: ubsan

2022-03-23 Thread Andres Freund
Hi,

On 2022-03-23 13:54:50 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > I tried to run postgres with ubsan to debug something.
> 
> For 0001, could we just replace configure's dlopen check with the
> dlsym check?  Or are you afraid of reverse-case failures?

Yea, I was worried about that. But now that I think more about it, it's hard
to believe something could provide / intercept dlsym but not dlopen. I guess
we can try and see?


> 0002: ugh, but my only real complaint is that __ubsan_default_options
> needs more than zero comment.

Yea, definitely. I am still hoping that somebody could see a better approach
than that ugly hack.

Haven't yet checked, but probably should also verify asan either doesn't have
the same problem or provide the same hack for ASAN_OPTIONS.


> Also, it's not "our" getenv is it?

Not really. "libc's getenv()"?


> 0003: OK.  Interesting though that we haven't seen these before.

I assume it's a question of library version and configure flags.

Looks like the fwrite nonnull case isn't actually due to the nonnull
attribute, but just fwrite() getting intercepted by the sanitizer
library. Looks like that was added starting in gcc 9 [1]

And the guc.c case presumably requires --enable-nls and a version of gettext
using the nonnull attribute?


Wonder if there's a few functions we should add nonnull to ourselves. Probably
would help "everyday compiler warnings", static analyzers, and ubsan.

Greetings,

Andres Freund

[1]
5d3805fca3e9 (Jakub Jelinek 2017-10-19 13:23:59 +0200  1151) #if 
SANITIZER_INTERCEPT_FWRITE
5d3805fca3e9 (Jakub Jelinek 2017-10-19 13:23:59 +0200  1152) 
INTERCEPTOR(SIZE_T, fwrite, const void *p, uptr size, uptr nmemb, void *file) {
5d3805fca3e9 (Jakub Jelinek 2017-10-19 13:23:59 +0200  1153)   // libc file 
streams can call user-supplied functions, see fopencookie.
5d3805fca3e9 (Jakub Jelinek 2017-10-19 13:23:59 +0200  1154)   void *ctx;
5d3805fca3e9 (Jakub Jelinek 2017-10-19 13:23:59 +0200  1155)   
COMMON_INTERCEPTOR_ENTER(ctx, fwrite, p, size, nmemb, file);
5d3805fca3e9 (Jakub Jelinek 2017-10-19 13:23:59 +0200  1156)   SIZE_T res = 
REAL(fwrite)(p, size, nmemb, file);
5d3805fca3e9 (Jakub Jelinek 2017-10-19 13:23:59 +0200  1157)   if (res > 0) 
COMMON_INTERCEPTOR_READ_RANGE(ctx, p, res * size);
5d3805fca3e9 (Jakub Jelinek 2017-10-19 13:23:59 +0200  1158)   return res;
5d3805fca3e9 (Jakub Jelinek 2017-10-19 13:23:59 +0200  1159) }

$ git describe --tags 5d3805fca3e9
basepoints/gcc-8-3961-g5d3805fca3e




Re: Reducing power consumption on idle servers

2022-03-23 Thread Simon Riggs
On Tue, 22 Mar 2022 at 00:54, Andres Freund  wrote:
>
> On 2022-02-21 21:04:19 +, Simon Riggs wrote:
> > On Mon, 21 Feb 2022 at 16:49, Chapman Flack  wrote:
> >
> > > Shouldn't the comment be "with work_done=false" ?
> >
> > Good catch, thanks.
> >
> > I've also added docs to say that "promote_trigger_file" is now
> > deprecated. There were no tests for that functionality, so just as
> > well it is being removed.
>
> Doesn't pass tests, and hasn't for weeks from what I can see: 
> https://cirrus-ci.com/task/5925633648754688?logs=test_world#L1153
>
> Marked as waiting-on-author.

Hmm, just the not-in-sample test again. Fixed by marking GUC_NOT_IN_SAMPLE.

-- 
Simon Riggshttp://www.EnterpriseDB.com/


hibernate_to_reduce_power_consumption.v4.patch
Description: Binary data


Re: Parameter for planner estimate of recursive queries

2022-03-23 Thread Simon Riggs
On Wed, 23 Mar 2022 at 17:36, Tom Lane  wrote:
>
> Robert Haas  writes:
> > On Tue, Jan 25, 2022 at 4:44 AM Peter Eisentraut
> >  wrote:
> >> On the one hand, this smells like a planner hint.  But on the other
> >> hand, it doesn't look like we will come up with proper graph-aware
> >> selectivity estimation system any time soon, so just having all graph
> >> OLTP queries suck until then because the planner hint is hardcoded
> >> doesn't seem like a better solution.  So I think this setting can be ok.
>
> > I agree. It's a bit lame, but seems pretty harmless, and I can't see
> > us realistically doing a lot better with any reasonable amount of
> > work.
>
> Yeah, agreed on all counts.  The thing that makes it lame is that
> there's no reason to expect that the same multiplier is good for
> every recursive query done in an installation, or even in a session.
>
> One could imagine dealing with that by adding custom syntax to WITH,
> as we have already done once:
>
> WITH RECURSIVE cte1 AS SCALE 1.0 (SELECT ...
>
> But I *really* hesitate to go there, mainly because once we do
> something like that we can't ever undo it.  I think Simon's
> proposal is a reasonable low-effort compromise.
>
> Some nitpicks:
>
> * The new calculation needs clamp_row_est(), since the float
> GUC could be fractional or even zero.

True, will do.

> * Do we want to prevent the GUC value from being zero?  It's not
> very sensible, plus I think we might want to reserve that value
> to mean "use the built-in calculation", in case we ever do put
> in some smarter logic here.  But I'm not sure what a reasonable
> non-zero lower bound would be.

Agreed, makes sense.

> * The proposed docs claim that a smaller setting works by biasing
> the planner towards fast-start plans, but I don't think I believe
> that explanation.  I'd venture that we want text more along the
> lines of "This may help the planner choose the most appropriate
> method for joining the work table to the query's other tables".

OK, will improve.

[New patch version pending]

-- 
Simon Riggshttp://www.EnterpriseDB.com/




Re: ubsan

2022-03-23 Thread Tom Lane
Andres Freund  writes:
> I tried to run postgres with ubsan to debug something.

For 0001, could we just replace configure's dlopen check with the
dlsym check?  Or are you afraid of reverse-case failures?

0002: ugh, but my only real complaint is that __ubsan_default_options
needs more than zero comment.  Also, it's not "our" getenv is it?

0003: OK.  Interesting though that we haven't seen these before.

0004: no opinion

regards, tom lane




Re: shared-memory based stats collector - v67

2022-03-23 Thread Andres Freund
Hi,

On 2022-03-23 17:27:50 +0900, Kyotaro Horiguchi wrote:
> At Mon, 21 Mar 2022 14:30:17 -0700, Andres Freund  wrote 
> in 
> > > Right now we reset stats for replicas, even if we start from a shutdown
> > > checkpoint. That seems pretty unnecessary with this patch:
> > > - 0021-pgstat-wip-only-reset-pgstat-data-after-crash-re.patch
> > 
> > Might raise this in another thread for higher visibility.
> 
> + /*
> +  * When starting with crash recovery, reset pgstat data - it might not 
> be
> +  * valid. Otherwise restore pgstat data. It's safe to do this here,
> +  * because postmaster will not yet have started any other processes
> +  *
> +  * TODO: With a bit of extra work we could just start with a pgstat file
> +  * associated with the checkpoint redo location we're starting from.
> +  */
> + if (ControlFile->state == DB_SHUTDOWNED ||
> + ControlFile->state == DB_SHUTDOWNED_IN_RECOVERY)
> + pgstat_restore_stats();
> + else
> + pgstat_discard_stats();
> +
> 
> Before there, InitWalRecovery changes the state to
> DB_IN_ARCHIVE_RECOVERY if it was either DB_SHUTDOWNED or
> DB_IN_PRODUCTION. So the stat seems like always discarded on standby.

Hm. I though it worked at some point. I guess there's a reason this commit is
a separate commit marked WIP ;)


> In the first place, I'm not sure it is valid that a standby from a
> cold backup takes over the stats from the primary.

I don't really see a reason not to use the stats in that case - we have a
correct stats file after all. But it doesn't seem too important. What I
actually find worth addressing is the case of standbys starting in
DB_SHUTDOWNED_IN_RECOVERY. Right now we always throw stats away after a
*graceful* restart of a standby, which doesn't seem great.

Greetings,

Andres Freund




Re: Parameter for planner estimate of recursive queries

2022-03-23 Thread Tom Lane
Robert Haas  writes:
> On Tue, Jan 25, 2022 at 4:44 AM Peter Eisentraut
>  wrote:
>> On the one hand, this smells like a planner hint.  But on the other
>> hand, it doesn't look like we will come up with proper graph-aware
>> selectivity estimation system any time soon, so just having all graph
>> OLTP queries suck until then because the planner hint is hardcoded
>> doesn't seem like a better solution.  So I think this setting can be ok.

> I agree. It's a bit lame, but seems pretty harmless, and I can't see
> us realistically doing a lot better with any reasonable amount of
> work.

Yeah, agreed on all counts.  The thing that makes it lame is that
there's no reason to expect that the same multiplier is good for
every recursive query done in an installation, or even in a session.

One could imagine dealing with that by adding custom syntax to WITH,
as we have already done once:

WITH RECURSIVE cte1 AS SCALE 1.0 (SELECT ...

But I *really* hesitate to go there, mainly because once we do
something like that we can't ever undo it.  I think Simon's
proposal is a reasonable low-effort compromise.

Some nitpicks:

* The new calculation needs clamp_row_est(), since the float
GUC could be fractional or even zero.

* Do we want to prevent the GUC value from being zero?  It's not
very sensible, plus I think we might want to reserve that value
to mean "use the built-in calculation", in case we ever do put
in some smarter logic here.  But I'm not sure what a reasonable
non-zero lower bound would be.

* The proposed docs claim that a smaller setting works by biasing
the planner towards fast-start plans, but I don't think I believe
that explanation.  I'd venture that we want text more along the
lines of "This may help the planner choose the most appropriate
method for joining the work table to the query's other tables".

regards, tom lane




ubsan

2022-03-23 Thread Andres Freund
Hi,

I tried to run postgres with ubsan to debug something. I ran into two main
issues:


1) Despite Tom's recent efforts in [1], I see two ubsan failures in
   HEAD.

   These are easy enough to fix as per the attached, although the fix for the
   GetConfigOptionByNum() isn't great - we should probably pass a nulls array
   to GetConfigOptionByNum() as well, but that'd have a bunch of followup
   changes. So I'm inclined to go with the minimal for now.


2) When debugging issues I got very confused by the fact that *sometimes*
   UBSAN_OPTIONS takes effect and sometimes not. I was trying to have ubsan
   generate backtraces as well as a coredump with
   
UBSAN_OPTIONS="print_stacktrace=1:disable_coredump=0:abort_on_error=1:verbosity=2"

   After a lot of debugging I figured out that the options took effect in
   postmaster, but not in any children. Which in turn is because
   set_ps_display() breaks /proc/$pid/environ - it's empty in all postmaster
   children for me.

   The sanitizer library use /proc/$pid/environ (from [2] to [3]), because
   they don't want to use getenv() because it wants to work without libc
   (whether that's the right way, i have my doubts).  When just using
   undefined and alignment sanitizers, the sanitizer library is only
   initialized when the first error occurs, by which time we've often already
   called set_ps_display().

   Note that ps_status.c breaks /proc/$pid/environ even if the
   update_process_title GUC is set to false, because init_ps_display() ignores
   that. Yes, that confused me for a while last night.


   The reason that /proc/$pid/environ is borken is fairly obvious: We
   overwrite it in set_ps_display() in the PS_USE_CLOBBER_ARGV case. The
   kernel just looks at the range set up originally, which we'll overwrite
   with zeroes.

   We could try telling the kernel about the new location of environ using
   prctl() PR_SET_MM_ENV_START/END but the restrictions around that sound
   problematic.


   I've included a hacky workaround: Define __ubsan_default_options, a weak
   symbol libsanitizer uses to get defaults from the application, and return
   getenv("UBSAN_OPTIONS"). But only if main already was reached, so that we
   don't end up relying on a not-yet-working getenv().


   This also lead me to finally figure out why /proc/$pid/cmdline of postgres
   children includes a lot of NULL bytes. I'd noticed this because tools like
   pidstat -l started to print long long status strings at some point, before
   it got fixed on the pidstat side.
   The way we overwrite doesn't trigger this special case in the kernel:
   https://github.com/torvalds/linux/blob/master/fs/proc/base.c#L296
   therefore the original size of the commandline arguments are printed, with
   a lot of boring zeroes.


A test run of this in ci, with the guc.c intentionally reintroduced, shows the
failure both via core dump
https://api.cirrus-ci.com/v1/task/6543164315009024/logs/cores.log
and
in postmaster's log: 
https://api.cirrus-ci.com/v1/artifact/task/6543164315009024/log/src/test/regress/log/postmaster.log

although that part is a bit annoying to read, because the error is
interspersed with other log messages:

guc.c:9801:15: runtime error: null pointer passed as argument 2, which is 
declared to never be null
==19253==Using libbacktrace symbolizer.
2022-03-23 17:20:35.412 UTC [19258][client backend] 
[pg_regress/alter_operator][14/429:0] ERROR:  must be owner of operator ===
...
2022-03-23 17:20:35.601 UTC [19254][client backend] 
[pg_regress/alter_generic][10/1569:0] STATEMENT:  ALTER STATISTICS alt_stat2 
OWNER TO regress_alter_generic_user2;
#0 0x5626562b2ab4 in GetConfigOptionByNum 
/tmp/cirrus-ci-build/src/backend/utils/misc/guc.c:9801
#1 0x5626562b3fd5 in show_all_settings 
/tmp/cirrus-ci-build/src/backend/utils/misc/guc.c:10137
2022-03-23 17:20:35.604 UTC [19254][client backend] 
[pg_regress/alter_generic][10/1576:0] ERROR:  must be owner of statistics 
object alt_stat3
...
2022-03-23 17:20:35.601 UTC [19254][client backend] 
[pg_regress/alter_generic][10/1569:0] STATEMENT:  ALTER STATISTICS alt_stat2 
OWNER TO regress_alter_generic_user2;
#0 0x5626562b2ab4 in GetConfigOptionByNum 
/tmp/cirrus-ci-build/src/backend/utils/misc/guc.c:9801
#1 0x5626562b3fd5 in show_all_settings 
/tmp/cirrus-ci-build/src/backend/utils/misc/guc.c:10137
2022-03-23 17:20:35.604 UTC [19254][client backend] 
[pg_regress/alter_generic][10/1576:0] ERROR:  must be owner of statistics 
object alt_stat3
2022-03-23 17:20:35.604 UTC [19254][client backend] 
[pg_regress/alter_generic][10/1576:0] STATEMENT:  ALTER STATISTICS alt_stat3 
RENAME TO alt_stat4;
#2 0x562655c0ea86 in ExecMakeTableFunctionResult 
/tmp/cirrus-ci-build/src/backend/executor/execSRF.c:234
#3 0x562655c3f8be in FunctionNext 
/tmp/cirrus-ci-build/src/backend/executor/nodeFunctionscan.c:95
2022-03-23 17:20:35.605 UTC [19254][client backend] 
[pg_regress/alter_generic][10/1578:0] ERROR:  must be owner of statistics 

Re: [BUG] Panic due to incorrect missingContrecPtr after promotion

2022-03-23 Thread alvhe...@alvh.no-ip.org
On 2022-Mar-07, Imseih (AWS), Sami wrote:

> I have gone ahead and backpatched this all the way to 10 as well.

Thanks!  I pushed this now.  I edited the test though: I don't
understand why you went to the trouble of setting stuff in order to call
'pg_ctl promote' (in different ways for older branches), when
$node_standby->promote does the same and is simpler to call.  So I
changed the tests to do that.  (I did verify that without the code fix,
the PANIC indeed is thrown.)

Thank again,

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/




Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints

2022-03-23 Thread Dilip Kumar
On Wed, Mar 23, 2022 at 10:37 PM Andres Freund  wrote:
>
> Hi,
>
> On 2022-03-23 22:29:40 +0530, Dilip Kumar wrote:
> > I could not see any reason for it to fail, and I could not reproduce
> > it either.  Is it possible to access the server log for this cfbot
> > failure?
>
> Yes, near the top, below the cpu / memory graphs, there's a file
> navigator. Should have all files ending with *.log or starting with
> regress_log_*.

Okay, I think I have found the reasoning for this failure, basically,
if we see the below logs then the second statement is failing with
foobar5 already exists and that is because some of the above test case
is conditionally generating the same name.  So the fix is to use a
different name.

2022-03-23 13:53:54.554 UTC [32647][client backend]
[020_createdb.pl][3/12:0] LOG:  statement: CREATE DATABASE foobar5
TEMPLATE template0 LOCALE_PROVIDER icu ICU_LOCALE 'en';
..
2022-03-23 13:53:55.374 UTC [32717][client backend]
[020_createdb.pl][3/46:0] LOG:  statement: CREATE DATABASE foobar5
STRATEGY file_copy TEMPLATE foobar2;
2022-03-23 13:53:55.390 UTC [32717][client backend]
[020_createdb.pl][3/46:0] ERROR:  database "foobar5" already exists

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints

2022-03-23 Thread Andres Freund
Hi,

On 2022-03-23 22:29:40 +0530, Dilip Kumar wrote:
> I could not see any reason for it to fail, and I could not reproduce
> it either.  Is it possible to access the server log for this cfbot
> failure?

Yes, near the top, below the cpu / memory graphs, there's a file
navigator. Should have all files ending with *.log or starting with
regress_log_*.

Greetings,

Andres Freund




Re: pgsql: Unbreak the build.

2022-03-23 Thread Robert Haas
On Wed, Mar 23, 2022 at 12:03 PM Robert Haas  wrote:
> I'm looking into this now, but that's not the same Assert(false). The
> one Andres is talking about is at the end of get_bc_algorithm_name().
> This one is in tar_open_for_write(). AFAIK, the first of these is
> actually unreachable or at least I see no evidence that we are
> reaching it. The second is clearly reachable because we're failing the
> assertion. I thought that might be because I didn't test
> --without-zlib locally, and indeed in testing that just now, I found
> another unused variable warning which I need to fix. But, that doesn't
> account for this failure, because when I correct the problem with the
> unused variable, all the tests pass.
>
> I think what likely happened here is that in reorganizing some of the
> logic in basebackup.c, I caused COMPRESSION_GZIP to get passed to
> tar_open_for_write() even when HAVE_LIBZ is not defined. But I don't
> yet understand why it only happens on Windows. I am suspicious that
> the problem is in basebackup.c's main() function, but I haven't
> pinpointed it yet.

Oh, it's not that: it's that Windows is using threads, and therefore
LogStreamerMain() is getting called with the wrong arguments. I guess
I just need to move the additional parameters that I added to
LogStreamerMain() into members in the logstreamer_param struct.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints

2022-03-23 Thread Dilip Kumar
On Wed, Mar 23, 2022 at 9:25 PM Dilip Kumar  wrote:
>
> On Wed, Mar 23, 2022 at 9:13 PM Andres Freund  wrote:
> >
> > Hi,
> >
> > On 2022-03-23 18:49:11 +0530, Dilip Kumar wrote:
> > > I think directly using smgrcreate() is a better idea instead of first
> > > registering and then unregistering it.   I have made that change in
> > > the attached patch.  After this change now we can merge creating the
> > > MAIN_FORKNUM also in the loop below where we are creating other
> > > fork[1] with one extra condition but I think current code is in more
> > > sync with the other code where we are doing the similar things so I
> > > have not merged it in the loop.  Please let me know if you think
> > > otherwise.
> >
> > FWIW, this fails tests: https://cirrus-ci.com/build/4929662173315072
> > https://cirrus-ci.com/task/6651773434724352?logs=test_bin#L121
> > https://cirrus-ci.com/task/6088823481303040?logs=test_world#L2377
>
> Strange to see that these changes are making a failure in the
> file_copy strategy[1] because we made changes only related to the
> wal_log strategy.  However I will look into this.  Thanks.
> [1]
> Failed test 'createdb -T foobar2 foobar5 -S file_copy exit code 0'

I could not see any reason for it to fail, and I could not reproduce
it either.  Is it possible to access the server log for this cfbot
failure?

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: Skip partition tuple routing with constant partition key

2022-03-23 Thread Zhihong Yu
On Wed, Mar 23, 2022 at 5:52 AM Amit Langote 
wrote:

> Hi Greg,
>
> On Wed, Mar 16, 2022 at 6:54 AM Greg Stark  wrote:
> > There are a whole lot of different patches in this thread.
> >
> > However this last one https://commitfest.postgresql.org/37/3270/
> > created by Amit seems like a fairly straightforward optimization that
> > can be evaluated on its own separately from the others and seems quite
> > mature. I'm actually inclined to set it to "Ready for Committer".
>
> Thanks for taking a look at it.
>
> > Incidentally a quick read-through of the patch myself and the only
> > question I have is how the parameters of the adaptive algorithm were
> > chosen. They seem ludicrously conservative to me
>
> Do you think CACHE_BOUND_OFFSET_THRESHOLD_TUPS (1000) is too high?  I
> suspect maybe you do.
>
> Basically, the way this works is that once set, cached_bound_offset is
> not reset until encountering a tuple for which cached_bound_offset
> doesn't give the correct partition, so the threshold doesn't matter
> when the caching is active.  However, once reset, it is not again set
> till the threshold number of tuples have been processed and that too
> only if the binary searches done during that interval appear to have
> returned the same bound offset in succession a number of times.  Maybe
> waiting a 1000 tuples to re-assess that is a bit too conservative,
> yes.  I guess even as small a number as 10 is fine here?
>
> I've attached an updated version of the patch, though I haven't
> changed the threshold constant.
>
> --
> Amit Langote
> EDB: http://www.enterprisedb.com
>
> On Wed, Mar 16, 2022 at 6:54 AM Greg Stark  wrote:
> >
> > There are a whole lot of different patches in this thread.
> >
> > However this last one https://commitfest.postgresql.org/37/3270/
> > created by Amit seems like a fairly straightforward optimization that
> > can be evaluated on its own separately from the others and seems quite
> > mature. I'm actually inclined to set it to "Ready for Committer".
> >
> > Incidentally a quick read-through of the patch myself and the only
> > question I have is how the parameters of the adaptive algorithm were
> > chosen. They seem ludicrously conservative to me and a bit of simple
> > arguments about how expensive an extra check is versus the time saved
> > in the boolean search should be easy enough to come up with to justify
> > whatever values make sense.
>
> Hi,

+ * Threshold of the number of tuples to need to have been processed before
+ * maybe_cache_partition_bound_offset() (re-)assesses whether caching must
be

The first part of the comment should be:

Threshold of the number of tuples which need to have been processed

+   (double) pd->n_tups_inserted / pd->n_offset_changed > 1)

I think division can be avoided - the condition can be written as:

  pd->n_tups_inserted > pd->n_offset_changed

+   /* Check if the value is below the high bound */

high bound -> upper bound

Cheers


Re: Patch to avoid orphaned dependencies

2022-03-23 Thread Tom Lane
Nathan Bossart  writes:
> Bertand, do you think this has any chance of making it into v15?  If not,
> are you alright with adjusting the commitfest entry to v16 and moving it to
> the next commitfest?

I looked this over briefly, and IMO it should have no chance of being
committed in anything like this form.

The lesser problem is that (as already noted) the reliance on a global
variable that changes catalog lookup semantics is just unbelievably
scary.  An example of the possible consequences here is that a syscache
entry could get made while that's set, containing a row that we should
not be able to see yet, and indeed might never get committed at all.
Perhaps that could be addressed by abandoning the patch's ambition to tell
you the details of an uncommitted object (which would have race conditions
anyway), so that *only* reads of pg_[sh]depend itself need be dirty.

The bigger problem is that it fails to close the race condition that
it's intending to solve.  This patch will catch a race like this:

Session doing DROPSession doing CREATE

DROP begins

  CREATE makes a dependent catalog entry

DROP scans for dependent objects

  CREATE commits

DROP removes catalog entry

DROP commits

However, it will not catch this slightly different timing:

Session doing DROPSession doing CREATE

DROP begins

DROP scans for dependent objects

  CREATE makes a dependent catalog entry

  CREATE commits

DROP removes catalog entry

DROP commits

So I don't see that we've moved the goalposts very far at all.

Realistically, if we want to prevent this type of problem, then all
creation DDL will have to take a lock on each referenced object that'd
conflict with a lock taken by DROP.  This might not be out of reach:
I think we do already take such locks while dropping objects.  The
reference-side lock could be taken by the recordDependency mechanism
itself, ensuring that we don't miss anything; and that would also
allow us to not bother taking such a lock on pinned objects, which'd
greatly cut the cost (though not to zero).

regards, tom lane




Re: Remove an unnecessary errmsg_plural in dependency.c

2022-03-23 Thread Peter Eisentraut

On 23.03.22 17:33, Bharath Rupireddy wrote:

It looks like the following errmsg_plural in dependency.c is
unnecessary as numReportedClient > 1 always and numNotReportedClient
can never be < 0. Therefore plural version of the error message is
sufficient. Attached a patch to fix it.


Some languages have more than two forms, so we still need to keep this 
to handle those.





  1   2   >