date:20211011

Re: Bug in DefineRange() with multiranges

2021-10-11 Thread Sergey Shinderuk


On 10.10.2021 20:12, Peter Eisentraut wrote:

On 04.10.21 19:09, Sergey Shinderuk wrote:

I wonder what is the proper fix.  Just drop pfree() altogether or add
pstrdup() instead?  I see that makeMultirangeTypeName() doesn't bother
freeing its buf.


I think removing the pfree()s is a correct fix.



Thanks, here is a patch.


--
Sergey Shinderukhttps://postgrespro.com/
From 6c548f07d2a254da46cd0b6f6e99b7ed24a6b811 Mon Sep 17 00:00:00 2001
From: Sergey Shinderuk 
Date: Tue, 12 Oct 2021 07:57:42 +0300
Subject: [PATCH] Fix premature pfree() of multirange_type_name in
 DefineRange()

If the mutlirange_type_name parameter is given in the query, this would
erroneously pfree() the string in the parse tree.  Oversight in 6df7a9698bb0.
---
 src/backend/commands/typecmds.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/src/backend/commands/typecmds.c b/src/backend/commands/typecmds.c
index b290629a450..9ab40341793 100644
--- a/src/backend/commands/typecmds.c
+++ b/src/backend/commands/typecmds.c
@@ -1707,7 +1707,6 @@ DefineRange(ParseState *pstate, CreateRangeStmt *stmt)
/* Create cast from the range type to its multirange type */
CastCreate(typoid, multirangeOid, castFuncOid, 'e', 'f', 
DEPENDENCY_INTERNAL);
 
-   pfree(multirangeTypeName);
pfree(multirangeArrayName);
 
return address;
-- 
2.24.3 (Apple Git-128)

Re: storing an explicit nonce

2021-10-11 Thread Ants Aasma

On Mon, 11 Oct 2021 at 22:15, Bruce Momjian  wrote:

> > Yes, that's the direction that I was thinking also and specifically with
> > XTS as the encryption algorithm to allow us to exclude the LSN but keep
> > everything else, and to address the concern around the nonce/tweak/etc
> > being the same sometimes across multiple writes.  Another thing to
> > consider is if we want to encrypt zero'd page.  There was a point
> > brought up that if we do then we are encrypting a fair bit of very
> > predictable bytes and that's not great (though there's a fair bit about
> > our pages that someone could quite possibly predict anyway based on
> > table structures and such...).  I would think that if it's easy enough
> > to not encrypt zero'd pages that we should avoid doing so.  Don't recall
> > offhand which way zero'd pages were being handled already but thought it
> > made sense to mention that as part of this discussion.
>
> Yeah, I wanted to mention that.  I don't see any security difference
> between fully-zero pages, pages with headers and no tuples, and pages
> with headers and only a few tuples.  If any of those are insecure, they
> all are.  Therefore, I don't see any reason to treat them differently.
>

We had to special case zero pages and not encrypt them because as far as I
can tell, there is no atomic way to extend a file and initialize it to
Enc(zero) in the same step.

-- 

Ants Aasma
Senior Database Engineerwww.cybertec-postgresql.com

Re: Printing backtrace of postgres processes

2021-10-11 Thread bt21tanigaway


Hi,


The previous patch was failing because of the recent test changes made
by commit 201a76183e2 which unified new and get_new_node, attached
patch has the changes to handle the changes accordingly.
Thanks for your update!
I have two comments.


1.Do we need “set_backtrace(NULL, 0);” on “HandleMainLoopInterrupts()”?
I could observe that it works correctly without this. It is written on 
“HandleAutoVacLauncherInterrupts” as well, but I think it is necessary 
to prevent delays as well as [1].


2.The patch seems to forget to handle
“ereport(LOG,(errmsg("logging backtrace of PID %d", MyProcPid)));” on 
“HandleAutoVacLauncherInterrupts” and “HandleMainLoopInterrupts()”.

I think it should be the same as the process on “ProcessInterrupts()”.

3.How about creating a new function.
Since the same process is on three functions( “ProcessInterrupts()”, 
“HandleAutoVacLauncherInterrupts”, “HandleMainLoopInterrupts()” ), I 
think it’s good to create a new function.


[1] https://commitfest.postgresql.org/35/3342/

Regards,
Koyu Tanigawa

Re: Error "initial slot snapshot too large" in create replication slot

2021-10-11 Thread Kyotaro Horiguchi

At Tue, 12 Oct 2021 13:59:59 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
> So I came up with the attached version.

Sorry, it was losing a piece of change.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From 424f405b4c9d41471eae1edd48cdbb6a6d47217e Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Tue, 12 Oct 2021 13:53:27 +0900
Subject: [PATCH v2] Allow overflowed snapshot when creating logical
 replication slot

Snapshot can hold top XIDs up to the number of server processes but
SnapBuildInitialSnapshot tries to store all top-level and sub XIDs to
there and fails for certain circumstances. Instead, create a
"takenDuringRecovery" snapshot instead, which stores all XIDs in
subxip array.  Addition to that, allow to create an overflowed
snapshot by adding to reorder buffer a function to tell whether an XID
is a top-level or not.
---
 .../replication/logical/reorderbuffer.c   | 20 +++
 src/backend/replication/logical/snapbuild.c   | 56 +++
 src/include/replication/reorderbuffer.h   |  1 +
 3 files changed, 65 insertions(+), 12 deletions(-)

diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 46e66608cf..4e452cce7c 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -3326,6 +3326,26 @@ ReorderBufferXidHasBaseSnapshot(ReorderBuffer *rb, TransactionId xid)
 }
 
 
+/*
+ * ReorderBufferXidIsKnownSubXact
+ *		Returns true if the xid is a known subtransaction.
+ */
+bool
+ReorderBufferXidIsKnownSubXact(ReorderBuffer *rb, TransactionId xid)
+{
+	ReorderBufferTXN *txn;
+
+	txn = ReorderBufferTXNByXid(rb, xid, false,
+NULL, InvalidXLogRecPtr, false);
+
+	/* a known subtxn? */
+	if (txn && rbtxn_is_known_subxact(txn))
+		return true;
+
+	return false;
+}
+
+
 /*
  * ---
  * Disk serialization support
diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index a549a8..46c691df20 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -531,8 +531,9 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
 {
 	Snapshot	snap;
 	TransactionId xid;
-	TransactionId *newxip;
-	int			newxcnt = 0;
+	TransactionId *newsubxip;
+	int			newsubxcnt;
+	bool		overflowed = false;
 
 	Assert(!FirstSnapshotSet);
 	Assert(XactIsoLevel == XACT_REPEATABLE_READ);
@@ -568,9 +569,12 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
 
 	MyProc->xmin = snap->xmin;
 
-	/* allocate in transaction context */
-	newxip = (TransactionId *)
-		palloc(sizeof(TransactionId) * GetMaxSnapshotXidCount());
+	/*
+	 * Allocate in transaction context. We use only subxip to store xids. See
+	 * GetSnapshotData for details.
+	 */
+	newsubxip = (TransactionId *)
+		palloc(sizeof(TransactionId) * GetMaxSnapshotSubxidCount());
 
 	/*
 	 * snapbuild.c builds transactions in an "inverted" manner, which means it
@@ -578,6 +582,9 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
 	 * classical snapshot by marking all non-committed transactions as
 	 * in-progress. This can be expensive.
 	 */
+retry:
+	newsubxcnt = 0;
+
 	for (xid = snap->xmin; NormalTransactionIdPrecedes(xid, snap->xmax);)
 	{
 		void	   *test;
@@ -591,12 +598,29 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
 
 		if (test == NULL)
 		{
-			if (newxcnt >= GetMaxSnapshotXidCount())
-ereport(ERROR,
-		(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
-		 errmsg("initial slot snapshot too large")));
+			bool add = true;
 
-			newxip[newxcnt++] = xid;
+			/* exlude subxids when subxip is overflowed */
+			if (overflowed &&
+ReorderBufferXidIsKnownSubXact(builder->reorder, xid))
+add = false;
+
+			if (add)
+			{
+if (newsubxcnt >= GetMaxSnapshotSubxidCount())
+{
+	if (overflowed)
+		ereport(ERROR,
+(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
+ errmsg("initial slot snapshot too large")));
+
+	/* retry excluding subxids */
+	overflowed = true;
+	goto retry;
+}
+
+newsubxip[newsubxcnt++] = xid;
+			}
 		}
 
 		TransactionIdAdvance(xid);
@@ -604,8 +628,16 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
 
 	/* adjust remaining snapshot fields as needed */
 	snap->snapshot_type = SNAPSHOT_MVCC;
-	snap->xcnt = newxcnt;
-	snap->xip = newxip;
+	snap->xcnt = 0;
+	snap->subxcnt = newsubxcnt;
+	snap->subxip = newsubxip;
+	snap->suboverflowed = overflowed;
+
+	/*
+	 * Although this snapshot is taken actually not during recovery, all XIDs
+	 * are stored in subxip even if it is not overflowed.
+	 */
+	snap->takenDuringRecovery = true;
 
 	return snap;
 }
diff --git a/src/include/replication/reorderbuffer.h b/src/include/replication/reorderbuffer.h
index 5b40ff75f7..e5fa1051d7 100644
--- a/src/include/replication/reorderbuffer.h
+++ b/src/include/replication/reorderbuffer.h
@@ -669,6 +669,7 @@ void		ReorderBufferXidSetCatalogChanges(ReorderBuffer *,

Re: Skipping logical replication transactions on subscriber side

2021-10-11 Thread Masahiko Sawada

On Fri, Oct 8, 2021 at 9:22 PM osumi.takami...@fujitsu.com
 wrote:
>
> On Thursday, September 30, 2021 2:45 PM Masahiko Sawada 
>  wrote:
> > I've attached updated patches that incorporate all comments I got so far. 
> > Please
> > review them.
> Hello
>
>
> Minor two comments for v15-0001 patch.
>
> (1) a typo in pgstat_vacuum_subworker_stat()
>
> +   /*
> +* This subscription is live.  The next step is that we 
> search errors
> +* of the table sync workers who are already in sync state. 
> These
> +* errors should be removed.
> +*/
>
> This subscription is "alive" ?
>
>
> (2) Suggestion to add one comment next to '0' in ApplyWorkerMain()
>
> +   /* report the table sync error */
> +   
> pgstat_report_subworker_error(MyLogicalRepWorker->subid,
> + 
> MyLogicalRepWorker->relid,
> + 
> MyLogicalRepWorker->relid,
> + 
> 0,
> + 
> InvalidTransactionId,
> + 
> errdata->message);
>
> How about writing /* no corresponding message type for table synchronization 
> */ or something ?
>

Thank you for the comments! Those comments are incorporated into the
latest patches I just submitted[1].

Regards,

[1] 
https://www.postgresql.org/message-id/CAD21AoDST8-ykrCLcWbWnTLj1u52-ZhiEP%2BbRU7kv5oBhfSy_Q%40mail.gmail.com

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

Re: Error "initial slot snapshot too large" in create replication slot

2021-10-11 Thread Kyotaro Horiguchi

At Mon, 11 Oct 2021 16:48:10 +0530, Dilip Kumar  wrote 
in 
> On Mon, Oct 11, 2021 at 4:29 PM Kyotaro Horiguchi
>  wrote:
> >
> > At Mon, 11 Oct 2021 11:49:41 +0530, Dilip Kumar  
> > wrote in
> > > While creating an "export snapshot" I don't see any protection why the
> > > number of xids in the snapshot can not cross the
> > > "GetMaxSnapshotXidCount()"?.
> > >
> > > Basically, while converting the HISTORIC snapshot to the MVCC snapshot
> > > in "SnapBuildInitialSnapshot()", we add all the xids between
> > > snap->xmin to snap->xmax to the MVCC snap->xip array (xids for which
> > > commit were not recorded).  The problem is that we add both topxids as
> > > well as the subxids into the same array and expect that the "xid"
> > > count does not cross the "GetMaxSnapshotXidCount()".  So it seems like
> > > an issue but I am not sure what is the fix for this, some options are
> >
> > It seems to me that it is a compromise between the restriction of the
> > legitimate snapshot and snapshots created by snapbuild.  If the xids
> > overflow, the resulting snapshot may lose a siginificant xid, i.e, a
> > top-level xid.
> >
> > > a) Don't limit the xid count in the exported snapshot and dynamically
> > > resize the array b) Increase the limit to GetMaxSnapshotXidCount() +
> > > GetMaxSnapshotSubxidCount().  But in option b) there would still be a
> > > problem that how do we handle the overflowed subtransaction?
> >
> > I'm afraid that we shouldn't expand the size limits.  If I understand
> > it correctly, we only need the top-level xids in the exported snapshot
> 
> But since we are using this as an MVCC snapshot, if we don't have the
> subxid and if we also don't mark the "suboverflowed" flag then IMHO
> the sub-transaction visibility might be wrong, Am I missing something?

Sorry I should have set suboverflowed in the generated snapshot.
However, we can store subxid list as well when the snapshot (or
running_xact) is not overflown. These (should) works the same way.

On physical standby, snapshot is created just filling up only subxip
with all top and sub xids (procrray.c:2400).  It would be better we do
the same thing here.

> > and reorder buffer knows whether a xid is a top-level or not after
> > establishing a consistent snapshot.
> >
> > The attached diff tries to make SnapBuildInitialSnapshot exclude
> > subtransaction from generated snapshots.  It seems working fine for
> > you repro. (Of course, I'm not confident that it is the correct thing,
> > though..)
> >
> > What do you think about this?
> 
> If your statement that we only need top-xids in the exported snapshot,
> is true then this fix looks fine to me.   If not then we might need to
> add the sub-xids in the snapshot->subxip array and if it crosses the
> limit of GetMaxSnapshotSubxidCount(), then we can mark "suboverflowed"
> flag.

So I came up with the attached version.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From 374a10aa6819224ca6af548100aa34e6c772a2c3 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Tue, 12 Oct 2021 13:53:27 +0900
Subject: [PATCH] Allow overflowed snapshot when creating logical replication
 slot

Snapshot can hold top XIDs up to the number of server processes but
SnapBuildInitialSnapshot tries to store all top-level and sub XIDs to
there and fails for certain circumstances. Instead, create a
"takenDuringRecovery" snapshot instead, which stores all XIDs in
subxip array.  Addition to that, allow to create an overflowed
snapshot by adding to reorder buffer a function to tell whether an XID
is a top-level or not.
---
 .../replication/logical/reorderbuffer.c   | 20 
 src/backend/replication/logical/snapbuild.c   | 50 ++-
 src/include/replication/reorderbuffer.h   |  1 +
 3 files changed, 59 insertions(+), 12 deletions(-)

diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 46e66608cf..4e452cce7c 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -3326,6 +3326,26 @@ ReorderBufferXidHasBaseSnapshot(ReorderBuffer *rb, TransactionId xid)
 }
 
 
+/*
+ * ReorderBufferXidIsKnownSubXact
+ *		Returns true if the xid is a known subtransaction.
+ */
+bool
+ReorderBufferXidIsKnownSubXact(ReorderBuffer *rb, TransactionId xid)
+{
+	ReorderBufferTXN *txn;
+
+	txn = ReorderBufferTXNByXid(rb, xid, false,
+NULL, InvalidXLogRecPtr, false);
+
+	/* a known subtxn? */
+	if (txn && rbtxn_is_known_subxact(txn))
+		return true;
+
+	return false;
+}
+
+
 /*
  * ---
  * Disk serialization support
diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index a549a8..d422315717 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -531,8 +531,9 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
 {
 	Snapshot	snap;

Re: Skipping logical replication transactions on subscriber side

2021-10-11 Thread Masahiko Sawada

On Fri, Oct 8, 2021 at 8:17 PM Greg Nancarrow  wrote:
>
> On Thu, Sep 30, 2021 at 3:45 PM Masahiko Sawada  wrote:
> >
> > I've attached updated patches that incorporate all comments I got so
> > far. Please review them.
> >
>
> Some comments about the v15-0001 patch:

Thank you for the comments!

>
> (1) patch adds a whitespace error
>
> Applying: Add a subscription errors statistics view
> "pg_stat_subscription_errors".
> .git/rebase-apply/patch:1656: new blank line at EOF.
> +
> warning: 1 line adds whitespace errors.

Fixed.

>
> (2) Patch comment says "This commit adds a new system view
> pg_stat_logical_replication_errors ..."
> BUT this is the wrong name, it should be "pg_stat_subscription_errors".
>
>

Fixed.

> doc/src/sgml/monitoring.sgml
>
> (3)
> "Message of the error" doesn't sound right. I suggest just saying "The
> error message".

Fixed.

>
> (4) view column "last_failed_time"
> I think it would be better to name this "last_error_time".

Okay, fixed.

>
>
> src/backend/postmaster/pgstat.c
>
> (5) pgstat_vacuum_subworker_stats()
>
> Spelling mistake in the following comment:
>
> /* Create a map for mapping subscriptoin OID and database OID */
>
> subscriptoin -> subscription

Fixed.

>
> (6)
> In the following functions:
>
> pgstat_read_statsfiles
> pgstat_read_db_statsfile_timestamp
>
> The following comment should say "... struct describing subscription
> worker statistics."
> (i.e. need to remove the "a")
>
> + * 'S' A PgStat_StatSubWorkerEntry struct describing a
> + * subscription worker statistics.
>

Fixed.

>
> (7) pgstat_get_subworker_entry
>
> Suggest comment change:
>
> BEFORE:
> + * Return the entry of subscription worker entry with the subscription
> AFTER:
> + * Return subscription worker entry with the given subscription

Fixed.

>
> (8) pgstat_recv_subworker_error
>
> + /*
> + * Update only the counter and timestamp if we received the same error
> + * again
> + */
> + if (wentry->relid == msg->m_relid &&
> + wentry->command == msg->m_command &&
> + wentry->xid == msg->m_xid &&
> + strncmp(wentry->message, msg->m_message, strlen(wentry->message)) == 0)
> + {
>
> Is there a reason that the above check uses strncmp() with
> strlen(wentry->message), instead of just strcmp()?
> msg->m_message is treated as the same error message if it is the same
> up to strlen(wentry->message)?
> Perhaps if that is intentional, then the comment should be updated.

It's better to use strcmp() in this case. Fixed.

>
> src/tools/pgindent/typedefs.list
>
> (9)
> The added "PgStat_SubWorkerError" should be removed from the
> typedefs.list (as there is no such new typedef).

Fixed.

I've attached updated patches.

Regards,


--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/


v16-0003-Add-skip_xid-option-to-ALTER-SUBSCRIPTION.patch
Description: Binary data


v16-0001-Add-a-subscription-errors-statistics-view-pg_sta.patch
Description: Binary data


v16-0002-Add-RESET-command-to-ALTER-SUBSCRIPTION-command.patch
Description: Binary data

Re: Rewriting the test of pg_upgrade as a TAP test - take three - remastered set

2021-10-11 Thread Michael Paquier

On Mon, Oct 11, 2021 at 09:04:47AM -0400, Andrew Dunstan wrote:
> Keeping test.sh is not necessary - I mis-remembered what the test module
> does.

So..  Are people fine to remove entirely test.sh at the end, requiring
the tests of pg_upgrade to have TAP installed?  I'd rather raise the
bar here, as it would keep the code simpler in the tree in the long
term.  Or am I misunderstanding something?
--
Michael


signature.asc
Description: PGP signature

Re: Rewriting the test of pg_upgrade as a TAP test - take three - remastered set

2021-10-11 Thread Michael Paquier

On Sun, Oct 03, 2021 at 08:22:57AM -0400, Andrew Dunstan wrote:
> Actually, I was wrong. The module just does "make check" for non-MSVC.
> For MSVC it calls vcregress.pl, which the patch doesn't touch (it
> should, I think).

Yes, it should.  And I'd like to do things so as we replace all the
internals of upgradecheck() by a call to tap_check().  The patch does
not work yet properly with MSVC, and there were some problems in
getting the invocation of pg_regress right as far as I recall.  That's
why I have left this part for now.  I don't see why we could not do
the MSVC part as an independent step though, getting rid of test.sh is
appealing enough in itself.
--
Michael


signature.asc
Description: PGP signature

Re: postgres_fdw: misplaced? comments in connection.c

2021-10-11 Thread Etsuro Fujita

On Mon, Oct 11, 2021 at 5:05 PM Etsuro Fujita  wrote:
> The comments for pgfdw_get_cleanup_result() say this:
>
>  * It's not a huge problem if we throw an ERROR here, but if we get into error
>  * recursion trouble, we'll end up slamming the connection shut, which will
>  * necessitate failing the entire toplevel transaction even if subtransactions
>  * were used.  Try to use WARNING where we can.
>
> But we don’t use WARNING anywhere in that function.  The right place
> for this is pgfdw_exec_cleanup_query()?

I noticed that pgfdw_cancel_query(), which is called during (sub)abort
cleanup if necessary, also uses WARNING, instead of ERROR, to avoid
the error-recursion-trouble issue.  So I think it would be good to
move this to pgfdw_cancel_query() as well as
pgfdw_exec_cleanup_query().  Attached is a patch for that.

Best regards,
Etsuro Fujita

move-misplaced-comments.patch
Description: Binary data

Re: Allow escape in application_name

2021-10-11 Thread Fujii Masao





On 2021/10/07 11:46, kuroda.hay...@fujitsu.com wrote:

So now we can choose from followings:

* ignore such differences and use isdigit() and strtol()
* give up using them and implement two static support functions

How do you think? Someone knows any other knowledge about locale?


Replacing process_log_prefix_padding() with isdigit()+strtol() is
just refactoring and doesn't provide any new feature. So they
basically should work in the same way. If the behavior of isdigit()+strtol()
can be different from process_log_prefix_padding(), I'd prefer to
the latter option you suggested, i.e., give up using isdigit()+strtol().

OTOH, of course if the behaviors of them are the same,
I'm ok to use isdigit()+strtol(), though.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: Corruption with IMMUTABLE functions in index expression.

2021-10-11 Thread Tom Lane

Andres Freund  writes:
> On 2021-10-11 14:59:22 -0400, Tom Lane wrote:
>> I doubt we need any code changes beyond changing the indisvalid state.

> I was thinking we'd want to throw an error if an index that's being created is
> accessed during the index build, rather than just not include it in
> planning...

AFAICT we *will* throw an error, just not a very intelligible one.
But until someone's shown another way to reach that error besides
the planner's path, I'm not thinking we need to expend effort on
making the error nicer.

regards, tom lane

Re: Corruption with IMMUTABLE functions in index expression.

2021-10-11 Thread Andres Freund

Hi,

On 2021-10-11 14:59:22 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > Perhaps we could set pg_index.indisvalid to false initially, and if opening 
> > an
> > index where pg_index.indisvalid error out with a different error message if
> > TransactionIdIsCurrentTransactionId(xmin). And then use an inplace update to
> > set indisvalid to true, to avoid the bloat?
> 
> I still can't get excited about it ...

Understandable, me neither...


> I doubt we need any code changes beyond changing the indisvalid state.

I was thinking we'd want to throw an error if an index that's being created is
accessed during the index build, rather than just not include it in
planning...


Greetings,

Andres Freund

Re: row filtering for logical replication

2021-10-11 Thread Ajin Cherian

On Tue, Oct 12, 2021 at 1:37 AM Dilip Kumar  wrote:
>
> On Wed, Oct 6, 2021 at 2:33 PM Ajin Cherian  wrote:
> >
> > On Sat, Oct 2, 2021 at 5:44 PM Ajin Cherian  wrote:
> > >
> > > I have for now also rebased the patch and merged the first 5 patches
> > > into 1, and added my changes for the above into the second patch.
> >
> > I have split the patches back again, just to be consistent with the
> > original state of the patches. Sorry for the inconvenience.
>
> Thanks for the updated version of the patch, I was looking into the
> latest version and I have a few comments.
>
>
> +if ((att->attlen == -1 &&
> VARATT_IS_EXTERNAL_ONDISK(tmp_new_slot->tts_values[i])) &&
> +(!old_slot->tts_isnull[i] &&
> +!(VARATT_IS_EXTERNAL_ONDISK(old_slot->tts_values[i]
> +{
> +tmp_new_slot->tts_values[i] = old_slot->tts_values[i];
> +newtup_changed = true;
> +}
>
> If the attribute is stored EXTERNAL_ONDIS on the new tuple and it is
> not null in the old tuple then it must be logged completely in the old
> tuple, so instead of checking
> !(VARATT_IS_EXTERNAL_ONDISK(old_slot->tts_values[i]), it should be
> asserted,
>
>
> +heap_deform_tuple(newtuple, desc, new_slot->tts_values,
> new_slot->tts_isnull);
> +heap_deform_tuple(oldtuple, desc, old_slot->tts_values,
> old_slot->tts_isnull);
> +
> +if (newtup_changed)
> +tmpnewtuple = heap_form_tuple(desc, tmp_new_slot->tts_values,
> new_slot->tts_isnull);
> +
> +old_matched = pgoutput_row_filter(relation, NULL, oldtuple, entry);
> +new_matched = pgoutput_row_filter(relation, NULL,
> +  newtup_changed ? tmpnewtuple :
> newtuple, entry);
>
> I do not like the fact that, first we have deformed the tuples and we
> are again using the HeapTuple
> for expression evaluation machinery and later the expression
> evaluation we do the deform again.
>
> So why don't you use the deformed tuple as it is to store as a virtual tuple?
>
> Infact, if newtup_changed is true then you are forming back the tuple
> just to get it deformed again
> in the expression evaluation.
>
> I think I have already given this comment on the last version.

Right, I only used the deformed tuple later when it was written to the
stream. I will modify this as well.

regards,
Ajin Cherian
Fujitsu Australia

Re: pg14 psql broke \d datname.nspname.relname

2021-10-11 Thread Peter Geoghegan

On Mon, Oct 11, 2021 at 7:09 PM Mark Dilger
 wrote:
> I was just wondering when it might be time to stop being lenient in psql and 
> instead reject malformed identifiers.

I suppose that I probably wouldn't have chosen this behavior in a
green field situation. But Hyrum's law tells us that there are bound
to be some number of users relying on it. I don't think that it's
worth inconveniencing those people without getting a clear benefit in
return.

Being lenient here just doesn't have much downside in practice, as
evidenced by the total lack of complaints about that lenience.

-- 
Peter Geoghegan

Re: BUG #17212: pg_amcheck fails on checking temporary relations

2021-10-11 Thread Mark Dilger

> On Oct 11, 2021, at 5:37 PM, Peter Geoghegan  wrote:
> 
> Cool. I pushed just the amcheck changes a moment ago. I attach the
> remaining changes from your v3, with a new draft commit message (no
> real changes). I didn't push the rest (what remains in the attached
> revision) just yet because I'm not quite sure about the approach used
> to exclude temp tables.

Thanks for that.

> Do we really need the redundancy between prepare_btree_command(),
> prepare_heap_command(), and compile_relation_list_one_db()? All three
> exclude temp relations, plus you have extra stuff in
> prepare_btree_command(). There is some theoretical value in delaying
> the index specific stuff until the query actually runs, at least in
> theory. But it also seems unlikely to make any appreciable difference
> to the overall level of coverage in practice.

I agree that it is unlikely to make much difference in practice.  Another 
session running reindex concurrently is, I think, the most likely to conflict, 
but it is just barely imaginable that a relation will be dropped, and its OID 
reused for something unrelated, by the time the check command gets run.  The 
new object might be temporary where the old object was not.  On a properly 
functioning database, that may be too remote a possibility to be worth worrying 
about, but on a corrupt database, most bets are off, and I can't really tell 
you if that's a likely scenario, because it is hard to think about all the 
different ways corruption might cause a database to behave.  On the other hand, 
the join against pg_class might fail due to unspecified corruption, so my 
attempt to play it safe may backfire.

I don't feel strongly about this.  If you'd like me to remove those checks, I 
can do so.  These are just my thoughts on the subject.

> Would it be simpler to do it all together, in
> compile_relation_list_one_db()? Were you concerned about things
> changing when parallel workers are run? Or something else?

Yeah, I was contemplating things changing by the time the parallel workers run 
the command.  I don't know how important that is.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: More business with $Test::Builder::Level in the TAP tests

2021-10-11 Thread Michael Paquier

On Mon, Oct 11, 2021 at 10:48:54AM -0400, Andrew Dunstan wrote:
> I would say:
> 
> This should be incremented by any subroutine which directly or
> indirectly calls test routines from Test::More, such as ok() or
> is().

Indeed, that looks better.  I have just used that and applied the
change down to 12 where we have begun playing with level
incrementations.
--
Michael


signature.asc
Description: PGP signature

Re: pg14 psql broke \d datname.nspname.relname

2021-10-11 Thread Mark Dilger

> On Oct 11, 2021, at 4:49 PM, Tom Lane  wrote:
> 
> You are attacking a straw man here.  To use a period in an identifier,
> you have to double-quote it; that's the same in SQL or \d.

That's a strange argument.  If somebody gives an invalid identifier, we 
shouldn't assume they know the proper use of quotations.  Somebody asking for 
a.b.c.d.e is clearly in the dark about something.  Maybe it's the need to quote 
the "a.b" part separately from the "c.d.e" part, or maybe it's something else.  
There are lots of reasonable guesses about what they meant, and for backward 
compatibility reasons we define using the suffix d.e and ignoring the prefix 
a.b.c as the correct answer.  That's a pretty arbitrary thing to do, but it has 
the advantage of being backwards compatible.

>> I expect I'll have to submit a patch restoring the old behavior, but I 
>> wonder if that's the best direction to go.
> 
> I do not understand why you're even questioning that.  The old
> behavior had stood for a decade or two without complaints.

I find the backward compatibility argument appealing, but since we have clients 
that understand the full database.schema.relation format without ignoring the 
database portion, our client behavior is getting inconsistent.  I'd like to 
leave the door open for someday supporting server.database.schema.relation 
format, too.  I was just wondering when it might be time to stop being lenient 
in psql and instead reject malformed identifiers.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: is possible an remote access to some macos?

2021-10-11 Thread Larry Rosenman


On 10/11/2021 8:49 pm, Noah Misch wrote:

On Mon, Oct 11, 2021 at 06:31:03AM +0200, Pavel Stehule wrote:
I would like to fix an issue https://github.com/okbob/pspg/issues/188 
where
the write to clipboard from pspg  doesn`t work on macos. But it is 
hard to
fix it without any macos. I am not a user of macos and I would not buy 
it

just for this purpose.

Is it possible to get some remote ssh access?


You can request a GCC Compile Farm account
(https://cfarm.tetaneutral.net/users/new/).


AWS also has macos instances:
https://aws.amazon.com/pm/ec2-mac/

--
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 214-642-9640 E-Mail: l...@lerctr.org
US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106

Re: [PATCH] ProcessInterrupts_hook

2021-10-11 Thread Craig Ringer

On Sat, 2 Oct 2021 at 01:24, Jaime Casanova 
wrote:

> On Tue, Jun 29, 2021 at 01:32:26PM +0800, Craig Ringer wrote:
> > On Sat, 20 Mar 2021 at 03:46, Tom Lane  wrote:
> >
> > > Robert Haas  writes:
> > > > On Fri, Mar 19, 2021 at 3:25 PM Tom Lane  wrote:
> > > >> I'm not very comfortable about the idea of having the postmaster set
> > > >> child processes' latches ... that doesn't sound terribly safe from
> the
> > > >> standpoint of not allowing the postmaster to mess with shared memory
> > > >> state that could cause it to block or crash.  If we already do that
> > > >> elsewhere, then OK, but I don't think we do.
> > >
> > > > It should be unnecessary anyway. We changed it a while back to make
> > > > any SIGUSR1 set the latch 
> > >
> > > Hmm, so the postmaster could send SIGUSR1 without setting any
> particular
> > > pmsignal reason?  Yeah, I suppose that could work.  Or we could recast
> > > this as being a new pmsignal reason.
> > >
> >
> > I'd be fine with either way.
> >
> > I don't expect to be able to get to working on a concrete patch for this
> > any time soon, so I'll be leaving it here unless someone else needs to
> pick
> > it up for their extension work. The in-principle agreement is there for
> > future work anyway.
>
> Hi Craig,
>
> There is still a CF entry for this. Should we close it as withdrawn? or
> maybe RwF?
>

I'm not going to get time for it now, so I think marking it withdrawn is
reasonable.

I think it's well worth doing and Tom seems to think it's not a crazy idea,
but I'm no longer working on the software that needed it, and don't see a
lot of other people calling for it, so it can wait until someone else needs
it.

Re: is possible an remote access to some macos?

2021-10-11 Thread Noah Misch

On Mon, Oct 11, 2021 at 06:31:03AM +0200, Pavel Stehule wrote:
> I would like to fix an issue https://github.com/okbob/pspg/issues/188 where
> the write to clipboard from pspg  doesn`t work on macos. But it is hard to
> fix it without any macos. I am not a user of macos and I would not buy it
> just for this purpose.
> 
> Is it possible to get some remote ssh access?

You can request a GCC Compile Farm account
(https://cfarm.tetaneutral.net/users/new/).

Re: automatically generating node support functions

2021-10-11 Thread Corey Huinker

>
> build support and made the Perl code more portable, so that the cfbot
> doesn't have to be sad.
>

Was this also the reason for doing the output with print statements rather
than using one of the templating libraries? I'm mostly just curious, and
certainly don't want it to get in the way of working code.

Re: Fix pg_log_backend_memory_contexts() 's delay

2021-10-11 Thread Fujii Masao





On 2021/10/11 14:40, Fujii Masao wrote:



On 2021/10/11 14:28, torikoshia wrote:

Thanks for the patch!

It might be self-evident, but since there are comments on other process 
handlings in HandleAutoVacLauncherInterrupts like below, how about adding a 
comment for the consistency?


+1

I applied such cosmetic changes to the patch. Patch attached.
Barring any objection, I will commit it and back-port to v14.


Pushed. Thanks!

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: BUG #17212: pg_amcheck fails on checking temporary relations

2021-10-11 Thread Peter Geoghegan

On Mon, Oct 11, 2021 at 2:41 PM Mark Dilger
 wrote:
> > Overall, your approach looks good to me. Will Robert take care of
> > committing this, or should I?
>
> I'd appreciate if you could fix up the  in the docs and do the 
> commit.

Cool. I pushed just the amcheck changes a moment ago. I attach the
remaining changes from your v3, with a new draft commit message (no
real changes). I didn't push the rest (what remains in the attached
revision) just yet because I'm not quite sure about the approach used
to exclude temp tables.

Do we really need the redundancy between prepare_btree_command(),
prepare_heap_command(), and compile_relation_list_one_db()? All three
exclude temp relations, plus you have extra stuff in
prepare_btree_command(). There is some theoretical value in delaying
the index specific stuff until the query actually runs, at least in
theory. But it also seems unlikely to make any appreciable difference
to the overall level of coverage in practice.

Would it be simpler to do it all together, in
compile_relation_list_one_db()? Were you concerned about things
changing when parallel workers are run? Or something else?

Many thanks
--
Peter Geoghegan

v4-0001-pg_amcheck-avoid-unhelpful-verification-attempts.patch
Description: Binary data

Re: Parallel vacuum workers prevent the oldest xmin from advancing

2021-10-11 Thread Masahiko Sawada

On Mon, Oct 11, 2021 at 3:01 PM Greg Nancarrow  wrote:
>
> On Wed, Oct 6, 2021 at 6:11 PM Masahiko Sawada  wrote:
> >
> > To fix it, I thought that we change the create index code and the
> > vacuum code so that the individual parallel worker sets its status
> > flags according to the leader’s one. But ISTM it’d be better to copy
> > the leader’s status flags to workers in ParallelWorkerMain(). I've
> > attached a patch for HEAD.
> >
>
> +1 The fix looks reasonable to me too.
> Is it possible for the patch to add test cases for the two identified
> problem scenarios? (PROC_IN_VACUUM, PROC_IN_SAFE_IC)

Not sure we can add stable tests for this. There is no way in the test
infra to control parallel workers to suspend and resume etc. and the
oldest xmin can vary depending on the situation. Probably we can add
an assertion to ensure a parallel worker for vacuum or create index
has PROC_IN_VACUUM or PROC_IN_SAFE_IC, respectively.

Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

Re: Accommodate startup process in a separate ProcState array slot instead of in MaxBackends slots.

2021-10-11 Thread Fujii Masao





On 2021/10/12 4:07, Bharath Rupireddy wrote:

Hi,

While working on [1], it is found that currently the ProcState array
doesn't have entries for auxiliary processes, it does have entries for
MaxBackends. But the startup process is eating up one slot from
MaxBackends. We need to increase the size of the ProcState array by 1
at least for the startup process. The startup process uses ProcState
slot via InitRecoveryTransactionEnvironment->SharedInvalBackendInit.
The procState array size is initialized to MaxBackends in
SInvalShmemSize.

The consequence of not fixing this issue is that the database may hit
the error "sorry, too many clients already" soon in
SharedInvalBackendInit.

Attaching a patch to fix this issue. Thoughts?


Thanks for making the patch! LGTM.
Barring any objection, I will commit it.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: pg14 psql broke \d datname.nspname.relname

2021-10-11 Thread Tom Lane

Mark Dilger  writes:
> But since we allow tables and schemas with dotted names in them, I'm 
> uncertain what  \d foo.bar.baz is really asking.  That could be 
> "foo.bar"."baz", or "foo"."bar"."baz", or "foo"."bar.baz", or even 
> "public"."foo.bar.baz".  The old behavior seems a bit dangerous.  There may 
> be tables with all those names, and the user may not have meant the one that 
> we gave them.

You are attacking a straw man here.  To use a period in an identifier,
you have to double-quote it; that's the same in SQL or \d.

regression=# create table "foo.bar" (f1 int);
CREATE TABLE
regression=# \d foo.bar
Did not find any relation named "foo.bar".
regression=# \d "foo.bar"
  Table "public.foo.bar"
 Column |  Type   | Collation | Nullable | Default 
+-+---+--+-
 f1 | integer |   |  | 

According to a quick test, you did not manage to break that in v14.

> I expect I'll have to submit a patch restoring the old behavior, but I wonder 
> if that's the best direction to go.

I do not understand why you're even questioning that.  The old
behavior had stood for a decade or two without complaints.

regards, tom lane

Re: pg14 psql broke \d datname.nspname.relname

2021-10-11 Thread Isaac Morland

On Mon, 11 Oct 2021 at 19:35, Mark Dilger 
wrote:

> But since we allow tables and schemas with dotted names in them, I'm
> uncertain what  \d foo.bar.baz is really asking.
>

FWIW, it’s absolutely clear to me that "." is a special character which has
to be quoted in order to be in an identifier. In other words, a.b.c is
three identifiers separated by two period punctuation marks; what exactly
those periods mean is another question. If somebody uses periods in their
names, they have to quote those names just as if they used capital letters
etc.

But that's just my impression. I comment at all because I remember looking
at something to do with the grammar (I think I wanted to implement ALTER …
RENAME TO newschema.newname) and noticed that a database name could be
given in the syntax.

Re: pg14 psql broke \d datname.nspname.relname

2021-10-11 Thread Mark Dilger

> On Oct 11, 2021, at 3:37 PM, Tom Lane  wrote:
> 
>> REL_13_STABLE appears to accept any amount of nonsense you like:
> 
> Yeah, I'm pretty sure that the old rule was to just ignore whatever
> appeared in the database-name position.  While we could tighten that
> up to insist that it match the current DB's name, I'm not sure that
> I see the point.  There's no near-term prospect of doing anything
> useful with some other DB's name there, so being more restrictive
> seems like it'll probably break peoples' scripts to little purpose.

You appear correct about the old behavior.  It's unclear how intentional it 
was.  There was a schema buffer and a name buffer, and while parsing the name, 
if a dot was encountered, the contents just parsed were copied into the schema 
buffer.  If multiple dots were encountered, that had the consequence of blowing 
away the earlier ones.

But since we allow tables and schemas with dotted names in them, I'm uncertain 
what  \d foo.bar.baz is really asking.  That could be "foo.bar"."baz", or 
"foo"."bar"."baz", or "foo"."bar.baz", or even "public"."foo.bar.baz".  The old 
behavior seems a bit dangerous.  There may be tables with all those names, and 
the user may not have meant the one that we gave them.

The v14 code is no better.  It just assumes that is "foo"."bar.baz".  So (with 
debugging statements included):

foo=# create table "foo.bar.baz" (i integer);
CREATE TABLE
foo=# \d public.foo.bar.baz
Converting "public.foo.bar.baz"
GOT "^(public)$" . "^(foo.bar.baz)$"
Table "public.foo.bar.baz"
 Column |  Type   | Collation | Nullable | Default 
+-+---+--+-
 i  | integer |   |  | 

I expect I'll have to submit a patch restoring the old behavior, but I wonder 
if that's the best direction to go.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: pg14 psql broke \d datname.nspname.relname

2021-10-11 Thread Justin Pryzby

On Mon, Oct 11, 2021 at 02:47:59PM -0700, Mark Dilger wrote:
> > |$ LD_LIBRARY_PATH=tmp_install/usr/local/pgsql/lib/ src/bin/psql/psql -h 
> > /tmp regression
> > |psql (15devel)
> > |Type "help" for help.
> > |regression=# \d regresion.public.bit_defaults
> > |Did not find any relation named "regresion.public.bit_defaults".
> > |regression=# \d public.bit_defaults
> > | Table "public.bit_defaults"
> > |...
> > 
> > This worked before v14 (even though the commit message says otherwise).
> > 
> > |$ /usr/lib/postgresql/13/bin/psql -h /tmp regression
> > |psql (13.2 (Debian 13.2-1.pgdg100+1), server 15devel)
> > |...
> > |regression=# \d regresion.public.bit_defaults
> > | Table "public.bit_defaults"
> > |...
> 
> I can only assume that you are intentionally misspelling "regression" as 
> "regresion" (with only one "s") as part of the test.  I have not checked if 
> that worked before v14, but if it ignored the misspelled database name before 
> v14, and it rejects it now, I'm not sure that counts as a bug. 
> 
> Am I misunderstanding your bug report?

It's not intentional but certainly confusing to put a typo there.
Sorry for that (and good eyes, BTW).

In v15/master:
regression=# \d regression.public.bit_defaults
Did not find any relation named "regression.public.bit_defaults".

After reverting that commit and recompiling psql:
regression=# \d regression.public.bit_defaults
 Table "public.bit_defaults"
...

In v13 psql:
regression=# \d regression.public.bit_defaults
 Table "public.bit_defaults"
...

It looks like before v13 any "datname" prefix was ignored.

But now it fails to show the table because it does:

WHERE c.relname OPERATOR(pg_catalog.~) '^(public.bit_defaults)$' COLLATE 
pg_catalog.default
  AND n.nspname OPERATOR(pg_catalog.~) '^(regression)$' COLLATE 
pg_catalog.default

-- 
Justin

Re: pg14 psql broke \d datname.nspname.relname

2021-10-11 Thread Tom Lane

Mark Dilger  writes:
>> On Oct 11, 2021, at 3:04 PM, Tom Lane  wrote:
>> Doesn't work with the correct DB name, either:
>> regression=# \d regression.public.bit_defaults
>> Did not find any relation named "regression.public.bit_defaults".

> REL_13_STABLE appears to accept any amount of nonsense you like:

Yeah, I'm pretty sure that the old rule was to just ignore whatever
appeared in the database-name position.  While we could tighten that
up to insist that it match the current DB's name, I'm not sure that
I see the point.  There's no near-term prospect of doing anything
useful with some other DB's name there, so being more restrictive
seems like it'll probably break peoples' scripts to little purpose.

regards, tom lane

Re: pg14 psql broke \d datname.nspname.relname

2021-10-11 Thread Mark Dilger

> On Oct 11, 2021, at 3:26 PM, Justin Pryzby  wrote:
> 
> It looks like before v13 any "datname" prefix was ignored.

The evidence so far suggests that something is broken in v14, but it is less 
clear to me what the appropriate behavior is.  The v14 psql is rejecting even a 
correctly named database.schema.table, but v13 psql accepted 
lots.of.nonsense.schema.table, and neither of those seems at first glance to be 
correct.  But perhaps there are good reasons for ignoring the nonsense prefixes?

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: pg14 psql broke \d datname.nspname.relname

2021-10-11 Thread Mark Dilger




> On Oct 11, 2021, at 3:04 PM, Tom Lane  wrote:
> 
> Doesn't work with the correct DB name, either:
> 
> regression=# \d public.bit_defaults
> Table "public.bit_defaults"
> Column |  Type  | Collation | Nullable |   Default   
> ++---+--+-
> b1 | bit(4) |   |  | '1001'::"bit"
> b2 | bit(4) |   |  | '0101'::"bit"
> b3 | bit varying(5) |   |  | '1001'::bit varying
> b4 | bit varying(5) |   |  | '0101'::"bit"
> 
> regression=# \d regression.public.bit_defaults
> Did not find any relation named "regression.public.bit_defaults".

REL_13_STABLE appears to accept any amount of nonsense you like:

foo=# \d nonesuch.foo.a.b.c.d.bar.baz
  Table "bar.baz"
 Column |  Type   | Collation | Nullable | Default 
+-+---+--+-
 i  | integer |   |  | 


Is this something we're intentionally supporting?  There is no regression test 
covering this, else we'd have seen breakage in the build-farm.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: pg14 psql broke \d datname.nspname.relname

2021-10-11 Thread Tom Lane

Mark Dilger  writes:
> I can only assume that you are intentionally misspelling "regression" as 
> "regresion" (with only one "s") as part of the test.  I have not checked if 
> that worked before v14, but if it ignored the misspelled database name before 
> v14, and it rejects it now, I'm not sure that counts as a bug. 

Doesn't work with the correct DB name, either:

regression=# \d public.bit_defaults
 Table "public.bit_defaults"
 Column |  Type  | Collation | Nullable |   Default   
++---+--+-
 b1 | bit(4) |   |  | '1001'::"bit"
 b2 | bit(4) |   |  | '0101'::"bit"
 b3 | bit varying(5) |   |  | '1001'::bit varying
 b4 | bit varying(5) |   |  | '0101'::"bit"

regression=# \d regression.public.bit_defaults
Did not find any relation named "regression.public.bit_defaults".

regards, tom lane

Re: pg14 psql broke \d datname.nspname.relname

2021-10-11 Thread Mark Dilger




> On Oct 11, 2021, at 2:24 PM, Justin Pryzby  wrote:
> 
> This commit broke psql \d datname.nspname.relname
> 
> commit 2c8726c4b0a496608919d1f78a5abc8c9b6e0868
> Author: Robert Haas 
> Date:   Wed Feb 3 13:19:41 2021 -0500
> 
>Factor pattern-construction logic out of processSQLNamePattern.
> ...
>patternToSQLRegex is a little more general than what is required
>by processSQLNamePattern. That function is only interested in
>patterns that can have up to 2 parts, a schema and a relation;
>but patternToSQLRegex can limit the maximum number of parts to
>between 1 and 3, so that patterns can look like either
>"database.schema.relation", "schema.relation", or "relation"
>depending on how it's invoked and what the user specifies.
> 
>processSQLNamePattern only passes two buffers, so works exactly
>the same as before, always interpreting the pattern as either
>a "schema.relation" pattern or a "relation" pattern. But,
>future callers can use this function in other ways.
> 
> |$ LD_LIBRARY_PATH=tmp_install/usr/local/pgsql/lib/ src/bin/psql/psql -h /tmp 
> regression
> |psql (15devel)
> |Type "help" for help.
> |regression=# \d regresion.public.bit_defaults
> |Did not find any relation named "regresion.public.bit_defaults".
> |regression=# \d public.bit_defaults
> | Table "public.bit_defaults"
> |...
> 
> This worked before v14 (even though the commit message says otherwise).
> 
> |$ /usr/lib/postgresql/13/bin/psql -h /tmp regression
> |psql (13.2 (Debian 13.2-1.pgdg100+1), server 15devel)
> |...
> |regression=# \d regresion.public.bit_defaults
> | Table "public.bit_defaults"
> |...

I can only assume that you are intentionally misspelling "regression" as 
"regresion" (with only one "s") as part of the test.  I have not checked if 
that worked before v14, but if it ignored the misspelled database name before 
v14, and it rejects it now, I'm not sure that counts as a bug. 

Am I misunderstanding your bug report?

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: BUG #17212: pg_amcheck fails on checking temporary relations

2021-10-11 Thread Mark Dilger




> On Oct 11, 2021, at 2:33 PM, Peter Geoghegan  wrote:
> 
> On Mon, Oct 11, 2021 at 1:20 PM Mark Dilger
>  wrote:
>> Ok, I went with this suggestion, and also your earlier suggestion to have a 
>>  in the pg_amcheck docs about using --parent-check and/or 
>> --rootdescend against servers in recovery.
> 
> My concern with --parent-check (and with --rootdescend) had little to
> do with Hot Standby. I suggested using a warning because these options
> alone can pretty much cause bedlam on a production database.

Ok, that makes more sense.  Would you care to rephrase them?  I don't think we 
need another round of patches posted.

> At least
> if they're used carelessly. Again, bt_index_parent_check()'s relation
> level locks will block all DML, as well as VACUUM. That isn't the case
> with any of the other pg_amcheck options, including those that call
> bt_index_check(), and including the heapam verification functionality.
> 
> It's also true that --parent-check won't work in Hot Standby mode, of
> course. So it couldn't hurt to mention that in passing, at the same
> point. But that's a secondary point, at best. We don't need to use a
> warning box because of that.
> 
> Overall, your approach looks good to me. Will Robert take care of
> committing this, or should I?

I'd appreciate if you could fix up the  in the docs and do the commit.

Thanks!

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: BUG #17212: pg_amcheck fails on checking temporary relations

2021-10-11 Thread Peter Geoghegan

On Mon, Oct 11, 2021 at 1:20 PM Mark Dilger
 wrote:
> Ok, I went with this suggestion, and also your earlier suggestion to have a 
>  in the pg_amcheck docs about using --parent-check and/or 
> --rootdescend against servers in recovery.

My concern with --parent-check (and with --rootdescend) had little to
do with Hot Standby. I suggested using a warning because these options
alone can pretty much cause bedlam on a production database. At least
if they're used carelessly. Again, bt_index_parent_check()'s relation
level locks will block all DML, as well as VACUUM. That isn't the case
with any of the other pg_amcheck options, including those that call
bt_index_check(), and including the heapam verification functionality.

It's also true that --parent-check won't work in Hot Standby mode, of
course. So it couldn't hurt to mention that in passing, at the same
point. But that's a secondary point, at best. We don't need to use a
warning box because of that.

Overall, your approach looks good to me. Will Robert take care of
committing this, or should I?

-- 
Peter Geoghegan

pg14 psql broke \d datname.nspname.relname

2021-10-11 Thread Justin Pryzby

This commit broke psql \d datname.nspname.relname

commit 2c8726c4b0a496608919d1f78a5abc8c9b6e0868
Author: Robert Haas 
Date:   Wed Feb 3 13:19:41 2021 -0500

Factor pattern-construction logic out of processSQLNamePattern.
...
patternToSQLRegex is a little more general than what is required
by processSQLNamePattern. That function is only interested in
patterns that can have up to 2 parts, a schema and a relation;
but patternToSQLRegex can limit the maximum number of parts to
between 1 and 3, so that patterns can look like either
"database.schema.relation", "schema.relation", or "relation"
depending on how it's invoked and what the user specifies.

processSQLNamePattern only passes two buffers, so works exactly
the same as before, always interpreting the pattern as either
a "schema.relation" pattern or a "relation" pattern. But,
future callers can use this function in other ways.

|$ LD_LIBRARY_PATH=tmp_install/usr/local/pgsql/lib/ src/bin/psql/psql -h /tmp 
regression
|psql (15devel)
|Type "help" for help.
|regression=# \d regresion.public.bit_defaults
|Did not find any relation named "regresion.public.bit_defaults".
|regression=# \d public.bit_defaults
| Table "public.bit_defaults"
|...

This worked before v14 (even though the commit message says otherwise).

|$ /usr/lib/postgresql/13/bin/psql -h /tmp regression
|psql (13.2 (Debian 13.2-1.pgdg100+1), server 15devel)
|...
|regression=# \d regresion.public.bit_defaults
| Table "public.bit_defaults"
|...

-- 
Justin

Re: pgsql: Adjust configure to insist on Perl version >= 5.8.3.

2021-10-11 Thread Tom Lane

I wrote:
> Andrew Dunstan  writes:
>> Why don't we specify the minimum versions required of these somewhere in
>> the perl code? Perl is pretty good at this.

> configure already checks Test::More's version.  I proposed downthread
> that it should also check IPC::Run, but didn't pull that trigger yet.

Done now.

I found an old note indicating that the reason I chose 0.79 for prairiedog
back in 2017 is that 0.78 failed its self-test on that machine.  0.78 did
pass when I tried it just now on a perlbrew-on-Fedora-34 rig, so I'm not
sure what that was about ... but in any case, it discourages me from
worrying any further about whether a lower minimum could be sane.

regards, tom lane

Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)

2021-10-11 Thread Melanie Plageman

On Fri, Oct 8, 2021 at 1:56 PM Andres Freund  wrote:
>
> Hi,
>
> On 2021-10-01 16:05:31 -0400, Melanie Plageman wrote:
> > From 40c809ad1127322f3462e85be080c10534485f0d Mon Sep 17 00:00:00 2001
> > From: Melanie Plageman 
> > Date: Fri, 24 Sep 2021 17:39:12 -0400
> > Subject: [PATCH v13 1/4] Allow bootstrap process to beinit
> >
> > ---
> >  src/backend/utils/init/postinit.c | 3 +--
> >  1 file changed, 1 insertion(+), 2 deletions(-)
> >
> > diff --git a/src/backend/utils/init/postinit.c 
> > b/src/backend/utils/init/postinit.c
> > index 78bc64671e..fba5864172 100644
> > --- a/src/backend/utils/init/postinit.c
> > +++ b/src/backend/utils/init/postinit.c
> > @@ -670,8 +670,7 @@ InitPostgres(const char *in_dbname, Oid dboid, const 
> > char *username,
> >   EnablePortalManager();
> >
> >   /* Initialize status reporting */
> > - if (!bootstrap)
> > - pgstat_beinit();
> > + pgstat_beinit();
> >
> >   /*
> >* Load relcache entries for the shared system catalogs.  This must 
> > create
> > --
> > 2.27.0
> >
>
> I think it's good to remove more and more of these !bootstrap cases - they
> really make it harder to understand the state of the system at various
> points. Optimizing for the rarely executed bootstrap mode at the cost of
> checks in very common codepaths...

What scope do you suggest for this patch set? A single patch which does
this in more locations (remove !bootstrap) or should I remove this patch
from the patchset?

>
>
>
> > From a709ddb30b2b747beb214f0b13cd1e1816094e6b Mon Sep 17 00:00:00 2001
> > From: Melanie Plageman 
> > Date: Thu, 30 Sep 2021 16:16:22 -0400
> > Subject: [PATCH v13 2/4] Add utility to make tuplestores for pg stat views
> >
> > Most of the steps to make a tuplestore for those pg_stat views requiring
> > one are the same. Consolidate them into a single helper function for
> > clarity and to avoid bugs.
> > ---
> >  src/backend/utils/adt/pgstatfuncs.c | 129 ++--
> >  1 file changed, 44 insertions(+), 85 deletions(-)
> >
> > diff --git a/src/backend/utils/adt/pgstatfuncs.c 
> > b/src/backend/utils/adt/pgstatfuncs.c
> > index ff5aedc99c..513f5aecf6 100644
> > --- a/src/backend/utils/adt/pgstatfuncs.c
> > +++ b/src/backend/utils/adt/pgstatfuncs.c
> > @@ -36,6 +36,42 @@
> >
> >  #define HAS_PGSTAT_PERMISSIONS(role)  (is_member_of_role(GetUserId(), 
> > ROLE_PG_READ_ALL_STATS) || has_privs_of_role(GetUserId(), role))
> >
> > +/*
> > + * Helper function for views with multiple rows constructed from a 
> > tuplestore
> > + */
> > +static Tuplestorestate *
> > +pg_stat_make_tuplestore(FunctionCallInfo fcinfo, TupleDesc *tupdesc)
> > +{
> > + Tuplestorestate *tupstore;
> > + ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
> > + MemoryContext per_query_ctx;
> > + MemoryContext oldcontext;
> > +
> > + /* check to see if caller supports us returning a tuplestore */
> > + if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
> > + ereport(ERROR,
> > + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> > +  errmsg("set-valued function called in 
> > context that cannot accept a set")));
> > + if (!(rsinfo->allowedModes & SFRM_Materialize))
> > + ereport(ERROR,
> > + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> > +  errmsg("materialize mode required, but it is 
> > not allowed in this context")));
> > +
> > + /* Build a tuple descriptor for our result type */
> > + if (get_call_result_type(fcinfo, NULL, tupdesc) != TYPEFUNC_COMPOSITE)
> > + elog(ERROR, "return type must be a row type");
> > +
> > + per_query_ctx = rsinfo->econtext->ecxt_per_query_memory;
> > + oldcontext = MemoryContextSwitchTo(per_query_ctx);
> > +
> > + tupstore = tuplestore_begin_heap(true, false, work_mem);
> > + rsinfo->returnMode = SFRM_Materialize;
> > + rsinfo->setResult = tupstore;
> > + rsinfo->setDesc = *tupdesc;
> > + MemoryContextSwitchTo(oldcontext);
> > + return tupstore;
> > +}
>
> Is pgstattuple the best place for this helper? It's not really pgstatfuncs
> specific...
>
> It also looks vaguely familiar - I wonder if we have a helper roughly like
> this somewhere else already...
>

I don't see a function which is specifically a utility to make a
tuplestore. Looking at the callers of tuplestore_begin_heap(), I notice
very similar code to the function I added in pg_tablespace_databases()
in utils/adt/misc.c, pg_stop_backup_v2() in xlogfuncs.c,
pg_event_trigger_dropped_objects() and pg_event_trigger_ddl_commands in
event_tigger.c, pg_available_extensions in extension.c, etc.

Do you think it makes sense to refactor this code out of all of these
places? If so, where would such a utility function belong?

>
>
> > From e9a5d2a021d429fdbb2daa58ab9d75a069f334d4 Mon Sep 17 00:00:00 2001
> > From: Melanie Plageman 
> > Date: Wed, 29 Sep 2021

Re: BUG #17212: pg_amcheck fails on checking temporary relations

2021-10-11 Thread Mark Dilger



> On Oct 11, 2021, at 12:25 PM, Mark Dilger  
> wrote:
> 
> Your proposal sounds good.  Let me try it and get back to you shortly.

Ok, I went with this suggestion, and also your earlier suggestion to have a 
 in the pg_amcheck docs about using --parent-check and/or 
--rootdescend against servers in recovery.



v3-0001-Fix-BUG-17212.patch
Description: Binary data

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: BUG #17212: pg_amcheck fails on checking temporary relations

2021-10-11 Thread Mark Dilger




> On Oct 11, 2021, at 11:53 AM, Peter Geoghegan  wrote:
> 
> On Mon, Oct 11, 2021 at 11:40 AM Peter Geoghegan  wrote:
>> A NOTICE message is supposed to be surfaced to clients (but not stored
>> in the server log), pretty much by definition.
>> 
>> It's not unreasonable to argue that I was mistaken to ever think that
>> about this particular message. In fact, I suspect that I was.
> 
>>> Somebody could quite reasonably complain about this on a hot standby with 
>>> millions of unlogged relations.  Actual ERROR messages might get lost in 
>>> all the noise.
> 
> How about this: we can just lower the elevel, from NOTICE to DEBUG1.
> We'd then be able to keep the message we have today in
> verify_nbtree.c. We'd also add a matching message (and logic) to
> verify_heapam.c, keeping them consistent.
> 
> I find your argument about spammy messages convincing. But it's no
> less valid for any other user of amcheck. So we really should just fix
> that at the amcheck level. That way you can get rid of the call to
> pg_is_in_recovery() from the SQL statements in pg_amcheck, while still
> fixing everything that needs to be fixed in pg_amcheck.

Your proposal sounds good.  Let me try it and get back to you shortly.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: storing an explicit nonce

2021-10-11 Thread Bruce Momjian

On Mon, Oct 11, 2021 at 01:30:38PM -0400, Stephen Frost wrote:
> Greetings,
> 
> > Keep in mind that in our existing code (not my patch), the LSN is zero
> > for unlogged relations, a fixed value for some GiST index pages, and
> > unchanged for some hint bit changes.  Therefore, while we can include
> > the LSN in the IV because it _might_ help, we can't rely on it.
> 
> Regarding unlogged LSNs at least, I would think that we'd want to
> actually use GetFakeLSNForUnloggedRel() instead of just having it zero'd
> out.  The fixed value for GiST index pages is just during the index

Good idea.  For my patch I had to use a WAL-logged dummy LSN, but for
our use, re-using a fake LSN after a crash seems fine, so we can just
use the existing GetFakeLSNForUnloggedRel().  However, we might need to
use the part of my patch that removes the assumption that unlogged
relations always have zero LSNs, because right now they are only used
for GiST indexes --- I would have to research that more.

> Yes, that's the direction that I was thinking also and specifically with
> XTS as the encryption algorithm to allow us to exclude the LSN but keep
> everything else, and to address the concern around the nonce/tweak/etc
> being the same sometimes across multiple writes.  Another thing to
> consider is if we want to encrypt zero'd page.  There was a point
> brought up that if we do then we are encrypting a fair bit of very
> predictable bytes and that's not great (though there's a fair bit about
> our pages that someone could quite possibly predict anyway based on
> table structures and such...).  I would think that if it's easy enough
> to not encrypt zero'd pages that we should avoid doing so.  Don't recall
> offhand which way zero'd pages were being handled already but thought it
> made sense to mention that as part of this discussion.

Yeah, I wanted to mention that.  I don't see any security difference
between fully-zero pages, pages with headers and no tuples, and pages
with headers and only a few tuples.  If any of those are insecure, they
all are.  Therefore, I don't see any reason to treat them differently.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.

Re: Inconsistency in startup process's MyBackendId and procsignal array registration with ProcSignalInit()

2021-10-11 Thread Bharath Rupireddy

On Mon, Oct 11, 2021 at 8:59 PM Fujii Masao  wrote:
>
> On 2021/10/11 19:46, Bharath Rupireddy wrote:
> > If we do the above, then the problem might arise if somebody calls
> > SICleanupQueue and wants to signal the startup process, the below code
> > (from SICleanupQueue) can't get the startup process backend id. So,
> > the backend id calculation for the startup process can't just be
> > MaxBackends + MyAuxProcType + 1.
> > BackendId his_backendId = (needSig - >procState[0]) + 1;
>
> Attached POC patch illustrates what I'm in mind. ISTM this change
> doesn't prevent SICleanupQueue() from getting right backend ID
> of the startup process. Thought?

I will take a look at it a bit later.

> > It looks like we need to increase the size of the ProcState array by 1
> > at least (for the startup process). Currently the ProcState array
> > doesn't have entries for auxiliary processes, it does have entries for
> > MaxBackends. The startup process is eating up one slot from
> > MaxBackends. Since we need only an extra ProcState array slot for the
> > startup process I think we could just extend its size by 1. Instead of
> > modifying the MaxBackends definition, we can just add 1 (and a comment
> > saying this 1 is for startup process) to  shmInvalBuffer->maxBackends
> > in SInvalShmemSize, CreateSharedInvalidationState. IMO, this has to go
> > in a separate patch and probably in a separate thread. Thoughts?
>
> Agreed.

Posted a patch in a separate thread, please review it.
https://www.postgresql.org/message-id/CALj2ACXZ_o7rcOb7-Rs96P0d%3DEi%2Bnvf_WZ-Meky7Vv%2BnQNFYjQ%40mail.gmail.com

Regards,
Bharath Rupireddy.

Accommodate startup process in a separate ProcState array slot instead of in MaxBackends slots.

2021-10-11 Thread Bharath Rupireddy

Hi,

While working on [1], it is found that currently the ProcState array
doesn't have entries for auxiliary processes, it does have entries for
MaxBackends. But the startup process is eating up one slot from
MaxBackends. We need to increase the size of the ProcState array by 1
at least for the startup process. The startup process uses ProcState
slot via InitRecoveryTransactionEnvironment->SharedInvalBackendInit.
The procState array size is initialized to MaxBackends in
SInvalShmemSize.

The consequence of not fixing this issue is that the database may hit
the error "sorry, too many clients already" soon in
SharedInvalBackendInit.

Attaching a patch to fix this issue. Thoughts?

[1] 
https://www.postgresql.org/message-id/ab6f-46b1-d5c0-603d-8f6680739db4%40oss.nttdata.com

Regards,
Bharath Rupireddy.


v1-0001-Accommodate-startup-process-in-a-separate-ProcSta.patch
Description: Binary data

Re: Corruption with IMMUTABLE functions in index expression.

2021-10-11 Thread Tom Lane

Andres Freund  writes:
> Perhaps we could set pg_index.indisvalid to false initially, and if opening an
> index where pg_index.indisvalid error out with a different error message if
> TransactionIdIsCurrentTransactionId(xmin). And then use an inplace update to
> set indisvalid to true, to avoid the bloat?

I still can't get excited about it ... but yeah, update-in-place would
be enough to remove the bloat objection.  I doubt we need any code
changes beyond changing the indisvalid state.

regards, tom lane

Re: BUG #17212: pg_amcheck fails on checking temporary relations

2021-10-11 Thread Peter Geoghegan

On Mon, Oct 11, 2021 at 11:40 AM Peter Geoghegan  wrote:
> A NOTICE message is supposed to be surfaced to clients (but not stored
> in the server log), pretty much by definition.
>
> It's not unreasonable to argue that I was mistaken to ever think that
> about this particular message. In fact, I suspect that I was.

> > Somebody could quite reasonably complain about this on a hot standby with 
> > millions of unlogged relations.  Actual ERROR messages might get lost in 
> > all the noise.

How about this: we can just lower the elevel, from NOTICE to DEBUG1.
We'd then be able to keep the message we have today in
verify_nbtree.c. We'd also add a matching message (and logic) to
verify_heapam.c, keeping them consistent.

I find your argument about spammy messages convincing. But it's no
less valid for any other user of amcheck. So we really should just fix
that at the amcheck level. That way you can get rid of the call to
pg_is_in_recovery() from the SQL statements in pg_amcheck, while still
fixing everything that needs to be fixed in pg_amcheck.

-- 
Peter Geoghegan

Re: BUG #17212: pg_amcheck fails on checking temporary relations

2021-10-11 Thread Peter Geoghegan

On Mon, Oct 11, 2021 at 11:37 AM Mark Dilger
 wrote:
> The documentation for contrib/amcheck has a paragraph but not a warning box.  
> Should that be changed also?

Maybe. I think that the pg_amcheck situation is a lot worse, because
users could easily interpret --parent-check as an additive thing.
Totally changing the general locking requirements seems like a POLA
violation. Besides, amcheck proper is now very much the low level tool
that most users won't ever bother with.

-- 
Peter Geoghegan

Re: BUG #17212: pg_amcheck fails on checking temporary relations

2021-10-11 Thread Peter Geoghegan

On Mon, Oct 11, 2021 at 11:26 AM Mark Dilger
 wrote:
> That's fine by me, but I was under the impression that people wanted the 
> extraneous noise removed.

A NOTICE message is supposed to be surfaced to clients (but not stored
in the server log), pretty much by definition.

It's not unreasonable to argue that I was mistaken to ever think that
about this particular message. In fact, I suspect that I was.

> Since pg_amcheck can know the command is going to draw a "you can't check 
> that right now" type message, one might argue that it is drawing these 
> notices for no particular benefit.

But technically it *was* checked. That's how I think of it, at least.
If a replica comes out of recovery, and we run pg_amcheck immediately
afterwards, are we now "checking it for real"? I don't think that
distinction is meaningful.

> Somebody could quite reasonably complain about this on a hot standby with 
> millions of unlogged relations.  Actual ERROR messages might get lost in all 
> the noise.

That's a good point.

-- 
Peter Geoghegan

Re: BUG #17212: pg_amcheck fails on checking temporary relations

2021-10-11 Thread Mark Dilger

> On Oct 11, 2021, at 11:33 AM, Peter Geoghegan  wrote:
> 
> I definitely think that it warrants a warning box. This is a huge
> practical difference.
> 
> Note that I'm talking about a standard thing, which there are
> certainly a dozen or more examples of in the docs already. Just grep
> for " " tags to see the existing warning boxes.

Yes, sure, I know they exist.  It's just that I have a vague recollection of a 
discussion on -hackers about whether we should be using them so much.

The documentation for contrib/amcheck has a paragraph but not a warning box.  
Should that be changed also?

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: BUG #17212: pg_amcheck fails on checking temporary relations

2021-10-11 Thread Peter Geoghegan

On Mon, Oct 11, 2021 at 11:29 AM Mark Dilger
 wrote:
> The recently submitted patch already contains a short paragraph for each of 
> these, but not a warning box.  Should I reformat those as warning boxes?  I 
> don't know the current thinking on the appropriateness of that documentation 
> style.

I definitely think that it warrants a warning box. This is a huge
practical difference.

Note that I'm talking about a standard thing, which there are
certainly a dozen or more examples of in the docs already. Just grep
for " " tags to see the existing warning boxes.

-- 
Peter Geoghegan

Re: BUG #17212: pg_amcheck fails on checking temporary relations

2021-10-11 Thread Mark Dilger

> On Oct 11, 2021, at 11:26 AM, Peter Geoghegan  wrote:
> 
> We should have a warning box about this in the pg_amcheck docs. Users
> should think carefully about ever using --parent-check, since it alone
> totally changes the locking requirements (actually --rootdescend will
> do that too, but only because that option also implies
> --parent-check).

The recently submitted patch already contains a short paragraph for each of 
these, but not a warning box.  Should I reformat those as warning boxes?  I 
don't know the current thinking on the appropriateness of that documentation 
style.
—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: BUG #17212: pg_amcheck fails on checking temporary relations

2021-10-11 Thread Peter Geoghegan

On Mon, Oct 11, 2021 at 11:12 AM Peter Geoghegan  wrote:
> Sure, the user might not be happy with --parent-check throwing an
> error on a replica. But in practice most users won't want to do that
> anyway. Even on a primary it's usually not possible as a practical
> matter, because the locking implications are *bad* -- it's just too
> disruptive, for too little extra coverage. And so when --parent-check
> fails on a replica, it really is very likely that the user should just
> not do that. Which is easy: just remove --parent-check, and try again.

We should have a warning box about this in the pg_amcheck docs. Users
should think carefully about ever using --parent-check, since it alone
totally changes the locking requirements (actually --rootdescend will
do that too, but only because that option also implies
--parent-check).

-- 
Peter Geoghegan

Re: BUG #17212: pg_amcheck fails on checking temporary relations

2021-10-11 Thread Mark Dilger

> On Oct 11, 2021, at 11:12 AM, Peter Geoghegan  wrote:
> 
> What's the problem with just having pg_amcheck pass through the notice
> to the user, without it affecting anything else? Why should a simple
> notice message need to affect its return code, or anything else?

That's fine by me, but I was under the impression that people wanted the 
extraneous noise removed.  Since pg_amcheck can know the command is going to 
draw a "you can't check that right now" type message, one might argue that it 
is drawing these notices for no particular benefit.  Somebody could quite 
reasonably complain about this on a hot standby with millions of unlogged 
relations.  Actual ERROR messages might get lost in all the noise.

It's true that these NOTICEs do not change the return code.  I was thinking 
about the ERRORs we get on failed lock acquisition, but that is unrelated.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Corruption with IMMUTABLE functions in index expression.

2021-10-11 Thread Andres Freund

Hi,

On 2021-10-11 12:27:44 -0400, Tom Lane wrote:
> While that could be argued to be a bug, I share David's lack of interest in
> fixing it, because I do not believe that there are any cases where a
> function that accesses an index's subject table is really going to be
> immutable.

> To prevent this access, we'd have to set pg_index.indisvalid false
> initially and then update it to true after the index is built.
> We do do that in CREATE INDEX CONCURRENTLY (so you can make this
> example work by specifying CONCURRENTLY), but I'm unexcited about
> creating bloat in pg_index for the standard case in order to support
> a usage scenario that is going to cause you all sorts of other pain.
> To paraphrase Henry Spencer: if you lie to the planner, it will get
> its revenge.

I agree that there's not much point in making this really "work", but perhaps
we could try to generate a more useful error message, without incurring undue
overhead? I think there've been a few reports of this over the years,
including some internally to postgres, e.g. during catalog table index
rebuilds.

Perhaps we could set pg_index.indisvalid to false initially, and if opening an
index where pg_index.indisvalid error out with a different error message if
TransactionIdIsCurrentTransactionId(xmin). And then use an inplace update to
set indisvalid to true, to avoid the bloat?

Greetings,

Andres Freund

Re: BUG #17212: pg_amcheck fails on checking temporary relations

2021-10-11 Thread Peter Geoghegan

On Mon, Oct 11, 2021 at 10:46 AM Mark Dilger
 wrote:
> > On Oct 11, 2021, at 10:10 AM, Peter Geoghegan  wrote:
> > This mostly looks good to me. Just one thing occurs to me: I suspect
> > that we don't need to call pg_is_in_recovery() from SQL at all. What's
> > wrong with just letting verify_heapam() (the C function from amcheck
> > proper) show those notice messages where appropriate?
>
> I thought a big part of the debate upthread was over exactly this point, that 
> pg_amcheck should not attempt to check (a) temporary relations, (b) indexes 
> that are invalid or unready, and (c) unlogged relations during recovery.

Again, I consider (a) and (b) very similar to each other, but very
dissimilar to (c). Only (a) and (b) are *inherently* not verifiable by
amcheck.

To me, giving pg_amcheck responsibility for only calling amcheck
functions when (a) and (b) are sane is akin to expecting pg_amcheck to
only call bt_index_check() with a B-Tree index. Giving pg_amcheck
these responsibilities is not a case of "pg_amcheck presuming to know
what's best for the user, or too much about amcheck", because amcheck
itself pretty clearly expects this from the user (and always has). The
user is no worse off for having used pg_amcheck rather than calling
amcheck functions from SQL themselves. pg_amcheck is literally just
fulfilling basic expectations held by amcheck, that are pretty much
documented as such.

Sure, the user might not be happy with --parent-check throwing an
error on a replica. But in practice most users won't want to do that
anyway. Even on a primary it's usually not possible as a practical
matter, because the locking implications are *bad* -- it's just too
disruptive, for too little extra coverage. And so when --parent-check
fails on a replica, it really is very likely that the user should just
not do that. Which is easy: just remove --parent-check, and try again.

Most scenarios where --parent-check is useful involve the user already
knowing that there is some corruption. In other words, scenarios where
almost nothing could be considered overkill. Presumably this is very
rare.

> I don't like having pg_amcheck parse the error message that comes back from 
> amcheck.

> How shall we proceed?

What's the problem with just having pg_amcheck pass through the notice
to the user, without it affecting anything else? Why should a simple
notice message need to affect its return code, or anything else?

It's not like I feel very strongly about this question. Ultimately it
probably doesn't matter very much -- if pg_amcheck just can't deal
with these notice messages for some reason, then I can let it go. But
what's the reason? If there is a good reason, then maybe we should
just not have the notice messages (so we would just remove the
existing one from verify_nbtree.c, while still interpreting the case
in the same way -- index has no storage to check, and so is trivially
verified).

-- 
Peter Geoghegan

Re: BUG #17212: pg_amcheck fails on checking temporary relations

2021-10-11 Thread Mark Dilger

> On Oct 11, 2021, at 10:10 AM, Peter Geoghegan  wrote:
> 
> This mostly looks good to me. Just one thing occurs to me: I suspect
> that we don't need to call pg_is_in_recovery() from SQL at all. What's
> wrong with just letting verify_heapam() (the C function from amcheck
> proper) show those notice messages where appropriate?

I thought a big part of the debate upthread was over exactly this point, that 
pg_amcheck should not attempt to check (a) temporary relations, (b) indexes 
that are invalid or unready, and (c) unlogged relations during recovery.

> In general I don't like the idea of making the behavior of pg_amcheck
> conditioned on the state of the system (e.g., whether we're in
> recovery) -- we should just let amcheck throw "invalid option" type
> errors when that's the logical outcome (e.g., when --parent-check is
> used on a replica). To me this seems rather different than not
> checking temporary tables, because that's something that inherently
> won't work. (Also, I consider the index-is-being-built stuff to be
> very similar to the temp table stuff -- same basic situation.)

I don't like having pg_amcheck parse the error message that comes back from 
amcheck.  If amcheck throws an error, pg_amcheck considers that a failure and 
ultimately exists with a non-zero status.  So, if we're going to have amcheck 
handle these cases, it will have to be with a NOTICE (or perhaps a WARNING) 
rather than an error.  That's not what happens now, but if you'd rather we 
fixed this problem that way, I can go do that, or perhaps as the author of the 
bt_*_check functions, you can do that and I can just do the pg_amcheck changes.

How shall we proceed?

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: RFC: compression dictionaries for JSONB

2021-10-11 Thread Matthias van de Meent

On Mon, 11 Oct 2021 at 15:25, Aleksander Alekseev
 wrote:
> Agree, add / change / remove of a column should be handled
> automatically. Just to clarify, by column option do you mean syntax
> like ALTER TABLE ... ALTER COLUMN ... etc, right?

Correct, either SET (option) or maybe using typmod (which is hack-ish,
but could save on some bytes of storage)

> I didn't think of
> extending this part of the syntax. That would be a better choice
> indeed.

> > Overall, I'm glad to see this take off, but I do want some
> > clarifications regarding the direction that this is going towards.
> > [...]
> > Actually, why is it a JSONB_DICTIONARY and not like:
> > CREATE TYPE name AS DICTIONARY (
> >  base_type = JSONB, ...
> > );
> > so that it is possible to use the infrastructure for other things?  For
> > example, perhaps PostGIS geometries could benefit from it -- or even
> > text or xml columns.
>
> So the question is if we want to extend the capabilities of a single
> type, i.e. JSONB, or to add a functionality that would work for the
> various types. I see the following pros and cons of both approaches.
>
> Modifying JSONB may at some point allow to partially decompress only
> the parts of the document that need to be decompressed for a given
> query. However, the compression schema will be probably less
> efficient. There could also be difficulties in respect of backward
> compatibility, and this is going to work only with JSONB.

Assuming this above is option 1. If I understand correctly, this
option was 'adapt the data type so that it understands how to handle a
shared dictionary, decreasing storage requirements'.

> An alternative approach, CREATE TYPE ... AS DICTIONARY OF  or
> something like this would work not only for JSONB, but also for TEXT,
> XML, arrays, and PostGIS. By the way, this was suggested before [1].
> Another advantage here is that all things being equal the compression
> schema could be more efficient. The feature will not affect existing
> types. The main disadvantage is that implementing a partial
> decompression would be very difficult and/or impractical.

Assuming this was the 2nd option. If I understand correctly, this
option is effectively 'adapt or wrap TOAST to understand and handle
dictionaries for dictionary encoding common values'.

> Personally, I would say that the 2nd option, CREATE TYPE ... AS
> DICTIONARY OF , seems to be more useful. To my knowledge, not
> many users would care much about partial decompression, and this is
> the only real advantage of the 1st option I see. Also, this is how
> ZSON is implemented. It doesn't care about the underlying type and
> treats it as a BLOB. Thus the proofs of usefulness I provided above
> are not quite valid for the 1st option. Probably unrelated, but 2nd
> option would be even easier for me to implement since I already solved
> a similar task.
>
> All in all, I suggest focusing on the 2nd option with universal
> compression dictionaries. Naturally, the focus will be on JSONB first.
> But we will be able to extend this functionality for other types as
> well.
>
> Thoughts?

I think that an 'universal dictionary encoder' would be useful, but
that a data type might also have good reason to implement their
replacement methods by themselves for better overall performance (such
as maintaining partial detoast support in dictionaried items, or
overall lower memory footprint, or ...). As such, I'd really
appreciate it if Option 1 is not ruled out by any implementation of
Option 2.

Kind regards,

Matthias van de Meent

Re: storing an explicit nonce

2021-10-11 Thread Stephen Frost

Greetings,

* Bruce Momjian (br...@momjian.us) wrote:
> On Fri, Oct  8, 2021 at 02:34:20PM -0400, Stephen Frost wrote:
> > What I think is missing from this discussion is the fact that, with XTS
> > (and XEX, on which XTS is built), the IV *is* run through a forward
> > cipher function, just as suggested above needs to be done for CBC.  I
> > don't see any reason to doubt that OpenSSL is correctly doing that.
> > 
> > This article shows this pretty clearly:
> > 
> > https://en.wikipedia.org/wiki/Disk_encryption_theory
> > 
> > I don't think that changes the fact that, if we're able to, we should be
> > varying the tweak/IV as often as we can, and including the LSN seems
> > like a good way to do just that.
> 
> Keep in mind that in our existiing code (not my patch), the LSN is zero
> for unlogged relations, a fixed value for some GiST index pages, and
> unchanged for some hint bit changes.  Therefore, while we can include
> the LSN in the IV because it _might_ help, we can't rely on it.

Regarding unlogged LSNs at least, I would think that we'd want to
actually use GetFakeLSNForUnloggedRel() instead of just having it zero'd
out.  The fixed value for GiST index pages is just during the index
build process, as I recall, and that's perhaps less of a concern.  Part
of the point of using XTS is to avoid the issue of the LSN not being
changed when hint bits are, or more generally not being unique in
various cases.

> We probably need to have a discussion about whether LSN and checksum
> should be encrypted on the page.  I think we are currently leaning to no
> encryption for LSN because we can use it as part of the nonce (where is
> it is variable) and encrypting the checksum for rudimenary integrity
> checking.

Yes, that's the direction that I was thinking also and specifically with
XTS as the encryption algorithm to allow us to exclude the LSN but keep
everything else, and to address the concern around the nonce/tweak/etc
being the same sometimes across multiple writes.  Another thing to
consider is if we want to encrypt zero'd page.  There was a point
brought up that if we do then we are encrypting a fair bit of very
predictable bytes and that's not great (though there's a fair bit about
our pages that someone could quite possibly predict anyway based on
table structures and such...).  I would think that if it's easy enough
to not encrypt zero'd pages that we should avoid doing so.  Don't recall
offhand which way zero'd pages were being handled already but thought it
made sense to mention that as part of this discussion.

Thanks,

Stephen

signature.asc
Description: PGP signature

Re: BUG #17212: pg_amcheck fails on checking temporary relations

2021-10-11 Thread Peter Geoghegan

On Mon, Oct 11, 2021 at 9:53 AM Mark Dilger
 wrote:
> > Right. I never meant anything like making a would-be
> > bt_index_parent_check() call into a bt_index_check() call, just
> > because of the state of the system (e.g., it's in recovery). That
> > seems awful, in fact.
>
> Please find attached the latest version of the patch which includes the 
> changes we discussed.

This mostly looks good to me. Just one thing occurs to me: I suspect
that we don't need to call pg_is_in_recovery() from SQL at all. What's
wrong with just letting verify_heapam() (the C function from amcheck
proper) show those notice messages where appropriate?

In general I don't like the idea of making the behavior of pg_amcheck
conditioned on the state of the system (e.g., whether we're in
recovery) -- we should just let amcheck throw "invalid option" type
errors when that's the logical outcome (e.g., when --parent-check is
used on a replica). To me this seems rather different than not
checking temporary tables, because that's something that inherently
won't work. (Also, I consider the index-is-being-built stuff to be
very similar to the temp table stuff -- same basic situation.)

-- 
Peter Geoghegan

Re: storing an explicit nonce

2021-10-11 Thread Bruce Momjian

On Mon, Oct 11, 2021 at 01:01:08PM -0400, Stephen Frost wrote:
> > It is more than just the page format --- it would also be the added
> > code, possible performance impact, and later code maintenance to allow
> > for are a more complex or two different page formats.
> 
> Yes, there is more to it than just the page format, I agree.  I'm still
> of the mind that it's something we're going to get to eventually, if for
> no other reason than that our current page format is certainly not
> perfect and it'd be pretty awesome if we could make improvements to it
> (independently of TDE or anything else discussed currently).

Yes, 100% agree on that.  The good part is that TDE would not be paying
the cost for that.  ;-)

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.

Re: storing an explicit nonce

2021-10-11 Thread Bruce Momjian

On Fri, Oct  8, 2021 at 02:34:20PM -0400, Stephen Frost wrote:
> What I think is missing from this discussion is the fact that, with XTS
> (and XEX, on which XTS is built), the IV *is* run through a forward
> cipher function, just as suggested above needs to be done for CBC.  I
> don't see any reason to doubt that OpenSSL is correctly doing that.
> 
> This article shows this pretty clearly:
> 
> https://en.wikipedia.org/wiki/Disk_encryption_theory
> 
> I don't think that changes the fact that, if we're able to, we should be
> varying the tweak/IV as often as we can, and including the LSN seems
> like a good way to do just that.

Keep in mind that in our existiing code (not my patch), the LSN is zero
for unlogged relations, a fixed value for some GiST index pages, and
unchanged for some hint bit changes.  Therefore, while we can include
the LSN in the IV because it _might_ help, we can't rely on it.

We probably need to have a discussion about whether LSN and checksum
should be encrypted on the page.  I think we are currently leaning to no
encryption for LSN because we can use it as part of the nonce (where is
it is variable) and encrypting the checksum for rudimenary integrity
checking.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.

Re: storing an explicit nonce

2021-10-11 Thread Stephen Frost

Greetings,

* Bruce Momjian (br...@momjian.us) wrote:
> On Thu, Oct  7, 2021 at 11:32:07PM -0400, Stephen Frost wrote:
> > Part of the meeting was specifically about "why are we doing this?" and
> > there were a few different answers- first and foremost was "because
> > people are asking for it", from which followed that, yes, in many cases
> > it's to satisfy an audit or similar requirement which any of the
> > proposed methods would address.  There was further discussion that we
> 
> Yes, Cybertec's experience with their TDE patch's adoption supported
> this.
> 
> > could address *more* cases by providing something better, but the page
> > format changes were weighed against that and the general consensus was
> > that we should attack the simpler problem first and, potentially, gain
> > a solution for 90% of the folks asking for it, and then later see if
> > there's enough interest and desire to attack the remaining 10%.
> 
> It is more than just the page format --- it would also be the added
> code, possible performance impact, and later code maintenance to allow
> for are a more complex or two different page formats.

Yes, there is more to it than just the page format, I agree.  I'm still
of the mind that it's something we're going to get to eventually, if for
no other reason than that our current page format is certainly not
perfect and it'd be pretty awesome if we could make improvements to it
(independently of TDE or anything else discussed currently).

> As an example, I think the online checksum patch failed because it
> wasn't happy with that 90% and went for the extra 10% of restartability,
> but once you saw the 100% solution, the patch was too big and was
> rejected.

I'm, at least, still hopeful that we get the online checksum patch done.
I'm not sure that I agree that this was 'the' reason it didn't make it
in, but I don't think it'd be helpful to tangent this thread to
discussing some other patch.

> > As such, it's just not so simple as "what is 'secure enough'" because it
> > depends on who you're talking to.  Based on the collective discussion at
> > the meeting, XTS is 'secure enough' for the needs of probably 90% of
> > those asking, while the other 10% want better (an AEAD method such as
> > GCM or GCM-SIV).  Therefore, what should we do?  Spend all of the extra
> > resources and engineering effort to address the 10% and maybe not get
> > anything because of the level of difficulty, or go the simpler route
> > first and get the 90%?  Through that lense, the choice seemed reasonably
> > clear, at least to me, hence why I agreed that we should work on an XTS
> > based approach first.
> 
> Yes, that was the conclusion.  I think it helped to have the discussion
> verbally with everyone hearing every word, rather than via email where
> people jump into the discussion not hearing earlier points.

Yes, agreed.  Certainly am hopeful that we are able to have more of
those in the (relatively) near future too!

Thanks!

Stephen


signature.asc
Description: PGP signature

Re: storing an explicit nonce

2021-10-11 Thread Bruce Momjian

On Thu, Oct  7, 2021 at 11:32:07PM -0400, Stephen Frost wrote:
> Part of the meeting was specifically about "why are we doing this?" and
> there were a few different answers- first and foremost was "because
> people are asking for it", from which followed that, yes, in many cases
> it's to satisfy an audit or similar requirement which any of the
> proposed methods would address.  There was further discussion that we

Yes, Cybertec's experience with their TDE patch's adoption supported
this.

> could address *more* cases by providing something better, but the page
> format changes were weighed against that and the general consensus was
> that we should attack the simpler problem first and, potentially, gain
> a solution for 90% of the folks asking for it, and then later see if
> there's enough interest and desire to attack the remaining 10%.

It is more than just the page format --- it would also be the added
code, possible performance impact, and later code maintenance to allow
for are a more complex or two different page formats.

As an example, I think the online checksum patch failed because it
wasn't happy with that 90% and went for the extra 10% of restartability,
but once you saw the 100% solution, the patch was too big and was
rejected.

> As such, it's just not so simple as "what is 'secure enough'" because it
> depends on who you're talking to.  Based on the collective discussion at
> the meeting, XTS is 'secure enough' for the needs of probably 90% of
> those asking, while the other 10% want better (an AEAD method such as
> GCM or GCM-SIV).  Therefore, what should we do?  Spend all of the extra
> resources and engineering effort to address the 10% and maybe not get
> anything because of the level of difficulty, or go the simpler route
> first and get the 90%?  Through that lense, the choice seemed reasonably
> clear, at least to me, hence why I agreed that we should work on an XTS
> based approach first.

Yes, that was the conclusion.  I think it helped to have the discussion
verbally with everyone hearing every word, rather than via email where
people jump into the discussion not hearing earlier points.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.

Re: BUG #17212: pg_amcheck fails on checking temporary relations

2021-10-11 Thread Mark Dilger



> On Oct 6, 2021, at 4:12 PM, Peter Geoghegan  wrote:
> 
>> 
>> Ok, excellent, that was probably the only thing that had me really hung up.  
>> I thought you were still asking for pg_amcheck to filter out the 
>> --parent-check option when in recovery, but if you're not asking for that, 
>> then I might have enough to go on now.
> 
> Sorry about that. I realized my mistake (not specifically addressing
> pg_is_in_recovery()) after I hit "send", and should have corrected the
> record sooner.
> 
>> I was using "downgrading" to mean downgrading from bt_index_parent_check() 
>> to bt_index_check() when pg_is_in_recovery() is true, but you've clarified 
>> that you're not requesting that downgrade, so I think we've now gotten past 
>> the last sticking point about that whole issue.
> 
> Right. I never meant anything like making a would-be
> bt_index_parent_check() call into a bt_index_check() call, just
> because of the state of the system (e.g., it's in recovery). That
> seems awful, in fact.

Please find attached the latest version of the patch which includes the changes 
we discussed.




v2-0001-Fix-BUG-17212.patch
Description: Binary data


—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Corruption with IMMUTABLE functions in index expression.

2021-10-11 Thread Peter Geoghegan

On Mon, Oct 11, 2021 at 9:27 AM Tom Lane  wrote:
> Yeah.  What is happening is that the function's SELECT on the subject
> table is trying to examine the not-yet-valid new index.  While that could
> be argued to be a bug, I share David's lack of interest in fixing it,
> because I do not believe that there are any cases where a function that
> accesses an index's subject table is really going to be immutable.

Right. It might be different if this was something that users
sometimes expect will work, based on some plausible-though-wrong
understanding of expression indexes. But experience suggests that they
don't.

-- 
Peter Geoghegan

Re: Corruption with IMMUTABLE functions in index expression.

2021-10-11 Thread Tom Lane

Tomas Vondra  writes:
> True, but I can't reproduce it. So either the build is broken in some 
> way, or perhaps there's something else going on. What would be quite 
> helpful is a backtrace showing why the error was triggered. i.e. set a 
> breakpoint on the ereport in mdread().

It reproduced as-described for me.  The planner sees the index as
already indisvalid, so it figures it can ask for the tree height:

#0  errfinish (filename=0xa4c15c "md.c", lineno=686, 
funcname=0xac6a98 <__func__.13643> "mdread") at elog.c:515
#1  0x004dc8fb in mdread (reln=, 
forknum=, blocknum=0, buffer=0x7fad931b4f80 "") at md.c:682
#2  0x007fd15c in ReadBuffer_common (smgr=0x1d72140, 
relpersistence=, forkNum=MAIN_FORKNUM, blockNum=0, 
mode=RBM_NORMAL, strategy=, hit=0x7fff63a1d7af)
at bufmgr.c:1003
#3  0x007fdb54 in ReadBufferExtended (reln=0x7fad9c4b69d8, 
forkNum=MAIN_FORKNUM, blockNum=0, mode=, 
strategy=) at ../../../../src/include/utils/rel.h:548
#4  0x005797f5 in _bt_getbuf (rel=0x7fad9c4b69d8, 
blkno=, access=1) at nbtpage.c:878
#5  0x00579bc7 in _bt_getrootheight (rel=rel@entry=0x7fad9c4b69d8)
at nbtpage.c:680
#6  0x0078106a in get_relation_info (root=root@entry=0x1d84b28, 
relationObjectId=59210, inhparent=false, rel=rel@entry=0x1d85290)
at plancat.c:419
#7  0x00785451 in build_simple_rel (root=0x1d84b28, relid=1, 
parent=0x0) at relnode.c:308
#8  0x0075792f in add_base_rels_to_query (root=root@entry=0x1d84b28, 
jtnode=) at initsplan.c:122
#9  0x0075ac68 in query_planner (root=root@entry=0x1d84b28, 
--Type  for more, q to quit, c to continue without paging--q

regards, tom lane

Re: Returning to Postgres community work

2021-10-11 Thread Rushabh Lathia

On Tue, Aug 31, 2021 at 11:24 AM Gurjeet Singh  wrote:

> Hi All,
>
>  I'm very happy to announce that I now work for Supabase [1]. They
> have hired me so that I can participate in, and contribute to the
> Postgres community.
>

Welcome back Gurjeet.


>
> I'm announcing it here in the hopes that more companies feel
> encouraged to contribute to Postgres. For those who don't know my past
> work and involvement in the Postgres community, please see the
> 'PostgreSQL RDBMS' section in my resume [2] (on page 4).
>
> I'm deeply indebted to Supabase for giving me this opportunity to
> work with, and for the Postgres community.
>
> Following is the statement by Paul (CEO) and Anthony (CTO), the
> co-founders of Supabase:
>
> Supabase is a PostgreSQL hosting service that makes PostgreSQL
> incredibly easy to use. Since our inception in 2020 we've benefited
> hugely from the work of the PostgreSQL community.
>
> We've been long-time advocates of PostgreSQL, and we're now in a
> position to contribute back in a tangible way. We're hiring Gurjeet
> with the explicit goal of working on PostgreSQL community
> contributions. We're excited to welcome Gurjeet to the team at
> Supabase.
>
> [1]: https://supabase.io/
> [2]: https://gurjeet.singh.im/GurjeetResume.pdf
>
> PS: Hacker News announcement is at https://news.ycombinator.com/item?id=
>
> Best regards,
> --
> Gurjeet Singh http://gurjeet.singh.im/
>
>
>

-- 
Rushabh Lathia

Re: Corruption with IMMUTABLE functions in index expression.

2021-10-11 Thread Tom Lane

"David G. Johnston"  writes:
> On Monday, October 11, 2021, Prabhat Sahu 
> wrote:
>> While using IMMUTABLE functions in index expression, we are getting below
>> corruption on HEAD.

> That function is not actually immutable (the system doesn’t check whether
> your claim of immutability and the function definition match, its up to you
> to know and specify the correct label for what the function does) so not
> our problem.  Write a trigger instead.

Yeah.  What is happening is that the function's SELECT on the subject
table is trying to examine the not-yet-valid new index.  While that could
be argued to be a bug, I share David's lack of interest in fixing it,
because I do not believe that there are any cases where a function that
accesses an index's subject table is really going to be immutable.

To prevent this access, we'd have to set pg_index.indisvalid false
initially and then update it to true after the index is built.
We do do that in CREATE INDEX CONCURRENTLY (so you can make this
example work by specifying CONCURRENTLY), but I'm unexcited about
creating bloat in pg_index for the standard case in order to support
a usage scenario that is going to cause you all sorts of other pain.
To paraphrase Henry Spencer: if you lie to the planner, it will get
its revenge.

regards, tom lane

Re: Corruption with IMMUTABLE functions in index expression.

2021-10-11 Thread Tomas Vondra





On 10/11/21 18:08, Andrey Borodin wrote:




11 окт. 2021 г., в 20:47, David G. Johnston  
написал(а):

On Monday, October 11, 2021, Prabhat Sahu  wrote:
While using IMMUTABLE functions in index expression, we are getting below 
corruption on HEAD.

That function is not actually immutable (the system doesn’t check whether your 
claim of immutability and the function definition match, its up to you to know 
and specify the correct label for what the function does) so not our problem.  
Write a trigger instead.

+1, but the error is strange. This might be a sign of some wrong assumption 
somewhere. My wild guess is that metapage is read before it was written.



True, but I can't reproduce it. So either the build is broken in some 
way, or perhaps there's something else going on. What would be quite 
helpful is a backtrace showing why the error was triggered. i.e. set a 
breakpoint on the ereport in mdread().



regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Corruption with IMMUTABLE functions in index expression.

2021-10-11 Thread Andrey Borodin




> 11 окт. 2021 г., в 20:47, David G. Johnston  
> написал(а):
> 
> On Monday, October 11, 2021, Prabhat Sahu  
> wrote:
> While using IMMUTABLE functions in index expression, we are getting below 
> corruption on HEAD.
> 
> That function is not actually immutable (the system doesn’t check whether 
> your claim of immutability and the function definition match, its up to you 
> to know and specify the correct label for what the function does) so not our 
> problem.  Write a trigger instead.  
+1, but the error is strange. This might be a sign of some wrong assumption 
somewhere. My wild guess is that metapage is read before it was written.

Best regards, Andrey Borodin.

Re: Proposal: allow database-specific role memberships

2021-10-11 Thread Stephen Frost

Greetings,

* David G. Johnston (david.g.johns...@gmail.com) wrote:
> On Monday, October 11, 2021, Stephen Frost  wrote:
> > I don't think "just don't grant access to those other databases"
> > is actually a proper answer- there is certainly a use-case for "I want
> > user X to have read access to all tables in *this* database, and also
> > allow them to connect to some other database but not have that same
> > level of access there."
> 
> Sure, that has a benefit.  But creating a second user for the other
> database and putting the onus on the user to use the correct credentials
> when logging into a particular database is a valid option  - it is in fact
> the status quo.  Due to the complexity of adding a whole new grant
> dimension to the system the status quo is an appealing option.  Annoyance
> factor aside it technically solves the per-database permissions problem put
> forth.

I disagree entirely that forcing users to have multiple accounts and to
deal with "using the correct one" is at all reasonable.  That's an utter
hack that results in a given user having multiple different accounts-
something that gets really ugly to deal with in enterprise deployments
which use any kind of centralized authentication system.

No, that's not a solution.  Perhaps there's another way to implement
this capability that is simpler than what's proposed here, but saying
"just give each user two accounts" isn't a solution.  Sure, it'll work
for existing released versions of PG, just like there's a lot of things
that people can do to hack around our deficiencies, but that doesn't
change that these are areas which we are lacking and where we should be
trying to provide a proper solution.

Thanks,

Stephen

signature.asc
Description: PGP signature

Re: Corruption with IMMUTABLE functions in index expression.

2021-10-11 Thread David G. Johnston

On Monday, October 11, 2021, Prabhat Sahu 
wrote:
>
> While using IMMUTABLE functions in index expression, we are getting below
> corruption on HEAD.
>

That function is not actually immutable (the system doesn’t check whether
your claim of immutability and the function definition match, its up to you
to know and specify the correct label for what the function does) so not
our problem.  Write a trigger instead.

David J.

Re: Proposal: allow database-specific role memberships

2021-10-11 Thread David G. Johnston

On Monday, October 11, 2021, Stephen Frost  wrote:

>
> I don't think "just don't grant access to those other databases"
> is actually a proper answer- there is certainly a use-case for "I want
> user X to have read access to all tables in *this* database, and also
> allow them to connect to some other database but not have that same
> level of access there."
>

Sure, that has a benefit.  But creating a second user for the other
database and putting the onus on the user to use the correct credentials
when logging into a particular database is a valid option  - it is in fact
the status quo.  Due to the complexity of adding a whole new grant
dimension to the system the status quo is an appealing option.  Annoyance
factor aside it technically solves the per-database permissions problem put
forth.

David J.

Re: Inconsistency in startup process's MyBackendId and procsignal array registration with ProcSignalInit()

2021-10-11 Thread Fujii Masao




On 2021/10/11 19:46, Bharath Rupireddy wrote:

If we do the above, then the problem might arise if somebody calls
SICleanupQueue and wants to signal the startup process, the below code
(from SICleanupQueue) can't get the startup process backend id. So,
the backend id calculation for the startup process can't just be
MaxBackends + MyAuxProcType + 1.
BackendId his_backendId = (needSig - >procState[0]) + 1;


Attached POC patch illustrates what I'm in mind. ISTM this change
doesn't prevent SICleanupQueue() from getting right backend ID
of the startup process. Thought?



It looks like we need to increase the size of the ProcState array by 1
at least (for the startup process). Currently the ProcState array
doesn't have entries for auxiliary processes, it does have entries for
MaxBackends. The startup process is eating up one slot from
MaxBackends. Since we need only an extra ProcState array slot for the
startup process I think we could just extend its size by 1. Instead of
modifying the MaxBackends definition, we can just add 1 (and a comment
saying this 1 is for startup process) to  shmInvalBuffer->maxBackends
in SInvalShmemSize, CreateSharedInvalidationState. IMO, this has to go
in a separate patch and probably in a separate thread. Thoughts?


Agreed.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
diff --git a/src/backend/postmaster/auxprocess.c 
b/src/backend/postmaster/auxprocess.c
index 7452f908b2..3f53138e7f 100644
--- a/src/backend/postmaster/auxprocess.c
+++ b/src/backend/postmaster/auxprocess.c
@@ -138,6 +138,7 @@ AuxiliaryProcessMain(AuxProcType auxtype)
switch (MyAuxProcType)
{
case StartupProcess:
+   MyBackendId = MaxBackends + MyAuxProcType + 1;
StartupProcessMain();
proc_exit(1);
 
diff --git a/src/backend/storage/ipc/sinvaladt.c 
b/src/backend/storage/ipc/sinvaladt.c
index 946bd8e3cb..af7de450fe 100644
--- a/src/backend/storage/ipc/sinvaladt.c
+++ b/src/backend/storage/ipc/sinvaladt.c
@@ -231,7 +231,7 @@ CreateSharedInvalidationState(void)
shmInvalBuffer->maxMsgNum = 0;
shmInvalBuffer->nextThreshold = CLEANUP_MIN;
shmInvalBuffer->lastBackend = 0;
-   shmInvalBuffer->maxBackends = MaxBackends;
+   shmInvalBuffer->maxBackends = MaxBackends + 1;
SpinLockInit(>msgnumLock);
 
/* The buffer[] array is initially all unused, so we need not fill it */
@@ -260,6 +260,9 @@ SharedInvalBackendInit(bool sendOnly)
ProcState  *stateP = NULL;
SISeg  *segP = shmInvalBuffer;
 
+   Assert(MyBackendId == InvalidBackendId ||
+  MyBackendId <= segP->maxBackends);
+
/*
 * This can run in parallel with read operations, but not with write
 * operations, since SIInsertDataEntries relies on lastBackend to set
@@ -267,15 +270,23 @@ SharedInvalBackendInit(bool sendOnly)
 */
LWLockAcquire(SInvalWriteLock, LW_EXCLUSIVE);
 
-   /* Look for a free entry in the procState array */
-   for (index = 0; index < segP->lastBackend; index++)
+   if (MyBackendId == InvalidBackendId)
{
-   if (segP->procState[index].procPid == 0)/* inactive 
slot? */
+   /* Look for a free entry in the procState array */
+   for (index = 0; index < segP->lastBackend; index++)
{
-   stateP = >procState[index];
-   break;
+   if (segP->procState[index].procPid == 0)/* 
inactive slot? */
+   {
+   stateP = >procState[index];
+   break;
+   }
}
}
+   else
+   {
+   stateP = >procState[MyBackendId - 1];
+   Assert(stateP->procPid == 0);
+   }
 
if (stateP == NULL)
{
@@ -298,6 +309,8 @@ SharedInvalBackendInit(bool sendOnly)
}
}
 
+   Assert(MyBackendId == InvalidBackendId ||
+  MyBackendId == (stateP - >procState[0]) + 1);
MyBackendId = (stateP - >procState[0]) + 1;
 
/* Advertise assigned backend ID in MyProc */

Reset snapshot export state on the transaction abort

2021-10-11 Thread Dilip Kumar

As reported at [1], if the transaction is aborted during export
snapshot then ExportInProgress and SavedResourceOwnerDuringExport are
not getting reset and that is throwing an error
"clearing exported snapshot in wrong transaction state" while
executing the next command.  The attached patch clears this state if
the transaction is aborted.

[1] 
https://www.postgresql.org/message-id/CAFiTN-tqopqpfS6HHug2nnOGieJJ_nm-Nvy0WBZ=zpo-lqt...@mail.gmail.com

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
From 673bea5121f0a6acadb1304e4a2b1027f5914e72 Mon Sep 17 00:00:00 2001
From: Dilip Kumar 
Date: Mon, 11 Oct 2021 20:38:58 +0530
Subject: [PATCH v1] Reset snapshot export state during abort

---
 src/backend/access/transam/xact.c   | 3 +++
 src/backend/replication/logical/snapbuild.c | 9 +
 src/include/replication/snapbuild.h | 1 +
 3 files changed, 13 insertions(+)

diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 4cc38f0..60be6bb 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -2698,6 +2698,9 @@ AbortTransaction(void)
 	/* Reset logical streaming state. */
 	ResetLogicalStreamingState();
 
+	/* Reset snapshot export state. */
+	ResetSnapBuildExportSnapshotState();
+
 	/* If in parallel mode, clean up workers and exit parallel mode. */
 	if (IsInParallelMode())
 	{
diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index a54..5127ea5 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -693,6 +693,15 @@ SnapBuildClearExportedSnapshot(void)
 	AbortCurrentTransaction();
 
 	CurrentResourceOwner = SavedResourceOwnerDuringExport;
+	ResetSnapBuildExportSnapshotState();
+}
+
+/*
+ * Reset snapshot export state on the transaction abort.
+ */
+void
+ResetSnapBuildExportSnapshotState(void)
+{
 	SavedResourceOwnerDuringExport = NULL;
 	ExportInProgress = false;
 }
diff --git a/src/include/replication/snapbuild.h b/src/include/replication/snapbuild.h
index de72124..6a1082b 100644
--- a/src/include/replication/snapbuild.h
+++ b/src/include/replication/snapbuild.h
@@ -70,6 +70,7 @@ extern void SnapBuildSnapDecRefcount(Snapshot snap);
 extern Snapshot SnapBuildInitialSnapshot(SnapBuild *builder);
 extern const char *SnapBuildExportSnapshot(SnapBuild *snapstate);
 extern void SnapBuildClearExportedSnapshot(void);
+extern void ResetSnapBuildExportSnapshotState(void);
 
 extern SnapBuildState SnapBuildCurrentState(SnapBuild *snapstate);
 extern Snapshot SnapBuildGetOrBuildSnapshot(SnapBuild *builder,
-- 
1.8.3.1

Re: Proposal: allow database-specific role memberships

2021-10-11 Thread Isaac Morland

On Mon, 11 Oct 2021 at 11:01, Stephen Frost  wrote:

> Having an ability to GRANT predefined roles within a particular database
> is certainly something that I'd considered when adding the pg_read/write
> data roles.  I'm not super thrilled with the idea of adding a column to
> pg_auth_members just for predefined roles though and I'm not sure that
> such role membership makes sense for non-predefined roles.  Would
> welcome input from others as to if that's something that would make
> sense or if folks have asked about that before.  We'd need to carefully
> think through what this means in terms of making sure we don't end up
> with any loops too.
>

I think the ability to grant a role within a particular database would be
useful. For example, imagine I have a dev/testing instance with multiple
databases, each a copy of production modified in some way for different
testing purposes. For example, one might be scrambled data (to make the
testing data non- or less- confidential); another might be limited to data
from the last year (to reduce the size of everything); another might be
limited to 1% of all the customers (to reduce the size in a different way);
and of course these could be combined.

It’s easy to imagine that I might want to grant a user the ability to
connect to all of these databases, but to have different privileges. For
example, maybe they have read_confidential_data in the scrambled database
but not in the reduced-but-not-scrambled databases. But maybe they have a
lesser level of access to these databases, so just using the connect
privilege won't do the job.

I’ve already found it a bit weird that I can set per-role, per-database
settings (e.g search_path), and of course privileges on individual objects,
but not which roles the role is a member of.

I haven’t thought about implementation at all however. The thought occurs
to me that the union of all the role memberships in all the database should
form a directed acyclic graph. In other words, you could not have X a
member of Y (possibly indirectly) in one database while Y is a member of X
in another database; the role memberships in each database would then be a
subset of the complete graph of memberships.

Re: pgsql: Adjust configure to insist on Perl version >= 5.8.3.

2021-10-11 Thread Tom Lane

Andrew Dunstan  writes:
> On 10/9/21 10:25 PM, Noah Misch wrote:
>> You mentioned prairiedog uses IPC::Run 0.79.  That's from 2005.  (Perl 5.8.3
>> is from 2004, and Test::More 0.87 is from 2009.)  I'd just use 0.79 in the
>> README recipe.  IPC::Run is easy to upgrade, so if we find cause to rely on a
>> newer version, I'd be fine updating that requirement.

> Why don't we specify the minimum versions required of these somewhere in
> the perl code? Perl is pretty good at this.

configure already checks Test::More's version.  I proposed downthread
that it should also check IPC::Run, but didn't pull that trigger yet.

regards, tom lane

Re: Proposal: allow database-specific role memberships

2021-10-11 Thread Stephen Frost

Greetings,

* David G. Johnston (david.g.johns...@gmail.com) wrote:
> On Sun, Oct 10, 2021 at 2:29 PM Kenaniah Cerny  wrote:
> 
> > In building off of prior art regarding the 'pg_read_all_data' and
> > 'pg_write_all_data' roles, I would like to propose an extension to roles
> > that would allow for database-specific role memberships (for the purpose of
> > granting database-specific privileges) as an additional layer of
> > abstraction.
> >
> > = Problem =
> >
> > There is currently no mechanism to grant the privileges afforded by the
> > default roles on a per-database basis. This makes it difficult to cleanly
> > accomplish permissions such as 'db_datareader' and 'db_datawriter' (which
> > are database-level roles in SQL Server that respectively grant read and
> > write access within a specific database).
> >
> > The recently-added 'pg_read_all_data' and 'pg_write_all_data' work
> > similarly to 'db_datareader' and 'db_datawriter', but work cluster-wide.
> 
> My first impression is that this is more complex than just restricting
> which databases users are allowed to connect to.  The added flexibility
> this would provide has some benefit but doesn't seem worth the added
> complexity.

Having an ability to GRANT predefined roles within a particular database
is certainly something that I'd considered when adding the pg_read/write
data roles.  I'm not super thrilled with the idea of adding a column to
pg_auth_members just for predefined roles though and I'm not sure that
such role membership makes sense for non-predefined roles.  Would
welcome input from others as to if that's something that would make
sense or if folks have asked about that before.  We'd need to carefully
think through what this means in terms of making sure we don't end up
with any loops too.

Does seem like we'd probably need to change more than just what's
suggested here so that you could, for example, ask "is role X a member
of role Y in database Z" without actually being connected to database Z.
That's just a matter of adding some functions though- the existing
functions would work with just the assumption that you're asking about
within the current database.

I don't think "just don't grant access to those other databases"
is actually a proper answer- there is certainly a use-case for "I want
user X to have read access to all tables in *this* database, and also
allow them to connect to some other database but not have that same
level of access there."

Thanks,

Stephen

signature.asc
Description: PGP signature

Re: Added schema level support for publication.

2021-10-11 Thread vignesh C

On Mon, Oct 11, 2021 at 1:21 PM Greg Nancarrow  wrote:
>
> On Mon, Oct 11, 2021 at 5:39 PM vignesh C  wrote:
> >
> > These comments are fixed in the v38 patch attached.
> >
>
> Thanks for the updates.
> I noticed that these patches don't apply on the latest source (last
> seemed to apply cleanly on HEAD as at about October 6).

I was not able to apply v37 patches, but I was able to apply the v38
version of patches on top of HEAD.

Regards,
Vignesh

Re: pgsql: Adjust configure to insist on Perl version >= 5.8.3.

2021-10-11 Thread Andrew Dunstan



On 10/9/21 10:25 PM, Noah Misch wrote:
> On Sat, Oct 09, 2021 at 04:34:46PM -0400, Tom Lane wrote:
>> Hah ... your backpan link led me to realize the actual problem with
>> Test::More.  It got folded into Test::Simple at some point, and
>> evidently cpanm isn't smart enough to handle a request for a back
>> version in such cases.  But this works:
>>
>> $ cpanm install Test::Simple@0.87_01
>> ...
>> $ perl -MTest::More -e 'print $Test::More::VERSION, "\n";'
>> 0.8701
>>
>> So we oughta recommend that instead.  Now I'm wondering what
>> version of IPC::Run to recommend.
> You mentioned prairiedog uses IPC::Run 0.79.  That's from 2005.  (Perl 5.8.3
> is from 2004, and Test::More 0.87 is from 2009.)  I'd just use 0.79 in the
> README recipe.  IPC::Run is easy to upgrade, so if we find cause to rely on a
> newer version, I'd be fine updating that requirement.
>
>

Why don't we specify the minimum versions required of these somewhere in
the perl code? Perl is pretty good at this.


e.g.


use IPC::Run 0.79;

use Test::More 0.87;


It will choke if the supplied version is older.


We could even put lines like this in a small script that configure could
run.


cheers


andrew

-- 

Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: EXPLAIN(VERBOSE) to CTE with SEARCH BREADTH FIRST fails

2021-10-11 Thread Tom Lane

Markus Winand  writes:
> What I meant is that it was still working on 
> 4ac0f450b698442c3273ddfe8eed0e1a7e56645f, but not on the next 
> (3f50b82639637c9908afa2087de7588450aa866b).

Yeah, silly oversight in that patch.  Will push a fix shortly.

regards, tom lane

Re: More business with $Test::Builder::Level in the TAP tests

2021-10-11 Thread Andrew Dunstan



On 10/10/21 7:18 AM, Michael Paquier wrote:
> On Fri, Oct 08, 2021 at 12:14:57PM -0400, Andrew Dunstan wrote:
>> I think we need to be more explicit about it, especially w.r.t. indirect
>> calls. Every subroutine in the call stack below where you want to error
>> reported as coming from should contain this line.
> Hmm.  I got to think about that for a couple of days, and the
> simplest, still the cleanest, phrasing I can come up with is that:
> This should be incremented by any subroutine part of a stack calling
> test routines from Test::More, like ok() or is().
>
> Perhaps you have a better suggestion?


I would say:

This should be incremented by any subroutine which directly or indirectly 
calls test routines from Test::More, such as ok() or is().


cheers



andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Corruption with IMMUTABLE functions in index expression.

2021-10-11 Thread Prabhat Sahu

Hi All,

While using IMMUTABLE functions in index expression, we are getting below
corruption on HEAD.

postgres=# CREATE TABLE  tab1 (c1 numeric, c2 numeric);
CREATE TABLE

postgres=# INSERT INTO  tab1 values (10, 100);
INSERT 0 1

postgres=# CREATE OR REPLACE FUNCTION func1(var1 numeric)
RETURNS NUMERIC AS $$
DECLARE
result numeric;
BEGIN
 SELECT c2 into result FROM  tab1 WHERE c1=var1;
 RETURN result;
END;
$$ LANGUAGE plpgsql IMMUTABLE;
CREATE FUNCTION

-- When using the IMMUTABLE function in creating an index for the first
time, it is working fine.
postgres=# CREATE INDEX idx1 ON  tab1(func1(c1));
CREATE INDEX

-- Executing the similar query for 2nd time, We are getting the error
postgres=# CREATE INDEX idx2 ON  tab1(func1(c1));
ERROR:  could not read block 0 in file "base/13675/16391": read only 0 of
8192 bytes
CONTEXT:  SQL statement "SELECT c2 FROM  tab1 WHERE c1=var1"
PL/pgSQL function func1(numeric) line 5 at SQL statement

-- 

With Regards,
Prabhat Kumar Sahu
EnterpriseDB: http://www.enterprisedb.com

Re: row filtering for logical replication

2021-10-11 Thread Dilip Kumar

On Wed, Oct 6, 2021 at 2:33 PM Ajin Cherian  wrote:
>
> On Sat, Oct 2, 2021 at 5:44 PM Ajin Cherian  wrote:
> >
> > I have for now also rebased the patch and merged the first 5 patches
> > into 1, and added my changes for the above into the second patch.
>
> I have split the patches back again, just to be consistent with the
> original state of the patches. Sorry for the inconvenience.

Thanks for the updated version of the patch, I was looking into the
latest version and I have a few comments.


+if ((att->attlen == -1 &&
VARATT_IS_EXTERNAL_ONDISK(tmp_new_slot->tts_values[i])) &&
+(!old_slot->tts_isnull[i] &&
+!(VARATT_IS_EXTERNAL_ONDISK(old_slot->tts_values[i]
+{
+tmp_new_slot->tts_values[i] = old_slot->tts_values[i];
+newtup_changed = true;
+}

If the attribute is stored EXTERNAL_ONDIS on the new tuple and it is
not null in the old tuple then it must be logged completely in the old
tuple, so instead of checking
!(VARATT_IS_EXTERNAL_ONDISK(old_slot->tts_values[i]), it should be
asserted,


+heap_deform_tuple(newtuple, desc, new_slot->tts_values,
new_slot->tts_isnull);
+heap_deform_tuple(oldtuple, desc, old_slot->tts_values,
old_slot->tts_isnull);
+
+if (newtup_changed)
+tmpnewtuple = heap_form_tuple(desc, tmp_new_slot->tts_values,
new_slot->tts_isnull);
+
+old_matched = pgoutput_row_filter(relation, NULL, oldtuple, entry);
+new_matched = pgoutput_row_filter(relation, NULL,
+  newtup_changed ? tmpnewtuple :
newtuple, entry);

I do not like the fact that, first we have deformed the tuples and we
are again using the HeapTuple
for expression evaluation machinery and later the expression
evaluation we do the deform again.

So why don't you use the deformed tuple as it is to store as a virtual tuple?

Infact, if newtup_changed is true then you are forming back the tuple
just to get it deformed again
in the expression evaluation.

I think I have already given this comment on the last version.


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: EXPLAIN(VERBOSE) to CTE with SEARCH BREADTH FIRST fails

2021-10-11 Thread Markus Winand

> On 11.10.2021, at 16:27, Peter Eisentraut  
> wrote:
> 
> On 11.10.21 12:22, Markus Winand wrote:
>> Both variants work fine before that patch 
>> (4ac0f450b698442c3273ddfe8eed0e1a7e56645f).
> 
> That commit is a message wording patch.  Are you sure you meant that one?
> 

What I meant is that it was still working on 
4ac0f450b698442c3273ddfe8eed0e1a7e56645f, but not on the next 
(3f50b82639637c9908afa2087de7588450aa866b).

-markus

Re: EXPLAIN(VERBOSE) to CTE with SEARCH BREADTH FIRST fails

2021-10-11 Thread Peter Eisentraut


On 11.10.21 12:22, Markus Winand wrote:

Both variants work fine before that patch 
(4ac0f450b698442c3273ddfe8eed0e1a7e56645f).


That commit is a message wording patch.  Are you sure you meant that one?

Re: automatically generating node support functions

2021-10-11 Thread Peter Eisentraut



On 15.09.21 21:01, Peter Eisentraut wrote:

On 17.08.21 16:36, Peter Eisentraut wrote:
Here is another set of preparatory patches that clean up various 
special cases and similar in the node support.


This set of patches has been committed.  I'll close this commit fest 
entry and come back with the main patch series in the future.


Here is an updated version of my original patch, so we have something to 
continue the discussion around.  This takes into account all the 
preparatory patches that have been committed in the meantime.  I have 
also changed it so that the array size of a pointer is now explicitly 
declared using pg_node_attr(array_size(N)) instead of picking the most 
recent scalar field, which was admittedly hacky.  I have also added MSVC 
build support and made the Perl code more portable, so that the cfbot 
doesn't have to be sad.
From 86783a117c1542e292d9f12052e33d75e6107d92 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Mon, 11 Oct 2021 10:55:21 +0200
Subject: [PATCH v2] Automatically generate node support functions

Add a script to automatically generate the node support functions
(copy, equal, out, and read, as well as the node tags enum) from the
struct definitions.

For each of the four node support files, it creates two include files,
e.g., copyfuncs.inc1.c and copyfuncs.inc2.c, to include in the main
file.  All the scaffolding of the main file stays in place.

TODO: In this patch, I have only ifdef'ed out the code to could be
removed, mainly so that it won't constantly have merge conflicts.
Eventually, that should all be changed to delete the code.  When we do
that, some code comments should probably be preserved elsewhere, so
that will need another pass of consideration.

I have tried to mostly make the coverage of the output match what is
currently there.  For example, one could now do out/read coverage of
utility statement nodes, but I have manually excluded those for now.
The reason is mainly that it's easier to diff the before and after,
and adding a bunch of stuff like this might require a separate
analysis and review.

Subtyping (TidScan -> Scan) is supported.

For the hard cases, you can just write a manual function and exclude
generating one.  For the not so hard cases, there is a way of
annotating struct fields to get special behaviors.  For example,
pg_node_attr(equal_ignore) has the field ignored in equal functions.

Discussion: 
https://www.postgresql.org/message-id/flat/c1097590-a6a4-486a-64b1-e1f9cc0533ce%40enterprisedb.com
---
 src/backend/Makefile|   8 +-
 src/backend/nodes/.gitignore|   3 +
 src/backend/nodes/Makefile  |  46 ++
 src/backend/nodes/copyfuncs.c   |  15 +
 src/backend/nodes/equalfuncs.c  |  21 +-
 src/backend/nodes/gen_node_stuff.pl | 642 
 src/backend/nodes/outfuncs.c|  23 +
 src/backend/nodes/readfuncs.c   |  19 +-
 src/include/nodes/.gitignore|   2 +
 src/include/nodes/nodes.h   |   8 +
 src/include/nodes/parsenodes.h  |   2 +-
 src/include/nodes/pathnodes.h   | 132 +++---
 src/include/nodes/plannodes.h   |  90 ++--
 src/include/nodes/primnodes.h   |  20 +-
 src/include/pg_config_manual.h  |   4 +-
 src/include/utils/rel.h |   6 +-
 src/tools/msvc/Solution.pm  |  46 ++
 17 files changed, 954 insertions(+), 133 deletions(-)
 create mode 100644 src/backend/nodes/.gitignore
 create mode 100644 src/backend/nodes/gen_node_stuff.pl
 create mode 100644 src/include/nodes/.gitignore

diff --git a/src/backend/Makefile b/src/backend/Makefile
index 0da848b1fd..a33db1ae01 100644
--- a/src/backend/Makefile
+++ b/src/backend/Makefile
@@ -143,11 +143,15 @@ storage/lmgr/lwlocknames.h: 
storage/lmgr/generate-lwlocknames.pl storage/lmgr/lw
 submake-catalog-headers:
$(MAKE) -C catalog distprep generated-header-symlinks
 
+# run this unconditionally to avoid needing to know its dependencies here:
+submake-nodes-headers:
+   $(MAKE) -C nodes distprep generated-header-symlinks
+
 # run this unconditionally to avoid needing to know its dependencies here:
 submake-utils-headers:
$(MAKE) -C utils distprep generated-header-symlinks
 
-.PHONY: submake-catalog-headers submake-utils-headers
+.PHONY: submake-catalog-headers submake-nodes-headers submake-utils-headers
 
 # Make symlinks for these headers in the include directory. That way
 # we can cut down on the -I options. Also, a symlink is automatically
@@ -162,7 +166,7 @@ submake-utils-headers:
 
 .PHONY: generated-headers
 
-generated-headers: $(top_builddir)/src/include/parser/gram.h 
$(top_builddir)/src/include/storage/lwlocknames.h submake-catalog-headers 
submake-utils-headers
+generated-headers: $(top_builddir)/src/include/parser/gram.h 
$(top_builddir)/src/include/storage/lwlocknames.h submake-catalog-headers 
submake-nodes-headers submake-utils-headers
 
 $(top_builddir)/src/include/parser/gram.h: parser/gram.h
prereqdir=`cd '$(dir $<)'

Re: Time to upgrade buildfarm coverage for some EOL'd OSes?

2021-10-11 Thread Mikael Kjellström


On 2021-10-11 10:20, Magnus Hagander wrote:

It was indeed supposed to, but didn't. It has now been done though, so 
git.postgresql.org  should now be compatible 
with ancient OpenSSL.


And curculio is back to life and shows as all green on the status page.

So it's indeed working again.

/Mikael

Re: Add client connection check during the execution of the query

2021-10-11 Thread Maksim Milyutin


On 12.06.2021 07:24, Thomas Munro wrote:


On Fri, Apr 30, 2021 at 2:23 PM Thomas Munro  wrote:

Here's something I wanted to park here to look into for the next
cycle:  it turns out that kqueue's EV_EOF flag also has the right
semantics for this.  That leads to the idea of exposing the event via
the WaitEventSet API, and would the bring
client_connection_check_interval feature to 6/10 of our OSes, up from
2/10.  Maybe Windows' FD_CLOSE event could get us up to 7/10, not
sure.

Rebased.  Added documentation tweak and a check to reject the GUC on
unsupported OSes.



Good work. I have tested your patch on Linux and FreeBSD on three basic 
cases: client killing, cable breakdown (via manipulations with firewall) 
and silent closing client connection before completion of previously 
started query in asynchronous manner. And all works fine.


Some comments from me:

 bool
 pq_check_connection(void)
 {
-#if defined(POLLRDHUP)
+    WaitEvent events[3];


3 is looks like as magic constant. We might to specify a constant for 
all event groups in FeBeWaitSet.



+    ModifyWaitEvent(FeBeWaitSet, FeBeWaitSetLatchPos, WL_LATCH_SET, NULL);
+    ModifyWaitEvent(FeBeWaitSet, FeBeWaitSetSocketPos, 
WL_SOCKET_CLOSED, NULL);

+    rc = WaitEventSetWait(FeBeWaitSet, 0, events, lengthof(events), 0);
+    ModifyWaitEvent(FeBeWaitSet, FeBeWaitSetLatchPos, WL_LATCH_SET, 
MyLatch);


AFAICS, side effect to 
(FeBeWaitSet->events[FeBeWaitSetSocketPos]).events by setting 
WL_SOCKET_CLOSED value under calling of pq_check_connection() function 
doesn't have negative impact later, does it? That is, all 
WaitEventSetWait() calls have to setup socket events on its own from 
scratch.



--
Regards,
Maksim Milyutin

Re: Double partition lock in bufmgr

2021-10-11 Thread Yura Sokolov

В Пт, 18/12/2020 в 15:20 +0300, Konstantin Knizhnik пишет:
> Hi hackers,
> 
> I am investigating incident with one of out customers: performance of 
> the system isdropped dramatically.
> Stack traces of all backends can be found here: 
> http://www.garret.ru/diag_20201217_102056.stacks_59644
> (this file is 6Mb so I have not attached it to this mail).
> 
> What I have see in this stack traces is that 642 backends and blocked
> in 
> LWLockAcquire,
> mostly in obtaining shared buffer lock:
> 
> #0  0x7f0e7fe7a087 in semop () from /lib64/libc.so.6
> #1  0x00682fb1 in PGSemaphoreLock 
> (sema=sema@entry=0x7f0e1c1f63a0) at pg_sema.c:387
> #2  0x006ed60b in LWLockAcquire (lock=lock@entry=0x7e8b6176d80
> 0, 
> mode=mode@entry=LW_SHARED) at lwlock.c:1338
> #3  0x006c88a7 in BufferAlloc (foundPtr=0x7ffcc3c8de9b
> "\001", 
> strategy=0x0, blockNum=997, forkNum=MAIN_FORKNUM, relpersistence=112 
> 'p', smgr=0x2fb2df8) at bufmgr.c:1177
> #4  ReadBuffer_common (smgr=0x2fb2df8, relpersistence= out>, 
> relkind=, forkNum=forkNum@entry=MAIN_FORKNUM, 
> blockNum=blockNum@entry=997, mode=RBM_NORMAL, strategy=0x0, 
> hit=hit@entry=0x7ffcc3c8df97 "") at bufmgr.c:894
> #5  0x006c928b in ReadBufferExtended (reln=0x32c7ed0, 
> forkNum=forkNum@entry=MAIN_FORKNUM, blockNum=997, 
> mode=mode@entry=RBM_NORMAL, strategy=strategy@entry=0x0) at
> bufmgr.c:753
> #6  0x006c93ab in ReadBuffer (blockNum=, 
> reln=) at bufmgr.c:685
> ...
> 
> Only 11 locks from this 642 are unique.
> Moreover: 358 backends are waiting for one lock and 183 - for another.
> 
> There are two backends (pids 291121 and 285927) which are trying to 
> obtain exclusive lock while already holding another exclusive lock.
> And them block all other backends.
> 
> This is single place in bufmgr (and in postgres) where process tries
> to 
> lock two buffers:
> 
>  /*
>   * To change the association of a valid buffer, we'll need to
> have
>   * exclusive lock on both the old and new mapping partitions.
>   */
>  if (oldFlags & BM_TAG_VALID)
>  {
>  ...
>  /*
>   * Must lock the lower-numbered partition first to avoid
>   * deadlocks.
>   */
>  if (oldPartitionLock < newPartitionLock)
>  {
>  LWLockAcquire(oldPartitionLock, LW_EXCLUSIVE);
>  LWLockAcquire(newPartitionLock, LW_EXCLUSIVE);
>  }
>  else if (oldPartitionLock > newPartitionLock)
>  {
>  LWLockAcquire(newPartitionLock, LW_EXCLUSIVE);
>  LWLockAcquire(oldPartitionLock, LW_EXCLUSIVE);
>  }
> 
> This two backends are blocked in the second lock request.
> I read all connects in bufmgr.c and README file but didn't find 
> explanation why do we need to lock both partitions.
> Why it is not possible first free old buffer (as it is done in 
> InvalidateBuffer) and then repeat attempt to allocate the buffer?
> 
> Yes, it may require more efforts than just "gabbing" the buffer.
> But in this case there is no need to keep two locks.
> 
> I wonder if somebody in the past  faced with the similar symptoms and 
> was this problem with holding locks of two partitions in bufmgr
> already 
> discussed?

Looks like there is no real need for this double lock. And the change to
consequitive lock acquisition really provides scalability gain:
https://bit.ly/3AytNoN

regards
Sokolov Yura
y.soko...@postgrespro.ru
funny.fal...@gmail.com

Re: RFC: compression dictionaries for JSONB

2021-10-11 Thread Aleksander Alekseev

Matthias, Alvaro,

Many thanks for your comments and suggestions!

> Well, I for one would like access to manually add entries to the
> dictionary. What I'm not interested in is being required to manually
> update the dictionary; but the ability to manually insert into the
> dictionary however is much appreciated.

Sure, I see no reason why we can't do it. This would also simplify
splitting the task into smaller ones. We could introduce only manually
updated dictionaries first, and then automate filling them for the
users who need this.

> If stored as strings, they would go out of date when tables or columns
> are renamed or dropped.
> Similarly, you'd want to update the dictionary with common values in
> columns of that type; generally not columns of arbitrary other types.
> You can't in advance know the names of tables and columns, so that
> would add a burden of maintenance to the user when they add / change /
> remove a column of the dictionary type. Instead of storing 'use update
> data from table X column Y' in the type, I think that adding it as a
> column option would be the better choice.

Agree, add / change / remove of a column should be handled
automatically. Just to clarify, by column option do you mean syntax
like ALTER TABLE ... ALTER COLUMN ... etc, right? I didn't think of
extending this part of the syntax. That would be a better choice
indeed.

> I'm a bit on the fence about this. We do use this for sequences, but
> alternatively we might want to use ALTER TYPE jsondict;

Agree, ALTER TYPE seems to be a better choice than SELECT function().
This would make the interface more consistent.

> Overall, I'm glad to see this take off, but I do want some
> clarifications regarding the direction that this is going towards.
> [...]
> Actually, why is it a JSONB_DICTIONARY and not like:
> CREATE TYPE name AS DICTIONARY (
>  base_type = JSONB, ...
> );
> so that it is possible to use the infrastructure for other things?  For
> example, perhaps PostGIS geometries could benefit from it -- or even
> text or xml columns.

So the question is if we want to extend the capabilities of a single
type, i.e. JSONB, or to add a functionality that would work for the
various types. I see the following pros and cons of both approaches.

Modifying JSONB may at some point allow to partially decompress only
the parts of the document that need to be decompressed for a given
query. However, the compression schema will be probably less
efficient. There could also be difficulties in respect of backward
compatibility, and this is going to work only with JSONB.

An alternative approach, CREATE TYPE ... AS DICTIONARY OF  or
something like this would work not only for JSONB, but also for TEXT,
XML, arrays, and PostGIS. By the way, this was suggested before [1].
Another advantage here is that all things being equal the compression
schema could be more efficient. The feature will not affect existing
types. The main disadvantage is that implementing a partial
decompression would be very difficult and/or impractical.

Personally, I would say that the 2nd option, CREATE TYPE ... AS
DICTIONARY OF , seems to be more useful. To my knowledge, not
many users would care much about partial decompression, and this is
the only real advantage of the 1st option I see. Also, this is how
ZSON is implemented. It doesn't care about the underlying type and
treats it as a BLOB. Thus the proofs of usefulness I provided above
are not quite valid for the 1st option. Probably unrelated, but 2nd
option would be even easier for me to implement since I already solved
a similar task.

All in all, I suggest focusing on the 2nd option with universal
compression dictionaries. Naturally, the focus will be on JSONB first.
But we will be able to extend this functionality for other types as
well.

Thoughts?

[1]: 
https://www.postgresql.org/message-id/CANP8%2BjLT8r03LJsw%3DdUSFxBh5pRB%2BUCKvS3BUT-dd4JPRDb3tg%40mail.gmail.com

-- 
Best regards,
Aleksander Alekseev

Re: Drop replslot after pgstat_shutdown cause assert coredump

2021-10-11 Thread Greg Nancarrow

On Mon, Oct 11, 2021 at 6:55 PM houzj.f...@fujitsu.com
 wrote:
>
> I can see the walsender tried to release a not-quite-ready repliaction slot
> that was created when create a subscription. But the pgstat has been shutdown
> before invoking ReplicationSlotRelease().
>
> The stack is as follows:
>
> #2  in ExceptionalCondition (pgstat_is_initialized && !pgstat_is_shutdown)
> #3  in pgstat_assert_is_up () at pgstat.c:4852
> #4  in pgstat_send (msg=msg@entry=0x7ffe716f7470, len=len@entry=144) at 
> pgstat.c:3075
> #5  in pgstat_report_replslot_drop (slotname=slotname@entry=0x7fbcf57a3c98 
> "sub") at pgstat.c:1869
> #6  in ReplicationSlotDropPtr (slot=0x7fbcf57a3c80) at slot.c:696
> #7  in ReplicationSlotDropAcquired () at slot.c:585
> #8  in ReplicationSlotRelease () at slot.c:482
> #9  in ProcKill (code=, arg=) at proc.c:852
> #10 in shmem_exit (code=code@entry=0) at ipc.c:272
> #11 in proc_exit_prepare (code=code@entry=0) at ipc.c:194
> #12 in proc_exit (code=code@entry=0) at ipc.c:107
> #13 in ProcessRepliesIfAny () at walsender.c:1807
> #14 in WalSndWaitForWal (loc=loc@entry=22087632) at walsender.c:1417
> #15 in logical_read_xlog_page (state=0x2f8c600, targetPagePtr=22085632,
> reqLen=, targetRecPtr=, cur_page=0x2f6c1e0 "\016\321\005") at 
> walsender.c:821
> #16 in ReadPageInternal (state=state@entry=0x2f8c600,
> pageptr=pageptr@entry=22085632, reqLen=reqLen@entry=2000) at 
> xlogreader.c:667
> #17 in XLogReadRecord (state=0x2f8c600,
> errormsg=errormsg@entry=0x7ffe716f7f98) at xlogreader.c:337
> #18 in DecodingContextFindStartpoint (ctx=ctx@entry=0x2f8c240)
> at logical.c:606
> #19 in CreateReplicationSlot (cmd=cmd@entry=0x2f1aef0)
>
> Is this behavior expected ?
>

I'd say it's not!

Just looking at the stacktrace, I'm thinking that the following commit
may have had a bearing on this problem, by causing pgstat to be
shutdown earlier:

commit fb2c5028e63589c01fbdf8b031be824766dc7a68
Author: Andres Freund 
Date:   Fri Aug 6 10:05:57 2021 -0700

pgstat: Schedule per-backend pgstat shutdown via before_shmem_exit().


Can you see if the problem can be reproduced prior to this commit?


Regards,
Greg Nancarrow
Fujitsu Australia

Re: Rewriting the test of pg_upgrade as a TAP test - take three - remastered set

2021-10-11 Thread Andrew Dunstan



On 10/10/21 10:07 AM, Peter Eisentraut wrote:
> On 03.10.21 09:03, Michael Paquier wrote:
>> On Sat, Oct 02, 2021 at 11:34:38PM -0400, Tom Lane wrote:
>>> Maybe we could leave test.sh in place for awhile?  I'd rather
>>> not cause a flag day for buildfarm owners.  (Also, how do we
>>> see this working in the back branches?)
>>
>> I would be fine with test.sh staying around for now.
>
> test.sh could be changed to invoke the TAP test.


Keeping test.sh is not necessary - I mis-remembered what the test module
does.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: Error "initial slot snapshot too large" in create replication slot

2021-10-11 Thread Dilip Kumar

On Mon, Oct 11, 2021 at 4:29 PM Kyotaro Horiguchi
 wrote:
>
> At Mon, 11 Oct 2021 11:49:41 +0530, Dilip Kumar  wrote 
> in
> > While creating an "export snapshot" I don't see any protection why the
> > number of xids in the snapshot can not cross the
> > "GetMaxSnapshotXidCount()"?.
> >
> > Basically, while converting the HISTORIC snapshot to the MVCC snapshot
> > in "SnapBuildInitialSnapshot()", we add all the xids between
> > snap->xmin to snap->xmax to the MVCC snap->xip array (xids for which
> > commit were not recorded).  The problem is that we add both topxids as
> > well as the subxids into the same array and expect that the "xid"
> > count does not cross the "GetMaxSnapshotXidCount()".  So it seems like
> > an issue but I am not sure what is the fix for this, some options are
>
> It seems to me that it is a compromise between the restriction of the
> legitimate snapshot and snapshots created by snapbuild.  If the xids
> overflow, the resulting snapshot may lose a siginificant xid, i.e, a
> top-level xid.
>
> > a) Don't limit the xid count in the exported snapshot and dynamically
> > resize the array b) Increase the limit to GetMaxSnapshotXidCount() +
> > GetMaxSnapshotSubxidCount().  But in option b) there would still be a
> > problem that how do we handle the overflowed subtransaction?
>
> I'm afraid that we shouldn't expand the size limits.  If I understand
> it correctly, we only need the top-level xids in the exported snapshot

But since we are using this as an MVCC snapshot, if we don't have the
subxid and if we also don't mark the "suboverflowed" flag then IMHO
the sub-transaction visibility might be wrong, Am I missing something?

> and reorder buffer knows whether a xid is a top-level or not after
> establishing a consistent snapshot.
>
> The attached diff tries to make SnapBuildInitialSnapshot exclude
> subtransaction from generated snapshots.  It seems working fine for
> you repro. (Of course, I'm not confident that it is the correct thing,
> though..)
>
> What do you think about this?

If your statement that we only need top-xids in the exported snapshot,
is true then this fix looks fine to me.   If not then we might need to
add the sub-xids in the snapshot->subxip array and if it crosses the
limit of GetMaxSnapshotSubxidCount(), then we can mark "suboverflowed"
flag.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: Error "initial slot snapshot too large" in create replication slot

2021-10-11 Thread Kyotaro Horiguchi

At Mon, 11 Oct 2021 11:49:41 +0530, Dilip Kumar  wrote 
in 
> While creating an "export snapshot" I don't see any protection why the
> number of xids in the snapshot can not cross the
> "GetMaxSnapshotXidCount()"?.
> 
> Basically, while converting the HISTORIC snapshot to the MVCC snapshot
> in "SnapBuildInitialSnapshot()", we add all the xids between
> snap->xmin to snap->xmax to the MVCC snap->xip array (xids for which
> commit were not recorded).  The problem is that we add both topxids as
> well as the subxids into the same array and expect that the "xid"
> count does not cross the "GetMaxSnapshotXidCount()".  So it seems like
> an issue but I am not sure what is the fix for this, some options are

It seems to me that it is a compromise between the restriction of the
legitimate snapshot and snapshots created by snapbuild.  If the xids
overflow, the resulting snapshot may lose a siginificant xid, i.e, a
top-level xid.

> a) Don't limit the xid count in the exported snapshot and dynamically
> resize the array b) Increase the limit to GetMaxSnapshotXidCount() +
> GetMaxSnapshotSubxidCount().  But in option b) there would still be a
> problem that how do we handle the overflowed subtransaction?

I'm afraid that we shouldn't expand the size limits.  If I understand
it correctly, we only need the top-level xids in the exported snapshot
and reorder buffer knows whether a xid is a top-level or not after
establishing a consistent snapshot.

The attached diff tries to make SnapBuildInitialSnapshot exclude
subtransaction from generated snapshots.  It seems working fine for
you repro. (Of course, I'm not confident that it is the correct thing,
though..)

What do you think about this?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/replication/logical/reorderbuffer.c 
b/src/backend/replication/logical/reorderbuffer.c
index 46e66608cf..4e452cce7c 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -3326,6 +3326,26 @@ ReorderBufferXidHasBaseSnapshot(ReorderBuffer *rb, 
TransactionId xid)
 }
 
 
+/*
+ * ReorderBufferXidIsKnownSubXact
+ * Returns true if the xid is a known subtransaction.
+ */
+bool
+ReorderBufferXidIsKnownSubXact(ReorderBuffer *rb, TransactionId xid)
+{
+   ReorderBufferTXN *txn;
+
+   txn = ReorderBufferTXNByXid(rb, xid, false,
+   NULL, 
InvalidXLogRecPtr, false);
+
+   /* a known subtxn? */
+   if (txn && rbtxn_is_known_subxact(txn))
+   return true;
+
+   return false;
+}
+
+
 /*
  * ---
  * Disk serialization support
diff --git a/src/backend/replication/logical/snapbuild.c 
b/src/backend/replication/logical/snapbuild.c
index a549a8..12d283f4de 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -591,12 +591,18 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
 
if (test == NULL)
{
-   if (newxcnt >= GetMaxSnapshotXidCount())
-   ereport(ERROR,
-   
(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
-errmsg("initial slot snapshot 
too large")));
+   bool issubxact =
+   
ReorderBufferXidIsKnownSubXact(builder->reorder, xid);
 
-   newxip[newxcnt++] = xid;
+   if (!issubxact)
+   {   
+   if (newxcnt >= GetMaxSnapshotXidCount())
+   ereport(ERROR,
+   
(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
+errmsg("initial slot 
snapshot too large")));
+   
+   newxip[newxcnt++] = xid;
+   }
}
 
TransactionIdAdvance(xid);
diff --git a/src/include/replication/reorderbuffer.h 
b/src/include/replication/reorderbuffer.h
index 5b40ff75f7..e5fa1051d7 100644
--- a/src/include/replication/reorderbuffer.h
+++ b/src/include/replication/reorderbuffer.h
@@ -669,6 +669,7 @@ void
ReorderBufferXidSetCatalogChanges(ReorderBuffer *, TransactionId xid, XLog
 bool   ReorderBufferXidHasCatalogChanges(ReorderBuffer *, 
TransactionId xid);
 bool   ReorderBufferXidHasBaseSnapshot(ReorderBuffer *, TransactionId 
xid);
 
+bool   ReorderBufferXidIsKnownSubXact(ReorderBuffer *rb, TransactionId 
xid);
 bool   ReorderBufferRememberPrepareInfo(ReorderBuffer *rb, 
TransactionId xid,

 XLogRecPtr prepare_lsn, XLogRecPtr end_lsn,

Re: Inconsistency in startup process's MyBackendId and procsignal array registration with ProcSignalInit()

2021-10-11 Thread Bharath Rupireddy

On Mon, Oct 11, 2021 at 11:54 AM Fujii Masao
 wrote:
>
> How about modifying SharedInvalBackendInit() so that it accepts
> BackendId as an argument and allocates the ProcState entry of
> the specified BackendId? That is, the startup process determines
> that its BackendId is "MaxBackends + MyAuxProcType (=StartupProcess) + 1"
> in AuxiliaryProcessMain(), and then it passes that BackendId to
> SharedInvalBackendInit() in InitRecoveryTransactionEnvironment().

If we do the above, then the problem might arise if somebody calls
SICleanupQueue and wants to signal the startup process, the below code
(from SICleanupQueue) can't get the startup process backend id. So,
the backend id calculation for the startup process can't just be
MaxBackends + MyAuxProcType + 1.
BackendId his_backendId = (needSig - >procState[0]) + 1;

> Maybe you need to enlarge ProcState array so that it also handles
> auxiliary processes if it does not for now.

It looks like we need to increase the size of the ProcState array by 1
at least (for the startup process). Currently the ProcState array
doesn't have entries for auxiliary processes, it does have entries for
MaxBackends. The startup process is eating up one slot from
MaxBackends. Since we need only an extra ProcState array slot for the
startup process I think we could just extend its size by 1. Instead of
modifying the MaxBackends definition, we can just add 1 (and a comment
saying this 1 is for startup process) to  shmInvalBuffer->maxBackends
in SInvalShmemSize, CreateSharedInvalidationState. IMO, this has to go
in a separate patch and probably in a separate thread. Thoughts?

Regards,
Bharath Rupireddy.

Re: Multi-Column List Partitioning

2021-10-11 Thread Rajkumar Raghuwanshi

Hi Nitin,

While testing further I got a crash with partition wise join enabled for
multi-col list partitions. please find test case & stack-trace below.

SET enable_partitionwise_join TO on;
CREATE TABLE plt1 (c varchar, d varchar) PARTITION BY LIST(c,d);
CREATE TABLE plt1_p1 PARTITION OF plt1 FOR VALUES IN
(('0001','0001'),('0002','0002'),(NULL,NULL));
CREATE TABLE plt1_p2 PARTITION OF plt1 FOR VALUES IN
(('0004','0004'),('0005','0005'),('0006','0006'));
INSERT INTO plt1 SELECT to_char(i % 11, 'FM'), to_char(i % 11,
'FM') FROM generate_series(0, 500) i WHERE i % 11 NOT IN (0,10,3,7,8,9);
INSERT INTO plt1 SELECT NULL,NULL FROM generate_series(0, 500) i WHERE i %
11 IN (3);
ANALYSE plt1;
CREATE TABLE plt2 (c varchar, d varchar) PARTITION BY LIST(c,d);
CREATE TABLE plt2_p1 PARTITION OF plt2 FOR VALUES IN
(('0001','0001'),('0002','0002'));
CREATE TABLE plt2_p2 PARTITION OF plt2 FOR VALUES IN
(('0004','0004'),('0005','0005'),('0006','0006'));
CREATE TABLE plt2_p3 PARTITION OF plt2 DEFAULT;
INSERT INTO plt2 SELECT to_char(i % 11, 'FM'), to_char(i % 11,
'FM') FROM generate_series(0, 500) i WHERE i % 11 NOT IN (0,10,3);
INSERT INTO plt2 SELECT NULL,NULL FROM generate_series(0, 500) i WHERE i %
11 IN (3);
ANALYSE plt2;

EXPLAIN (COSTS OFF)
SELECT t1.c,t2.c,t3.c,t1.d,t2.d,t3.d FROM plt1 t1 INNER JOIN plt2 t2 ON
(t1.c = t2.c AND t1.d = t2.d) LEFT JOIN plt1 t3 on (t2.c = t3.c AND t2.d =
t3.d);

postgres=# EXPLAIN (COSTS OFF)
postgres-# SELECT t1.c,t2.c,t3.c,t1.d,t2.d,t3.d FROM plt1 t1 INNER JOIN
plt2 t2 ON (t1.c = t2.c AND t1.d = t2.d) LEFT JOIN plt1 t3 on (t2.c = t3.c
AND t2.d = t3.d);
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!?> \q
[edb@localhost bin]$ gdb -q -c data/core.66926 postgres
Reading symbols from
/home/edb/WORK/pg_src/PG_TEMP/postgresql/inst/bin/postgres...done.
[New LWP 66926]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `postgres: edb postgres [local] EXPLAIN
 '.
Program terminated with signal 11, Segmentation fault.
#0  0x0082be39 in is_dummy_rel (rel=0x40) at joinrels.c:1221
1221 if (rel->pathlist == NIL)
(gdb) bt
#0  0x0082be39 in is_dummy_rel (rel=0x40) at joinrels.c:1221
#1  0x0089341c in is_dummy_partition (rel=0x2f86e88, part_index=2)
at partbounds.c:1959
#2  0x00891d38 in merge_list_bounds (partnatts=2,
partsupfunc=0x2f70058, partcollation=0x2fd3c98, outer_rel=0x2f86e88,
inner_rel=0x2fd4368, jointype=JOIN_LEFT,
outer_parts=0x7ffea91f8cc0, inner_parts=0x7ffea91f8cb8) at
partbounds.c:1325
#3  0x00891991 in partition_bounds_merge (partnatts=2,
partsupfunc=0x2f70058, partcollation=0x2fd3c98, outer_rel=0x2f86e88,
inner_rel=0x2fd4368, jointype=JOIN_LEFT,
outer_parts=0x7ffea91f8cc0, inner_parts=0x7ffea91f8cb8) at
partbounds.c:1198
#4  0x0082cc5a in compute_partition_bounds (root=0x2f9e910,
rel1=0x2f86e88, rel2=0x2fd4368, joinrel=0x2fae388, parent_sjinfo=0x2f7dfa8,
parts1=0x7ffea91f8cc0,
parts2=0x7ffea91f8cb8) at joinrels.c:1644
#5  0x0082c474 in try_partitionwise_join (root=0x2f9e910,
rel1=0x2f86e88, rel2=0x2fd4368, joinrel=0x2fae388, parent_sjinfo=0x2f7dfa8,
parent_restrictlist=0x2fae650)
at joinrels.c:1402
#6  0x0082b6e2 in populate_joinrel_with_paths (root=0x2f9e910,
rel1=0x2f86e88, rel2=0x2fd4368, joinrel=0x2fae388, sjinfo=0x2f7dfa8,
restrictlist=0x2fae650) at joinrels.c:926
#7  0x0082b135 in make_join_rel (root=0x2f9e910, rel1=0x2f86e88,
rel2=0x2fd4368) at joinrels.c:760
#8  0x0082a643 in make_rels_by_clause_joins (root=0x2f9e910,
old_rel=0x2f86e88, other_rels_list=0x2f90148, other_rels=0x2f90160) at
joinrels.c:312
#9  0x0082a119 in join_search_one_level (root=0x2f9e910, level=3)
at joinrels.c:123
#10 0x0080cd97 in standard_join_search (root=0x2f9e910,
levels_needed=3, initial_rels=0x2f90148) at allpaths.c:3020
#11 0x0080cd10 in make_rel_from_joinlist (root=0x2f9e910,
joinlist=0x2fd7550) at allpaths.c:2951
#12 0x0080899a in make_one_rel (root=0x2f9e910, joinlist=0x2fd7550)
at allpaths.c:228
#13 0x0084516a in query_planner (root=0x2f9e910,
qp_callback=0x84ad85 , qp_extra=0x7ffea91f9140) at
planmain.c:276
#14 0x0084788d in grouping_planner (root=0x2f9e910,
tuple_fraction=0) at planner.c:1447
#15 0x00846f56 in subquery_planner (glob=0x2fa0c08,
parse=0x2f56d30, parent_root=0x0, hasRecursion=false, tuple_fraction=0) at
planner.c:1025
#16 0x0084578b in standard_planner (parse=0x2f56d30,
query_string=0x2eadcd0 "EXPLAIN (COSTS OFF)\nSELECT
t1.c,t2.c,t3.c,t1.d,t2.d,t3.d FROM plt1 t1 INNER JOIN plt2 t2 ON (t1.c =
t2.c AND t1.d = t2.d) LEFT JOIN plt1 t3 on (t2.c = t3.c AND t2.d = t3.d);",
cursorOptions=2048, boundParams=0x0) at planner.c:406
#17 0x00845536 in planner

Re: Inconsistency in startup process's MyBackendId and procsignal array registration with ProcSignalInit()

2021-10-11 Thread Bharath Rupireddy

On Mon, Oct 11, 2021 at 12:41 PM Kyotaro Horiguchi
 wrote:
>
> > > These are the following ways I think we can fix it, if at all some
> > > other hacker agrees that it is actually an issue:
> > > 1) Fix the startup process code, probably by unregistering the
> > > procsignal array entry that was made with ProcSignalInit(MaxBackends +
> > > MyAuxProcType + 1) in AuxiliaryProcessMain() and register with
> > > ProcSignalInit(MyBackendId) immediately after SharedInvalBackendInit()
> > > calculates the MyBackendId in with SharedInvalBackendInit() in
> > > InitRecoveryTransactionEnvironment(). This seems risky to me as
> > > unregistering and registering ProcSignal array involves some barriers
> > > and during the registering and unregistering window, the startup
> > > process may miss the SIGUSR1.
> > > 2) Ensure that the process, that wants to send the startup process
> > > SIGUSR1 signal, doesn't use the backendId from the startup process
> > > PGPROC, in which case it has to loop over all the entries of
> > > ProcSignal->psh_slot[] array to find the entry with the startup
> > > process PID. It seems easier and less riskier but only caveat is that
> > > the sending process shouldn't look at the backendId from auxiliary
> > > process PGPROC, instead it should just traverse the entire proc signal
> > > array to find the right slot.
> > > 3) Add a comment around AuxiliaryPidGetProc() that says "auxiliary
> > > processes don't have valid backend ids, so don't use the backendId
> > > from the returned PGPROC".
> > > (2) and (3) seem reasonable to me. Thoughts?
>
> (I'm not sure how the trouble happens.)

The order in which SharedInvalBackendInit and ProcSignalInit should be
used to keep procState and ProcSignal array entries for backend and
auxiliary processes is as follows:
call SharedInvalBackendInit() and let it calculate the MyBackendId
call ProcSignalInit(MyBackendId);

But for the startup process it does the opposite way, so the procState
and ProcSignal array entries are not in sync.
call ProcSignalInit(MaxBackends + MyAuxProcType + 1);
call SharedInvalBackendInit() and let it calculate the MyBackendId

 If some process wants to send the startup process SIGUSR1 with the
PGPROC->backendId and SendProcSignal(PGPROC->pid, X,
PGPROC->backendId) after getting the PGPROC entry from
AuxiliaryPidGetProc(), then the signal isn't sent. To understand this
issue, please use a sample patch at [1], have a standby setup, call
pg_log_backend_memory_contexts with startup process pid on the
standby, the error "could not send signal to process" is shown, see
[2].

[1]
diff --git a/src/backend/utils/adt/mcxtfuncs.c
b/src/backend/utils/adt/mcxtfuncs.c
index 0d52613bc3..2739591edc 100644
--- a/src/backend/utils/adt/mcxtfuncs.c
+++ b/src/backend/utils/adt/mcxtfuncs.c
@@ -185,6 +185,10 @@ pg_log_backend_memory_contexts(PG_FUNCTION_ARGS)

proc = BackendPidGetProc(pid);

+   /* see if the given process is an auxiliary process. */
+   if (proc == NULL)
+   proc = AuxiliaryPidGetProc(pid);
+
/*
 * BackendPidGetProc returns NULL if the pid isn't valid; but
by the time
 * we reach kill(), a process for which we get a valid proc here might

[2]
postgres=# select pg_log_backend_memory_contexts(726901);
WARNING:  could not send signal to process 726901: No such process
 pg_log_backend_memory_contexts

 f
(1 row)

> > How about modifying SharedInvalBackendInit() so that it accepts
> > BackendId as an argument and allocates the ProcState entry of
> > the specified BackendId? That is, the startup process determines
> > that its BackendId is "MaxBackends + MyAuxProcType (=StartupProcess) +
> > 1"
> > in AuxiliaryProcessMain(), and then it passes that BackendId to
> > SharedInvalBackendInit() in InitRecoveryTransactionEnvironment().
> >
> > Maybe you need to enlarge ProcState array so that it also handles
> > auxiliary processes if it does not for now.
>
> It seems to me that the backendId on startup process is used only for
> vxid generation. Actually "make check-world" doesn't fail by skipping
> SharedInvalBackendinit() (and disabling an assertion).
>
> I thought that we could decouple vxid from backend ID (or sinval code)
> by using pgprocno for vxid generation instead of backend ID. "make
> check-world" doesn't fail with that change, too. (I don't think
> "doesn't fail" ncecessarily mean that that change is correct, though),
> but vxid gets somewhat odd after the change..
>
> =# select distinct virtualxid from pg_locks;
>  virtualxid
> 
>
>  116/1   # startup
>  99/48   # backend 1

I'm not sure we go that path. Others may have better thoughts.

Regards,
Bharath Rupireddy.

Re: EXPLAIN(VERBOSE) to CTE with SEARCH BREADTH FIRST fails

2021-10-11 Thread Markus Winand

Hi!

It seems like this patch causes another problem.

If I explain a simple row generator **without** verbose, it fails:

postgres=# EXPLAIN (VERBOSE FALSE)
   WITH RECURSIVE gen (n)  AS (
  VALUES (1)
   UNION ALL
  SELECT n+1
FROM gen
   WHERE n < 3
   )
   SELECT * FROM gen
   ;
ERROR:  could not find RecursiveUnion for WorkTableScan with wtParam 0

That’s the new error message introduced by the patch.

The same with verbose works just fine:

postgres=# EXPLAIN (VERBOSE TRUE)
   WITH RECURSIVE gen (n)  AS (
  VALUES (1)
   UNION ALL
  SELECT n+1
FROM gen
   WHERE n < 3
   )
   SELECT * FROM gen
   ;
 QUERY PLAN
-
 CTE Scan on gen  (cost=2.95..3.57 rows=31 width=4)
   Output: gen.n
   CTE gen
 ->  Recursive Union  (cost=0.00..2.95 rows=31 width=4)
   ->  Result  (cost=0.00..0.01 rows=1 width=4)
 Output: 1
   ->  WorkTable Scan on gen gen_1  (cost=0.00..0.23 rows=3 width=4)
 Output: (gen_1.n + 1)
 Filter: (gen_1.n < 3)
(9 rows)

Both variants work fine before that patch 
(4ac0f450b698442c3273ddfe8eed0e1a7e56645f).

Markus Winand
winand.at

> On 21.09.2021, at 14:43, torikoshia  wrote:
> 
> On 2021-09-16 08:40, Tom Lane wrote:
>> I wrote:
>>> I do not think that patch is a proper solution, but we do need to do
>>> something about this.
>> I poked into this and decided it's an ancient omission within ruleutils.c.
>> The reason we've not seen it before is probably that you can't get to the
>> case through the parser.  The SEARCH stuff is generating a query structure
>> basically equivalent to
>> regression=# with recursive cte (x,r) as (
>> select 42 as x, row(i, 2.3) as r from generate_series(1,3) i
>> union all
>> select x, row((c.r).f1, 4.5) from cte c
>> )
>> select * from cte;
>> ERROR:  record type has not been registered
>> and as you can see, expandRecordVariable fails to figure out what
>> the referent of "c.r" is.  I think that could be fixed (by looking
>> into the non-recursive term), but given the lack of field demand,
>> I'm not feeling that it's urgent.
>> So the omission is pretty obvious from the misleading comment:
>> actually, Vars referencing RTE_CTE RTEs can also appear in WorkTableScan
>> nodes, and we're not doing anything to support that.  But we only reach
>> this code when trying to resolve a field of a Var of RECORD type, which
>> is a case that it seems like the parser can't produce.
>> It doesn't look too hard to fix: we just have to find the RecursiveUnion
>> that goes with the WorkTableScan, and drill down into that, much as we
>> would do in the CteScan case.  See attached draft patch.  I'm too tired
>> to beat on this heavily or add a test case, but I have verified that it
>> passes check-world and handles the example presented in this thread.
>>  regards, tom lane
> 
> Thanks for looking into this and fixing it!
> 
> -- 
> Regards,
> 
> --
> Atsushi Torikoshi
> NTT DATA CORPORATION
> 
> 
> 
>

1 2 >

1 - 100 of 120 matches

Mail list logo