Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit

2021-01-25 Thread Fujii Masao




On 2021/01/26 16:39, Fujii Masao wrote:



On 2021/01/26 16:33, Bharath Rupireddy wrote:

On Tue, Jan 26, 2021 at 12:54 PM Fujii Masao
 wrote:

On 2021/01/26 16:05, Tom Lane wrote:

Fujii Masao  writes:

Thanks for the review! I fixed them and pushed the patch!


Buildfarm is very not happy ...


Yes I'm investigating that.

   -- Return false as connections are still in use, warnings are issued.
   SELECT postgres_fdw_disconnect_all();
-WARNING:  cannot close dropped server connection because it is still in use
-WARNING:  cannot close connection for server "loopback" because it is still in 
use
   WARNING:  cannot close connection for server "loopback2" because it is still 
in use
+WARNING:  cannot close connection for server "loopback" because it is still in 
use
+WARNING:  cannot close dropped server connection because it is still in use

The cause of the regression test failure is that the order of warning messages
is not stable. So I'm thinking to set client_min_messages to ERROR temporarily
when doing the above test.


Looks like we do suppress warnings/notices by setting
client_min_messages to ERROR/WARNING. For instance, "suppress warning
that depends on wal_level" and  "Suppress NOTICE messages when
users/groups don't exist".


Yes, so I pushed that change to stabilize the regression test.
Let's keep checking how the results of buildfarm members are changed.


+WARNING:  roles created by regression test cases should have names starting with 
"regress_"
 CREATE ROLE multi_conn_user2 SUPERUSER;
+WARNING:  roles created by regression test cases should have names starting with 
"regress_"

Hmm... another failure happened.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit

2021-01-25 Thread Fujii Masao




On 2021/01/26 16:33, Bharath Rupireddy wrote:

On Tue, Jan 26, 2021 at 12:54 PM Fujii Masao
 wrote:

On 2021/01/26 16:05, Tom Lane wrote:

Fujii Masao  writes:

Thanks for the review! I fixed them and pushed the patch!


Buildfarm is very not happy ...


Yes I'm investigating that.

   -- Return false as connections are still in use, warnings are issued.
   SELECT postgres_fdw_disconnect_all();
-WARNING:  cannot close dropped server connection because it is still in use
-WARNING:  cannot close connection for server "loopback" because it is still in 
use
   WARNING:  cannot close connection for server "loopback2" because it is still 
in use
+WARNING:  cannot close connection for server "loopback" because it is still in 
use
+WARNING:  cannot close dropped server connection because it is still in use

The cause of the regression test failure is that the order of warning messages
is not stable. So I'm thinking to set client_min_messages to ERROR temporarily
when doing the above test.


Looks like we do suppress warnings/notices by setting
client_min_messages to ERROR/WARNING. For instance, "suppress warning
that depends on wal_level" and  "Suppress NOTICE messages when
users/groups don't exist".


Yes, so I pushed that change to stabilize the regression test.
Let's keep checking how the results of buildfarm members are changed.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit

2021-01-25 Thread Bharath Rupireddy
On Tue, Jan 26, 2021 at 12:54 PM Fujii Masao
 wrote:
> On 2021/01/26 16:05, Tom Lane wrote:
> > Fujii Masao  writes:
> >> Thanks for the review! I fixed them and pushed the patch!
> >
> > Buildfarm is very not happy ...
>
> Yes I'm investigating that.
>
>   -- Return false as connections are still in use, warnings are issued.
>   SELECT postgres_fdw_disconnect_all();
> -WARNING:  cannot close dropped server connection because it is still in use
> -WARNING:  cannot close connection for server "loopback" because it is still 
> in use
>   WARNING:  cannot close connection for server "loopback2" because it is 
> still in use
> +WARNING:  cannot close connection for server "loopback" because it is still 
> in use
> +WARNING:  cannot close dropped server connection because it is still in use
>
> The cause of the regression test failure is that the order of warning messages
> is not stable. So I'm thinking to set client_min_messages to ERROR temporarily
> when doing the above test.

Looks like we do suppress warnings/notices by setting
client_min_messages to ERROR/WARNING. For instance, "suppress warning
that depends on wal_level" and  "Suppress NOTICE messages when
users/groups don't exist".

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit

2021-01-25 Thread Fujii Masao




On 2021/01/26 16:05, Tom Lane wrote:

Fujii Masao  writes:

Thanks for the review! I fixed them and pushed the patch!


Buildfarm is very not happy ...


Yes I'm investigating that.

 -- Return false as connections are still in use, warnings are issued.
 SELECT postgres_fdw_disconnect_all();
-WARNING:  cannot close dropped server connection because it is still in use
-WARNING:  cannot close connection for server "loopback" because it is still in 
use
 WARNING:  cannot close connection for server "loopback2" because it is still 
in use
+WARNING:  cannot close connection for server "loopback" because it is still in 
use
+WARNING:  cannot close dropped server connection because it is still in use

The cause of the regression test failure is that the order of warning messages
is not stable. So I'm thinking to set client_min_messages to ERROR temporarily
when doing the above test.

Regards,


--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: Add SQL function for SHA1

2021-01-25 Thread Michael Paquier
On Mon, Jan 25, 2021 at 11:27:28PM -0500, Bruce Momjian wrote:
> On Mon, Jan 25, 2021 at 10:23:30PM -0600, Kenneth Marshall wrote:
>> +1 I know that it has been deprecated, but it can be very useful when
>> working with data from pre-deprecation. :) It is annoying to have to
>> resort to plperl or plpython because it is not available. The lack or
>> orthogonality is painful.

plperl and plpython can be annoying to require if you have strong
security requirements as these are untrusted languages, but I don't
completely agree with this argument because pgcrypto gives the option
to use SHA1 with digest(), and this one is fine to have even in
environments under STIG or equally-constrained environments.

> Yes, I think having SHA1 makes sense --- there are probably still valid
> uses for it.

Consistency with the existing in-core SQL functions for cryptohashes
and the possibility to not need pgcrypto are my only arguments at
hand.

;)
--
Michael


signature.asc
Description: PGP signature


Re: [bug fix] ALTER TABLE SET LOGGED/UNLOGGED on a partitioned table does nothing silently

2021-01-25 Thread Kyotaro Horiguchi
At Sat, 26 Dec 2020 01:56:02 +, "tsunakawa.ta...@fujitsu.com" 
 wrote in 
> From: Michael Paquier 
> > On Wed, Dec 09, 2020 at 09:52:17AM -0300, Alvaro Herrera wrote:
> > > Well, that definition seems unfriendly to me.  I prefer the stance
> > > that if you change the value for the parent, then future partitions
> > > inherit that value.
> > 
> > That's indeed more interesting from the user perspective.  So +1 from me.
> 
> As I mentioned as below, some properties apply to that, and some don't.
> 
> --
> That would be right when the storage property is an optional
> specification such as fillfactor.  For example, when I run ALTER
> TABLE mytable SET (fillfactor = 70) and then CREATE TABLE mytable_p1
> PARTITION OF mytable, I find it nice that the fillfactor os
> mytable_p1 is also 70 (but I won't complain if it isn't, since I can
> run ALTER TABLE SET on the parent table again.)
> 
> OTOH, CREATE TABLE and CREATE UNLOGGED TABLE is an explicit request
> to create a logged and unlogged relation respectively.  I feel it a
> strange? if CREATE TABLE mytable_p1 PARTITION OF mytable creates an
> unlogged partition.
> --

"CREATE TABLE" is not "CREATE LOGGED TABLE". We can assume that as
"CREATE  TABLE", where the default logged-ness
varies according to context. Or it might have been so since the
beginning. Currently we don't have the syntax "CREATE LOGGED TABLE",
but we can add that syntax.

> Anyway, I think I'll group ALTER TABLE/INDEX altering actions based
> on some common factors and suggest what would be a desirable
> behavior, asking for opinions.  I'd like to explore the consensus on
> the basic policy for fixes.  Then, I hope we will be able to work on
> fixes for each ALTER action in patches that can be released
> separately.  I'd like to regist requiring all fixes to be arranged
> at once, since that may become a high bar for those who volunteer to
> fix some of the actions.  (Even a committer Alvaro-san struggled to
> fix one action, ALTER TABLE REPLICA IDENTITY.)

I'd vote +1 to:

 ALTER TABLE/INDEX on a parent recurses to all descendants. ALTER
 TABLE/INDEX ONLY doesn't, and the change takes effect on future
 children.

 We pursue relasing all fixes at once but we might release all fixes
 other than some items that cannot be fixed for some technical reasons
 at the time, like REPLICA IDENITTY.

I'm not sure how long we will wait for the time of release, though.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit

2021-01-25 Thread Tom Lane
Fujii Masao  writes:
> Thanks for the review! I fixed them and pushed the patch!

Buildfarm is very not happy ...

regards, tom lane




Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly

2021-01-25 Thread Michael Paquier
On Mon, Jan 25, 2021 at 11:11:38PM +0300, Alexey Kondratov wrote:
> I updated comment with CCI info, did pgindent run and renamed new function
> to SetRelationTableSpace(). New patch is attached.
>
> [...]
>
> Yeah, all these checks we complicated from the beginning. I will try to find
> a better place tomorrow or put more info into the comments at least.

I was reviewing that, and I think that we can do a better
consolidation on several points that will also help the features
discussed on this thread for VACUUM, CLUSTER and REINDEX.

If you look closely, ATExecSetTableSpace() uses the same logic as the
code modified here to check if a relation can be moved to a new
tablespace, with extra checks for mapped relations,
GLOBALTABLESPACE_OID or if attempting to manipulate a temp relation
from another session.  There are two differences though:
- Custom actions are taken between the phase where we check if a
relation can be moved to a new tablespace, and the update of
pg_class.
- ATExecSetTableSpace() needs to be able to set a given relation
relfilenode on top of reltablespace, the newly-created one.

So I think that the heart of the problem is made of two things here:
- We should have one common routine for the existing code paths and
the new code paths able to check if a tablespace move can be done or
not.  The case of a cluster, reindex or vacuum on a list of relations
extracted from pg_class would still require a different handling
as incorrect relations have to be skipped, but the case of individual
relations can reuse the refactoring pieces done here
(see CheckRelationTableSpaceMove() in the attached).
- We need to have a second routine able to update reltablespace and
optionally relfilenode for a given relation's pg_class entry, once the
caller has made sure that CheckRelationTableSpaceMove() validates a
tablespace move.

Please note that was a bug in your previous patch 0002: shared
dependencies need to be registered if reltablespace is updated of
course, but also iff the relation has no physical storage.  So
changeDependencyOnTablespace() requires a check based on
RELKIND_HAS_STORAGE(), or REINDEX would have registered shared
dependencies even for relations with storage, something we don't
want per the recent work done by Alvaro in ebfe2db.
--
Michael
From 265631fb5966ae7b454287c489684c8275b7b001 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Tue, 26 Jan 2021 15:53:06 +0900
Subject: [PATCH v6] Refactor code to detect and process tablespace moves

---
 src/include/commands/tablecmds.h |   4 +
 src/backend/commands/tablecmds.c | 222 ++-
 2 files changed, 131 insertions(+), 95 deletions(-)

diff --git a/src/include/commands/tablecmds.h b/src/include/commands/tablecmds.h
index 08c463d3c4..b3d30acc35 100644
--- a/src/include/commands/tablecmds.h
+++ b/src/include/commands/tablecmds.h
@@ -61,6 +61,10 @@ extern void ExecuteTruncateGuts(List *explicit_rels, List *relids, List *relids_
 
 extern void SetRelationHasSubclass(Oid relationId, bool relhassubclass);
 
+extern bool CheckRelationTableSpaceMove(Relation rel, Oid newTableSpaceId);
+extern void SetRelationTableSpace(Relation rel, Oid newTableSpaceId,
+  Oid newRelFileNode);
+
 extern ObjectAddress renameatt(RenameStmt *stmt);
 
 extern ObjectAddress RenameConstraint(RenameStmt *stmt);
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 8687e9a97c..1f88eebabd 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -3037,6 +3037,120 @@ SetRelationHasSubclass(Oid relationId, bool relhassubclass)
 	table_close(relationRelation, RowExclusiveLock);
 }
 
+/*
+ * CheckRelationTableSpaceMove
+ *		Check if relation can be moved to new tablespace.
+ *
+ * NOTE: caller must be holding an appropriate lock on the relation.
+ * ShareUpdateExclusiveLock is sufficient to prevent concurrent schema
+ * changes.
+ *
+ * Returns true if the relation can be moved to the new tablespace;
+ * false otherwise.
+ */
+bool
+CheckRelationTableSpaceMove(Relation rel, Oid newTableSpaceId)
+{
+	Oid			oldTableSpaceId;
+	Oid			reloid = RelationGetRelid(rel);
+
+	/*
+	 * No work if no change in tablespace.  Note that MyDatabaseTableSpace
+	 * is stored as 0.
+	 */
+	oldTableSpaceId = rel->rd_rel->reltablespace;
+	if (newTableSpaceId == oldTableSpaceId ||
+		(newTableSpaceId == MyDatabaseTableSpace && oldTableSpaceId == 0))
+	{
+		InvokeObjectPostAlterHook(RelationRelationId, reloid, 0);
+		return false;
+	}
+
+	/*
+	 * We cannot support moving mapped relations into different tablespaces.
+	 * (In particular this eliminates all shared catalogs.)
+	 */
+	if (RelationIsMapped(rel))
+		ereport(ERROR,
+(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot move system relation \"%s\"",
+		RelationGetRelationName(rel;
+
+	/* Cannot move a non-shared relation into pg_global */
+	if (newTableSpaceId == GLOBALTABLESPACE_OID)
+		ereport(ERROR,
+

Re: About to add WAL write/fsync statistics to pg_stat_wal view

2021-01-25 Thread Masahiro Ikeda

Hi, David.

Thanks for your comments.

On 2021-01-26 08:48, David G. Johnston wrote:

On Mon, Jan 25, 2021 at 8:03 AM Masahiko Sawada
 wrote:


On Mon, Jan 25, 2021 at 4:51 PM Masahiro Ikeda
 wrote:


Hi, thanks for the reviews.

I updated the attached patch.


Thank you for updating the patch!


Your original email with "total number of times" is more correct,
removing the "of times" and just writing "total number of WAL" is not
good wording.

Specifically, this change is strictly worse than the original.

-   Number of times WAL data was written to disk because WAL
buffers became full
+   Total number of WAL data written to disk because WAL buffers
became full

Both have the flaw that they leave implied exactly what it means to
"write WAL to disk".  It is also unclear whether a counter, bytes, or
both, would be more useful here. I've incorporated this into my
documentation suggestions below:
(wal_buffers_full)

-- Revert - the original was better, though maybe add more detail
similar to the below.  I didn't research exactly how this works.


OK, I understood.
I reverted since this is a counter statistics.



(wal_write)
The number of times WAL buffers were written out to disk via XLogWrite



Thanks.

I thought it's better to omit "The" and "XLogWrite" because other views' 
description
omits "The" and there is no description of "XlogWrite" in the documents. 
What do you think?



-- Seems like this should have a bytes version too


Do you mean that we need to separate statistics for wal write?



(wal_write_time)
The amount of time spent writing WAL buffers to disk, excluding sync
time unless the wal_sync_method is either open_datasync or open_sync.
Units are in milliseconds with microsecond resolution.  This is zero
when track_wal_io_timing is disabled.


Thanks, I'll fix it.



(wal_sync)
The number of times WAL files were synced to disk while
wal_sync_method was set to one of the "sync at commit" options (i.e.,
fdatasync, fsync, or fsync_writethrough).


Thanks, I'll fix it.



-- it is not going to be zero just because those settings are
presently disabled as they could have been enabled at some point since
the last time these statistics were reset.


Right, your description is correct.
The "track_wal_io_timing" has the same limitation, doesn't it?



(wal_sync_time)
The amount of time spent syncing WAL files to disk, in milliseconds
with microsecond resolution.  This requires setting wal_sync_method to
one of the "sync at commit" options (i.e., fdatasync, fsync, or
fsync_writethrough).


Thanks, I'll fix it.
I will add the comments related to "track_wal_io_timing".



Also,

I would suggest extracting the changes to postmaster/pgstat.c and
replication/walreceiver.c to a separate patch as you've fundamentally
changed how it behaves with regards to that function and how it
interacts with the WAL receiver.  That seems an entirely separate
topic warranting its own patch and discussion.


OK, I will separate two patches.


On 2021-01-26 08:52, David G. Johnston wrote:

On Mon, Jan 25, 2021 at 4:37 PM Masahiro Ikeda
 wrote:


I agree with your comments. I think it should report when
reaching the end of WAL too. I add the code to report the stats
when finishing the current WAL segment file when timeout in the
main loop and when reaching the end of WAL.


The following is not an improvement:

- /* Send WAL statistics to the stats collector. */+ /* Send WAL
statistics to stats collector */

The word "the" there makes it proper English.  Your copy-pasting
should have kept the existing good wording in the other locations
rather than replace the existing location with the newly added
incorrect wording.


Thanks, I'll fix it.



This doesn't make sense:

* current WAL segment file to avoid loading stats collector.

Maybe "overloading" or "overwhelming"?

I see you removed the pgstat_send_wal(force) change.  The rest of my
comments on the v6 patch still stand I believe.


Yes, "overloading" is right. Thanks.


Regards,
--
Masahiro Ikeda
NTT DATA CORPORATION




Re: Tid scan improvements

2021-01-25 Thread David Rowley
Thanks for having a look at this.

On Tue, 26 Jan 2021 at 15:48, Zhihong Yu  wrote:
> bq. within this range.  Table AMs where scanning ranges of TIDs does not make
> sense or is difficult to implement efficiently may choose to not implement
>
> Is there criterion on how to judge efficiency ?

For example, if the AM had no way to index such a column and the
method needed to scan the entire table to find TIDs in that range. The
planner may as well just pick a SeqScan. If that's the case, then the
table AM may as well not bother implementing that function.

> +   if (tidopexpr->exprtype == TIDEXPR_LOWER_BOUND)
> ...
> +   if (tidopexpr->exprtype == TIDEXPR_UPPER_BOUND)
>
> The if statement for upper bound should be prefixed with 'else', right ?

Yeah, thanks.

> + * TidRecheck -- access method routine to recheck a tuple in EvalPlanQual
> ...
> +TidRangeRecheck(TidRangeScanState *node, TupleTableSlot *slot)
>
> The method name in the comment doesn't match the real method name.

Well noticed.

> + * ExecTidRangeScan(node)
> ...
> +ExecTidRangeScan(PlanState *pstate)
>
> Parameter names don't match.

hmm. Looking around it seems there's lots of other places that do
this.  I think the the comment is really just indicating that the
function is taking an executor node state as a parameter.

Have a look at: git grep -E "\s\*.*\(node\)$"   that shows the other
places. Some of these have the parameter named "node", and many others
have some other name.

I've made the two changes locally. Since the two issues were mostly
cosmetic, I'll post an updated patch after some bigger changes are
required.

Thanks again for looking at this.

David




Re: Deleting older versions in unique indexes to avoid page splits

2021-01-25 Thread Amit Kapila
On Sat, Jan 23, 2021 at 5:27 AM Peter Geoghegan  wrote:
>
> On Thu, Jan 21, 2021 at 9:23 PM Amit Kapila  wrote:
> > > Slowing down non-HOT updaters in these extreme cases may actually be a
> > > good thing, even when bottom-up deletion finally becomes ineffective.
> > > It can be thought of as backpressure. I am not worried about slowing
> > > down something that is already hopelessly inefficient and
> > > unsustainable. I'd go even further than that, in fact -- I now wonder
> > > if we should *deliberately* slow them down some more!
> > >
> >
> > Do you have something specific in mind for this?
>
> Maybe something a bit like the VACUUM cost delay stuff could be
> applied at the point that we realize that a given bottom-up deletion
> pass is entirely effective purely due to a long running transaction,
> that gets applied by nbtree caller once it splits the page.
>
> This isn't something that I plan to work on anytime soon. My point was
> mostly that it really would make sense to deliberately throttle
> non-hot updates at the point that they trigger page splits that are
> believed to be more or less caused by a long running transaction.
> They're so incredibly harmful to the general responsiveness of the
> system that having a last line of defense like that
> (backpressure/throttling) really does make sense.
>
> > I have briefly tried that but numbers were not consistent probably
> > because at that time autovacuum was also 'on'. So, I tried switching
> > off autovacuum and dropping/recreating the tables.
>
> It's not at all surprising that they weren't consistent. Clearly
> bottom-up deletion wastes cycles on the first execution (it is wasted
> effort in at least one sense) -- you showed that already. Subsequent
> executions will actually manage to delete some tuples (probably a
> great many tuples), and so will have totally different performance
> profiles/characteristics. Isn't that obvious?
>

Yeah, that sounds obvious but what I remembered happening was that at
some point during/before the second update, the autovacuum kicks in
and removes the bloat incurred by the previous update. In few cases,
the autovacuum seems to clean up the bloat and still we seem to be
taking additional time maybe because of some non-helpful cycles by
bottom-up clean-up in the new pass (like second bulk-update for which
we can't clean up anything). Now, this is more of speculation based on
the few runs so I don't expect any response or any action based on it.
I need to spend more time on benchmarking to study the behavior and I
think without that it would be difficult to make a conclusion in this
regard. So, let's not consider any action on this front till I spend
more time to find the details.

I agree with the other points mentioned by you in the email.

-- 
With Regards,
Amit Kapila.




Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit

2021-01-25 Thread Fujii Masao




On 2021/01/26 12:08, Bharath Rupireddy wrote:

On Tue, Jan 26, 2021 at 12:38 AM Fujii Masao
 wrote:

Attaching v17 patch set, please review it further.


Thanks for updating the patch!

Attached is the tweaked version of the patch. I didn't change any logic,
but I updated some comments and docs. Also I added the regresssion test
to check that postgres_fdw_disconnect() closes multiple connections.
Barring any objection, I will commit this version.


Thanks. The patch LGTM, except few typos:
1) in the commit message "a warning messsage is emitted." it's
"message" not "messsage".
2) in the documentation "+   a user mapping, the correspoinding
connections are closed." it's "corresponding" not "correspoinding".


Thanks for the review! I fixed them and pushed the patch!

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: Add SQL function for SHA1

2021-01-25 Thread David Fetter
On Mon, Jan 25, 2021 at 10:12:28PM +0900, Michael Paquier wrote:
> Hi all,
> 
> SHA-1 is now an option available for cryptohashes, and like the
> existing set of functions of SHA-2, I don't really see a reason why we
> should not have a SQL function for SHA1.  Attached is a patch doing
> that.

Thanks for doing this!

While there are applications SHA1 is no longer good for, there are
plenty where it's still in play. One I use frequently is git. While
there are plans for creating an upgrade path to more cryptographically
secure hashes, it will take some years before repositories have
converted over.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate




Re: logical replication worker accesses catalogs in error context callback

2021-01-25 Thread Masahiko Sawada
On Tue, Jan 12, 2021 at 6:33 PM Bharath Rupireddy
 wrote:
>
> On Tue, Jan 12, 2021 at 9:40 AM Masahiko Sawada  wrote:
> >
> > On Mon, Jan 11, 2021 at 4:46 PM Bharath Rupireddy
> >  wrote:
> > >
> > > On Mon, Jan 11, 2021 at 10:56 AM Masahiko Sawada  
> > > wrote:
> > > > Agreed. Attached the updated patch.
> > >
> > > Thanks for the updated patch. Looks like the comment crosses the 80
> > > char limit, I have corrected it. And also changed the variable name
> > > from remotetypeid to remotetypid, so that logicalrep_typmap_gettypname
> > > will not cross the 80 char limit. And also added a commit message.
> > > Attaching v3 patch, please have a look. Both make check and make
> > > check-world passes.
> >
> > Thanks! The change looks good to me.
>
> Thanks!
>
> > > > > I quickly searched in places where error callbacks are being used, I
> > > > > think we need a similar kind of fix for conversion_error_callback in
> > > > > postgres_fdw.c, because get_attname and get_rel_name are being used
> > > > > which do catalogue lookups. ISTM that all the other error callbacks
> > > > > are good in the sense that they are not doing sys catalogue lookups.
> > > >
> > > > Indeed. If we need to disallow the catalog lookup during executing
> > > > error callbacks we might want to have an assertion checking that in
> > > > SearchCatCacheInternal(), in addition to Assert(IsTransactionState()).
> > >
> > > I tried to add(as attached in
> > > v1-0001-Avoid-Sys-Cache-Lookups-in-Error-Callbacks.patch) the
> > > Assert(!error_context_stack); in SearchCatCacheInternal, initdb itself
> > > fails [1]. That means, we are doing a bunch of catalogue access from
> > > error context callbacks. Given this, I'm not quite sure whether we can
> > > have such an assertion in SearchCatCacheInternal.
> >
> > I think checking !error_context_stack is not a correct check if we're
> > executing an error context callback since it's a stack to store
> > callbacks. It can be non-NULL by setting an error callback, see
> > setup_parser_errposition_callback() for instance.
>
> Thanks! Yes, you are right, even though we are not processing the
> actual error callback, the error_context_stack can be non-null, hence
> the Assert(!error_context_stack); doesn't make any sense.
>
> > Probably we need to check if (recursion_depth > 0) and elevel.
> > Attached a patch for that as an example.
>
> IIUC, we must be knowing in SearchCatCacheInternal, whether errstart
> is called with elevel >= ERROR and we have recursion_depth > 0. If
> both are true, then the assertion in SearchCatCacheInternal should
> fail. I see that in your patch, elevel is being fetched from
> errordata, that's fine. What I'm not quite clear is the following
> assumption:
>
> +/* If we doesn't set any error data yet, assume it's an error */
> +if (errordata_stack_depth == -1)
> +return true;
>
> Is it always true that we are in error processing when
> errordata_stack_depth is -1, what happens if errordata_stack_depth <
> -1? Maybe I'm missing something.

You're right. I missed something.

>
> IMHO, adding an assertion in SearchCatCacheInternal(which is a most
> generic code part within the server) with a few error context global
> variables may not be always safe. Because we might miss using the
> error context variables properly. Instead, we could add a comment in
> ErrorContextCallback structure saying something like, "it's not
> recommended to access any system catalogues within an error context
> callback when the callback is expected to be called while processing
> an error, because the transaction might have been broken in that
> case." And let the future callback developers take care of it.
>
> Thoughts?

That sounds good to me. But I still also see the value to add an
assertion in SearchCatCacheInternal(). If we had it, we could find
these two bugs earlier.

Anyway, this seems to be unrelated to this bug fixing so we can start
a new thread for that.

>
> As I said earlier [1], currently only two(there could be more) error
> context callbacks access the sys catalogues, one is in
> slot_store_error_callback which will be fixed with your patch. Another
> is in conversion_error_callback, we can also fix this by storing the
> relname, attname and other required info in ConversionLocation,
> something like the attached. Please have a look.
>
> Thoughts?

Thank you for the patch!

Here are some comments:

+static void set_error_callback_info(ConversionLocation *errpos,
+  Relation rel, int cur_attno,
+  ForeignScanState *fsstate);

I'm concerned a bit that the function name is generic. How about
set_conversion_error_callback_info() or something to make the name
clear?

---
+static void
+conversion_error_callback(void *arg)
+{
+   ConversionLocation *errpos = (ConversionLocation *) arg;
+
+   if (errpos->show_generic_message)
+   errcontext("processing expression at position %d 

Re: Support ALTER SUBSCRIPTION ... ADD/DROP PUBLICATION ... syntax

2021-01-25 Thread japin


Hi, Bharath

Thanks for your reviewing.

On Tue, 26 Jan 2021 at 12:55, Bharath Rupireddy 
 wrote:
> On Tue, Jan 26, 2021 at 9:25 AM japin  wrote:
>> > I think it's more convenient. Any thoughts?
>>
>> Sorry, I forgot to attach the patch.
>
> As I mentioned earlier in [1], +1 from my end to have the new syntax
> for adding/dropping publications from subscriptions i.e. ALTER
> SUBSCRIPTION ... ADD/DROP PUBLICATION. But I'm really not sure why
> that syntax was not added earlier. Are we missing something here?
>

Yeah, we should figure out why we do not support this syntax earlier.  It seems
ALTER SUBSCRIPTION is introduced in 665d1fad99e, however the commit do not
contains any discussion URL.

> I would like to hear opinions from other hackers.
>
> [1] - 
> https://www.postgresql.org/message-id/CALj2ACVGDNZDQk3wfv%3D3zYTg%3DqKUaEa5s1f%2B9_PYxN0QyAUdCw%40mail.gmail.com
>
> Some quick comments on the patch, although I have not taken a deeper look at 
> it:
>
> 1. IMO, it will be good if the patch is divided into say coding, test
> cases and documentation

Agreed.  I will refactor it in next patch.

> 2. Looks like AlterSubscription() is already having ~200 LOC, why
> can't we have separate functions for each ALTER_SUBSCRIPTION_ case
> or at least for the new code that's getting added for this patch?

I'm not sure it is necessary to provide a separate functions for each
ALTER_SUBSCRIPTION_XXX, so I just followed current style.

> 3. The new code added for ALTER_SUBSCRIPTION_ADD_PUBLICATION and
> ALTER_SUBSCRIPTION_DROP_PUBLICATION looks almost same maybe with
> little difference, so why can't we have single function
> (alter_subscription_add_or_drop_publication or
> hanlde_add_or_drop_publication or some better name?) and pass in a
> flag to differentiate the code that differs for both commands.

Agreed.

-- 
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.




RE: Parallel INSERT (INTO ... SELECT ...)

2021-01-25 Thread tsunakawa.ta...@fujitsu.com
From: Hou, Zhijie 
> IMO, max_parallel_hazard() only check the parent table's default expressions,
> But if the table has partitions and its partition have its own default 
> expressions,
> max_parallel_hazard() seems does not check that.
> And we seems does not check that too.
> 
> I am not sure should we allow parallel insert for this case ?

I think we can allow parallel insert in this case, because the column value is 
determined according to the DEFAULT definition of the target table specified in 
the INSERT statement.  This is described here:

https://www.postgresql.org/docs/devel/sql-createtable.html

"Defaults may be specified separately for each partition. But note that a 
partition's default value is not applied when inserting a tuple through a 
partitioned table."

So the parallel-unsafe function should not be called.


Regards
Takayuki Tsunakawa



Re: Support ALTER SUBSCRIPTION ... ADD/DROP PUBLICATION ... syntax

2021-01-25 Thread Bharath Rupireddy
On Tue, Jan 26, 2021 at 9:25 AM japin  wrote:
> > I think it's more convenient. Any thoughts?
>
> Sorry, I forgot to attach the patch.

As I mentioned earlier in [1], +1 from my end to have the new syntax
for adding/dropping publications from subscriptions i.e. ALTER
SUBSCRIPTION ... ADD/DROP PUBLICATION. But I'm really not sure why
that syntax was not added earlier. Are we missing something here?

I would like to hear opinions from other hackers.

[1] - 
https://www.postgresql.org/message-id/CALj2ACVGDNZDQk3wfv%3D3zYTg%3DqKUaEa5s1f%2B9_PYxN0QyAUdCw%40mail.gmail.com

Some quick comments on the patch, although I have not taken a deeper look at it:

1. IMO, it will be good if the patch is divided into say coding, test
cases and documentation
2. Looks like AlterSubscription() is already having ~200 LOC, why
can't we have separate functions for each ALTER_SUBSCRIPTION_ case
or at least for the new code that's getting added for this patch?
3. The new code added for ALTER_SUBSCRIPTION_ADD_PUBLICATION and
ALTER_SUBSCRIPTION_DROP_PUBLICATION looks almost same maybe with
little difference, so why can't we have single function
(alter_subscription_add_or_drop_publication or
hanlde_add_or_drop_publication or some better name?) and pass in a
flag to differentiate the code that differs for both commands.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




Re: shared tempfile was not removed on statement_timeout

2021-01-25 Thread Kyotaro Horiguchi
At Wed, 28 Oct 2020 19:03:14 +0900, Masahiko Sawada 
 wrote in 
> On Wed, 29 Jul 2020 at 10:37, Justin Pryzby  wrote:
> >
> > On Mon, Jul 27, 2020 at 05:39:02AM -0500, Justin Pryzby wrote:
> > > On Mon, Jul 27, 2020 at 08:00:46PM +1200, Thomas Munro wrote:
> > > > Why can't tuplesort_end do it?
> > >
> > > Because then I think the parallel workers remove their own files, with 
> > > tests
> > > failing like:
> > >
> > > +ERROR:  could not open temporary file "0.0" from BufFile "0": No such 
> > > file or directory
> > >
> > > I look around a bit more and came up with this, which works, but I don't 
> > > know
> > > enough to say if it's right.
> >
> > I convinced myself this is right, since state->nParticipants==-1 for 
> > workers.
> > Only the leader should do the cleanup.
> >
> > Added here:
> > https://commitfest.postgresql.org/29/2657/

+   if (state->shared && state->shared->workersFinished == 
state->nParticipants)

Isn't it more straight forward to check "state->shared &&
state->worker == -1"?

> I've also investigated this issue. As Thomas mentioned before, this
> problem is not specific to parallel index creation. Shared temporary
> files could be left if the process is interrupted while deleting the
> file as a part of the work of detaching dsm segment.
> 
> To fix this issue, possible solutions would be:
> 
> 1. Like the current patch, we call SharedFileSetDeleteAll() before
> DestroyParallelContext() which calls dsm_detach() so that we can make
> sure to delete these files while not relying on on_dsm_detach
> callback. That way, even if the process is interrupted during that
> cleaning, it will clean these files again during transaction abort
> (AtEOXact_Parallel() calls dsm_detach()). OTOH a downside would be
> that we will end up setting a rule that we need to explicitly call
> SharedFileSetDeleteAll().

That seems to be common. We release lock explicitly but it is
automatically released on error. Of couse it is slightly different
that SharedFileSetOnDetach unconditionally removes the directory but
that doesn't matter as far as that behavior doesn't lead to an
error. We can skip, as Thomas suggested, the cleanup if not necessary.

Looking the comment of SharedFileSetOnDetach:

| * everything in them.  We can't raise an error on failures, because this runs
| * in error cleanup paths.

I feel that a function that shouldn't error-out also shouldn't be
cancellable. If that's the case, we omit the CHECK_FOR_INTERRUPTS() in
walkdir() when elevel is smaller than ERROR.

=
diff --git a/src/backend/storage/file/fd.c b/src/backend/storage/file/fd.c
index b58502837a..593c23553e 100644
--- a/src/backend/storage/file/fd.c
+++ b/src/backend/storage/file/fd.c
@@ -3374,7 +3374,9 @@ walkdir(const char *path,
{
charsubpath[MAXPGPATH * 2];
 
-   CHECK_FOR_INTERRUPTS();
+   /* omit interrupts while we shouldn't error-out */
+   if (elevel >= ERROR)
+   CHECK_FOR_INTERRUPTS();
 
if (strcmp(de->d_name, ".") == 0 ||
strcmp(de->d_name, "..") == 0)
=

> 2. We don't use on_dsm_detach callback to delete the shared file set.
> Instead, I wonder if we can delete them at the end of the transaction
> by using ResourceOwner mechanism, like we do for non-shared temporary
> files cleanup. This idea doesn't have the cons that idea #1 has. OTOH,
> the lifetime of the shared file set will change from the parallel
> context to the transaction, leading to keep many temporary files until
> the transaction end. Also, we would need to rework the handling shared
> file set.
> 
> 3. We use on_dsm_detach as well as on_proc_exit callback to delete the
> shared file set. It doesn't resolve the root cause but that way, even
> if the process didn’t delete it on destroying the parallel context, we
> can make sure to delete it on process exit.
> 
> I think #1 is suitable for back branches. For HEAD, I think #2 and #3
> would be better in terms of not setting an implicit rule. Thoughts?

As far as we allow dms_detach being canceled, the problem persists
anywhat other we do. So #2 and #3 seems a bit too much. It seems to me
that #1 + omitting CHECK_FOR_INTERRUPTS() is suitable for all
branches.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


Re: patch: reduce overhead of execution of CALL statement in no atomic mode from PL/pgSQL

2021-01-25 Thread Pavel Stehule
út 26. 1. 2021 v 4:33 odesílatel Tom Lane  napsal:

> Pavel Stehule  writes:
> > [ plpgsql-plan-cache-for-call-3.patch ]
>
> Pushed with some additional cleanup.
>

Thank you

Pavel


> It strikes me that we ought to get rid of SPI_execute_with_receiver
> (which has been added since v13) in favor of a "SPI_execute_extended"
> that shares the same options struct as SPI_execute_plan_extended.
> But I left that for tomorrow.
>
> regards, tom lane
>


Re: Add SQL function for SHA1

2021-01-25 Thread Bruce Momjian
On Mon, Jan 25, 2021 at 10:23:30PM -0600, Kenneth Marshall wrote:
> On Tue, Jan 26, 2021 at 01:06:29PM +0900, Michael Paquier wrote:
> > On Mon, Jan 25, 2021 at 10:42:25PM -0500, Sehrope Sarkuni wrote:
> > > +1 to adding a SHA1 SQL function. Even if it's deprecated, there's plenty
> > > of historical usage that I can see it being useful.
> > 
> > Let's wait for more opinions to see if we agree that this addition is
> > helpful or not.  Even if this is not added, I think that there is
> > still value in refactoring the code anyway for the SHA-2 functions.
> > 
> 
> +1 I know that it has been deprecated, but it can be very useful when
> working with data from pre-deprecation. :) It is annoying to have to
> resort to plperl or plpython because it is not available. The lack or
> orthogonality is painful.

Yes, I think having SHA1 makes sense --- there are probably still valid
uses for it.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee





Re: Add SQL function for SHA1

2021-01-25 Thread Kenneth Marshall
On Tue, Jan 26, 2021 at 01:06:29PM +0900, Michael Paquier wrote:
> On Mon, Jan 25, 2021 at 10:42:25PM -0500, Sehrope Sarkuni wrote:
> > +1 to adding a SHA1 SQL function. Even if it's deprecated, there's plenty
> > of historical usage that I can see it being useful.
> 
> Let's wait for more opinions to see if we agree that this addition is
> helpful or not.  Even if this is not added, I think that there is
> still value in refactoring the code anyway for the SHA-2 functions.
> 

+1 I know that it has been deprecated, but it can be very useful when
working with data from pre-deprecation. :) It is annoying to have to
resort to plperl or plpython because it is not available. The lack or
orthogonality is painful.

Regards,
Ken




RE: Parallel INSERT (INTO ... SELECT ...)

2021-01-25 Thread Hou, Zhijie
Hi,

I have an issue of the check about column default expressions.

+   if (command_type == CMD_INSERT)
+   {
+   /*
+* Column default expressions for columns in the target-list are
+* already being checked for parallel-safety in the
+* max_parallel_hazard() scan of the query tree in 
standard_planner().
+*/
+
+   tupdesc = RelationGetDescr(rel);
+   for (attnum = 0; attnum < tupdesc->natts; attnum++)


IMO, max_parallel_hazard() only check the parent table's default expressions, 
But if the table has partitions and its partition have its own default 
expressions, max_parallel_hazard() seems does not check that.
And we seems does not check that too.

I am not sure should we allow parallel insert for this case ?

Example:

-
set parallel_setup_cost=0;
set parallel_tuple_cost=0;
set min_parallel_table_scan_size=0;
set max_parallel_workers_per_gather=4;

create table origin(a int);
insert into origin values(generate_series(1,5000));

create or replace function bdefault_unsafe () returns int language plpgsql 
parallel unsafe as $$ begin
RETURN 5;
end $$;

create table parttable1 (a int, b name) partition by range (a); create table 
parttable1_1 partition of parttable1 for values from (0) to (5000); create 
table parttable1_2 partition of parttable1 for values from (5000) to (1);

alter table parttable1_1 ALTER COLUMN b SET DEFAULT bdefault_unsafe();

postgres=# explain insert into parttable1 select * from origin ;
   QUERY PLAN   

 Gather  (cost=0.00..41.92 rows=5865 width=0)
   Workers Planned: 3
   ->  Insert on parttable1  (cost=0.00..41.92 rows=0 width=0)
 ->  Parallel Seq Scan on origin  (cost=0.00..41.92 rows=1892 width=68)
(4 rows)

postgres=# explain insert into parttable1_1 select * from origin ;
QUERY PLAN 
---
 Insert on parttable1_1  (cost=0.00..1348.00 rows=0 width=0)
   ->  Seq Scan on origin  (cost=0.00..1348.00 rows=5000 width=68)
(2 rows)

-

Best regards,
houzj




Re: Extensions not dumped when --schema is used

2021-01-25 Thread Julien Rouhaud
On Mon, Jan 25, 2021 at 9:34 PM Guillaume Lelarge
 wrote:
>
> "Anytime soon" was a long long time ago, and I eventually completely forgot 
> this, sorry. As nobody worked on it yet, I took a shot at it. See attached 
> patch.

Great!

I didn't reviewed it thoroughly yet, but after a quick look it sounds
sensible.  I'd prefer to see some tests added, and it looks like a
test for plpgsql could be added quite easily.

> I don't know if I should add this right away in the commit fest app. If yes, 
> I guess it should go on the next commit fest (2021-03), right?

Correct, and please add it on the commit fest!




Re: Add SQL function for SHA1

2021-01-25 Thread Michael Paquier
On Mon, Jan 25, 2021 at 10:42:25PM -0500, Sehrope Sarkuni wrote:
> +1 to adding a SHA1 SQL function. Even if it's deprecated, there's plenty
> of historical usage that I can see it being useful.

Let's wait for more opinions to see if we agree that this addition is
helpful or not.  Even if this is not added, I think that there is
still value in refactoring the code anyway for the SHA-2 functions.

> Either way, the rest of the refactor can be improved a bit to perform a
> single palloc() and remove the memcpy(). Attached is a diff for
> cryptohashfuncs.c that does that by writing the digest final directly to
> the result. It also removes the digest length arg and determines it in the
> switch block. There's only one correct digest length for each type so
> there's no reason to give callers the option to give the wrong one.

Yeah, what you have here is better.

+  default:
+  elog(ERROR, "unsupported digest type %d", type);
Not using a default clause is the purpose here, as it would generate a
compilation warning if a value in the enum is forgotten.  Hence, if a
new option is added to pg_cryptohash_type in the future, people won't
miss that they could add a SQL function for the new option.  If we
decide that MD5 and SHA1 have no need to use this code path, I'd
rather just use elog(ERROR) instead.
--
Michael


signature.asc
Description: PGP signature


Re: Identify missing publications from publisher while create/alter subscription.

2021-01-25 Thread japin


On Mon, 25 Jan 2021 at 21:55, Bharath Rupireddy 
 wrote:
> On Mon, Jan 25, 2021 at 5:18 PM japin  wrote:
>> > Do you mean when we drop publications from a subscription? If yes, do
>> > we have a way to drop a publication from the subscription? See below
>> > one of my earlier questions on this.
>> > "I wonder, why isn't dropping a publication from a list of
>> > publications that are with subscription is not allowed?"
>> > At least, I see no ALTER SUBSCRIPTION ... DROP PUBLICATION mypub1 or
>> > something similar?
>> >
>>
>> Why we do not support ALTER SUBSCRIPTION...ADD/DROP PUBLICATION?  When we
>> have multiple publications in subscription, but I want to add/drop a single
>> publication, it is conveient.  The ALTER SUBSCRIPTION...SET PUBLICATION...
>> should supply the completely publications.
>
> Looks like the way to drop/add publication from the list of
> publications in subscription requires users to specify all the list of
> publications currently exists +/- the new publication that needs to be
> added/dropped:
>
> CREATE SUBSCRIPTION mysub1 CONNECTION 'host=localhost port=5432
> dbname=postgres' PUBLICATION mypub1, mypub2, mypu3, mypub4, mypub5;
> postgres=# select subpublications from pg_subscription;
>subpublications
> -
>  {mypub1,mypub2,mypu3,mypub4,mypub5}
> (1 row)
>
> Say, I want to drop mypub4:
>
> ALTER SUBSCRIPTION mysub1 SET PUBLICATION mypub1, mypub2, mypu3, mypub5;
> postgres=# select subpublications from pg_subscription;
>subpublications
> --
>  {mypub1,mypub2,mypu3,mypub5}
>
> Say, I want toa dd mypub4 and mypub6:
> ALTER SUBSCRIPTION mysub1 SET PUBLICATION mypub1, mypub2, mypu3,
> mypub5, mypub4, mypub6;
> postgres=# select subpublications from pg_subscription;
>   subpublications
> 
>  {mypub1,mypub2,mypu3,mypub5,mypub4,mypub6}
> (1 row)
>
> It will be good to have something like:
>
> ALTER SUBSCRIPTION mysub1 ADD PUBLICATION mypub1, mypub3; which will
> the publications to subscription if not added previously.
>
> ALTER SUBSCRIPTION mysub1 DROP PUBLICATION mypub1, mypub3; which will
> drop the publications from subscription if they exist in the
> subscription's list of publications.
>
> But I'm really not sure why the above syntax was not added earlier. We
> may be missing something here.
>
>> Sorry, this question is unrelated with this subject.
>
> Yes, IMO it can definitely be discussed in another thread. It will be
> good to get a separate opinion for this.
>

I started a new thread [1] for this, please have a look.

[1] - 
https://www.postgresql.org/message-id/meyp282mb166939d0d6c480b7fbe7effbb6...@meyp282mb1669.ausp282.prod.outlook.com

-- 
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.




Re: Support ALTER SUBSCRIPTION ... ADD/DROP PUBLICATION ... syntax

2021-01-25 Thread japin

On Tue, 26 Jan 2021 at 11:47, japin  wrote:
> Hi, hackers
>
> When I read the discussion in [1], I found that update subscription's 
> publications
> is complicated.
>
> For example, I have 5 publications in subscription.
>
> CREATE SUBSCRIPTION mysub1 CONNECTION 'host=localhost port=5432 
> dbname=postgres'
> PUBLICATION mypub1, mypub2, mypub3, mypub4, mypub5;
>
> If I want to drop "mypub4", we should use the following command:
>
> ALTER SUBSCRIPTION mysub1 SET PUBLICATION mypub1, mypub2, mypub3, mypub5;
>
> Also, if I want to add "mypub7" and "mypub8", it will use:
>
> ALTER SUBSCRIPTION mysub1 SET PUBLICATION mypub1, mypub2, mypub3, mypub5, 
> mypub7, mypub8;
>
> Attached implement ALTER SUBSCRIPTION ... ADD/DROP PUBLICATION ... syntax, 
> for the above
> two cases, we can use the following:
>
> ALTER SUBSCRIPTION mysub1 DROP PUBLICATION mypub4;
>
> ALTER SUBSCRIPTION mysub1 DROP PUBLICATION mypub7, mypub8;
>
> I think it's more convenient. Any thoughts?
>
> [1] - 
> https://www.postgresql.org/message-id/MEYP282MB16690CD5EC5319FC35B2F78AB6BD0%40MEYP282MB1669.AUSP282.PROD.OUTLOOK.COM

Sorry, I forgot to attach the patch.

-- 
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.

>From 4407334edd1a7276cbc0fab449912c879552971f Mon Sep 17 00:00:00 2001
From: Japin Li 
Date: Tue, 26 Jan 2021 11:51:49 +0800
Subject: [PATCH v1] Add ALTER SUBSCRIPTION...ADD/DROP PUBLICATION...

---
 doc/src/sgml/ref/alter_subscription.sgml |  68 ++
 src/backend/commands/subscriptioncmds.c  | 156 +++
 src/backend/parser/gram.y|  20 +++
 src/bin/psql/tab-complete.c  |   2 +-
 src/include/nodes/parsenodes.h   |   2 +
 5 files changed, 247 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/ref/alter_subscription.sgml b/doc/src/sgml/ref/alter_subscription.sgml
index db5e59f707..97c427e0f4 100644
--- a/doc/src/sgml/ref/alter_subscription.sgml
+++ b/doc/src/sgml/ref/alter_subscription.sgml
@@ -23,6 +23,8 @@ PostgreSQL documentation
 
 ALTER SUBSCRIPTION name CONNECTION 'conninfo'
 ALTER SUBSCRIPTION name SET PUBLICATION publication_name [, ...] [ WITH ( set_publication_option [= value] [, ... ] ) ]
+ALTER SUBSCRIPTION name ADD PUBLICATION publication_name [, ...] [ WITH ( set_publication_option [= value] [, ... ] ) ]
+ALTER SUBSCRIPTION name DROP PUBLICATION publication_name [, ...] [ WITH ( set_publication_option [= value] [, ... ] ) ]
 ALTER SUBSCRIPTION name REFRESH PUBLICATION [ WITH ( refresh_option [= value] [, ... ] ) ]
 ALTER SUBSCRIPTION name ENABLE
 ALTER SUBSCRIPTION name DISABLE
@@ -107,6 +109,72 @@ ALTER SUBSCRIPTION name RENAME TO <
 

 
+   
+ADD PUBLICATION publication_name
+
+ 
+  Add list of publications to subscription. See
+   for more information.
+  By default this command will also act like REFRESH
+  PUBLICATION.
+ 
+
+ 
+  set_publication_option specifies additional
+  options for this operation.  The supported options are:
+
+  
+   
+refresh (boolean)
+
+ 
+  When false, the command will not try to refresh table information.
+  REFRESH PUBLICATION should then be executed separately.
+  The default is true.
+ 
+
+   
+  
+
+  Additionally, refresh options as described
+  under REFRESH PUBLICATION may be specified.
+ 
+
+   
+
+   
+DROP PUBLICATION publication_name
+
+ 
+  Drop list of publications from subscription. See
+   for more information.
+  By default this command will also act like REFRESH
+  PUBLICATION.
+ 
+
+ 
+  set_publication_option specifies additional
+  options for this operation.  The supported options are:
+
+  
+   
+refresh (boolean)
+
+ 
+  When false, the command will not try to refresh table information.
+  REFRESH PUBLICATION should then be executed separately.
+  The default is true.
+ 
+
+   
+  
+
+  Additionally, refresh options as described
+  under REFRESH PUBLICATION may be specified.
+ 
+
+   
+

 REFRESH PUBLICATION
 
diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c
index 082f7855b8..c014a227e9 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -857,6 +857,162 @@ AlterSubscription(AlterSubscriptionStmt *stmt)
 break;
 			}
 
+		case ALTER_SUBSCRIPTION_ADD_PUBLICATION:
+			{
+int			i;
+int			noldsubpublications;
+bool		copy_data;
+bool		refresh;
+List	   *publications = NIL;
+Datum	   *oldsubpublications;
+ArrayType  *array;
+
+/* deconstruct the publications */
+heap_deform_tuple(tup, RelationGetDescr(rel), values, nulls);
+array = DatumGetArrayTypeP(values[Anum_pg_subscription_subpublications - 1]);
+deconstruct_array(array, 

Support ALTER SUBSCRIPTION ... ADD/DROP PUBLICATION ... syntax

2021-01-25 Thread japin


Hi, hackers

When I read the discussion in [1], I found that update subscription's 
publications
is complicated.

For example, I have 5 publications in subscription.

CREATE SUBSCRIPTION mysub1 CONNECTION 'host=localhost port=5432 
dbname=postgres'
PUBLICATION mypub1, mypub2, mypub3, mypub4, mypub5;

If I want to drop "mypub4", we should use the following command:

ALTER SUBSCRIPTION mysub1 SET PUBLICATION mypub1, mypub2, mypub3, mypub5;

Also, if I want to add "mypub7" and "mypub8", it will use:

ALTER SUBSCRIPTION mysub1 SET PUBLICATION mypub1, mypub2, mypub3, mypub5, 
mypub7, mypub8;

Attached implement ALTER SUBSCRIPTION ... ADD/DROP PUBLICATION ... syntax, for 
the above
two cases, we can use the following:

ALTER SUBSCRIPTION mysub1 DROP PUBLICATION mypub4;

ALTER SUBSCRIPTION mysub1 DROP PUBLICATION mypub7, mypub8;

I think it's more convenient. Any thoughts?

[1] - 
https://www.postgresql.org/message-id/MEYP282MB16690CD5EC5319FC35B2F78AB6BD0%40MEYP282MB1669.AUSP282.PROD.OUTLOOK.COM

-- 
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.




Re: Add SQL function for SHA1

2021-01-25 Thread Sehrope Sarkuni
+1 to adding a SHA1 SQL function. Even if it's deprecated, there's plenty
of historical usage that I can see it being useful.

Either way, the rest of the refactor can be improved a bit to perform a
single palloc() and remove the memcpy(). Attached is a diff for
cryptohashfuncs.c that does that by writing the digest final directly to
the result. It also removes the digest length arg and determines it in the
switch block. There's only one correct digest length for each type so
there's no reason to give callers the option to give the wrong one.

Regards,
-- Sehrope Sarkuni
Founder & CEO | JackDB, Inc. | https://www.jackdb.com/
diff --git a/src/backend/utils/adt/cryptohashfuncs.c b/src/backend/utils/adt/cryptohashfuncs.c
index d99485f4c6..8dc695e80c 100644
--- a/src/backend/utils/adt/cryptohashfuncs.c
+++ b/src/backend/utils/adt/cryptohashfuncs.c
@@ -15,6 +15,7 @@
 
 #include "common/cryptohash.h"
 #include "common/md5.h"
+#include "common/sha1.h"
 #include "common/sha2.h"
 #include "utils/builtins.h"
 
@@ -68,6 +69,77 @@ md5_bytea(PG_FUNCTION_ARGS)
 	PG_RETURN_TEXT_P(cstring_to_text(hexsum));
 }
 
+/*
+ * Internal routine to compute a cryptohash with the given bytea input.
+ */
+static inline bytea *
+cryptohash_internal(pg_cryptohash_type type, bytea *input)
+{
+	const uint8 *data;
+	const char  *typestr;
+	int			digest_len;
+	size_t		len;
+	pg_cryptohash_ctx *ctx;
+	bytea	   *result;
+
+	switch (type)
+	{
+		case PG_MD5:
+			typestr = "MD5";
+			digest_len = MD5_DIGEST_LENGTH;
+			break;
+		case PG_SHA1:
+			typestr = "SHA1";
+			digest_len = SHA1_DIGEST_LENGTH;
+			break;
+		case PG_SHA224:
+			typestr = "SHA224";
+			digest_len = PG_SHA224_DIGEST_LENGTH;
+			break;
+		case PG_SHA256:
+			typestr = "SHA256";
+			digest_len = PG_SHA256_DIGEST_LENGTH;
+			break;
+		case PG_SHA384:
+			typestr = "SHA384";
+			digest_len = PG_SHA384_DIGEST_LENGTH;
+			break;
+		case PG_SHA512:
+			typestr = "SHA512";
+			digest_len = PG_SHA512_DIGEST_LENGTH;
+			break;
+		default:
+			elog(ERROR, "unsupported digest type %d", type);
+	}
+
+	result = palloc0(digest_len + VARHDRSZ);
+	len = VARSIZE_ANY_EXHDR(input);
+	data = (unsigned char *) VARDATA_ANY(input);
+
+	ctx = pg_cryptohash_create(type);
+	if (pg_cryptohash_init(ctx) < 0)
+		elog(ERROR, "could not initialize %s context", typestr);
+	if (pg_cryptohash_update(ctx, data, len) < 0)
+		elog(ERROR, "could not update %s context", typestr);
+	if (pg_cryptohash_final(ctx, (unsigned char *) VARDATA(result)) < 0)
+		elog(ERROR, "could not finalize %s context", typestr);
+	pg_cryptohash_free(ctx);
+
+	SET_VARSIZE(result, digest_len + VARHDRSZ);
+
+	return result;
+}
+
+/*
+ * SHA-1
+ */
+Datum
+sha1_bytea(PG_FUNCTION_ARGS)
+{
+	bytea   *result = cryptohash_internal(PG_SHA1, PG_GETARG_BYTEA_PP(0));
+
+	PG_RETURN_BYTEA_P(result);
+}
 
 /*
  * SHA-2 variants
@@ -76,28 +148,7 @@ md5_bytea(PG_FUNCTION_ARGS)
 Datum
 sha224_bytea(PG_FUNCTION_ARGS)
 {
-	bytea	   *in = PG_GETARG_BYTEA_PP(0);
-	const uint8 *data;
-	size_t		len;
-	pg_cryptohash_ctx *ctx;
-	unsigned char buf[PG_SHA224_DIGEST_LENGTH];
-	bytea	   *result;
-
-	len = VARSIZE_ANY_EXHDR(in);
-	data = (unsigned char *) VARDATA_ANY(in);
-
-	ctx = pg_cryptohash_create(PG_SHA224);
-	if (pg_cryptohash_init(ctx) < 0)
-		elog(ERROR, "could not initialize %s context", "SHA224");
-	if (pg_cryptohash_update(ctx, data, len) < 0)
-		elog(ERROR, "could not update %s context", "SHA224");
-	if (pg_cryptohash_final(ctx, buf) < 0)
-		elog(ERROR, "could not finalize %s context", "SHA224");
-	pg_cryptohash_free(ctx);
-
-	result = palloc(sizeof(buf) + VARHDRSZ);
-	SET_VARSIZE(result, sizeof(buf) + VARHDRSZ);
-	memcpy(VARDATA(result), buf, sizeof(buf));
+	bytea   *result = cryptohash_internal(PG_SHA224, PG_GETARG_BYTEA_PP(0));
 
 	PG_RETURN_BYTEA_P(result);
 }
@@ -105,28 +156,7 @@ sha224_bytea(PG_FUNCTION_ARGS)
 Datum
 sha256_bytea(PG_FUNCTION_ARGS)
 {
-	bytea	   *in = PG_GETARG_BYTEA_PP(0);
-	const uint8 *data;
-	size_t		len;
-	pg_cryptohash_ctx *ctx;
-	unsigned char buf[PG_SHA256_DIGEST_LENGTH];
-	bytea	   *result;
-
-	len = VARSIZE_ANY_EXHDR(in);
-	data = (unsigned char *) VARDATA_ANY(in);
-
-	ctx = pg_cryptohash_create(PG_SHA256);
-	if (pg_cryptohash_init(ctx) < 0)
-		elog(ERROR, "could not initialize %s context", "SHA256");
-	if (pg_cryptohash_update(ctx, data, len) < 0)
-		elog(ERROR, "could not update %s context", "SHA256");
-	if (pg_cryptohash_final(ctx, buf) < 0)
-		elog(ERROR, "could not finalize %s context", "SHA256");
-	pg_cryptohash_free(ctx);
-
-	result = palloc(sizeof(buf) + VARHDRSZ);
-	SET_VARSIZE(result, sizeof(buf) + VARHDRSZ);
-	memcpy(VARDATA(result), buf, sizeof(buf));
+	bytea   *result = cryptohash_internal(PG_SHA256, PG_GETARG_BYTEA_PP(0));
 
 	PG_RETURN_BYTEA_P(result);
 }
@@ -134,28 +164,7 @@ sha256_bytea(PG_FUNCTION_ARGS)
 Datum
 sha384_bytea(PG_FUNCTION_ARGS)
 {
-	bytea	   *in = PG_GETARG_BYTEA_PP(0);
-	const uint8 *data;
-	size_t		len;
-	pg_cryptohash_ctx *ctx;
-	unsigned char buf[PG_SHA384_DIGEST_LENGTH];
-	bytea	   

Re: patch: reduce overhead of execution of CALL statement in no atomic mode from PL/pgSQL

2021-01-25 Thread Tom Lane
Pavel Stehule  writes:
> [ plpgsql-plan-cache-for-call-3.patch ]

Pushed with some additional cleanup.

It strikes me that we ought to get rid of SPI_execute_with_receiver
(which has been added since v13) in favor of a "SPI_execute_extended"
that shares the same options struct as SPI_execute_plan_extended.
But I left that for tomorrow.

regards, tom lane




Re: Add SQL function for SHA1

2021-01-25 Thread Noah Misch
On Mon, Jan 25, 2021 at 10:12:28PM +0900, Michael Paquier wrote:
> SHA-1 is now an option available for cryptohashes, and like the
> existing set of functions of SHA-2, I don't really see a reason why we
> should not have a SQL function for SHA1.

NIST deprecated SHA1 over ten years ago.  It's too late to be adding this.




Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit

2021-01-25 Thread Bharath Rupireddy
On Tue, Jan 26, 2021 at 12:38 AM Fujii Masao
 wrote:
> > Attaching v17 patch set, please review it further.
>
> Thanks for updating the patch!
>
> Attached is the tweaked version of the patch. I didn't change any logic,
> but I updated some comments and docs. Also I added the regresssion test
> to check that postgres_fdw_disconnect() closes multiple connections.
> Barring any objection, I will commit this version.

Thanks. The patch LGTM, except few typos:
1) in the commit message "a warning messsage is emitted." it's
"message" not "messsage".
2) in the documentation "+   a user mapping, the correspoinding
connections are closed." it's "corresponding" not "correspoinding".

I will post "keep_connections" GUC and "keep_connection" server level
option patches later.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




Re: Tid scan improvements

2021-01-25 Thread Zhihong Yu
Hi,

bq. within this range.  Table AMs where scanning ranges of TIDs does not
make
sense or is difficult to implement efficiently may choose to not implement

Is there criterion on how to judge efficiency ?

+   if (tidopexpr->exprtype == TIDEXPR_LOWER_BOUND)
...
+   if (tidopexpr->exprtype == TIDEXPR_UPPER_BOUND)

The if statement for upper bound should be prefixed with 'else', right ?

+ * TidRecheck -- access method routine to recheck a tuple in EvalPlanQual
...
+TidRangeRecheck(TidRangeScanState *node, TupleTableSlot *slot)

The method name in the comment doesn't match the real method name.

+ * ExecTidRangeScan(node)
...
+ExecTidRangeScan(PlanState *pstate)

Parameter names don't match.

Cheers

On Mon, Jan 25, 2021 at 5:23 PM David Rowley  wrote:

> On Thu, 21 Jan 2021 at 18:16, David Rowley  wrote:
> > I've implemented this in the attached.
>
> The bug fix in 0001 is now committed, so I'm just attaching the 0002
> patch again after having rebased... This is mostly just to keep the
> CFbot happy.
>
> David
>


Re: fdatasync(2) on macOS

2021-01-25 Thread Thomas Munro
On Mon, Jan 18, 2021 at 5:08 PM Tom Lane  wrote:
> (1) other platforms weren't safe-by-default either.  Perhaps the
> state of the art is better now, though?

Generally the answer seems to be yes, but there are still some systems
out there that don't send flushes when volatile write cache is
enabled.  Probably still including Macs, by the admission of their man
page.  The numbers I saw would put a little M1 Air at the upper range
of super expensive server storage if they included or didn't need a
flush to survive power loss, but then that's a consumer device with a
battery so it doesn't really fit into the usual way we think about
database server storage and power loss...

> (2) we don't want to force exceedingly-expensive defaults on people
> who may be uninterested in reliable storage.  That seemed like a
> shaky argument then and it still does now.  Still, I see the point
> that suddenly degrading performance by orders of magnitude would
> be a PR disaster.

(Purely as a matter of curiosity, I wonder why the latency is so high
for F_FULLFSYNC.  Wild speculation: APFS is said to be a bit like ZFS,
but it's also said to avoid the data journaling of HFS+... so perhaps
it lacks an equivalent of ZFS's ZIL (a thing like WAL) that allows
synchronous writes to avoid having to flush out a new tree and uber
block (in ZFS lingo "spa_sync()").  It might be possible to see this
with tools like iosnoop (or the underlying io:::start dtrace probe),
if you overwrite a single block and then fcntl(F_FULLFSYNC).  Your 12
ops/sec on spinning rust would have to be explained by something like
that, and is significantly slower than the speeds I see on my spinning
rust ZFS system that manages something like disk rotation speed.)

Anyway, my purpose in this thread was to flag our usage of the
undocumented system call and open flags; that is, "how we talk to the
OS", not "how the OS talks to the disk".  That turned out to be
already well known and not as new as I first thought, so I'm not
planning to pursue this Mac stuff any further, despite my curiosity...




Re: Phrase search vs. multi-lexeme tokens

2021-01-25 Thread Neil Chen
Hi Alexander,
On Mon, Jan 25, 2021 at 11:25 PM Alexander Korotkov 
wrote:

>
> BTW, you mentioned you read the documentation.  Do you think it needs
> to be adjusted accordingly to the patch?
>
>
Yes, I checked section 8.11, section 9.13 and Chapter 12 of the document.
The change of this patch did not conflict with the document, because it was
not mentioned in the document at all. We can simply not modify it, or we
can supplement these situations.

-- 
There is no royal road to learning.
HighGo Software Co.


Re: Tid scan improvements

2021-01-25 Thread David Rowley
On Thu, 21 Jan 2021 at 18:16, David Rowley  wrote:
> I've implemented this in the attached.

The bug fix in 0001 is now committed, so I'm just attaching the 0002
patch again after having rebased... This is mostly just to keep the
CFbot happy.

David
From e459b522d0599602188fcb1cc9ee6062ac8a4aee Mon Sep 17 00:00:00 2001
From: "dgrow...@gmail.com" 
Date: Thu, 21 Jan 2021 16:48:15 +1300
Subject: [PATCH v12] Add TID Range Scans to support efficient scanning ranges
 of TIDs

This adds a new node type named TID Range Scan.  The query planner will
generate paths for TID Range scans when quals are discovered on base
relations which search for ranges of ctid.  These ranges may be open at
either end.

To support this, a new optional callback function has been added to table
AM which is named scan_getnextslot_inrange.  This function accepts a
minimum and maximum ItemPointer to allow efficient retrieval of tuples
within this range.  Table AMs where scanning ranges of TIDs does not make
sense or is difficult to implement efficiently may choose to not implement
this function.

Author: Edmund Horner and David Rowley
Discussion: 
https://postgr.es/m/CAMyN-kB-nFTkF=va_jpwfno08s0d-yk0f741s2b7ldmyai8...@mail.gmail.com
---
 src/backend/access/heap/heapam.c   | 132 +++
 src/backend/access/heap/heapam_handler.c   |   1 +
 src/backend/commands/explain.c |  23 ++
 src/backend/executor/Makefile  |   1 +
 src/backend/executor/execAmi.c |   6 +
 src/backend/executor/execProcnode.c|  10 +
 src/backend/executor/nodeTidrangescan.c| 409 +
 src/backend/nodes/copyfuncs.c  |  24 ++
 src/backend/nodes/outfuncs.c   |  13 +
 src/backend/optimizer/README   |   1 +
 src/backend/optimizer/path/costsize.c  |  95 +
 src/backend/optimizer/path/tidpath.c   | 117 +-
 src/backend/optimizer/plan/createplan.c|  98 +
 src/backend/optimizer/plan/setrefs.c   |  16 +
 src/backend/optimizer/plan/subselect.c |   6 +
 src/backend/optimizer/util/pathnode.c  |  29 ++
 src/backend/optimizer/util/plancat.c   |   4 +
 src/backend/optimizer/util/relnode.c   |   3 +
 src/backend/storage/page/itemptr.c |  58 +++
 src/include/access/heapam.h|   4 +
 src/include/access/tableam.h   |  50 +++
 src/include/catalog/pg_operator.dat|   6 +-
 src/include/executor/nodeTidrangescan.h|  23 ++
 src/include/nodes/execnodes.h  |  18 +
 src/include/nodes/nodes.h  |   3 +
 src/include/nodes/pathnodes.h  |  18 +
 src/include/nodes/plannodes.h  |  13 +
 src/include/optimizer/cost.h   |   3 +
 src/include/optimizer/pathnode.h   |   4 +
 src/include/storage/itemptr.h  |   2 +
 src/test/regress/expected/tidrangescan.out | 302 +++
 src/test/regress/parallel_schedule |   2 +-
 src/test/regress/sql/tidrangescan.sql  | 104 ++
 src/tools/pgindent/typedefs.list   |   5 +
 34 files changed, 1588 insertions(+), 15 deletions(-)
 create mode 100644 src/backend/executor/nodeTidrangescan.c
 create mode 100644 src/include/executor/nodeTidrangescan.h
 create mode 100644 src/test/regress/expected/tidrangescan.out
 create mode 100644 src/test/regress/sql/tidrangescan.sql

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 9926e2bd54..164a044715 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -1391,6 +1391,138 @@ heap_getnextslot(TableScanDesc sscan, ScanDirection 
direction, TupleTableSlot *s
return true;
 }
 
+bool
+heap_getnextslot_inrange(TableScanDesc sscan, ScanDirection direction,
+TupleTableSlot *slot, 
ItemPointer mintid,
+ItemPointer maxtid)
+{
+   HeapScanDesc scan = (HeapScanDesc) sscan;
+
+   if (!scan->rs_inited)
+   {
+   BlockNumber startBlk;
+   BlockNumber numBlks;
+   ItemPointerData highestItem;
+   ItemPointerData lowestItem;
+
+   /* A relation with zero blocks won't have any tuples */
+   if (scan->rs_nblocks == 0)
+   return false;
+
+   /*
+* Set up some ItemPointers which point to the first and last 
possible
+* tuples in the heap.
+*/
+   ItemPointerSet(, scan->rs_nblocks - 1, 
MaxOffsetNumber);
+   ItemPointerSet(, 0, FirstOffsetNumber);
+
+   /*
+* If the given maximum TID is below the highest possible TID 
in the
+* relation, then restrict the range to that, otherwise we scan 
to the
+* end of the relation.
+*/
+   if (ItemPointerCompare(maxtid, ) < 0)
+   ItemPointerCopy(maxtid, );
+
+  

Re: Key management with tests

2021-01-25 Thread Bruce Momjian
On Mon, Jan 25, 2021 at 08:12:01PM -0300, Álvaro Herrera wrote:
> In patch 1,
> 
> * The docs are not clear on what happens if --auth-prompt is not given
> but an auth prompt is required for the program to work.  Should it exit
> with a status other than 0?

Uh, I think the docs talk about this:

It can prompt from the terminal if
option>--authprompt is used.  In the parameter
value, %R is replaced by a file descriptor
number opened to the terminal that started the server.  A file
descriptor is only available if enabled at server start via
-R.  If %R is specified and
no file descriptor is available, the server will not start.

The code is:

case 'R':
{
char fd_str[20];

if (terminal_fd == -1)
{
ereport(ERROR,
(errcode(ERRCODE_INTERNAL_ERROR),
 errmsg("cluster key command referenced 
%%R, but --authprompt not specified")));
}

Does that help?

> * BootStrapKmgr claims it is called by initdb, but that doesn't seem to
> be the case.

Well, initdb starts the postmaster in --boot mode, and that calls
BootStrapKmgr().  Does that help?

> * Also, BootStrapKmgr is the only one that checks USE_OPENSSL; what if a
> with-openssl build inits the datadir, and then a non-openssl runs it?
> What if it's the other way around?  I think you'd get a failure in
> stat() ...

Wow, I never considered that.  I have added a check to InitializeKmgr().
Thanks.

> * ... oh, KMGR_DIR_PID is used but not defined anywhere.  Is it defined
> in some later commit?  If so, then I think you've chosen to split the
> patch series wrong.

OK, fixed.  It is in include/common/kmgr_utils.c, which was in #3.

> May I suggest to use "git format-patch" to produce the patch files?  When
> working with a series like this, trying to do patch handling manually
> like you seem to be doing, is much more time-consuming and error prone.
> For example, with a branch containing individual commits, you could use 
>   git rebase -i origin/master -x "make install check-world"
> or similar, so that each commit is built and tested individually.

I used "git format-patch".  Are you asking for seven commits that then
generate seven files via one format-patch run?  Or is the primary issue
that you want compile testing for each patch?

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee





Re: About to add WAL write/fsync statistics to pg_stat_wal view

2021-01-25 Thread David G. Johnston
On Mon, Jan 25, 2021 at 4:37 PM Masahiro Ikeda 
wrote:

>
> I agree with your comments. I think it should report when
> reaching the end of WAL too. I add the code to report the stats
> when finishing the current WAL segment file when timeout in the
> main loop and when reaching the end of WAL.
>
>
The following is not an improvement:

- /* Send WAL statistics to the stats collector. */
+ /* Send WAL statistics to stats collector */

The word "the" there makes it proper English.  Your copy-pasting should
have kept the existing good wording in the other locations rather than
replace the existing location with the newly added incorrect wording.

This doesn't make sense:

* current WAL segment file to avoid loading stats collector.

Maybe "overloading" or "overwhelming"?

I see you removed the pgstat_send_wal(force) change.  The rest of my
comments on the v6 patch still stand I believe.

David J.


Re: About to add WAL write/fsync statistics to pg_stat_wal view

2021-01-25 Thread David G. Johnston
On Mon, Jan 25, 2021 at 8:03 AM Masahiko Sawada 
wrote:

> On Mon, Jan 25, 2021 at 4:51 PM Masahiro Ikeda 
> wrote:
> >
> > Hi, thanks for the reviews.
> >
> > I updated the attached patch.
>
> Thank you for updating the patch!
>

Your original email with "total number of times" is more correct, removing
the "of times" and just writing "total number of WAL" is not good wording.

Specifically, this change is strictly worse than the original.

-   Number of times WAL data was written to disk because WAL buffers
became full
+   Total number of WAL data written to disk because WAL buffers became
full

Both have the flaw that they leave implied exactly what it means to "write
WAL to disk".  It is also unclear whether a counter, bytes, or both, would
be more useful here. I've incorporated this into my documentation
suggestions below:

(wal_buffers_full)
-- Revert - the original was better, though maybe add more detail similar
to the below.  I didn't research exactly how this works.

(wal_write)
The number of times WAL buffers were written out to disk via XLogWrite

-- Seems like this should have a bytes version too

(wal_write_time)
The amount of time spent writing WAL buffers to disk, excluding sync time
unless the wal_sync_method is either open_datasync or open_sync.
Units are in milliseconds with microsecond resolution.  This is zero when
track_wal_io_timing is disabled.

(wal_sync)
The number of times WAL files were synced to disk while wal_sync_method was
set to one of the "sync at commit" options (i.e., fdatasync, fsync,
or fsync_writethrough).

-- it is not going to be zero just because those settings are presently
disabled as they could have been enabled at some point since the last time
these statistics were reset.

(wal_sync_time)
The amount of time spent syncing WAL files to disk, in milliseconds with
microsecond resolution.  This requires setting wal_sync_method to one of
the "sync at commit" options (i.e., fdatasync, fsync,
or fsync_writethrough).


Also,

I would suggest extracting the changes to postmaster/pgstat.c and
replication/walreceiver.c to a separate patch as you've fundamentally
changed how it behaves with regards to that function and how it interacts
with the WAL receiver.  That seems an entirely separate topic warranting
its own patch and discussion.

David J.


Re: About to add WAL write/fsync statistics to pg_stat_wal view

2021-01-25 Thread Masahiro Ikeda

On 2021-01-26 00:03, Masahiko Sawada wrote:
On Mon, Jan 25, 2021 at 4:51 PM Masahiro Ikeda 
 wrote:


Hi, thanks for the reviews.

I updated the attached patch.


Thank you for updating the patch!


The summary of the changes is following.

1. fix document

I followed another view's comments.


2. refactor issue_xlog_fsync()

I removed "sync_called" variables, narrowed the "duration" scope and
change the switch statement to if statement.


Looking at the code again, I think if we check if an fsync was really
called when calculating the I/O time, it's better to check that before
starting the measurement.

bool issue_fsync = false;

if (enableFsync &&
(sync_method == SYNC_METHOD_FSYNC ||
 sync_method == SYNC_METHOD_FSYNC_WRITETHROUGH ||
 sync_method == SYNC_METHOD_FDATASYNC))
{
if (track_wal_io_timing)
INSTR_TIME_SET_CURRENT(start);
issue_fsync = true;
}
(snip)
if (issue_fsync)
{
if (track_wal_io_timing)
{
instr_time  duration;

INSTR_TIME_SET_CURRENT(duration);
INSTR_TIME_SUBTRACT(duration, start);
WalStats.m_wal_sync_time = 
INSTR_TIME_GET_MICROSEC(duration);

}
WalStats.m_wal_sync++;
}

So I prefer either the above which is a modified version of the
original approach or my idea that doesn’t introduce a new local
variable I proposed before. But I'm not going to insist on that.


Thanks for the comments.
I change the code to the above.




3. make wal-receiver report WAL statistics

I add the code to collect the statistics for a written operation
in XLogWalRcvWrite() and to report stats in WalReceiverMain().

Since WalReceiverMain() can loop fast, to avoid loading stats 
collector,

I add "force" argument to the pgstat_send_wal function. If "force" is
false, it can skip reporting until at least 500 msec since it last
reported. WalReceiverMain() almost calls pgstat_send_wal() with 
"force"

as false.


 void
-pgstat_send_wal(void)
+pgstat_send_wal(bool force)
 {
/* We assume this initializes to zeroes */
static const PgStat_MsgWal all_zeroes;
+   static TimestampTz last_report = 0;

+   TimestampTz now;
WalUsagewalusage;

+   /*
+* Don't send a message unless it's been at least 
PGSTAT_STAT_INTERVAL

+* msec since we last sent one or specified "force".
+*/
+   now = GetCurrentTimestamp();
+   if (!force &&
+   !TimestampDifferenceExceeds(last_report, now, 
PGSTAT_STAT_INTERVAL))

+   return;
+
+   last_report = now;

Hmm, I don’t think it's good to use PGSTAT_STAT_INTERVAL for this
purpose since it is used as a minimum time for stats file updates. If
we want an interval, I think we should define another one Also, with
the patch, pgstat_send_wal() calls GetCurrentTimestamp() every time
even if track_wal_io_timing is off, which is not good. On the other
hand, I agree that your concern that the wal receiver should not send
the stats for whenever receiving wal records. So an idea could be to
send the wal stats when finishing the current WAL segment file and
when timeout in the main loop. That way we can guarantee that the wal
stats on a replica is updated at least every time finishing a WAL
segment file when actively receiving WAL records and every
NAPTIME_PER_CYCLE in other cases.


I agree with your comments. I think it should report when
reaching the end of WAL too. I add the code to report the stats
when finishing the current WAL segment file when timeout in the
main loop and when reaching the end of WAL.

Regards,
--
Masahiro Ikeda
NTT DATA CORPORATION--- v6-0001-Add-statistics-related-to-write-sync-wal-records.patch	2021-01-25 16:27:50.749429666 +0900
+++ v7-0001-Add-statistics-related-to-write-sync-wal-records.patch	2021-01-26 08:19:48.269760642 +0900
@@ -1,6 +1,6 @@
-From e9aad92097c5cff5565b67ce1a8ec6d7b4c8a4d9 Mon Sep 17 00:00:00 2001
+From 02f0888efeb09ae641d9ef905788d995d687c56f Mon Sep 17 00:00:00 2001
 From: Masahiro Ikeda 
-Date: Mon, 25 Jan 2021 16:26:04 +0900
+Date: Tue, 26 Jan 2021 08:18:37 +0900
 Subject: [PATCH] Add statistics related to write/sync wal records.
 
 This patch adds following statistics to pg_stat_wal view
@@ -22,24 +22,24 @@
 Reviewed-By: Japin Li, Hayato Kuroda, Masahiko Sawada
 Discussion: https://postgr.es/m/0509ad67b585a5b86a83d445dfa75...@oss.nttdata.com
 
-(This requires a catversion bump, as well as an update to PGSTAT_FILE_FORMAT_ID)
+(This requires a catversion bump, as well as an update to PGSTAT_FILE_FORMAT_ID
 ---
- doc/src/sgml/config.sgml  | 21 
- doc/src/sgml/monitoring.sgml  | 48 -
- src/backend/access/transam/xlog.c | 51 ++-
+ doc/src/sgml/config.sgml  | 21 +++
+ doc/src/sgml/monitoring.sgml  | 48 +++-
+ src/backend/access/transam/xlog.c | 56 ++-
  src/backend/catalog/system_views.sql  |  4 

Re: Key management with tests

2021-01-25 Thread Alvaro Herrera
In patch 1,

* The docs are not clear on what happens if --auth-prompt is not given
but an auth prompt is required for the program to work.  Should it exit
with a status other than 0?

* BootStrapKmgr claims it is called by initdb, but that doesn't seem to
be the case.

* Also, BootStrapKmgr is the only one that checks USE_OPENSSL; what if a
with-openssl build inits the datadir, and then a non-openssl runs it?
What if it's the other way around?  I think you'd get a failure in
stat() ...

* ... oh, KMGR_DIR_PID is used but not defined anywhere.  Is it defined
in some later commit?  If so, then I think you've chosen to split the
patch series wrong.


May I suggest to use "git format-patch" to produce the patch files?  When
working with a series like this, trying to do patch handling manually
like you seem to be doing, is much more time-consuming and error prone.
For example, with a branch containing individual commits, you could use 
  git rebase -i origin/master -x "make install check-world"
or similar, so that each commit is built and tested individually.

-- 
Álvaro Herrera   Valdivia, Chile
Al principio era UNIX, y UNIX habló y dijo: "Hello world\n".
No dijo "Hello New Jersey\n", ni "Hello USA\n".




Re: [Patch] ALTER SYSTEM READ ONLY

2021-01-25 Thread Robert Haas
On Thu, Jan 21, 2021 at 9:47 AM Amul Sul  wrote:
> It is not like that, let me explain. When a user backend requests to alter WAL
> prohibit state by using ASRO/ASRW DDL with the previous patch or calling
> pg_alter_wal_prohibit_state() then WAL prohibit state in shared memory will be
> set to the transition state i.e. going-read-only or going-read-write if it is
> not already.  If another backend trying to request the same alteration to the
> wal prohibit state then nothing going to be changed in shared memory but that
> backend needs to wait until the transition to the final wal prohibited state
> completes.  If a backend tries to request for the opposite state than the
> previous which is in progress then it will see an error as "system state
> transition to read only/write is already in progress".  At a time only one
> transition state can be set.

Hrm. Well, then that needs to be abundantly clear in the relevant comments.

> Now, I am a little bit concerned about the current function name. How about
> pg_set_wal_prohibit_state(bool) name or have two functions as
> pg_set_wal_prohibit_state(void) and pg_unset_wal_prohibit_state(void) or any
> other suggestions?

How about pg_prohibit_wal(true|false)?

> > It seems odd that ProcessWALProhibitStateChangeRequest() returns
> > without doing anything if !AmCheckpointerProcess(), rather than having
> > that be an Assert(). Why is it like that?
>
> Like AbsorbSyncRequests().

Well, that can be called not from the checkpointer, according to the
comments.  Specifically from the postmaster, I guess.  Again, comments
please.

> If you think that there will be panic when UpdateFullPageWrites() and/or
> XLogReportParameters() tries to write WAL since the shared memory state for 
> WAL
> prohibited is set then it is not like that.  For those functions, WAL write is
> explicitly enabled by calling LocalSetXLogInsertAllowed().
>
> I was under the impression that there won't be any problem if we allow the
> writing WAL to UpdateFullPageWrites() and XLogReportParameters().  It can be
> considered as an exception since it is fine that this WAL record is not 
> streamed
> to standby while graceful failover, I may be wrong though.

I don't think that's OK. I mean, the purpose of the feature is to
prohibit WAL. If it doesn't do that, I believe it will fail to satisfy
the principle of least surprise.

> I am not clear on this part. In the attached version I am sending SIGUSR1
> instead of SIGINT, which works for me.

OK.

> The attached version does not address all your comments, I'll continue my work
> on that.

Some thoughts on this version:

+/* Extract last two bits */
+#defineWALPROHIBIT_CURRENT_STATE(stateGeneration)  \
+   ((uint32)(stateGeneration) & ((uint32) ((1 << 2) - 1)))
+#defineWALPROHIBIT_NEXT_STATE(stateGeneration) \
+   WALPROHIBIT_CURRENT_STATE((stateGeneration + 1))

This is really confusing. First, the comment looks like it applies to
both based on how it is positioned, but that's clearly not true.
Second, the naming is really hard to understand. Third, there don't
seem to be comments explaining the theory of what is going on here.
Fourth, stateGeneration refers not to which generation of state we've
got here but to the combination of the state and the generation.
However, it's not clear that we ever really use the generation for
anything.

I think that the direction you went with this is somewhat different
from what I had in mind. That may be OK, but let me just explain the
difference. We both had in mind the idea that the low two bits of the
state would represent the current state and the upper bits would
represent the state generation. However, I wasn't necessarily
imagining that the only supported operation was making the combined
value go up by 1. For instance, I had thought that perhaps the effect
of trying to go read-only when we're in the middle of going read-write
would be to cancel the previous operation and start the new one. What
you have instead is that it errors out. So in your model a change
always has to finish before the next one can start, which in turn
means that the sequence is completely linear. In my idea the
state+generation might go from say 1 to 7, because trying to go
read-write would cancel the previous attempt to go read-only and
replace it with an attempt to go the other direction, and from 7 we
might go to to 9 if somebody now tries to go read-only again before
that finishes. In your model, there's never any sort of cancellation
of that kind, so you can only go 0->1->2->3->4->5->6->7->8->9 etc.

One disadvantage of the way you've got it from a user perspective is
that if I'm writing a tool, I might get an error telling me that the
state change I'm trying to make is already in progress, and then I
have to retry. With the other design, I might attempt a state change
and have it fail because the change can't be completed, but I won't
ever fail because I attempt a state change and it can't be started

Re: Snapbuild woes followup

2021-01-25 Thread Andres Freund
Hi,

On 2021-01-25 12:00:08 -0800, Andres Freund wrote:
> > >   /*
> > >* For backward compatibility reasons this has to be stored in the 
> > > wrongly
> > >* named field.  Will be fixed in next major version.
> > >*/
> > >   return builder->was_running.was_xmax;
> >
> > We could fix that now... Andres, what did you have in mind for a proper
> > name?
> 
> next_phase_at seems like it'd do the trick?

See attached patch...
>From 5c7d735d71beae1f6e6616f1a160543fae3ac91b Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Mon, 25 Jan 2021 12:47:04 -0800
Subject: [PATCH] Remove backwards compat ugliness in snapbuild.c.

In 955a684e040 we fixed a bug in initial snapshot creation. In the
course of which several members of struct SnapBuild were obsoleted. As
SnapBuild is serialized to disk we couldn't change the memory layout.

Unfortunately I subsequently forgot about removing the backward compat
gunk, but luckily Heikki just reminded me.

This commit bumps SNAPBUILD_VERSION, therefore breaking existing
slots (which is fine in a major release).

Author: Andres Freund
Reminded-By: Heikki Linnakangas 
Discussion: https://postgr.es/m/c94be044-818f-15e3-1ad3-7a7ae2dfe...@iki.fi
---
 src/backend/replication/logical/snapbuild.c | 103 
 1 file changed, 18 insertions(+), 85 deletions(-)

diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index e903e561afc..752cf2d7dbc 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -189,24 +189,11 @@ struct SnapBuild
 	ReorderBuffer *reorder;
 
 	/*
-	 * Outdated: This struct isn't used for its original purpose anymore, but
-	 * can't be removed / changed in a minor version, because it's stored
-	 * on-disk.
+	 * TransactionId at which the next phase of initial snapshot building will
+	 * happen. InvalidTransactionId if not known (i.e. SNAPBUILD_START), or
+	 * when no next phase necessary (SNAPBUILD_CONSISTENT).
 	 */
-	struct
-	{
-		/*
-		 * NB: This field is misused, until a major version can break on-disk
-		 * compatibility. See SnapBuildNextPhaseAt() /
-		 * SnapBuildStartNextPhaseAt().
-		 */
-		TransactionId was_xmin;
-		TransactionId was_xmax;
-
-		size_t		was_xcnt;	/* number of used xip entries */
-		size_t		was_xcnt_space; /* allocated size of xip */
-		TransactionId *was_xip; /* running xacts array, xidComparator-sorted */
-	}			was_running;
+	TransactionId next_phase_at;
 
 	/*
 	 * Array of transactions which could have catalog changes that committed
@@ -272,34 +259,6 @@ static void SnapBuildWaitSnapshot(xl_running_xacts *running, TransactionId cutof
 static void SnapBuildSerialize(SnapBuild *builder, XLogRecPtr lsn);
 static bool SnapBuildRestore(SnapBuild *builder, XLogRecPtr lsn);
 
-/*
- * Return TransactionId after which the next phase of initial snapshot
- * building will happen.
- */
-static inline TransactionId
-SnapBuildNextPhaseAt(SnapBuild *builder)
-{
-	/*
-	 * For backward compatibility reasons this has to be stored in the wrongly
-	 * named field.  Will be fixed in next major version.
-	 */
-	return builder->was_running.was_xmax;
-}
-
-/*
- * Set TransactionId after which the next phase of initial snapshot building
- * will happen.
- */
-static inline void
-SnapBuildStartNextPhaseAt(SnapBuild *builder, TransactionId at)
-{
-	/*
-	 * For backward compatibility reasons this has to be stored in the wrongly
-	 * named field.  Will be fixed in next major version.
-	 */
-	builder->was_running.was_xmax = at;
-}
-
 /*
  * Allocate a new snapshot builder.
  *
@@ -728,7 +687,7 @@ SnapBuildProcessChange(SnapBuild *builder, TransactionId xid, XLogRecPtr lsn)
 	 * we got into the SNAPBUILD_FULL_SNAPSHOT state.
 	 */
 	if (builder->state < SNAPBUILD_CONSISTENT &&
-		TransactionIdPrecedes(xid, SnapBuildNextPhaseAt(builder)))
+		TransactionIdPrecedes(xid, builder->next_phase_at))
 		return false;
 
 	/*
@@ -945,7 +904,7 @@ SnapBuildCommitTxn(SnapBuild *builder, XLogRecPtr lsn, TransactionId xid,
 	 */
 	if (builder->state == SNAPBUILD_START ||
 		(builder->state == SNAPBUILD_BUILDING_SNAPSHOT &&
-		 TransactionIdPrecedes(xid, SnapBuildNextPhaseAt(builder
+		 TransactionIdPrecedes(xid, builder->next_phase_at)))
 	{
 		/* ensure that only commits after this are getting replayed */
 		if (builder->start_decoding_at <= lsn)
@@ -1267,7 +1226,7 @@ SnapBuildFindSnapshot(SnapBuild *builder, XLogRecPtr lsn, xl_running_xacts *runn
 		Assert(TransactionIdIsNormal(builder->xmax));
 
 		builder->state = SNAPBUILD_CONSISTENT;
-		SnapBuildStartNextPhaseAt(builder, InvalidTransactionId);
+		builder->next_phase_at = InvalidTransactionId;
 
 		ereport(LOG,
 (errmsg("logical decoding found consistent point at %X/%X",
@@ -1299,7 +1258,7 @@ SnapBuildFindSnapshot(SnapBuild *builder, XLogRecPtr lsn, xl_running_xacts *runn
 	else if (builder->state == SNAPBUILD_START)
 	{
 		builder->state = SNAPBUILD_BUILDING_SNAPSHOT;
-		

Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly

2021-01-25 Thread Alexey Kondratov

On 2021-01-25 11:07, Michael Paquier wrote:

On Fri, Jan 22, 2021 at 05:07:02PM +0300, Alexey Kondratov wrote:
I have updated patches accordingly and also simplified tablespaceOid 
checks
and assignment in the newly added SetRelTableSpace(). Result is 
attached as
two separate patches for an ease of review, but no objections to merge 
them

and apply at once if everything is fine.


 extern void SetRelationHasSubclass(Oid relationId, bool 
relhassubclass);

+extern bool SetRelTableSpace(Oid reloid, Oid tablespaceOid);
Seeing SetRelationHasSubclass(), wouldn't it be more consistent to use
SetRelationTableSpace() as routine name?

I think that we should document that the caller of this routine had
better do a CCI once done to make the tablespace chage visible.
Except for those two nits, the patch needs an indentation run and some
style tweaks but its logic looks fine.  So I'll apply that first
piece.



I updated comment with CCI info, did pgindent run and renamed new 
function to SetRelationTableSpace(). New patch is attached.



+INSERT INTO regress_tblspace_test_tbl (num1, num2, t)
+  SELECT round(random()*100), random(), repeat('text', 100)
+  FROM generate_series(1, 10) s(i);
Repeating 1M times a text value is too costly for such a test.  And as
even for empty tables there is one page created for toast indexes,
there is no need for that?



Yes, TOAST relation is created anyway. I just wanted to put some data 
into a TOAST index, so REINDEX did some meaningful work there, not only 
a new relfilenode creation. However you are right and this query 
increases tablespace tests execution for more for more than 2 times on 
my machine. I think that it is not really required.




This patch is introducing three new checks for system catalogs:
- don't use tablespace for mapped relations.
- don't use tablespace for system relations, except if
allowSystemTableMods.
- don't move non-shared relation to global tablespace.
For the non-concurrent case, all three checks are in reindex_index().
For the concurrent case, the two first checks are in
ReindexMultipleTables() and the third one is in
ReindexRelationConcurrently().  That's rather tricky to follow because
CONCURRENTLY is not allowed on system relations.  I am wondering if it
would be worth an extra comment effort, or if there is a way to
consolidate that better.



Yeah, all these checks we complicated from the beginning. I will try to 
find a better place tomorrow or put more info into the comments at 
least.


I am also going to check/fix the remaining points regarding 002 
tomorrow.



Regards
--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres CompanyFrom 39880842d7af31dcbfcffe7219250b31102955d5 Mon Sep 17 00:00:00 2001
From: Alexey Kondratov 
Date: Wed, 20 Jan 2021 20:21:12 +0300
Subject: [PATCH v5 1/2] Extract common part from ATExecSetTableSpaceNoStorage
 for a future usage

---
 src/backend/commands/tablecmds.c | 95 +++-
 src/include/commands/tablecmds.h |  2 +
 2 files changed, 58 insertions(+), 39 deletions(-)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 8687e9a97c..ec9c440e4e 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -13291,6 +13291,59 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
 	list_free(reltoastidxids);
 }
 
+/*
+ * SetRelationTableSpace - modify relation tablespace in the pg_class entry.
+ *
+ * 'reloid' is an Oid of relation to be modified.
+ * 'tablespaceOid' is an Oid of new tablespace.
+ *
+ * Catalog modification is done only if tablespaceOid is different from
+ * the currently set.  Returned bool value is indicating whether any changes
+ * were made or not.  Note that caller is responsible for doing
+ * CommandCounterIncrement() to make tablespace changes visible.
+ */
+bool
+SetRelationTableSpace(Oid reloid, Oid tablespaceOid)
+{
+	Relation	pg_class;
+	HeapTuple	tuple;
+	Form_pg_class rd_rel;
+	bool		changed = false;
+
+	/* Get a modifiable copy of the relation's pg_class row. */
+	pg_class = table_open(RelationRelationId, RowExclusiveLock);
+
+	tuple = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(reloid));
+	if (!HeapTupleIsValid(tuple))
+		elog(ERROR, "cache lookup failed for relation %u", reloid);
+	rd_rel = (Form_pg_class) GETSTRUCT(tuple);
+
+	/* MyDatabaseTableSpace is stored as InvalidOid. */
+	if (tablespaceOid == MyDatabaseTableSpace)
+		tablespaceOid = InvalidOid;
+
+	/* No work if no change in tablespace. */
+	if (tablespaceOid != rd_rel->reltablespace)
+	{
+		/* Update the pg_class row. */
+		rd_rel->reltablespace = tablespaceOid;
+		CatalogTupleUpdate(pg_class, >t_self, tuple);
+
+		/* Record dependency on tablespace. */
+		changeDependencyOnTablespace(RelationRelationId,
+	 reloid, rd_rel->reltablespace);
+
+		changed = true;
+	}
+
+	InvokeObjectPostAlterHook(RelationRelationId, reloid, 0);
+
+	heap_freetuple(tuple);
+	

Re: Snapbuild woes followup

2021-01-25 Thread Andres Freund
Hi,

Thomas, CCing you because of the condvar bit below.

On 2021-01-25 19:28:33 +0200, Heikki Linnakangas wrote:
> In SnapBuildFinalSnapshot(), we have this comment:
> > /*
> >  * c) transition from START to BUILDING_SNAPSHOT.
> >  *
> >  * In START state, and a xl_running_xacts record with running xacts is
> >  * encountered.  In that case, switch to BUILDING_SNAPSHOT state, and
> >  * record xl_running_xacts->nextXid.  Once all running xacts have 
> > finished
> >  * (i.e. they're all >= nextXid), we have a complete catalog snapshot.  
> > It
> >  * might look that we could use xl_running_xact's ->xids information to
> >  * get there quicker, but that is problematic because transactions 
> > marked
> >  * as running, might already have inserted their commit record - it's
> >  * infeasible to change that with locking.
> >  */
>
> This was added in commit 955a684e040, before that we did in fact use the
> xl_running_xacts list of XIDs, but it was buggy. Commit 955a684e040 fixed
> that by waiting for *two* xl_running_xacts, such that the second
> xl_running_xact doesn't contain any of the XIDs from the first one.

Not really just that, but we also just don't believe ->xids to be
consistent for visibility purposes...


> To fix the case mentioned in that comment, where a transaction listed in
> xl_running_xacts is in fact already committed or aborted, wouldn't it be
> more straightforward to check each XID, if they are in fact already
> committed or aborted? The CLOG is up-to-date here, I believe.

Well, we can't *just* go the clog since that will contain transactions
as committed/aborted even when not yet visible, due to still being in
the procarray. And I don't think it's easy to figure out how to
crosscheck between clog and procarray in this instance (since we don't
have the past procarray).  This is different in the recovery path because
there we know that changes to the procarray / knownassignedxids
machinery are only done by one backend.


> I bumped into this, after I noticed that read_local_xlog_page() has a pretty
> busy polling loop, with just 1 ms delay to keep it from hogging CPU.

Hm - but why is that really related to the initial snapshot building
logic? Logical decoding constantly waits for WAL outside of that too,
no?

ISTM that we should improve the situation substantially in a fairly easy
way. Like:

1) Improve ConditionVariableBroadcast() so it doesn't take the spinlock
   if there are no wakers - afaict that's pretty trivial.
2) Replace WalSndWakeup() with ConditionVariableBroadcast().
3) Replace places that need to wait for new WAL to be written with a
   call to function doing something like

XLogRecPtr flushed_to = GetAppropriateFlushRecPtr(); // works for both 
normal / recovery

if (flush_requirement <= flushed_to)
break;

ConditionVariablePrepareToSleep(>flush_progress_cv);

while (true)
{
flushed_to = GetAppropriateFlushRecPtr();

if (flush_requirement <= flushed_to)
break;

ConditionVariableSleep(>flush_progress_cv);
}

this should end up being more efficient than the current WalSndWakeup()
mechanism because we'll only wake up the processes that need to be woken
up, rather than checking/setting each walsenders latch.


> P.S. There's this in SnapBuildNextPhaseAt():
>
> > /*
> >  * For backward compatibility reasons this has to be stored in the 
> > wrongly
> >  * named field.  Will be fixed in next major version.
> >  */
> > return builder->was_running.was_xmax;
>
> We could fix that now... Andres, what did you have in mind for a proper
> name?

next_phase_at seems like it'd do the trick?


> P.P.S. Two thinkos in comments in snapbuild.c: s/write for/wait for/.

Will push a fix.

Greetings,

Andres Freund




Re: Fixing cache pollution in the Kerberos test suite

2021-01-25 Thread Tom Lane
Jacob Champion  writes:
> On Mon, 2021-01-25 at 14:36 -0500, Tom Lane wrote:
>> Also, it looks like the test causes /tmp/krb5cc_ to get
>> created or updated despite this setting.

> Huh. I wonder, if you run `klist -A` after running the tests, do you
> get anything more interesting?

"klist -A" prints nothing.

> I am seeing a few bugs on Red Hat's
> Bugzilla that center around strange KCM behavior [1]. But we're now
> well outside my area of competence.

Mine too.  But I verified that the /tmp file is no longer modified
with the adjusted script, so one way or the other this is better.

regards, tom lane




Re: Fixing cache pollution in the Kerberos test suite

2021-01-25 Thread Tom Lane
Jacob Champion  writes:
> On Mon, 2021-01-25 at 14:04 -0500, Tom Lane wrote:
>> True, but if it did try to access the cache, accessing the user's
>> normal cache would be strictly worse than accessing the test cache.

> That's fair. Attached is a v2 that just sets KRB5CCNAME globally. Makes
> for a much smaller patch :)

I tweaked this to make it look a bit more like the rest of the script,
and pushed it.  Thanks!

regards, tom lane




Re: Fixing cache pollution in the Kerberos test suite

2021-01-25 Thread Jacob Champion
On Mon, 2021-01-25 at 14:36 -0500, Tom Lane wrote:
> However, this doesn't seem to explain why the test script isn't
> causing a global state change.  Whether the state is held in a
> file or the sssd daemon shouldn't matter, it seems like.
> 
> Also, it looks like the test causes /tmp/krb5cc_ to get
> created or updated despite this setting.

Huh. I wonder, if you run `klist -A` after running the tests, do you
get anything more interesting? I am seeing a few bugs on Red Hat's
Bugzilla that center around strange KCM behavior [1]. But we're now
well outside my area of competence.

--Jacob

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1712875


Re: Fixing cache pollution in the Kerberos test suite

2021-01-25 Thread Tom Lane
I wrote:
> Jacob Champion  writes:
>> Interesting. I'm running Ubuntu 20.04:

> Hmm.  I'll poke harder.

Ah ... on both RHEL8 and Fedora 33, I find this:

--- snip ---
$ cat /etc/krb5.conf.d/kcm_default_ccache
# This file should normally be installed by your distribution into a
# directory that is included from the Kerberos configuration file 
(/etc/krb5.conf)
# On Fedora/RHEL/CentOS, this is /etc/krb5.conf.d/
#
# To enable the KCM credential cache enable the KCM socket and the service:
#   systemctl enable sssd-secrets.socket sssd-kcm.socket
#   systemctl start sssd-kcm.socket
#
# To disable the KCM credential cache, comment out the following lines.

[libdefaults]
default_ccache_name = KCM:
--- snip ---

Even more interesting, that service seems to be enabled by default
(I'm pretty darn sure I didn't ask for it...)

However, this doesn't seem to explain why the test script isn't
causing a global state change.  Whether the state is held in a
file or the sssd daemon shouldn't matter, it seems like.

Also, it looks like the test causes /tmp/krb5cc_ to get
created or updated despite this setting.  If I force klist
to look at that:

$ KRB5CCNAME=/tmp/krb5cc_1001 klist
Ticket cache: FILE:/tmp/krb5cc_1001
Default principal: te...@example.com

Valid starting ExpiresService principal
01/25/21 14:31:57  01/26/21 14:31:57  krbtgt/example@example.com
01/25/21 14:31:57  01/26/21 14:31:57  
postgres/auth-test-localhost.postgresql.example.com@
Ticket server: 
postgres/auth-test-localhost.postgresql.example@example.com

where the time corresponds to my having just run the test again.

So I'm still mightily confused, but it is clear that the test's
kinit is touching a file it shouldn't.

regards, tom lane




Re: [UNVERIFIED SENDER] Re: Minimal logical decoding on standbys

2021-01-25 Thread Fabrízio de Royes Mello
On Mon, Jan 18, 2021 at 8:48 AM Drouvot, Bertrand 
wrote:
>
> 3 and 4 were failing because the
ResolveRecoveryConflictWithLogicalSlots() call was missing in
ResolveRecoveryConflictWithSnapshot(): the new version attached adds it.
>
> The new version attached also provides a few changes to make it compiling
on the current master (it was not the case anymore).
>
> I also had to change 023_standby_logical_decoding_conflicts.pl (had to
call $node_standby->create_logical_slot_on_standby($node_master,
'otherslot', 'postgres'); at the very beginning of the "DROP DATABASE
should drops it's slots, including active slots" section)
>

Awesome and thanks a lot.

Seems your patch series is broken... can you please `git format-patch` and
send again?

Regards,

-- 
   Fabrízio de Royes Mello
   PostgreSQL Developer at OnGres Inc. - https://ongres.com


Re: Fixing cache pollution in the Kerberos test suite

2021-01-25 Thread Jacob Champion
On Mon, 2021-01-25 at 14:04 -0500, Tom Lane wrote:
> Jacob Champion  writes:
> > On Mon, 2021-01-25 at 13:49 -0500, Tom Lane wrote:
> > > Also, why are you only setting the ENV variable within narrow parts
> > > of the test script?  I'd be inclined to enforce it throughout.
> > I considered it and decided I didn't want to pollute the server's
> > environment with it, since the server shouldn't need the client cache.
> 
> True, but if it did try to access the cache, accessing the user's
> normal cache would be strictly worse than accessing the test cache.

That's fair. Attached is a v2 that just sets KRB5CCNAME globally. Makes
for a much smaller patch :)

--Jacob
From 86a7331868e6155488e568864c099caf1f21dffb Mon Sep 17 00:00:00 2001
From: Jacob Champion 
Date: Mon, 25 Jan 2021 09:32:44 -0800
Subject: [PATCH] test/kerberos: use a local credentials cache

Previously, the Kerberos test suite pushed credentials into the user's
default credentials cache. This modified any credentials the user
already had, and could cause other psql invocations to misbehave later,
as the GSS implementation attempted to use the globally cached test
credentials.

Use a local credentials cache at tmp_check/krb5cc instead. Clients can
be directed to use this cache via the KRB5CCNAME environment variable.
---
 src/test/kerberos/t/001_auth.pl | 4 
 1 file changed, 4 insertions(+)

diff --git a/src/test/kerberos/t/001_auth.pl b/src/test/kerberos/t/001_auth.pl
index 8625059149..044e58018f 100644
--- a/src/test/kerberos/t/001_auth.pl
+++ b/src/test/kerberos/t/001_auth.pl
@@ -79,6 +79,10 @@ my $dbname = 'postgres';
 my $username = 'test1';
 my $application = '001_auth.pl';
 
+# Avoid polluting the global credentials cache by creating our own and pointing
+# the clients to it. kinit and psql will use this implicitly.
+$ENV{KRB5CCNAME} = "${TestLib::tmp_check}/krb5cc";
+
 note "setting up Kerberos";
 
 my ($stdout, $krb5_version);
-- 
2.25.1



Re: pg_upgrade fails with non-standard ACL

2021-01-25 Thread Anastasia Lubennikova

On 24.01.2021 11:39, Noah Misch wrote:

On Thu, Jan 21, 2021 at 01:03:58AM +0300, Anastasia Lubennikova wrote:

On 03.01.2021 14:29, Noah Misch wrote:

Overall, this patch predicts a subset of cases where pg_dump will emit a
failing GRANT or REVOKE that targets a pg_catalog object.  Can you write a
code comment stating what it does and does not detect?  I think it's okay to
not predict every failure, but let's record the intended coverage.  Here are a
few examples I know; there may be more.  The above query only finds GRANTs to
non-pinned roles.  pg_dump dumps the following, but the above query doesn't
find them:

   REVOKE ALL ON FUNCTION pg_last_wal_replay_lsn FROM public;
   GRANT EXECUTE ON FUNCTION pg_reload_conf() TO pg_signal_backend;

I see a new comment listing object types.  Please also explain the lack of
preventing REVOKE failures (first example query above) and the limitation
around non-pinned roles (second example).



1) Could you please clarify, what do you mean by REVOKE failures?

I tried following example:

Start 9.6 cluster.

REVOKE ALL ON function pg_switch_xlog() FROM public;
REVOKE ALL ON function pg_switch_xlog() FROM backup;

The upgrade to the current master passes with and without patch.
It seems that current implementation of pg_upgrade doesn't take into 
account revoke ACLs.


2) As for pinned roles, I think we should fix this behavior, rather than 
adding a comment. Because such roles can have grants on system objects.


In earlier versions of the patch, I gathered ACLs directly from system 
catalogs: pg_proc.proacl, pg_class.relacl pg_attribute.attacl and 
pg_type.typacl.


The only downside of this approach is that it cannot be easily extended 
to other object types, as we need to handle every object type separately.
I don't think it is a big deal, as we already do it in 
check_for_changed_signatures()


And also the query to gather non-standard ACLs won't look as nice as 
now, because of the need to parse arrays of aclitems.


What do you think?

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company





Re: cleaning up a few CLOG-related things

2021-01-25 Thread Heikki Linnakangas

On 25/01/2021 18:56, Robert Haas wrote:

I attach a series of proposed patches to slightly improve some minor
things related to the CLOG code.

[patches 0001 - 0003]


Makes sense.


0004 - In StartupCLOG(), correct an off-by-one error. Currently, if
the nextXid is exactly a multiple of the number of CLOG entries that
fit on a page, then the value we compute for
XactCtl->shared->latest_page_number is higher than it should be by 1.
That's because nextXid represents the first XID that hasn't yet been
allocated, not the last one that gets allocated.


Yes.


So, the CLOG page gets created when nextXid advances from the first
value on the page to the second value on the page, not when it
advances from the last value on the previous page to the first value
on the current page.
Yes. It took me a moment to understand that explanation, though. I'd 
phrase it something like "nextXid is the next XID that will be used, but 
we want to set latest_page_number according to the last XID that's 
already been used. So retreat by one."


Having a separate FullTransactionIdToLatestPageNumber() function for 
this seems like overkill to me.



Some of the other SLRUs have similar issues as a result of
copy-and-paste work over the years. I plan to look at tidying that
stuff up, too. However, I wanted to post (and probably commit) these
patches first, partly to get some feedback, and also because all the
cases are a little different and I want to make sure to do a proper
analysis of each one.


Yeah, multixact seems similar at least.

- Heikki




Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit

2021-01-25 Thread Fujii Masao



On 2021/01/26 0:12, Bharath Rupireddy wrote:

On Mon, Jan 25, 2021 at 7:28 PM Bharath Rupireddy
 wrote:

I will provide the updated patch set soon.


Attaching v17 patch set, please review it further.


Thanks for updating the patch!

Attached is the tweaked version of the patch. I didn't change any logic,
but I updated some comments and docs. Also I added the regresssion test
to check that postgres_fdw_disconnect() closes multiple connections.
Barring any objection, I will commit this version.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
From fa3e0f0a700588ab644c4c752e06c03845450712 Mon Sep 17 00:00:00 2001
From: Fujii Masao 
Date: Tue, 26 Jan 2021 03:54:46 +0900
Subject: [PATCH] postgres_fdw: Add functions to discard cached connections.

This commit introduces two new functions postgres_fdw_disconnect()
and postgres_fdw_disconnect_all(). The former function discards
the cached connection to the specified foreign server. The latter discards
all the cached connections. If the connection is used in the current
transaction, it's not closed and a warning messsage is emitted.

For example, these functions are useful when users want to explicitly
close the foreign server connections that are no longer necessary and
then to prevent them from eating up the foreign servers connections
capacity.

Author: Bharath Rupireddy, tweaked a bit by Fujii Masao
Reviewed-by: Alexey Kondratov, Zhijie Hou, Zhihong Yu, Fujii Masao
Discussion: 
https://postgr.es/m/CALj2ACVvrp5=avp2pupem+nac8s4buqr3fjmmacoc7ftt0a...@mail.gmail.com
---
 contrib/postgres_fdw/connection.c | 135 +++-
 .../postgres_fdw/expected/postgres_fdw.out| 208 +-
 .../postgres_fdw/postgres_fdw--1.0--1.1.sql   |  10 +
 contrib/postgres_fdw/sql/postgres_fdw.sql |  98 -
 doc/src/sgml/postgres-fdw.sgml|  67 +-
 5 files changed, 505 insertions(+), 13 deletions(-)

diff --git a/contrib/postgres_fdw/connection.c 
b/contrib/postgres_fdw/connection.c
index a1404cb6bb..ee0b4acf0b 100644
--- a/contrib/postgres_fdw/connection.c
+++ b/contrib/postgres_fdw/connection.c
@@ -80,6 +80,8 @@ static bool xact_got_connection = false;
  * SQL functions
  */
 PG_FUNCTION_INFO_V1(postgres_fdw_get_connections);
+PG_FUNCTION_INFO_V1(postgres_fdw_disconnect);
+PG_FUNCTION_INFO_V1(postgres_fdw_disconnect_all);
 
 /* prototypes of private functions */
 static void make_new_connection(ConnCacheEntry *entry, UserMapping *user);
@@ -102,6 +104,7 @@ static bool pgfdw_exec_cleanup_query(PGconn *conn, const 
char *query,
 static bool pgfdw_get_cleanup_result(PGconn *conn, TimestampTz endtime,
 
PGresult **result);
 static bool UserMappingPasswordRequired(UserMapping *user);
+static bool disconnect_cached_connections(Oid serverid);
 
 /*
  * Get a PGconn which can be used to execute queries on the remote PostgreSQL
@@ -1428,8 +1431,8 @@ postgres_fdw_get_connections(PG_FUNCTION_ARGS)
 * Even though the server is dropped in the current 
transaction, the
 * cache can still have associated active connection entry, say 
we
 * call such connections dangling. Since we can not fetch the 
server
-* name from system catalogs for dangling connections, instead 
we
-* show NULL value for server name in output.
+* name from system catalogs for dangling connections, instead 
we show
+* NULL value for server name in output.
 *
 * We could have done better by storing the server name in the 
cache
 * entry instead of server oid so that it could be used in the 
output.
@@ -1447,7 +1450,7 @@ postgres_fdw_get_connections(PG_FUNCTION_ARGS)
/*
 * If the server has been dropped in the current 
explicit
 * transaction, then this entry would have been 
invalidated in
-* pgfdw_inval_callback at the end of drop sever 
command. Note
+* pgfdw_inval_callback at the end of drop server 
command. Note
 * that this connection would not have been closed in
 * pgfdw_inval_callback because it is still being used 
in the
 * current explicit transaction. So, assert that here.
@@ -1470,3 +1473,129 @@ postgres_fdw_get_connections(PG_FUNCTION_ARGS)
 
PG_RETURN_VOID();
 }
+
+/*
+ * Disconnect the specified cached connections.
+ *
+ * This function discards the open connections that are established by
+ * postgres_fdw from the local session to the foreign server with
+ * the given name. Note that there can be multiple connections to
+ * the given server using different user mappings. If the connections
+ * are used in the current local transaction, they are not 

Re: Fixing cache pollution in the Kerberos test suite

2021-01-25 Thread Tom Lane
Jacob Champion  writes:
> On Mon, 2021-01-25 at 13:49 -0500, Tom Lane wrote:
>> Yeah, changing global state is just awful.  However, I don't
>> actually see any change here (RHEL8):

> Interesting. I'm running Ubuntu 20.04:

Hmm.  I'll poke harder.

>> Also, why are you only setting the ENV variable within narrow parts
>> of the test script?  I'd be inclined to enforce it throughout.

> I considered it and decided I didn't want to pollute the server's
> environment with it, since the server shouldn't need the client cache.

True, but if it did try to access the cache, accessing the user's
normal cache would be strictly worse than accessing the test cache.

regards, tom lane




Re: Fixing cache pollution in the Kerberos test suite

2021-01-25 Thread Jacob Champion
On Mon, 2021-01-25 at 13:49 -0500, Tom Lane wrote:
> Yeah, changing global state is just awful.  However, I don't
> actually see any change here (RHEL8):

Interesting. I'm running Ubuntu 20.04:

$ klist
klist: No credentials cache found (filename: /tmp/krb5cc_1000)

$ make check
...

$ klist
Ticket cache: FILE:/tmp/krb5cc_1000
Default principal: te...@example.com

Valid starting   Expires  Service principal
... krbtgt/example@example.com
... postgres/auth-test-localhost.postgresql.example.com@
... postgres/auth-test-localhost.postgresql.example@example.com

I wonder if your use of a KCM cache type rather than FILE makes the
difference?

> Also, why are you only setting the ENV variable within narrow parts
> of the test script?  I'd be inclined to enforce it throughout.

I considered it and decided I didn't want to pollute the server's
environment with it, since the server shouldn't need the client cache.
But I think it'd be fine (and match the current situation) if it were
set once for the whole script, if you prefer.

--Jacob


Re: Fixing cache pollution in the Kerberos test suite

2021-01-25 Thread Tom Lane
Stephen Frost  writes:
> * Jacob Champion (pchamp...@vmware.com) wrote:
>> I was running tests with a GSS-enabled stack, and ran into some very
>> long psql timeouts after running the Kerberos test suite. It turns out
>> the suite pushes test credentials into the user's global cache, and
>> these no-longer-useful credentials persist after the suite has
>> finished. (You can see this in action by running the test/kerberos
>> suite and then running `klist`.) This leads to long hangs, I assume
>> while the GSS implementation tries to contact a KDC that no longer
>> exists.
>> Attached is a patch that initializes a local credentials cache inside
>> tmp_check/krb5cc, and tells psql to use it via the KRB5CCNAME envvar.
>> This prevents the global cache pollution. WDYT?

> Ah, yeah, that generally seems like a good idea.

Yeah, changing global state is just awful.  However, I don't
actually see any change here (RHEL8):

$ klist
klist: Credentials cache 'KCM:1001' not found
$ make check
...
Result: PASS
$ klist
klist: Credentials cache 'KCM:1001' not found

I suppose in an environment where someone was really using Kerberos,
the random kinit would be more of a problem.

Also, why are you only setting the ENV variable within narrow parts
of the test script?  I'd be inclined to enforce it throughout.

regards, tom lane




Re: Fixing cache pollution in the Kerberos test suite

2021-01-25 Thread Stephen Frost
Greetings,

* Jacob Champion (pchamp...@vmware.com) wrote:
> I was running tests with a GSS-enabled stack, and ran into some very
> long psql timeouts after running the Kerberos test suite. It turns out
> the suite pushes test credentials into the user's global cache, and
> these no-longer-useful credentials persist after the suite has
> finished. (You can see this in action by running the test/kerberos
> suite and then running `klist`.) This leads to long hangs, I assume
> while the GSS implementation tries to contact a KDC that no longer
> exists.
> Attached is a patch that initializes a local credentials cache inside
> tmp_check/krb5cc, and tells psql to use it via the KRB5CCNAME envvar.
> This prevents the global cache pollution. WDYT?

Ah, yeah, that generally seems like a good idea.

Thanks,

Stephen


signature.asc
Description: PGP signature


Fixing cache pollution in the Kerberos test suite

2021-01-25 Thread Jacob Champion
Hi all,

I was running tests with a GSS-enabled stack, and ran into some very
long psql timeouts after running the Kerberos test suite. It turns out
the suite pushes test credentials into the user's global cache, and
these no-longer-useful credentials persist after the suite has
finished. (You can see this in action by running the test/kerberos
suite and then running `klist`.) This leads to long hangs, I assume
while the GSS implementation tries to contact a KDC that no longer
exists.
Attached is a patch that initializes a local credentials cache inside
tmp_check/krb5cc, and tells psql to use it via the KRB5CCNAME envvar.
This prevents the global cache pollution. WDYT?

--Jacob
From c9532e72a762abeef0996ad8df5da9bbdedbccad Mon Sep 17 00:00:00 2001
From: Jacob Champion 
Date: Mon, 25 Jan 2021 09:32:44 -0800
Subject: [PATCH] test/kerberos: use a local credentials cache

Previously, the Kerberos test suite pushed credentials into the user's
default credentials cache. This modified any credentials the user
already had, and could cause other psql invocations to misbehave later,
as the GSS implementation attempted to use the globally cached test
credentials.

Use a local credentials cache at tmp_check/krb5cc instead. Clients can
be directed to use this cache via the KRB5CCNAME environment variable.
---
 src/test/kerberos/t/001_auth.pl | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/src/test/kerberos/t/001_auth.pl b/src/test/kerberos/t/001_auth.pl
index 8625059149..d329cb06e3 100644
--- a/src/test/kerberos/t/001_auth.pl
+++ b/src/test/kerberos/t/001_auth.pl
@@ -75,6 +75,10 @@ my $kdc_datadir = "${TestLib::tmp_check}/krb5kdc";
 my $kdc_pidfile = "${TestLib::tmp_check}/krb5kdc.pid";
 my $keytab  = "${TestLib::tmp_check}/krb5.keytab";
 
+# Avoid polluting the global credentials cache by creating our own and pointing
+# psql to it later using KRB5CCNAME.
+my $krb5_cache  = "${TestLib::tmp_check}/krb5cc";
+
 my $dbname = 'postgres';
 my $username = 'test1';
 my $application = '001_auth.pl';
@@ -185,6 +189,9 @@ sub test_access
 {
 	my ($node, $role, $query, $expected_res, $gssencmode, $test_name, $expect_log_msg) = @_;
 
+	# Use our local credentials cache.
+	local $ENV{KRB5CCNAME} = $krb5_cache;
+
 	# need to connect over TCP/IP for Kerberos
 	my ($res, $stdoutres, $stderrres) = $node->psql(
 		'postgres',
@@ -241,6 +248,9 @@ sub test_query
 {
 	my ($node, $role, $query, $expected, $gssencmode, $test_name) = @_;
 
+	# Use our local credentials cache.
+	local $ENV{KRB5CCNAME} = $krb5_cache;
+
 	# need to connect over TCP/IP for Kerberos
 	my ($res, $stdoutres, $stderrres) = $node->psql(
 		'postgres',
@@ -270,7 +280,7 @@ test_access($node, 'test1', 'SELECT true', 2, '',
 	'does not log unauthenticated principal',
 	"(test1:[unknown]) FATAL:  GSSAPI authentication failed");
 
-run_log [ $kinit, 'test1' ], \$test1_password or BAIL_OUT($?);
+run_log [ $kinit, '-c', $krb5_cache, 'test1' ], \$test1_password or BAIL_OUT($?);
 
 test_access($node, 'test1', 'SELECT true', 2, '', 'fails without mapping', '');
 
-- 
2.25.1



Re: The mysterious pg_proc.protrftypes

2021-01-25 Thread Tom Lane
"Joel Jacobson"  writes:
> Attached patch adds "(references pg_type.oid)" to the documentation for 
> pg_proc.protrftypes.

Agreed, pushed.  I also stumbled over a backend core dump while
testing it :-(.  So this whole area seems a bit spongy ...

regards, tom lane




Re: [Proposal] Global temporary tables

2021-01-25 Thread Pavel Stehule
Hi

út 22. 12. 2020 v 4:20 odesílatel wenjing  napsal:

> HI all
>
> I rebased patch, fix OID conflicts, fix some comments.
> Code updates to v41.
>

Please, can you do rebase?

Regards

Pavel


>
> Wenjing
>
>


Re: a verbose option for autovacuum

2021-01-25 Thread Tommy Li
Hi Stephen

> ... can set vacuum options on a table level which autovacuum should
respect,
> such as vacuum_index_cleanup and vacuum_truncate.  For skip locked,
> autovacuum already will automatically release it's attempt to acquire a
> lock if someone backs up behind it for too long.

This is good information, I wasn't aware that autovacuum respected those
settings.
In that case I'd like to focus specifically on the verbose aspect.

My first thought was a new boolean configuration called
"autovacuum_verbose".
I'd want it to behave similarly to autovacuum_vacuum_cost_limit in that it
can be
set globally or on a per-table basis.

Thoughts?


Tommy


Re: CheckpointLock needed in CreateCheckPoint()?

2021-01-25 Thread Robert Haas
On Mon, Jan 18, 2021 at 5:18 PM Tom Lane  wrote:
> Yeah.  Digging further, it looks like I oversimplified things above:
> we once launched special background-worker-like processes for checkpoints,
> and there could be more than one at a time.

Thanks. I updated the commit message to mention some of the relevant
commits, and then committed this to master.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Snapbuild woes followup

2021-01-25 Thread Heikki Linnakangas

In SnapBuildFinalSnapshot(), we have this comment:

/*
 * c) transition from START to BUILDING_SNAPSHOT.
 *
 * In START state, and a xl_running_xacts record with running xacts is
 * encountered.  In that case, switch to BUILDING_SNAPSHOT state, and
 * record xl_running_xacts->nextXid.  Once all running xacts have 
finished
 * (i.e. they're all >= nextXid), we have a complete catalog snapshot.  
It
 * might look that we could use xl_running_xact's ->xids information to
 * get there quicker, but that is problematic because transactions 
marked
 * as running, might already have inserted their commit record - it's
 * infeasible to change that with locking.
 */


This was added in commit 955a684e040, before that we did in fact use the 
xl_running_xacts list of XIDs, but it was buggy. Commit 955a684e040 
fixed that by waiting for *two* xl_running_xacts, such that the second 
xl_running_xact doesn't contain any of the XIDs from the first one.


To fix the case mentioned in that comment, where a transaction listed in 
xl_running_xacts is in fact already committed or aborted, wouldn't it be 
more straightforward to check each XID, if they are in fact already 
committed or aborted? The CLOG is up-to-date here, I believe.


I bumped into this, after I noticed that read_local_xlog_page() has a 
pretty busy polling loop, with just 1 ms delay to keep it from hogging 
CPU. I tried to find the call sites where we might get into that loop, 
and found that the snapshot building code to do that: the 
'delayed_startup' regression test in contrib/test_decoding exercises it. 
In a primary server, SnapBuildWaitSnapshot() inserts a new running-xacts 
record, and then read_local_xlog_page() will poll until that record has 
been flushed. We could add an explicit XLogFlush() there to avoid the 
wait. However, if I'm reading the code correctly, in a standby server, 
we can't write a new running-xacts record so we just wait for one that's 
created periodically by bgwriter in the primary. That can take several 
seconds. Or indefinitely, if the standby isn't connected to the primary 
at the moment. Would be nice to not poll.


- Heikki

P.S. There's this in SnapBuildNextPhaseAt():


/*
 * For backward compatibility reasons this has to be stored in the 
wrongly
 * named field.  Will be fixed in next major version.
 */
return builder->was_running.was_xmax;


We could fix that now... Andres, what did you have in mind for a proper 
name?


P.P.S. Two thinkos in comments in snapbuild.c: s/write for/wait for/.




Re: Extensibility of the PostgreSQL wire protocol

2021-01-25 Thread Jan Wieck
Hi Jonah,

On Mon, Jan 25, 2021 at 10:18 AM Jonah H. Harris 
wrote:

> On Mon, Jan 25, 2021 at 10:07 AM Jan Wieck  wrote:
>
>> The following is a request for discussion and comments, not a refined
>> proposal accompanied by a working patch.
>>
>
> After implementing this three different ways inside the backend over the
> years, I landed on almost this identical approach for handling the MySQL,
> TDS, MongoDB, and Oracle protocols for NEXTGRES.
>

Could any of that be open sourced? It would be an excellent addition to add
one of those as example code.


Regards, Jan



>
> Initially, each was implemented as an background worker extension which
> had to handle its own networking, passing the fd off to new
> protocol-specific connections, etc. This worked, but duplicate a good
> amount of logic. It would be great to have a standard, loadable, way to add
> support for a new protocol.
>
> --
> Jonah H. Harris
>
>

-- 
Jan Wieck


Re: ResourceOwner refactoring

2021-01-25 Thread Robert Haas
On Thu, Jan 21, 2021 at 5:14 AM Heikki Linnakangas  wrote:
> Here you can see that as numsnaps increases, the test becomes slower,
> but then it becomes faster again at 64-66, when it switches to the hash
> table. So 64 seems too much. 32 seems to be the sweet spot here, that's
> where scanning the hash and scanning the array are about the same speed.

That sounds good. I mean, it could be that not all hardware behaves
the same here. But trying to get it into the right ballpark makes
sense.

I also like the fact that this now has some cases where it wins by a
significant margin. That's pretty cool; thanks for working on it!

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Printing backtrace of postgres processes

2021-01-25 Thread Robert Haas
On Wed, Jan 20, 2021 at 9:24 PM Craig Ringer
 wrote:
> I know lots of this stuff can seem like a pointless sidetrack because the 
> utility of it is not obvious on dev systems or when you're doing your own 
> hands-on expert support on systems you own and operate yourself. These sorts 
> of things really only start to make sense when you're touching many different 
> postgres systems "in the wild" - such as commercial support services, helping 
> people on -general, -bugs or stackoverflow, etc.
>
> I really appreciate your help with it.

Big +1 for all that from me, too.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Identify missing publications from publisher while create/alter subscription.

2021-01-25 Thread vignesh C
On Mon, Jan 25, 2021 at 3:07 PM Dilip Kumar  wrote:
>
> On Mon, Jan 25, 2021 at 2:48 PM Bharath Rupireddy
>  wrote:
> >
> > On Mon, Jan 25, 2021 at 2:42 PM Dilip Kumar 
wrote:
> > >
> > > On Mon, Jan 25, 2021 at 1:10 PM vignesh C  wrote:
> > > >
> > > > On Thu, Jan 21, 2021 at 10:21 PM Bharath Rupireddy
> > > >  wrote:
> > > > >
> > > > > On Thu, Jan 21, 2021 at 6:56 PM vignesh C 
wrote:
> > > > > >
> > > > > > Hi,
> > > > > >
> > > > > > Creating/altering subscription is successful when we specify a
> > > > > > publication which does not exist in the publisher. I felt we
should
> > > > > > throw an error in this case, that will help the user to check
if there
> > > > > > is any typo in the create subscription command or to create the
> > > > > > publication before the subscription is created.
> > > > > > If the above analysis looks correct, then please find a patch
that
> > > > > > checks if the specified publications are present in the
publisher and
> > > > > > throws an error if any of the publications is missing in the
> > > > > > publisher.
> > > > > > Thoughts?
> > > > >
> > > > > I was having similar thoughts (while working on  the logical
> > > > > replication bug on alter publication...drop table behaviour) on
why
> > > > > create subscription succeeds without checking the publication
> > > > > existence. I checked in documentation, to find if there's a strong
> > > > > reason for that, but I couldn't. Maybe it's because of the
principle
> > > > > "first let users create subscriptions, later the publications can
be
> > > > > created on the publisher system", similar to this behaviour
> > > > > "publications can be created without any tables attached to it
> > > > > initially, later they can be added".
> > > > >
> > > > > Others may have better thoughts.
> > > > >
> > > > > If we do check publication existence for CREATE SUBSCRIPTION, I
think
> > > > > we should also do it for ALTER SUBSCRIPTION ... SET PUBLICATION.
> > > > >
> > > > > I wonder, why isn't dropping a publication from a list of
publications
> > > > > that are with subscription is not allowed?
> > > > >
> > > > > Some comments so far on the patch:
> > > > >
> > > > > 1) I see most of the code in the new function
check_publications() and
> > > > > existing fetch_table_list() is the same. Can we have a generic
> > > > > function, with maybe a flag to separate out the logic specific for
> > > > > checking publication and fetching table list from the publisher.
> > > >
> > > > I have made the common code between the check_publications and
> > > > fetch_table_list into a common function
> > > > get_appended_publications_query. I felt the rest of the code is
better
> > > > off kept as it is.
> > > > The Attached patch has the changes for the same and also the change
to
> > > > check publication exists during alter subscription set publication.
> > > > Thoughts?
> > > >
> > >
> > > So basically, the create subscription will throw an error if the
> > > publication does not exist.  So will you throw an error if we try to
> > > drop the publication which is subscribed by some subscription?  I mean
> > > basically, you are creating a dependency that if you are creating a
> > > subscription then there must be a publication that is not completely
> > > insane but then we will have to disallow dropping the publication as
> > > well.  Am I missing something?
> >
> > Do you mean DROP PUBLICATION non_existent_publication;?
> >
> > Or
> >
> > Do you mean when we drop publications from a subscription?
>
> I mean it doesn’t seem right to disallow to create the subscription if
> the publisher doesn't exist, and my reasoning was even though the
> publisher exists while creating the subscription you might drop it
> later right?.  So basically, now also we can create the same scenario
> that a subscription may exist for the publication which does not
> exist.
>

I would like to defer on documentation for this.
I feel we should have the behavior similar to publication tables as given
below, then it will be consistent and easier for the users:

This is the behavior in case of table:

*Step 1:*PUBLISHER SIDE:
create table t1(c1 int);
create table t2(c1 int);
CREATE PUBLICATION mypub1 for table t1,t2;
-- All above commands succeeds

*Step 2:*SUBSCRIBER SIDE:
-- Create subscription without creating tables will result in error:

*CREATE SUBSCRIPTION mysub1 CONNECTION 'dbname=source_rep host=localhost
user=vignesh port=5432' PUBLICATION mypub1;ERROR:  relation "public.t2"
does not exist*
create table t1(c1 int);
create table t2(c1 int);

CREATE SUBSCRIPTION mysub1 CONNECTION 'dbname=source_rep host=localhost
user=vignesh port=5432' PUBLICATION mypub1;

postgres=# select * from pg_subscription;
  oid  | subdbid | subname | subowner | subenabled | subbinary | substream
|   subconninfo   | subslotname |
subsynccommit | subpublications

Re: Identify missing publications from publisher while create/alter subscription.

2021-01-25 Thread vignesh C
On Mon, Jan 25, 2021 at 5:18 PM japin  wrote:
>
>
> On Mon, 25 Jan 2021 at 17:18, Bharath Rupireddy 
>  wrote:
> > On Mon, Jan 25, 2021 at 2:42 PM Dilip Kumar  wrote:
> >>
> >> On Mon, Jan 25, 2021 at 1:10 PM vignesh C  wrote:
> >> >
> >> > On Thu, Jan 21, 2021 at 10:21 PM Bharath Rupireddy
> >> >  wrote:
> >> > >
> >> > > On Thu, Jan 21, 2021 at 6:56 PM vignesh C  wrote:
> >> > > >
> >> > > > Hi,
> >> > > >
> >> > > > Creating/altering subscription is successful when we specify a
> >> > > > publication which does not exist in the publisher. I felt we should
> >> > > > throw an error in this case, that will help the user to check if 
> >> > > > there
> >> > > > is any typo in the create subscription command or to create the
> >> > > > publication before the subscription is created.
> >> > > > If the above analysis looks correct, then please find a patch that
> >> > > > checks if the specified publications are present in the publisher and
> >> > > > throws an error if any of the publications is missing in the
> >> > > > publisher.
> >> > > > Thoughts?
> >> > >
> >> > > I was having similar thoughts (while working on  the logical
> >> > > replication bug on alter publication...drop table behaviour) on why
> >> > > create subscription succeeds without checking the publication
> >> > > existence. I checked in documentation, to find if there's a strong
> >> > > reason for that, but I couldn't. Maybe it's because of the principle
> >> > > "first let users create subscriptions, later the publications can be
> >> > > created on the publisher system", similar to this behaviour
> >> > > "publications can be created without any tables attached to it
> >> > > initially, later they can be added".
> >> > >
> >> > > Others may have better thoughts.
> >> > >
> >> > > If we do check publication existence for CREATE SUBSCRIPTION, I think
> >> > > we should also do it for ALTER SUBSCRIPTION ... SET PUBLICATION.
> >> > >
> >> > > I wonder, why isn't dropping a publication from a list of publications
> >> > > that are with subscription is not allowed?
> >> > >
> >> > > Some comments so far on the patch:
> >> > >
> >> > > 1) I see most of the code in the new function check_publications() and
> >> > > existing fetch_table_list() is the same. Can we have a generic
> >> > > function, with maybe a flag to separate out the logic specific for
> >> > > checking publication and fetching table list from the publisher.
> >> >
> >> > I have made the common code between the check_publications and
> >> > fetch_table_list into a common function
> >> > get_appended_publications_query. I felt the rest of the code is better
> >> > off kept as it is.
> >> > The Attached patch has the changes for the same and also the change to
> >> > check publication exists during alter subscription set publication.
> >> > Thoughts?
> >> >
> >>
> >> So basically, the create subscription will throw an error if the
> >> publication does not exist.  So will you throw an error if we try to
> >> drop the publication which is subscribed by some subscription?  I mean
> >> basically, you are creating a dependency that if you are creating a
> >> subscription then there must be a publication that is not completely
> >> insane but then we will have to disallow dropping the publication as
> >> well.  Am I missing something?
> >
> > Do you mean DROP PUBLICATION non_existent_publication;?
> >
> > Or
> >
> > Do you mean when we drop publications from a subscription? If yes, do
> > we have a way to drop a publication from the subscription? See below
> > one of my earlier questions on this.
> > "I wonder, why isn't dropping a publication from a list of
> > publications that are with subscription is not allowed?"
> > At least, I see no ALTER SUBSCRIPTION ... DROP PUBLICATION mypub1 or
> > something similar?
> >
>
> Why we do not support ALTER SUBSCRIPTION...ADD/DROP PUBLICATION?  When we
> have multiple publications in subscription, but I want to add/drop a single
> publication, it is conveient.  The ALTER SUBSCRIPTION...SET PUBLICATION...
> should supply the completely publications.
>
> Sorry, this question is unrelated with this subject.

Please start a new thread for this, let's discuss this separately.

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com




Re: Is Recovery actually paused?

2021-01-25 Thread Robert Haas
On Sat, Jan 23, 2021 at 6:10 AM Bharath Rupireddy
 wrote:
> +1 to just show the recovery pause state in the output of
> pg_is_wal_replay_paused. But, should the function name
> "pg_is_wal_replay_paused" be something like
> "pg_get_wal_replay_pause_state" or some other? To me, when "is" exists
> in a function, I expect a boolean output. Others may have better
> thoughts.

Maybe we should leave the existing function pg_is_wal_replay_paused()
alone and add a new one with the name you suggest that returns text.
That would create less burden for tool authors.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




cleaning up a few CLOG-related things

2021-01-25 Thread Robert Haas
I attach a series of proposed patches to slightly improve some minor
things related to the CLOG code.

0001 - Always call StartupCLOG() just after initializing
ShmemVariableCache. Right now, the hot_standby=off code path does this
at end of recovery, and the hot_standby=on code path does it at the
beginning of recovery. It's better to do this in only one place
because (a) it's simpler, (b) StartupCLOG() is trivial so trying to
postpone it in the hot_standby=off case has no value, and (c) it
allows for 0002 and therefore 0003, which make things even simpler.

0002 - In clog_redo(), don't set XactCtl->shared->latest_page_number.
The value that is being set here is actually the oldest page we're not
truncating, not the newest page that exists, so it's a bogus value
(except when we're truncating all but one page). The reason it's like
this now is to avoid having SimpleLruTruncate() see an uninitialized
value that might trip a sanity check, but after 0001 that won't be
possible, so we can just let the sanity check do its thing.

0003 - In TrimCLOG(), don't reset XactCtl->shared->latest_page_number.
After we stop doing 0002 we don't need to do this either, because the
only thing this can be doing for us is correcting the error introduced
by the code which 0002 removes. Relying on the results of replaying
(authoritative) CLOG/EXTEND records seems better than relying on our
(approximate) value of nextXid at end of recovery.

0004 - In StartupCLOG(), correct an off-by-one error. Currently, if
the nextXid is exactly a multiple of the number of CLOG entries that
fit on a page, then the value we compute for
XactCtl->shared->latest_page_number is higher than it should be by 1.
That's because nextXid represents the first XID that hasn't yet been
allocated, not the last one that gets allocated. So, the CLOG page
gets created when nextXid advances from the first value on the page to
the second value on the page, not when it advances from the last value
on the previous page to the first value on the current page.

Note that all of 0001-0003 result in a net removal of code. 0001 comes
out to more lines total because of the comment changes, but fewer
executable lines.

I don't plan to back-patch any of this because, AFAICS, an incorrect
value of XactCtl->shared->latest_page_number has no real consequences.
The SLRU code uses latest_page_number for just two purposes. First,
the latest page is never evicted; but that's just a question of
performance, not correctness, and the performance impact is small.
Second, the sanity check in SimpleLruTruncate() uses it. The present
code can make the value substantially inaccurate during recovery, but
only in a way that can make the sanity check pass rather than failing,
so it's not going to really bite anybody except perhaps if they have a
corrupted cluster where they would have liked the sanity check to
catch some problem. When not in recovery, the value can be off by at
most one. I am not sure whether there's a theoretical risk of this
making SimpleLruTruncate()'s sanity check fail when it should have
passed, but even if there is the chances must be extremely remote.

Some of the other SLRUs have similar issues as a result of
copy-and-paste work over the years. I plan to look at tidying that
stuff up, too. However, I wanted to post (and probably commit) these
patches first, partly to get some feedback, and also because all the
cases are a little different and I want to make sure to do a proper
analysis of each one.

Any review very much appreciated.

Thanks,

-- 
Robert Haas
EDB: http://www.enterprisedb.com


v1-0003-In-TrimCLOG-don-t-reset-XactCtl-shared-latest_pag.patch
Description: Binary data


v1-0002-In-clog_redo-don-t-set-XactCtl-shared-latest_page.patch
Description: Binary data


v1-0004-In-StartupCLOG-correct-an-off-by-one-error.patch
Description: Binary data


v1-0001-Move-StartupCLOG-calls-to-just-after-we-initializ.patch
Description: Binary data


Re: Error on failed COMMIT

2021-01-25 Thread Dave Cramer
Apologies, I should have checked again to make sure the patch applied.

This one does and passes tests.

Dave Cramer
www.postgres.rocks


On Mon, 25 Jan 2021 at 09:09, Dave Cramer  wrote:

> Rebased against head
>
> Here's my summary of the long thread above.
>
> This change is in keeping with the SQL spec.
>
> There is an argument (Tom) that says that this will annoy more people than
> it will please. I presume this is due to the fact that libpq behaviour will
> change.
>
> As the author of the JDBC driver, and I believe I speak for other driver
> (NPGSQL for one) authors as well that have implemented the protocol I would
> argue that the current behaviour is more annoying.
>
> We currently have to keep state and determine if COMMIT actually failed or
> it ROLLED BACK. There are a number of async drivers that would also benefit
> from not having to keep state in the session.
>
> Regards,
>
> Dave Cramer
> www.postgres.rocks
>
>
> On Tue, 10 Nov 2020 at 11:53, Dave Cramer 
> wrote:
>
>>
>>
>> On Mon, 9 Nov 2020 at 16:26, Dave Cramer 
>> wrote:
>>
>>>
>>>
>>> On Wed, 30 Sep 2020 at 18:14, Andrew Dunstan <
>>> andrew.duns...@2ndquadrant.com> wrote:
>>>

 On 8/4/20 12:19 PM, Dave Cramer wrote:
 > Attached is the rebased patch for consideration.
 >
 >


 It's a bit sad this has been hanging around so long without attention.


 The previous discussion seems to give the patch a clean bill of health
 for most/all of the native drivers. Are there any implications for libpq
 based drivers such as DBD::Pg and psycopg2? How about for ecpg?


 cheers


 andrew


 --
 Andrew Dunstanhttps://www.2ndQuadrant.com
 PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

>>>
>>>
>>> Attached is a rebased patch with fixes for the isolation tests
>>>
>>>
>>
>>>
>>> Dave Cramer
>>> www.postgres.rocks
>>>
>>


0001-Throw-error-and-rollback-on-a-failed-transaction-ins.patch
Description: Binary data


Re: New IndexAM API controlling index vacuum strategies

2021-01-25 Thread Zhihong Yu
Hi,
bq. We can mention in the commit log that since the commit changes
MaxHeapTuplesPerPage the encoding in gin posting list is also changed.

Yes - this is fine.

Thanks

On Mon, Jan 25, 2021 at 12:28 AM Masahiko Sawada 
wrote:

> (Please avoid top-posting on the mailing lists[1]: top-posting breaks
> the logic of a thread.)
>
> On Tue, Jan 19, 2021 at 12:02 AM Zhihong Yu  wrote:
> >
> > Hi, Masahiko-san:
>
> Thank you for reviewing the patch!
>
> >
> > For v2-0001-Introduce-IndexAM-API-for-choosing-index-vacuum-s.patch :
> >
> > For blvacuumstrategy():
> >
> > +   if (params->index_cleanup == VACOPT_TERNARY_DISABLED)
> > +   return INDEX_VACUUM_STRATEGY_NONE;
> > +   else
> > +   return INDEX_VACUUM_STRATEGY_BULKDELETE;
> >
> > The 'else' can be omitted.
>
> Yes, but I'd prefer to leave it as it is because it's more readable
> without any performance side effect that we return BULKDELETE if index
> cleanup is enabled.
>
> >
> > Similar comment for ginvacuumstrategy().
> >
> > For v2-0002-Choose-index-vacuum-strategy-based-on-amvacuumstr.patch :
> >
> > If index_cleanup option is specified neither VACUUM command nor
> > storage option
> >
> > I think this is what you meant (by not using passive voice):
> >
> > If index_cleanup option specifies neither VACUUM command nor
> > storage option,
> >
> > - * integer, but you can't fit that many items on a page. 11 ought to be
> more
> > + * integer, but you can't fit that many items on a page. 13 ought to be
> more
> >
> > It would be nice to add a note why the number of bits is increased.
>
> I think that it might be better to mention such update history in the
> commit log rather than in the source code. Because most readers are
> likely to be interested in why 12 bits for offset number is enough,
> rather than why this value has been increased. In the source code
> comment, we describe why 12 bits for offset number is enough. We can
> mention in the commit log that since the commit changes
> MaxHeapTuplesPerPage the encoding in gin posting list is also changed.
> What do you think?
>
> > For choose_vacuum_strategy():
> >
> > +   IndexVacuumStrategy ivstrat;
> >
> > The variable is only used inside the loop. You can use
> vacrelstats->ivstrategies[i] directly and omit the variable.
>
> Fixed.
>
> >
> > +   int ndeaditems_limit = (int) ((freespace / sizeof(ItemIdData)) *
> 0.7);
> >
> > How was the factor of 0.7 determined ? Comment below only mentioned
> 'safety factor' but not how it was chosen.
> > I also wonder if this factor should be exposed as GUC.
>
> Fixed.
>
> >
> > +   if (nworkers > 0)
> > +   nworkers--;
> >
> > Should log / assert be added when nworkers is <= 0 ?
>
> Hmm I don't think so. As far as I read the code, there is no
> possibility that nworkers can be lower than 0 (we always increment it)
> and actually, the code works fine even if nworkers is a negative
> value.
>
> >
> > + * XXX: allowing to fill the heap page with only line pointer seems a
> overkill.
> >
> > 'a overkill' -> 'an overkill'
> >
>
> Fixed.
>
> The above comments are incorporated into the latest patch I just posted[2].
>
> [1] https://en.wikipedia.org/wiki/Posting_style#Top-posting
> [2]
> https://www.postgresql.org/message-id/CAD21AoCS94vK1fs-_%3DR5J3tp2DsZPq9zdcUu2pk6fbr7BS7quA%40mail.gmail.com
>
>
>
>
> --
> Masahiko Sawada
> EDB:  https://www.enterprisedb.com/
>


Re: Is Recovery actually paused?

2021-01-25 Thread Yugo NAGATA
On Mon, 25 Jan 2021 14:53:18 +0530
Dilip Kumar  wrote:
 
> I have changed as per other functions for consistency.

Thank you for updating the patch. Here are a few comments:


(1)

-   SetRecoveryPause(true);
+   SetRecoveryPause(RECOVERY_PAUSE_REQUESTED);
 
ereport(LOG
(errmsg("recovery has paused"),
 errdetail("If recovery is unpaused, 
the server will shut down."),
 errhint("You can then restart the 
server after making the necessary configuration changes.")));
 
-   while (RecoveryIsPaused())
+   while (GetRecoveryPauseState() != RECOVERY_NOT_PAUSED)
{
HandleStartupProcInterrupts();

This fix would be required for code added by the following commit. 
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=15251c0a60be76eedee74ac0e94b433f9acca5af

Due to this, the recovery could get paused after the configuration
change in the primary. However, after applying this patch,
pg_is_wal_replay_paused  returns "pause requested" although it should
return "paused".

To fix this, we must pass RECOVERY_PAUSED to SetRecoveryPause() instead
of RECOVERY_PAUSE_REQUESTED. Or, we can call CheckAndSetRecoveryPause()
in the loop like recoveryPausesHere(), but this seems redundant.


(2)
-   while (RecoveryIsPaused())
+   while (GetRecoveryPauseState() != RECOVERY_NOT_PAUSED)
{
+
HandleStartupProcInterrupts();

Though it is trivial, I think the new line after the brace is unnecessary.


Regards,
Yugo Nagata

-- 
Yugo NAGATA 




Re: mkid reference

2021-01-25 Thread Bruce Momjian
On Sun, Jan 24, 2021 at 02:20:58PM +0100, Magnus Hagander wrote:
> > I found man pages for mkid online --- it's apparently a ctags-like
> > code indexing tool, not something for patches.  So maybe Bruce still
> > uses it, or maybe not.  But as long as we've also got make_ctags and
> > make_etags in there, I don't have a problem with leaving make_mkid.
> >
> > make_diff, on the other hand, certainly looks like technology whose
> > time has passed.  I wonder about pgtest, too.
> 
> I'll go kill make_diff then -- quicker than fixing the docs of it.
> 
> As for pgtest, that one looks a bit interesting as well -- but it's
> been patched on as late as 9.5 and in 2018, so it seems at least Bruce
> uses it :)

Yes, that is how I noticed the ecpg/preproc.y warning this past weekend.

> While at it, what point is "codelines" adding?

That is the script I use to generate code line counts when comparing
releases.  I thought it should be in the tree so others can reproduce my
numbers.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee





Re: mkid reference

2021-01-25 Thread Bruce Momjian
On Fri, Jan 22, 2021 at 01:07:36PM -0500, Tom Lane wrote:
> > There's also src/tools/make_mkid which use this mkid tool.  +1 for removing.
> > If anything, it seems better replaced by extended documentation on the 
> > existing
> > wiki article [0] on how to use "git format-patch".
> 
> I found man pages for mkid online --- it's apparently a ctags-like
> code indexing tool, not something for patches.  So maybe Bruce still
> uses it, or maybe not.  But as long as we've also got make_ctags and

Yes, I do still use it, so I thought having a script to generate its
index files might be helpful to someone.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee





Re: Jsonpath ** vs lax mode

2021-01-25 Thread Alexander Korotkov
On Thu, Jan 21, 2021 at 4:35 PM Alvaro Herrera  wrote:
> On 2021-Jan-21, Alexander Korotkov wrote:
>
> > Requiring strict mode for ** is a solution, but probably too restrictive...
> >
> > What do you think about making just subsequent accessor after ** not
> > to unwrap arrays.  That would be a bit tricky to implement, but
> > probably that would better satisfy the user needs.
>
> Hmm, why is it too restrictive?  If the user needs to further drill into
> the JSON, can't they chain json_path_query calls, specifying (or
> defaulting to) lax mode for the part doesn't include the ** expression?

For sure, there are some walkarounds.  But I don't think all the
lax-mode queries involving ** are affected.  So, it might happen that
we force users to use strict-mode or chain call even if it's not
necessary.  I'm tending to just fix the doc and wait if there are mode
complaints :)

--
Regards,
Alexander Korotkov




Re: Jsonpath ** vs lax mode

2021-01-25 Thread Alexander Korotkov
On Thu, Jan 21, 2021 at 12:38 PM Thomas Kellerer  wrote:
> Alexander Korotkov schrieb am 20.01.2021 um 18:13:
> > We have a bug report which says that jsonpath ** operator behaves strangely 
> > in the lax mode [1].
> That report was from me ;)
>
> Thanks for looking into it.
>
> > At first sight, we may just say that lax mode just sucks and
> > counter-intuitive results are expected.  But at the second sight, the
> > lax mode is used by default and current behavior may look too
> > surprising.
>
> I personally would be fine with the manual stating that the Postgres extension
> to the JSONPath processing that allows a recursive lookup using ** requires 
> strict
> mode to work properly.
>
> It should probably be documented in chapter 9.16.2 "The SQL/JSON Path 
> Language",
> maybe with a little warning in the description of jsonb_path_query** and in
> chapter 8.14.16 as well (or at least that's were I would expect such a 
> warning)

Thank you for reporting :)

Yeah, documenting the current behavior is something "must have".  If
even we find the appropriate behavior change, I don't think it would
be backpatchable.  But we need to backpatch the documentation for
sure.  So, let's start by fixing the docs.

--
Regards,
Alexander Korotkov




Re: Phrase search vs. multi-lexeme tokens

2021-01-25 Thread Alexander Korotkov
Hi, Neil!

On Mon, Jan 25, 2021 at 11:45 AM Neil Chen  wrote:
>
> The following review has been posted through the commitfest application:
> make installcheck-world:  tested, passed
> Implements feature:   tested, passed
> Spec compliant:   not tested
> Documentation:not tested
>
> Greetings,
>
> Although I am not an expert in this field, I carefully read the full-text 
> search section in the document. I think the change is surprising, but yes, it 
> is correct.
> I found that your patch didn't modify the regress/excepted/tsearch.out. So I 
> updated it and carried out the regression test. It passed. Also, I manually 
> executed some test cases, all of which were OK.

Thank you for looking into this.  Yes, I've adjusted tsearch.sql
regression tests to provide reasonable exercises for the new logic,
but forgot to add tsearch.out to the patch.

BTW, you mentioned you read the documentation.  Do you think it needs
to be adjusted accordingly to the patch?

--
Regards,
Alexander Korotkov




Re: Extensibility of the PostgreSQL wire protocol

2021-01-25 Thread Jonah H. Harris
On Mon, Jan 25, 2021 at 10:07 AM Jan Wieck  wrote:

> The following is a request for discussion and comments, not a refined
> proposal accompanied by a working patch.
>

After implementing this three different ways inside the backend over the
years, I landed on almost this identical approach for handling the MySQL,
TDS, MongoDB, and Oracle protocols for NEXTGRES.

Initially, each was implemented as an background worker extension which had
to handle its own networking, passing the fd off to new protocol-specific
connections, etc. This worked, but duplicate a good amount of logic. It
would be great to have a standard, loadable, way to add support for a new
protocol.

-- 
Jonah H. Harris


Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit

2021-01-25 Thread Bharath Rupireddy
On Mon, Jan 25, 2021 at 7:28 PM Bharath Rupireddy
 wrote:
> I will provide the updated patch set soon.

Attaching v17 patch set, please review it further.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com


v17-0001-postgres_fdw-function-to-discard-cached-connecti.patch
Description: Binary data


v17-0002-postgres_fdw-add-keep_connections-GUC-to-not-cac.patch
Description: Binary data


v17-0003-postgres_fdw-server-level-option-keep_connection.patch
Description: Binary data


Extensibility of the PostgreSQL wire protocol

2021-01-25 Thread Jan Wieck
The following is a request for discussion and comments, not a refined
proposal accompanied by a working patch.

As recently publicly announced Amazon Web Services is working on Babelfish,
a set of extensions that will allow PostgreSQL to be compatible with other
database systems. One part of this will be an extension that allows
PostgreSQL to listen on a secondary port and process a different wire
protocol. The first extension we are creating in this direction is handling
of the Tabular Data Stream (TDS), used by Sybase and Microsoft SQL-Server
databases. It is more efficient to build an extension, that can handle the
TDS protocol inside the backend, than creating a proxy process that
translates from TDS to libpq protocol and back.

Creating the necessary infrastructure in the postmaster and backend will
open up more possibilities, that are not tied to our compatibility efforts.
Possible use cases for wire protocol extensibility include the development
of a completely new, not backwards compatible PostgreSQL protocol or
extending the existing wire protocol with things like 3rd party connection
pool specific features (like transfer of file descriptors between pool and
working backend for example).

Our current plan is to create a new set of API calls and hooks that allow
to register additional wire protocols. The existing backend libpq
implementation will be modified to register itself using the new API. This
will serve as a proof of concept as well as ensure that the API definition
is not slanted towards a specific protocol. It is also similar to the way
table access methods and compression methods are added.

A wire protocol extension will be a standard PostgreSQL dynamic loadable
extension module. The wire protocol extensions to load will be listed in
the shared_preload_libraries GUC. The extension's Init function will
register a hook function to be called where the postmaster is currently
creating the libpq server sockets. This hook callback will then create the
server sockets and register them for monitoring via select(2) in the
postmaster main loop, using a new API function. Part of the registration
information are callback functions to invoke for accepting and
authenticating incoming connections, error reporting as well as a function
that will implement the TCOP loop for the protocol. Ongoing work on the TDS
protocol has shown us that different protocols make it desirable to have
separate implementations of the TCOP loop. The TCOP function will return
only after the connection has been terminated. Fortunately half the
interface already exists since the sending of result sets is implemented
via callback functions that are registered as the dest receiver, which
works pretty well in our current code.


Regards, Jan

-- 
Jan Wieck
Principal Database Engineer
Amazon Web Services


Re: About to add WAL write/fsync statistics to pg_stat_wal view

2021-01-25 Thread Masahiko Sawada
On Mon, Jan 25, 2021 at 4:51 PM Masahiro Ikeda  wrote:
>
> Hi, thanks for the reviews.
>
> I updated the attached patch.

Thank you for updating the patch!

> The summary of the changes is following.
>
> 1. fix document
>
> I followed another view's comments.
>
>
> 2. refactor issue_xlog_fsync()
>
> I removed "sync_called" variables, narrowed the "duration" scope and
> change the switch statement to if statement.

Looking at the code again, I think if we check if an fsync was really
called when calculating the I/O time, it's better to check that before
starting the measurement.

bool issue_fsync = false;

if (enableFsync &&
(sync_method == SYNC_METHOD_FSYNC ||
 sync_method == SYNC_METHOD_FSYNC_WRITETHROUGH ||
 sync_method == SYNC_METHOD_FDATASYNC))
{
if (track_wal_io_timing)
INSTR_TIME_SET_CURRENT(start);
issue_fsync = true;
}
(snip)
if (issue_fsync)
{
if (track_wal_io_timing)
{
instr_time  duration;

INSTR_TIME_SET_CURRENT(duration);
INSTR_TIME_SUBTRACT(duration, start);
WalStats.m_wal_sync_time = INSTR_TIME_GET_MICROSEC(duration);
}
WalStats.m_wal_sync++;
}

So I prefer either the above which is a modified version of the
original approach or my idea that doesn’t introduce a new local
variable I proposed before. But I'm not going to insist on that.

>
>
> 3. make wal-receiver report WAL statistics
>
> I add the code to collect the statistics for a written operation
> in XLogWalRcvWrite() and to report stats in WalReceiverMain().
>
> Since WalReceiverMain() can loop fast, to avoid loading stats collector,
> I add "force" argument to the pgstat_send_wal function. If "force" is
> false, it can skip reporting until at least 500 msec since it last
> reported. WalReceiverMain() almost calls pgstat_send_wal() with "force"
> as false.

 void
-pgstat_send_wal(void)
+pgstat_send_wal(bool force)
 {
/* We assume this initializes to zeroes */
static const PgStat_MsgWal all_zeroes;
+   static TimestampTz last_report = 0;

+   TimestampTz now;
WalUsagewalusage;

+   /*
+* Don't send a message unless it's been at least PGSTAT_STAT_INTERVAL
+* msec since we last sent one or specified "force".
+*/
+   now = GetCurrentTimestamp();
+   if (!force &&
+   !TimestampDifferenceExceeds(last_report, now, PGSTAT_STAT_INTERVAL))
+   return;
+
+   last_report = now;

Hmm, I don’t think it's good to use PGSTAT_STAT_INTERVAL for this
purpose since it is used as a minimum time for stats file updates. If
we want an interval, I think we should define another one Also, with
the patch, pgstat_send_wal() calls GetCurrentTimestamp() every time
even if track_wal_io_timing is off, which is not good. On the other
hand, I agree that your concern that the wal receiver should not send
the stats for whenever receiving wal records. So an idea could be to
send the wal stats when finishing the current WAL segment file and
when timeout in the main loop. That way we can guarantee that the wal
stats on a replica is updated at least every time finishing a WAL
segment file when actively receiving WAL records and every
NAPTIME_PER_CYCLE in other cases.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/




Re: adding wait_start column to pg_locks

2021-01-25 Thread Fujii Masao




On 2021/01/22 18:11, Fujii Masao wrote:



On 2021/01/22 14:37, torikoshia wrote:

On 2021-01-21 12:48, Fujii Masao wrote:


Thanks for updating the patch! I think that this is really useful feature!!


Thanks for reviewing!


I have two minor comments.

+  
+   wait_start timestamptz

The column name "wait_start" should be "waitstart" for the sake of consistency
with other column names in pg_locks? pg_locks seems to avoid including
an underscore in column names, so "locktype" is used instead of "lock_type",
"virtualtransaction" is used instead of "virtual_transaction", etc.

+   Lock acquisition wait start time. NULL if
+   lock acquired.



Agreed.

I also changed the variable name "wait_start" in struct PGPROC and
LockInstanceData to "waitStart" for the same reason.



There seems the case where the wait start time is NULL even when "grant"
is false. It's better to add note about that case into the docs? For example,
I found that the wait start time is NULL while the startup process is waiting
for the lock. Is this only that case?


Thanks, this is because I set 'waitstart' in the following
condition.

   ---src/backend/storage/lmgr/proc.c
   > 1250 if (!InHotStandby)

As far as considering this, I guess startup process would
be the only case.

I also think that in case of startup process, it seems possible
to set 'waitstart' in ResolveRecoveryConflictWithLock(), so I
did it in the attached patch.


This change seems to cause "waitstart" to be reset every time
ResolveRecoveryConflictWithLock() is called in the do-while loop.
I guess this is not acceptable. Right?

To avoid that issue, IMO the following change is better. Thought?

-   else if (log_recovery_conflict_waits)
+   else
     {
+   TimestampTz now = GetCurrentTimestamp();
+
+   MyProc->waitStart = now;
+
     /*
  * Set the wait start timestamp if logging is enabled and in 
hot
  * standby.
  */
-   standbyWaitStart = GetCurrentTimestamp();
+    if (log_recovery_conflict_waits)
+    standbyWaitStart = now
     }

This change causes the startup process to call GetCurrentTimestamp()
additionally even when log_recovery_conflict_waits is disabled. Which
might decrease the performance of the startup process, but that performance
degradation can happen only when the startup process waits in
ACCESS EXCLUSIVE lock. So if this my understanding right, IMO it's almost
harmless to call GetCurrentTimestamp() additionally in that case. Thought?


According to the off-list discussion with you, this should not happen because 
ResolveRecoveryConflictWithDatabase() sets MyProc->waitStart only when it's not 
set yet (i.e., = 0). That's good. So I'd withdraw my comment.

+   if (MyProc->waitStart == 0)
+   MyProc->waitStart = now;

+   MyProc->waitStart = get_timeout_start_time(DEADLOCK_TIMEOUT);

Another comment is; Doesn't the change of MyProc->waitStart need the lock table's 
partition lock? If yes, we can do that by moving LWLockRelease(partitionLock) just 
after the change of MyProc->waitStart, but which causes the time that lwlock is 
being held to be long. So maybe we need another way to do that.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: Error on failed COMMIT

2021-01-25 Thread Dave Cramer
Rebased against head

Here's my summary of the long thread above.

This change is in keeping with the SQL spec.

There is an argument (Tom) that says that this will annoy more people than
it will please. I presume this is due to the fact that libpq behaviour will
change.

As the author of the JDBC driver, and I believe I speak for other driver
(NPGSQL for one) authors as well that have implemented the protocol I would
argue that the current behaviour is more annoying.

We currently have to keep state and determine if COMMIT actually failed or
it ROLLED BACK. There are a number of async drivers that would also benefit
from not having to keep state in the session.

Regards,

Dave Cramer
www.postgres.rocks


On Tue, 10 Nov 2020 at 11:53, Dave Cramer  wrote:

>
>
> On Mon, 9 Nov 2020 at 16:26, Dave Cramer 
> wrote:
>
>>
>>
>> On Wed, 30 Sep 2020 at 18:14, Andrew Dunstan <
>> andrew.duns...@2ndquadrant.com> wrote:
>>
>>>
>>> On 8/4/20 12:19 PM, Dave Cramer wrote:
>>> > Attached is the rebased patch for consideration.
>>> >
>>> >
>>>
>>>
>>> It's a bit sad this has been hanging around so long without attention.
>>>
>>>
>>> The previous discussion seems to give the patch a clean bill of health
>>> for most/all of the native drivers. Are there any implications for libpq
>>> based drivers such as DBD::Pg and psycopg2? How about for ecpg?
>>>
>>>
>>> cheers
>>>
>>>
>>> andrew
>>>
>>>
>>> --
>>> Andrew Dunstanhttps://www.2ndQuadrant.com
>>> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>>>
>>
>>
>> Attached is a rebased patch with fixes for the isolation tests
>>
>>
>
>>
>> Dave Cramer
>> www.postgres.rocks
>>
>


0001-Throw-error-and-rollback-on-a-failed-transaction-ins.patch
Description: Binary data


Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit

2021-01-25 Thread Bharath Rupireddy
On Mon, Jan 25, 2021 at 7:20 PM Fujii Masao  wrote:
> On 2021/01/25 19:28, Bharath Rupireddy wrote:
> > On Mon, Jan 25, 2021 at 3:17 PM Fujii Masao  
> > wrote:
> >>> Yes, if required backends can establish the connection again. But my
> >>> worry is this - a non-super user disconnecting all or a given
> >>> connection created by a super user?
> >>
> >> Yes, I was also worried about that. But I found that there are other 
> >> similar cases, for example,
> >>
> >> - a cursor that superuser declared can be closed by non-superuser (set by 
> >> SET ROLE or SET SESSION AUTHORIZATION) in the same session.
> >> - a prepared statement that superuser created can be deallocated by 
> >> non-superuser in the same session.
> >>
> >> This makes me think that it's OK even for non-superuser to disconnect the 
> >> connections established by superuser in the same session. For now I've not 
> >> found any real security issue by doing that yet. Thought? Am I missing 
> >> something?
> >
> > Oh, and added to that list is dblink_disconnect(). I don't know
> > whether there's any security risk if we allow non-superusers to
> > discard the super users connections.
>
> I guess that's ok because superuser and nonsuperuser are running in the same 
> session. That is, since this is the case where superuser switches to 
> nonsuperuser intentionally, interactions between them is also intentional.
>
> OTOH, if nonsuperuser in one session can affect superuser in another session 
> that way, which would be problematic. So, for example, for now 
> pg_stat_activity disallows nonsuperuser to see the query that superuser in 
> another session is running, from it.

Hmm, that makes sense.

> > In this case, the super users
> > will just have to re make the connection.
> >
> >>> For now I'm thinking that it might better to add the restriction like 
> >>> pg_terminate_backend() at first and relax that later if possible. But I'd 
> >>> like hear more opinions about this.
> >>
> >> I agree. If required we can lift it later, once we get the users using
> >> these functions? Maybe we can have a comment near superchecks in
> >> disconnect_cached_connections saying, we can lift this in future?
> >
> > Maybe we can do the opposite of the above that is not doing any
> > superuser checks in disconnect functions for now, and later if some
> > users complain we can add it?
>
> +1

Thanks, will send the updated patch set soon.

> > We can leave a comment there that "As of
> > now we don't see any security risks if a non-super user disconnects
> > the connections made by super users. If required, non-supers can be
> > disallowed to disconnct the connections" ?
>
> Yes. Also we should note that that's ok because they are in the same session.

I will add this comment in disconnect_cached_connections so that we
don't lose track of it.

I will provide the updated patch set soon.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




Re: Identify missing publications from publisher while create/alter subscription.

2021-01-25 Thread Bharath Rupireddy
On Mon, Jan 25, 2021 at 5:18 PM japin  wrote:
> > Do you mean when we drop publications from a subscription? If yes, do
> > we have a way to drop a publication from the subscription? See below
> > one of my earlier questions on this.
> > "I wonder, why isn't dropping a publication from a list of
> > publications that are with subscription is not allowed?"
> > At least, I see no ALTER SUBSCRIPTION ... DROP PUBLICATION mypub1 or
> > something similar?
> >
>
> Why we do not support ALTER SUBSCRIPTION...ADD/DROP PUBLICATION?  When we
> have multiple publications in subscription, but I want to add/drop a single
> publication, it is conveient.  The ALTER SUBSCRIPTION...SET PUBLICATION...
> should supply the completely publications.

Looks like the way to drop/add publication from the list of
publications in subscription requires users to specify all the list of
publications currently exists +/- the new publication that needs to be
added/dropped:

CREATE SUBSCRIPTION mysub1 CONNECTION 'host=localhost port=5432
dbname=postgres' PUBLICATION mypub1, mypub2, mypu3, mypub4, mypub5;
postgres=# select subpublications from pg_subscription;
   subpublications
-
 {mypub1,mypub2,mypu3,mypub4,mypub5}
(1 row)

Say, I want to drop mypub4:

ALTER SUBSCRIPTION mysub1 SET PUBLICATION mypub1, mypub2, mypu3, mypub5;
postgres=# select subpublications from pg_subscription;
   subpublications
--
 {mypub1,mypub2,mypu3,mypub5}

Say, I want toa dd mypub4 and mypub6:
ALTER SUBSCRIPTION mysub1 SET PUBLICATION mypub1, mypub2, mypu3,
mypub5, mypub4, mypub6;
postgres=# select subpublications from pg_subscription;
  subpublications

 {mypub1,mypub2,mypu3,mypub5,mypub4,mypub6}
(1 row)

It will be good to have something like:

ALTER SUBSCRIPTION mysub1 ADD PUBLICATION mypub1, mypub3; which will
the publications to subscription if not added previously.

ALTER SUBSCRIPTION mysub1 DROP PUBLICATION mypub1, mypub3; which will
drop the publications from subscription if they exist in the
subscription's list of publications.

But I'm really not sure why the above syntax was not added earlier. We
may be missing something here.

> Sorry, this question is unrelated with this subject.

Yes, IMO it can definitely be discussed in another thread. It will be
good to get a separate opinion for this.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit

2021-01-25 Thread Fujii Masao




On 2021/01/25 19:28, Bharath Rupireddy wrote:

On Mon, Jan 25, 2021 at 3:17 PM Fujii Masao  wrote:

Yes, if required backends can establish the connection again. But my
worry is this - a non-super user disconnecting all or a given
connection created by a super user?


Yes, I was also worried about that. But I found that there are other similar 
cases, for example,

- a cursor that superuser declared can be closed by non-superuser (set by SET 
ROLE or SET SESSION AUTHORIZATION) in the same session.
- a prepared statement that superuser created can be deallocated by 
non-superuser in the same session.

This makes me think that it's OK even for non-superuser to disconnect the 
connections established by superuser in the same session. For now I've not 
found any real security issue by doing that yet. Thought? Am I missing 
something?


Oh, and added to that list is dblink_disconnect(). I don't know
whether there's any security risk if we allow non-superusers to
discard the super users connections.


I guess that's ok because superuser and nonsuperuser are running in the same 
session. That is, since this is the case where superuser switches to 
nonsuperuser intentionally, interactions between them is also intentional.

OTOH, if nonsuperuser in one session can affect superuser in another session 
that way, which would be problematic. So, for example, for now pg_stat_activity 
disallows nonsuperuser to see the query that superuser in another session is 
running, from it.



In this case, the super users
will just have to re make the connection.


For now I'm thinking that it might better to add the restriction like 
pg_terminate_backend() at first and relax that later if possible. But I'd like 
hear more opinions about this.


I agree. If required we can lift it later, once we get the users using
these functions? Maybe we can have a comment near superchecks in
disconnect_cached_connections saying, we can lift this in future?


Maybe we can do the opposite of the above that is not doing any
superuser checks in disconnect functions for now, and later if some
users complain we can add it?


+1


We can leave a comment there that "As of
now we don't see any security risks if a non-super user disconnects
the connections made by super users. If required, non-supers can be
disallowed to disconnct the connections" ?


Yes. Also we should note that that's ok because they are in the same session.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: Is Recovery actually paused?

2021-01-25 Thread Bharath Rupireddy
On Mon, Jan 25, 2021 at 2:53 PM Dilip Kumar  wrote:
> I have changed as per other functions for consistency.

Thanks for the v7 patch. Here are some quick comments on it:

[1] I think we need to change return value from boolean to text in
documentation:
  pg_is_wal_replay_paused

pg_is_wal_replay_paused ()
boolean
   

[2] Do we intentionally ignore the return type of below function? If
yes, can we change the return type to void and change the function
comment? If we do care about the return value, shouldn't we use it?

static bool recoveryApplyDelay(XLogReaderState *record);
+recoveryApplyDelay(xlogreader);

[3] Although it's not necessary, I just thought, it will be good to
have an example for the new output of pg_is_wal_replay_paused in the
documentation, something like below for brin_page_type:


test=# SELECT brin_page_type(get_raw_page('brinidx', 0));
 brin_page_type

 meta


With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




Re: New IndexAM API controlling index vacuum strategies

2021-01-25 Thread Masahiko Sawada
On Mon, Jan 25, 2021 at 5:19 PM Masahiko Sawada  wrote:
>
> On Thu, Jan 21, 2021 at 11:24 PM Masahiko Sawada  
> wrote:
> >
> > On Wed, Jan 20, 2021 at 7:58 AM Peter Geoghegan  wrote:
> > >
> > > On Sun, Jan 17, 2021 at 9:18 PM Masahiko Sawada  
> > > wrote:
> > > > After more thought, I think that ambulkdelete needs to be able to
> > > > refer the answer to amvacuumstrategy. That way, the index can skip
> > > > bulk-deletion when lazy vacuum doesn't vacuum heap and it also doesn’t
> > > > want to do that.
> > >
> > > Makes sense.
> > >
> > > BTW, your patch has bitrot already. Peter E's recent pageinspect
> > > commit happens to conflict with this patch. It might make sense to
> > > produce a new version that just fixes the bitrot, so that other people
> > > don't have to deal with it each time.
> > >
> > > > I’ve attached the updated version patch that includes the following 
> > > > changes:
> > >
> > > Looks good. I'll give this version a review now. I will do a lot more
> > > soon. I need to come up with a good benchmark for this, that I can
> > > return to again and again during review as needed.
> >
> > Thank you for reviewing the patches.
> >
> > >
> > > Some feedback on the first patch:
> > >
> > > * Just so you know: I agree with you about handling
> > > VACOPT_TERNARY_DISABLED in the index AM's amvacuumstrategy routine. I
> > > think that it's better to do that there, even though this choice may
> > > have some downsides.
> > >
> > > * Can you add some "stub" sgml doc changes for this? Doesn't have to
> > > be complete in any way. Just a placeholder for later, that has the
> > > correct general "shape" to orientate the reader of the patch. It can
> > > just be a FIXME comment, plus basic mechanical stuff -- details of the
> > > new amvacuumstrategy_function routine and its signature.
> > >
> >
> > 0002 patch had the doc update (I mistakenly included it to 0002
> > patch). Is that update what you meant?
> >
> > > Some feedback on the second patch:
> > >
> > > * Why do you move around IndexVacuumStrategy in the second patch?
> > > Looks like a rebasing oversight.
> >
> > Check.
> >
> > >
> > > * Actually, do we really need the first and second patches to be
> > > separate patches? I agree that the nbtree patch should be a separate
> > > patch, but dividing the first two sets of changes doesn't seem like it
> > > adds much. Did I miss some something?
> >
> > I separated the patches since I used to have separate patches when
> > adding other index AM options required by parallel vacuum. But I
> > agreed to merge the first two patches.
> >
> > >
> > > * Is the "MAXALIGN(sizeof(ItemIdData)))" change in the definition of
> > > MaxHeapTuplesPerPage appropriate? Here is the relevant section from
> > > the patch:
> > >
> > > diff --git a/src/include/access/htup_details.h
> > > b/src/include/access/htup_details.h
> > > index 7c62852e7f..038e7cd580 100644
> > > --- a/src/include/access/htup_details.h
> > > +++ b/src/include/access/htup_details.h
> > > @@ -563,17 +563,18 @@ do { \
> > >  /*
> > >   * MaxHeapTuplesPerPage is an upper bound on the number of tuples that 
> > > can
> > >   * fit on one heap page.  (Note that indexes could have more, because 
> > > they
> > > - * use a smaller tuple header.)  We arrive at the divisor because each 
> > > tuple
> > > - * must be maxaligned, and it must have an associated line pointer.
> > > + * use a smaller tuple header.)  We arrive at the divisor because each 
> > > line
> > > + * pointer must be maxaligned.
> > > *** SNIP ***
> > >  #define MaxHeapTuplesPerPage\
> > > -((int) ((BLCKSZ - SizeOfPageHeaderData) / \
> > > -(MAXALIGN(SizeofHeapTupleHeader) + sizeof(ItemIdData
> > > +((int) ((BLCKSZ - SizeOfPageHeaderData) / 
> > > (MAXALIGN(sizeof(ItemIdData)
> > >
> > > It's true that ItemIdData structs (line pointers) are aligned, but
> > > they're not MAXALIGN()'d. If they were then the on-disk size of line
> > > pointers would generally be 8 bytes, not 4 bytes.
> >
> > You're right. Will fix.
> >
> > >
> > > * Maybe it would be better if you just changed the definition such
> > > that "MAXALIGN(SizeofHeapTupleHeader)" became "MAXIMUM_ALIGNOF", with
> > > no other changes? (Some variant of this suggestion might be better,
> > > not sure.)
> > >
> > > For some reason that feels a bit safer: we still have an "imaginary
> > > tuple header", but it's just 1 MAXALIGN() quantum now. This is still
> > > much less than the current 3 MAXALIGN() quantums (i.e. what
> > > MaxHeapTuplesPerPage treats as the tuple header size). Do you think
> > > that this alternative approach will be noticeably less effective
> > > within vacuumlazy.c?
> > >
> > > Note that you probably understand the issue with MaxHeapTuplesPerPage
> > > for vacuumlazy.c better than I do currently. I'm still trying to
> > > understand your choices, and to understand what is really important
> > > here.
> >
> > Yeah, using MAXIMUM_ALIGNOF seems better for safety. I shared my
> 

Re: Extensions not dumped when --schema is used

2021-01-25 Thread Guillaume Lelarge
Hi,

Le sam. 23 mai 2020 à 14:53, Guillaume Lelarge  a
écrit :

> Le mer. 20 mai 2020 à 16:39, Tom Lane  a écrit :
>
>> Guillaume Lelarge  writes:
>> > Le mer. 20 mai 2020 à 11:26, Daniel Gustafsson  a
>> écrit :
>> >>  The question is what --extensions should do: only dump any
>> >> extensions that objects in the schema depend on; require a pattern and
>> only
>> >> dump matching extensions; dump all extensions (probably not) or
>> something
>> >> else?
>>
>> > Actually, "dump all extensions" (#3) would make sense to me, and has my
>> > vote.
>>
>> I think that makes no sense at all.  By definition, a dump that's been
>> restricted with --schema, --table, or any similar switch is incomplete
>> and may not restore on its own.  Typical examples include foreign key
>> references to tables in other schemas, views using functions in other
>> schemas, etc etc.  I see no reason for extension dependencies to be
>> treated differently from those cases.
>>
>>
> Agreed.
>
> In any use of selective dump, it's the user's job to select a set of
>> objects that she wants dumped (or restored).  Trying to second-guess that
>> is mostly going to make the feature less usable for power-user cases.
>>
>>
> Agreed, though right now he has no way to do this for extensions.
>
> As a counterexample, what if you want the dump to be restorable on a
>> system that doesn't have all of the extensions available on the source?
>> You carefully pick out the tables that you need, which don't require the
>> unavailable extensions ... and then pg_dump decides you don't know what
>> you're doing and includes all the problematic extensions anyway.
>>
>>
> That's true.
>
> I could get behind an "--extensions=PATTERN" switch to allow selective
>> addition of extensions to a selective dump, but I don't want to see us
>> overruling the user's choices about what to dump.
>>
>>
> With all your comments, I can only agree to your views. I'll try to work
> on this anytime soon.
>
>
"Anytime soon" was a long long time ago, and I eventually completely forgot
this, sorry. As nobody worked on it yet, I took a shot at it. See attached
patch.

I don't know if I should add this right away in the commit fest app. If
yes, I guess it should go on the next commit fest (2021-03), right?


-- 
Guillaume.
commit 97c41b05b4aa20f4187aedbed79d67874d2cbbbd
Author: Guillaume Lelarge 
Date:   Mon Jan 25 14:25:53 2021 +0100

Add a --extension flag to dump specific extensions

Version 1.

diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml
index bcbb7a25fb..95d45fabfb 100644
--- a/doc/src/sgml/ref/pg_dump.sgml
+++ b/doc/src/sgml/ref/pg_dump.sgml
@@ -215,6 +215,38 @@ PostgreSQL documentation
   
  
 
+ 
+  -e pattern
+  --extension=pattern
+  
+   
+Dump only extensions matching pattern.  When this option is not
+specified, all non-system extensions in the target database will be
+dumped.  Multiple schemas can be selected by writing multiple
+-e switches.  The pattern parameter is interpreted as a
+pattern according to the same rules used by
+psql's \d commands (see
+), so multiple extensions can also
+be selected by writing wildcard characters in the pattern.  When using
+wildcards, be careful to quote the pattern if needed to prevent the
+shell from expanding the wildcards.
+   
+
+   
+
+ When -e is specified,
+ pg_dump makes no attempt to dump any other
+ database objects that the selected extension(s) might depend upon.
+ Therefore, there is no guarantee that the results of a
+ specific-extension dump can be successfully restored by themselves
+ into a clean database.
+
+   
+  
+ 
+
  
   -E encoding
   --encoding=encoding
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 798d14580e..3c1203e85c 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -123,6 +123,9 @@ static SimpleOidList tabledata_exclude_oids = {NULL, NULL};
 static SimpleStringList foreign_servers_include_patterns = {NULL, NULL};
 static SimpleOidList foreign_servers_include_oids = {NULL, NULL};
 
+static SimpleStringList extension_include_patterns = {NULL, NULL};
+static SimpleOidList extension_include_oids = {NULL, NULL};
+
 static const CatalogId nilCatalogId = {0, 0};
 
 /* override for standard extra_float_digits setting */
@@ -151,6 +154,10 @@ static void expand_schema_name_patterns(Archive *fout,
 		SimpleStringList *patterns,
 		SimpleOidList *oids,
 		bool strict_names);
+static void expand_extension_name_patterns(Archive *fout,
+			SimpleStringList *patterns,
+			SimpleOidList *oids,
+			bool strict_names);
 static void expand_foreign_server_name_patterns(Archive *fout,
 SimpleStringList *patterns,
 SimpleOidList *oids);
@@ -334,6 +341,7 @@ 

Re: Columns correlation and adaptive query optimization

2021-01-25 Thread Konstantin Knizhnik

Hello,

Thank you for review.
My answers are inside.


On 21.01.2021 15:30, Yugo NAGATA wrote:

Hello,

On Thu, 26 Mar 2020 18:49:51 +0300
Konstantin Knizhnik  wrote:


Attached please find new version of the patch with more comments and
descriptions added.

Adaptive query optimization is very interesting feature for me, so I looked
into this patch.  Here are some comments and questions.

(1)
This patch needs rebase because clauselist_selectivity was modified to improve
estimation of OR clauses.


Rebased version is attached.


(2)
If I understand correctly, your proposal consists of the following two features.

1. Add a feature to auto_explain that creates an extended statistic 
automatically
if an error on estimated rows number is large.

2. Improve rows number estimation of join results by considering functional
dependencies between vars in the join qual if the qual has more than one 
clauses,
and also functional dependencies between a var in the join qual and vars in 
quals
of the inner/outer relation.

As you said, these two parts are independent each other, so one feature will 
work
even if we don't assume the other.  I wonder it would be better to split the 
patch
again, and register them to commitfest separately.


I agree with you that this are two almost unrelated changes, although 
without clausesel patch additional statistic can not improve query planning.

But I already have too many patches at commitfest.
May be it will be enough to spit this patch into two?



(3)
+   DefineCustomBoolVariable("auto_explain.suggest_only",
+"Do not create statistic 
but just record in WAL suggested create statistics statement.",
+NULL,
+
_explain_suggest_on

To imply that this parameter is involving to add_statistics_threshold, it seems
better for me to use more related name like add_statistics_suggest_only.

Also, additional documentations for new parameters are required.


Done.



(4)
+   /*
+* Prevent concurrent access to extended statistic table
+*/
+   stat_rel = table_open(StatisticExtRelationId, 
AccessExclusiveLock);
+   slot = table_slot_create(stat_rel, NULL);
+   scan = table_beginscan_catalog(stat_rel, 2, entry);
(snip)
+   table_close(stat_rel, AccessExclusiveLock);
+   }

When I tested the auto_explain part, I got the following WARNING.

  WARNING:  buffer refcount leak: [097] (rel=base/12879/3381, blockNum=0, 
flags=0x8300, refcount=1 2)
  WARNING:  buffer refcount leak: [097] (rel=base/12879/3381, blockNum=0, 
flags=0x8300, refcount=1 1)
  WARNING:  relcache reference leak: relation "pg_statistic_ext" not closed
  WARNING:  TupleDesc reference leak: TupleDesc 0x7fa439266338 (12029,-1) still 
referenced
  WARNING:  Snapshot reference leak: Snapshot 0x55c332c10418 still referenced

To suppress this, I think we need table_endscan(scan) and
ExecDropSingleTupleTableSlot(slot) before finishing this function.


Thank you for noticing the problem, fixed.




(6)
+   elog(NOTICE, "Auto_explain suggestion: 
CREATE STATISTICS %s %s FROM %s", stat_name, create_stat_stmt, rel_name);

We should use ereport instead of elog for log messages.


Changed.



(7)
+   double dep = 
find_var_dependency(root, innerRelid, var, clauses_attnums);
+   if (dep != 0.0)
+   {
+   s1 *= dep + (1 - dep) * 
s2;
+   continue;
+   }

I found the following comment of clauselist_apply_dependencies():

   * we actually combine selectivities using the formula
   *
   *  P(a,b) = f * Min(P(a), P(b)) + (1-f) * P(a) * P(b)

so, is it not necessary using the same formula in this patch? That is,

s1 *= dep + (1-dep) * s2   (if s1 <= s2)
   s1 *= dep * (s2/s1) + (1-dep) * s2   (otherwise) .

Makes sense.


(8)
+/*
+ * Try to find dependency between variables.
+ * var: varaibles which dependencies are considered
+ * join_vars: list of variables used in other clauses
+ * This functions return strongest dependency and some subset of variables 
from the same relation
+ * or 0.0 if no dependency was found.
+ */
+static double
+var_depends_on(PlannerInfo *root, Var* var, List* clause_vars)
+{

The comment mentions join_vars but the actual argument name is clauses_vars,
so it needs unification.


Fixed.



(9)
Currently, it only consider functional dependencies statistics. Can we also
consider multivariate MCV list, and is it useful?



Right now auto_explain create statistic 

Re: Parallel INSERT (INTO ... SELECT ...)

2021-01-25 Thread Greg Nancarrow
On Mon, Jan 25, 2021 at 10:40 PM Hou, Zhijie  wrote:
>
> Hi,
>
> When reading the code of rel_max_parallel_hazard_for_modify in 0001.
>
> I thought there are so many places call table_close().
> Personally, It's a little confused to me.
>
> Do you think it's better to do the table_open/close outside of 
> rel_max_parallel_hazard_for_modify ?
>
> Like:
>
> static bool rel_max_parallel_hazard_for_modify(Relation rel,
>CmdType command_type,
>max_parallel_hazard_context 
> *context);
> ...
> Relation relation = table_open(rte->relid, NoLock);
> (void) rel_max_parallel_hazard_for_modify(relation, 
> parse->commandType, );
> table_close(relation, NoLock);
>
>
> And we seems do not need the lockmode param with the above define.
>
>

Yeah, the repeated cleanup at the point of return is a bit ugly.
It could be solved by changing the function to do cleanup at a common
return point, but I agree with you that in this case it could simply
be done outside the function.
Thanks, I'll make that change.

Regards,
Greg Nancarrow
Fujitsu Australia




Add SQL function for SHA1

2021-01-25 Thread Michael Paquier
Hi all,

SHA-1 is now an option available for cryptohashes, and like the
existing set of functions of SHA-2, I don't really see a reason why we
should not have a SQL function for SHA1.  Attached is a patch doing
that.

The same code pattern was repeated 4 times on HEAD for the SHA-2
functions for the bytea -> bytea hashing, so I have refactored the
whole thing while integrating the new function, shaving some code from
cryptohashfuncs.c.

Thoughts?
--
Michael
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index b5f52d4e4a..e6f22a5873 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -7317,6 +7317,9 @@
 { oid => '2321', descr => 'MD5 hash',
   proname => 'md5', proleakproof => 't', prorettype => 'text',
   proargtypes => 'bytea', prosrc => 'md5_bytea' },
+{ oid => '8935', descr => 'SHA-1 hash',
+  proname => 'sha1', proleakproof => 't', prorettype => 'bytea',
+  proargtypes => 'bytea', prosrc => 'sha1_bytea' },
 { oid => '3419', descr => 'SHA-224 hash',
   proname => 'sha224', proleakproof => 't', prorettype => 'bytea',
   proargtypes => 'bytea', prosrc => 'sha224_bytea' },
diff --git a/src/backend/utils/adt/cryptohashfuncs.c b/src/backend/utils/adt/cryptohashfuncs.c
index d99485f4c6..5bdc2da531 100644
--- a/src/backend/utils/adt/cryptohashfuncs.c
+++ b/src/backend/utils/adt/cryptohashfuncs.c
@@ -15,6 +15,7 @@
 
 #include "common/cryptohash.h"
 #include "common/md5.h"
+#include "common/sha1.h"
 #include "common/sha2.h"
 #include "utils/builtins.h"
 
@@ -68,6 +69,74 @@ md5_bytea(PG_FUNCTION_ARGS)
 	PG_RETURN_TEXT_P(cstring_to_text(hexsum));
 }
 
+/*
+ * Internal routine to compute a cryptohash with the given bytea input.
+ */
+static inline bytea *
+cryptohash_internal(bytea *input, int digest_len,
+	pg_cryptohash_type type)
+{
+	const uint8 *data;
+	const char  *typestr;
+	size_t		len;
+	pg_cryptohash_ctx *ctx;
+	unsigned char *buf;
+	bytea	   *result;
+
+	switch (type)
+	{
+		case PG_MD5:
+			typestr = "MD5";
+			break;
+		case PG_SHA1:
+			typestr = "SHA1";
+			break;
+		case PG_SHA224:
+			typestr = "SHA224";
+			break;
+		case PG_SHA256:
+			typestr = "SHA256";
+			break;
+		case PG_SHA384:
+			typestr = "SHA384";
+			break;
+		case PG_SHA512:
+			typestr = "SHA512";
+			break;
+	}
+
+	buf = palloc0(digest_len);
+	len = VARSIZE_ANY_EXHDR(input);
+	data = (unsigned char *) VARDATA_ANY(input);
+
+	ctx = pg_cryptohash_create(type);
+	if (pg_cryptohash_init(ctx) < 0)
+		elog(ERROR, "could not initialize %s context", typestr);
+	if (pg_cryptohash_update(ctx, data, len) < 0)
+		elog(ERROR, "could not update %s context", typestr);
+	if (pg_cryptohash_final(ctx, buf) < 0)
+		elog(ERROR, "could not finalize %s context", typestr);
+	pg_cryptohash_free(ctx);
+
+	result = palloc(digest_len + VARHDRSZ);
+	SET_VARSIZE(result, digest_len + VARHDRSZ);
+	memcpy(VARDATA(result), buf, digest_len);
+
+	return result;
+}
+
+/*
+ * SHA-1
+ */
+Datum
+sha1_bytea(PG_FUNCTION_ARGS)
+{
+	bytea   *result = cryptohash_internal(PG_GETARG_BYTEA_PP(0),
+		  SHA1_DIGEST_LENGTH,
+		  PG_SHA1);
+	PG_RETURN_BYTEA_P(result);
+}
+
 
 /*
  * SHA-2 variants
@@ -76,115 +145,35 @@ md5_bytea(PG_FUNCTION_ARGS)
 Datum
 sha224_bytea(PG_FUNCTION_ARGS)
 {
-	bytea	   *in = PG_GETARG_BYTEA_PP(0);
-	const uint8 *data;
-	size_t		len;
-	pg_cryptohash_ctx *ctx;
-	unsigned char buf[PG_SHA224_DIGEST_LENGTH];
-	bytea	   *result;
-
-	len = VARSIZE_ANY_EXHDR(in);
-	data = (unsigned char *) VARDATA_ANY(in);
-
-	ctx = pg_cryptohash_create(PG_SHA224);
-	if (pg_cryptohash_init(ctx) < 0)
-		elog(ERROR, "could not initialize %s context", "SHA224");
-	if (pg_cryptohash_update(ctx, data, len) < 0)
-		elog(ERROR, "could not update %s context", "SHA224");
-	if (pg_cryptohash_final(ctx, buf) < 0)
-		elog(ERROR, "could not finalize %s context", "SHA224");
-	pg_cryptohash_free(ctx);
-
-	result = palloc(sizeof(buf) + VARHDRSZ);
-	SET_VARSIZE(result, sizeof(buf) + VARHDRSZ);
-	memcpy(VARDATA(result), buf, sizeof(buf));
-
+	bytea   *result = cryptohash_internal(PG_GETARG_BYTEA_PP(0),
+		  PG_SHA224_DIGEST_LENGTH,
+		  PG_SHA224);
 	PG_RETURN_BYTEA_P(result);
 }
 
 Datum
 sha256_bytea(PG_FUNCTION_ARGS)
 {
-	bytea	   *in = PG_GETARG_BYTEA_PP(0);
-	const uint8 *data;
-	size_t		len;
-	pg_cryptohash_ctx *ctx;
-	unsigned char buf[PG_SHA256_DIGEST_LENGTH];
-	bytea	   *result;
-
-	len = VARSIZE_ANY_EXHDR(in);
-	data = (unsigned char *) VARDATA_ANY(in);
-
-	ctx = pg_cryptohash_create(PG_SHA256);
-	if (pg_cryptohash_init(ctx) < 0)
-		elog(ERROR, "could not initialize %s context", "SHA256");
-	if (pg_cryptohash_update(ctx, data, len) < 0)
-		elog(ERROR, "could not update %s context", "SHA256");
-	if (pg_cryptohash_final(ctx, buf) < 0)
-		elog(ERROR, "could not finalize %s context", "SHA256");
-	pg_cryptohash_free(ctx);
-
-	result = palloc(sizeof(buf) + VARHDRSZ);
-	SET_VARSIZE(result, sizeof(buf) + VARHDRSZ);
-	memcpy(VARDATA(result), buf, sizeof(buf));
-
+	bytea   *result = 

Re: libpq debug log

2021-01-25 Thread 'Alvaro Herrera'
On 2021-Jan-25, tsunakawa.ta...@fujitsu.com wrote:

> Iwata-san,

[...]

> Considering these and the compilation error Kirk-san found, I'd like
> you to do more self-review before I resume this review.

Kindly note that these errors are all mine.

Thanks for the review

-- 
Álvaro Herrera39°49'30"S 73°17'W
#error "Operator lives in the wrong universe"
  ("Use of cookies in real-time system development", M. Gleixner, M. Mc Guire)




Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly

2021-01-25 Thread Justin Pryzby
On Mon, Jan 25, 2021 at 05:07:29PM +0900, Michael Paquier wrote:
> On Fri, Jan 22, 2021 at 05:07:02PM +0300, Alexey Kondratov wrote:
> > I have updated patches accordingly and also simplified tablespaceOid checks
> > and assignment in the newly added SetRelTableSpace(). Result is attached as
> > two separate patches for an ease of review, but no objections to merge them
> > and apply at once if everything is fine.
...
> +SELECT relname FROM pg_class
> +WHERE reltablespace=(SELECT oid FROM pg_tablespace WHERE 
> spcname='regress_tblspace');
> [...]
> +-- first, check a no-op case
> +REINDEX (TABLESPACE pg_default) INDEX regress_tblspace_test_tbl_idx;
> +REINDEX (TABLESPACE pg_default) TABLE regress_tblspace_test_tbl;
> Reindexing means that the relfilenodes are changed, so the tests
> should track the original and new relfilenodes and compare them, no?
> In short, this set of regression tests does not make sure that a
> REINDEX actually happens or not, and this may read as a reindex not
> happening at all for those tests.  For single units, these could be
> saved in a variable and compared afterwards.  create_index.sql does
> that a bit with REINDEX SCHEMA for a set of relations.

You might also check my "CLUSTER partitioned" patch for another way to do that.

https://www.postgresql.org/message-id/20210118183459.GJ8560%40telsasoft.com
https://www.postgresql.org/message-id/attachment/118126/v6-0002-Implement-CLUSTER-of-partitioned-table.patch

-- 
Justin




Re: [PATCH] Automatic HASH and LIST partition creation

2021-01-25 Thread Pavel Borisov
>
> I don't think we can use \d+ on a temporary table here, because the
> backend ID appears in the namespace, which is causing a failure on one
> of the CI OSes due to nondeterminism:
>
> CREATE TEMP TABLE temp_parted (a char) PARTITION BY LIST (a)
> CONFIGURATION (VALUES IN ('a') DEFAULT PARTITION temp_parted_default);
> \d+ temp_parted
> - Partitioned table "pg_temp_3.temp_parted"
> + Partitioned table "pg_temp_4.temp_parted"
>

I've updated the tests accordingly. PFA version 4.
As none of the recent proposals to modify the syntax were seconded by
anyone, I return the previous Ready-for-committer CF status.

-- 
Best regards,
Pavel Borisov

Postgres Professional: http://postgrespro.com 


0001-Auto-generated-HASH-and-LIST-partitions-v4.patch
Description: Binary data


Re: Identify missing publications from publisher while create/alter subscription.

2021-01-25 Thread japin


On Mon, 25 Jan 2021 at 17:18, Bharath Rupireddy 
 wrote:
> On Mon, Jan 25, 2021 at 2:42 PM Dilip Kumar  wrote:
>>
>> On Mon, Jan 25, 2021 at 1:10 PM vignesh C  wrote:
>> >
>> > On Thu, Jan 21, 2021 at 10:21 PM Bharath Rupireddy
>> >  wrote:
>> > >
>> > > On Thu, Jan 21, 2021 at 6:56 PM vignesh C  wrote:
>> > > >
>> > > > Hi,
>> > > >
>> > > > Creating/altering subscription is successful when we specify a
>> > > > publication which does not exist in the publisher. I felt we should
>> > > > throw an error in this case, that will help the user to check if there
>> > > > is any typo in the create subscription command or to create the
>> > > > publication before the subscription is created.
>> > > > If the above analysis looks correct, then please find a patch that
>> > > > checks if the specified publications are present in the publisher and
>> > > > throws an error if any of the publications is missing in the
>> > > > publisher.
>> > > > Thoughts?
>> > >
>> > > I was having similar thoughts (while working on  the logical
>> > > replication bug on alter publication...drop table behaviour) on why
>> > > create subscription succeeds without checking the publication
>> > > existence. I checked in documentation, to find if there's a strong
>> > > reason for that, but I couldn't. Maybe it's because of the principle
>> > > "first let users create subscriptions, later the publications can be
>> > > created on the publisher system", similar to this behaviour
>> > > "publications can be created without any tables attached to it
>> > > initially, later they can be added".
>> > >
>> > > Others may have better thoughts.
>> > >
>> > > If we do check publication existence for CREATE SUBSCRIPTION, I think
>> > > we should also do it for ALTER SUBSCRIPTION ... SET PUBLICATION.
>> > >
>> > > I wonder, why isn't dropping a publication from a list of publications
>> > > that are with subscription is not allowed?
>> > >
>> > > Some comments so far on the patch:
>> > >
>> > > 1) I see most of the code in the new function check_publications() and
>> > > existing fetch_table_list() is the same. Can we have a generic
>> > > function, with maybe a flag to separate out the logic specific for
>> > > checking publication and fetching table list from the publisher.
>> >
>> > I have made the common code between the check_publications and
>> > fetch_table_list into a common function
>> > get_appended_publications_query. I felt the rest of the code is better
>> > off kept as it is.
>> > The Attached patch has the changes for the same and also the change to
>> > check publication exists during alter subscription set publication.
>> > Thoughts?
>> >
>>
>> So basically, the create subscription will throw an error if the
>> publication does not exist.  So will you throw an error if we try to
>> drop the publication which is subscribed by some subscription?  I mean
>> basically, you are creating a dependency that if you are creating a
>> subscription then there must be a publication that is not completely
>> insane but then we will have to disallow dropping the publication as
>> well.  Am I missing something?
>
> Do you mean DROP PUBLICATION non_existent_publication;?
>
> Or
>
> Do you mean when we drop publications from a subscription? If yes, do
> we have a way to drop a publication from the subscription? See below
> one of my earlier questions on this.
> "I wonder, why isn't dropping a publication from a list of
> publications that are with subscription is not allowed?"
> At least, I see no ALTER SUBSCRIPTION ... DROP PUBLICATION mypub1 or
> something similar?
>

Why we do not support ALTER SUBSCRIPTION...ADD/DROP PUBLICATION?  When we
have multiple publications in subscription, but I want to add/drop a single
publication, it is conveient.  The ALTER SUBSCRIPTION...SET PUBLICATION...
should supply the completely publications.

Sorry, this question is unrelated with this subject.

-- 
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.




Re: Single transaction in the tablesync worker?

2021-01-25 Thread Peter Smith
On Mon, Jan 25, 2021 at 4:48 PM Amit Kapila  wrote:
>
> On Mon, Jan 25, 2021 at 8:03 AM Peter Smith  wrote:
> >
> > Hi Amit.
> >
> > PSA the v19 patch for the Tablesync Solution1.
> >
>
> I see one race condition in this patch where we try to drop the origin
> via apply process and DropSubscription. I think it can lead to the
> error "cache lookup failed for replication origin with oid %u". The
> same problem can happen via exposed API pg_replication_origin_drop but
> probably because this is not used concurrently so nobody faced this
> issue. I think for the matter of this patch we can try to suppress
> such an error either via try..catch, or by adding missing_ok argument
> to replorigin_drop API, or we can just add to comments that such a
> race exists.

OK. This has been isolated to a common function called from 3 places.
The potential race ERROR is suppressed by TRY/CATCH.
Please see code of latest patch [v20]

> Additionally, we should try to start a new thread for the
> existence of this problem in pg_replication_origin_drop. What do you
> think?

OK. It is on my TODO list..


[v20] 
https://www.postgresql.org/message-id/CAHut%2BPuNwSujoL_dwa%3DTtozJ_vF%3DCnJxjgQTCmNBkazd8J1m-A%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia




Re: Single transaction in the tablesync worker?

2021-01-25 Thread Peter Smith
On Mon, Jan 25, 2021 at 2:54 PM Amit Kapila  wrote:
>
> On Mon, Jan 25, 2021 at 8:23 AM Peter Smith  wrote:
> >
> > On Sat, Jan 23, 2021 at 11:26 PM Amit Kapila  
> > wrote:
> > >
> > > 2.
> > > @@ -98,11 +102,16 @@
> > >  #include "miscadmin.h"
> > >  #include "parser/parse_relation.h"
> > >  #include "pgstat.h"
> > > +#include "postmaster/interrupt.h"
> > >  #include "replication/logicallauncher.h"
> > >  #include "replication/logicalrelation.h"
> > > +#include "replication/logicalworker.h"
> > >  #include "replication/walreceiver.h"
> > >  #include "replication/worker_internal.h"
> > > +#include "replication/slot.h"
> > >
> > > I don't think the above includes are required. They seem to the
> > > remnant of the previous approach.
> > >
> >
> > OK. Fixed in the latest patch [v19].
> >
>
> You seem to forgot removing #include "replication/slot.h". Check, if
> it is not required then remove that as well.
>

Fixed in the latest patch [v20].


[v20] 
https://www.postgresql.org/message-id/CAHut%2BPuNwSujoL_dwa%3DTtozJ_vF%3DCnJxjgQTCmNBkazd8J1m-A%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia




Re: Single transaction in the tablesync worker?

2021-01-25 Thread Peter Smith
On Mon, Jan 25, 2021 at 1:58 PM Amit Kapila  wrote:
>
> On Sun, Jan 24, 2021 at 12:24 PM Peter Smith  wrote:
> >
> > On Sat, Jan 23, 2021 at 11:26 PM Amit Kapila  
> > wrote:
> > >
> > >
> > > Few comments:
> > > =
> > > 1.
> > > - *   So the state progression is always: INIT -> DATASYNC -> SYNCWAIT ->
> > > - *   CATCHUP -> SYNCDONE -> READY.
> > > + *   So the state progression is always: INIT -> DATASYNC ->
> > > + *   (sync worker FINISHEDCOPY) -> SYNCWAIT -> CATCHUP -> SYNCDONE -> 
> > > READY.
> > >
> > > I don't think we need to be specific here that sync worker sets
> > > FINISHEDCOPY state.
> > >
> >
> > This was meant to indicate that *only* the sync worker knows about the
> > FINISHEDCOPY state, whereas all the other states are either known
> > (assigned and/or used) by *both* kinds of workers. But, I can remove
> > it if you feel that distinction is not useful.
> >
>
> Okay, but I feel you can mention that in the description you have
> added for FINISHEDCOPY state. It looks a bit odd here and the message
> you want to convey is also not that clear.
>

The comment is updated in the latest patch [v20].


[v20] 
https://www.postgresql.org/message-id/CAHut%2BPuNwSujoL_dwa%3DTtozJ_vF%3DCnJxjgQTCmNBkazd8J1m-A%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia




  1   2   >